Benchmarking scientific code is painful because Package A usually requires numpy<1.20 while Package B requires numpy>=1.24. Instead of managing five different Conda environments and manually switching contexts, you can use uv's conflicting dependency groups or independent directory execution to run everything from a single command line interface.
Last month, I wrote a long-form elaborating why science is infrastructure-constrained, not ideas-constrained. In this note, I'll provide one example of how better infra (in this case uv) enabled me to benchmark 5 distinct codebases for chemical synthesis planning (almost) without the usually associated headaches of dealing with dependencies. I also have a comprehensive tutorial on dependency management with uv.
Let's say you have two planners: AiZynthFinder and SynPlanner. Most likely there's at least one dependency conflict between them, so using both from the same virtual environment is impossible and you'd have to resort to creating two different environments (either with venv or conda, in both cases you would waste gigabytes duplicating shared dependencies), and every time you need to run each of the planners, you'll have to manually switch the environments before executing scripts.
With uv you can declare dependency groups as conflicting, create a single compartmentalized environment, and you can switch between the groups with a simple cli argument --extra group-name:
uv run --extra aizyn scripts/aizynthfinder/3-run-aizyn-mcts.py \
--benchmark mkt-lin-500or
uv run --extra syntheseus scripts/synplanner/2-run-synp-val.py \
--benchmark mkt-lin-500This approach worked decently for up to 4 planners, but the empire always strikes back, so eventually I had to migrate to defining a dedicated pyproject.toml and uv.lock in each planner's directory, but the execution stays as simple as before:
uv run --directory scripts/planning/run-synplanner 2-run-synp-val.py \
--benchmark mkt-lin-500It's hard to overstate how significant this feature is. Ask anyone who has ever tried to benchmark academic codebases, and you'll immediately learn that this is a PTSD-inducing experience that is usually reserved as a punishment (or a rite of passage) for junior researchers. And in my personal experience there were several occasions when the right scientific thing was rerunning all evals from scratch, but my collaborators were running mental cost-benefit analysis because they had (a correct at the moment) prior that re-running multiple academic codebases will take days. But with uv it's a breeze. So it's not an exaggeration to say that the quality of tools dictates scientific possibility.
In fact, I'll go on record with quite a sacrilegious take: the next time a Nobel prize is awarded for a computational breakthrough, Astral team would be an unacknowledged co-recipient.
As a bonus, a few words for those curious about the rationale for the shift from --extra group-name to --directory dir-name. Using uv run --extra is functionally equivalent to running:
source .venv/bin/activate # activate an environment
uv sync --extra group-name # install all main dependencies and group-name extras using uv.lock specification
python script.pyThis procedure relies on the existence of a lockfile uv.lock, or, to be more precise, on the ability to create it (because uv will create it if it doesn't exist). During construction of the lockfile, uv attempts to perform universal resolution; it tries to produce a single lockfile valid for all supported platforms (Linux, macOS, Windows) and architectures (x86, ARM). If your dependencies rely on packages with C-extensions (RDKit, DGL) that don't publish wheels for every single platform, the resolution fails because no single version satisfies the constraints across all targets. So, unless you update pyproject.toml to specify that a certain dependency should only be installed on a particular platform, uv will fail to create the lockfile. As a result, you won't be able to run uv sync --extra group-name, but you'd be perfectly capable of running
source .venv/bin/activate
uv pip install -e ".[group-name]" # install all main dependencies and group-name extras from pyproject.toml spec
python script.pyand because it wouldn't have to satisfy the constraints of other dependency groups, it'll just install whatever versions are specified by the planner. While it works, you lose all the benefits of having a lockfile (being able to perfectly reproduce the environment). So a different solution would be to create a new uv project within the planner directory (with its own uv.lock file). So if your main/root project is where you run scoring/analysis, you can have a distinct/competing uv.lock file for each planner that you can invoke with --directory arg. For example:
main-repo/
├── analysis.py
├── pyproject.toml # root project (orchestrator)
├── uv.lock
├── src/
│ └── package/
│ ├── __init__.py
│ └── ...
├── planner1/ # independent sub-project
│ ├── script1.py
│ ├── pyproject.toml # defines deps for planner 1
│ └── uv.lock # locked specifically for planner 1
└── planner2/ # independent sub-project
├── script2.py
├── pyproject.toml # defines deps for planner 2
└── uv.lock # locked specifically for planner 2So if you run uv run analysis.py it'll use uv.lock from the root, but if you run uv run --directory planner1 script1.py, it'll use uv.lock from the planner1 directory. This way, you can have a separate environment for each planner, and you can still use the benefits of having a lockfile.
A dive into the gregorian calendar as a 400-year software patch for the Julian's memory leak. A deployment full of hilarious bugs, like Sweden inventing a February 30th, with a surprising connection to our modern holiday traditions.
Cursor is dying because cost-optimization forces models into tunnel vision. RAG agents fail because they only see what they search for. The superior workflow for 2026 is massive context windows (gemini 2.5 pro) and manual control. Stop letting agents hide code from the model.
A breakdown of Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation
a deep dive into how the metrics for ai chemical synthesis are broken, rewarding models for "solving" routes with impossible, hallucinatory chemistry. we present the data and the open-source tools to fix it.