This is a zero-to-hero guide on managing Python dependencies using uv. Whether you are a PhD student stuck in Conda hell or a developer looking to modernize your workflow, this post covers everything from creating virtual environments to handling conflicting dependencies and publishing reproducible code.
In one of my recent long-forms, I claimed that software quality dictates scientific possibility and went on to argue that a good computational scientist must be a good software engineer. It sounds great if you're a college freshman, but what if you're already midway through your PhD and you can't really ask your PI to allow you to spend a semester taking CS courses? The good news is that a significant chunk of problems typically associated with academic code stem from a few bad practices, one of them is lack of proper dependency management. This tutorial will be a comprehensive one-stop ultimate guide to managing dependencies in your projects, and it's written to be accessible to any level of expertise.
If you want your software to be used by other people, it should:
The first one is trivially easy to achieve if you use a good package manager like uv and follow best practices, to which most of this tutorial will be dedicated. The second one is a bit more tricky since it's primarily downstream of engineering decisions, and we'll cover it at the end.
The tutorial is intentionally written to be accessible to any level of coding experience, so it begins with the basics (e.g. why do we need virtual environments). If you already are a user of built-in venv or conda, you can skip ahead to the final overview of commands.
Python is a programming language with a remarkably low barrier to entry, not in the least because you can get quite far without ever worrying about dependencies. I've been using python for at least 3 years automating certain taskslike
writing a script
to create a queue of olympiad participants for arbitration such that every student can talk to every unique problem author and such that no member of the jury would sit without students without ever using a virtual environment: I was simply pip installing things like numpy, openpyxl, plotly without ever worrying where that installation happened.
I can imagine many computational scientists, for whom python is a tool (not a sacred craft with a rigid set of puritan rules), might share this experience, and so wonder: what could possibly go wrong? Why is it not enough if something works on my machine?
Let's say you wrote some code in 2023 using RDKit (a popular cheminformatics library) and you even specified that you used version 2023.9.5. RDKit requires numpy, and at the time, latest version of numpy was 1.24, which is what you had installed on your computer. When someone in 2026, who already has numpy installed (and since latest is 2.4, he'll have that), tries to install RDKit 2023.9.5 (according to your specifications), he's going to get screamed at with errors like:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.4.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "<string>", line 1, in <module>
File "/Users/morgunov/.cache/uv/archive-v0/pVEn_MEolmd-ckXXQs20Z/lib/python3.11/site-packages/rdkit/Chem/__init__.py", line 15, in <module>
from rdkit import DataStructs, RDConfig, rdBase
File "/Users/morgunov/.cache/uv/archive-v0/pVEn_MEolmd-ckXXQs20Z/lib/python3.11/site-packages/rdkit/DataStructs/__init__.py", line 13, in <module>
from rdkit.DataStructs import cDataStructs
AttributeError: _ARRAY_API not foundIt's remarkably easy to demonstrate the problem above with uv because you can do things like:
uv run --python 3.11 --with "rdkit==2023.9.5" \
--with "numpy>=2.0" python -c "import rdkit.Chem"which would trigger the error above. And changing spec to numpy<2 fixes the issue.
uv run --python 3.11 --with "rdkit==2023.9.5" \
--with "numpy<2.0" python -c "import rdkit.Chem"You might think this problem is easy to solve, just let your user know that they need numpy version 1.23.4 or lower, but for any real-world project, you'd have up to 10 more such specifications,also, what if a user needs to use a different project which needs latest numpy? and you'd basically rederive best practices for dependency management from first principles.
The real problem above is that the default behavior of python when you install a package is to install it globally. E.g., when you run
pip install numpyit installs it in the global site-packages directory. The good thing is that you can then use numpy in any python script, and this, arguably, is part of the reason why python has such a low barrier to entry. On the other hand, it also makes it almost impossible to reliably specify requirements for any single project (even if you currently have numpy v2 installed it doesn't mean that the project you worked on 2 years ago can be executed with it).
A virtual environment is basically a compartmentalized directory with a fresh (raw) python interpreter that you can (and should) create separately for each project. As you build the project, you install required dependencies into that directory, and so it becomes a minimal set of dependencies required to run your code.you might intuitively think that you can enable perfect reproducibility by just sharing that virtual environment directory, and directionally that would be correct, but we'll see a more space-efficient approaches to achieve that in a bit.
To create a virtual environment (hereafter referred to as a venv), you can simply type uv venv in the terminal, which will create a new directory called .venv.because it starts with a dot, it's hidden by default, but you can always see it by typing
ls -a
in the terminal. If you want to create a venv with a specific python version, you can specify it like:
uv venv --python 3.10to install uv on any platform, you can simply execute this one-liner in the terminal:
curl -LsSf https://astral.sh/uv/install.sh | shfast and easy! You can find this command on the official install page.
If you ever used pyenv or conda before, you might be familiar with the concept of having to activate the virtual environment before using it. If you want, you can still do that by running
source .venv/bin/activateafter which you can enter python repl python or run any file:
python my-awesome-script.pyWith uv, however, you can simply replace python with uv run, and forget about having to activate the virtual environment. In other words, uv run my-awesome-script.py is equivalent to running
source .venv/bin/activate
python my-awesome-script.pysmall quality of life UX improvements like this might seem minor, but they add up and that's what makes uv such a game changer for python devs.
Whenever you start a new python project, you should run uv init, which creates 3 files
my-project/
├── main.py
├── README.md
├── pyproject.tomlpyproject.toml is a modern universal configuration fileintroduced in
PEP 621 for python projects. Initially, it'll look like:
[project]
name = "my-project"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = []Notice that by default, uv init sets the project as requiring the latest version of python.which in general, is a very good practice. You can also specify desired python version, and you probably can already guess how:
uv init --python 3.11Let's say you want to use RDKit in your project. You can just add it:
uv add rdkitWhich will print something like:
uv add rdkit
Using CPython 3.13.3
Creating virtual environment at: .venv # if you didn't create a venv manually, it'll create one for you
Resolved 4 packages in 142ms
Installed 3 packages in 24ms
+ numpy==2.4.1
+ pillow==12.1.0
+ rdkit==2025.9.3This results in two changes. First, pyproject.toml is updated:
dependencies = [
"rdkit>=2025.9.3",
]Second, a new file uv.lock is created:
version = 1
revision = 3
requires-python = ">=3.11"
[[package]]
name = "numpy"
version = "2.4.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/24/62/ae72ff66c0f1fd959925b4c11f8c2dea61f47f6acaea75a08512cdfe3fed/numpy-2.4.1.tar.gz", hash = "sha256:a1ceafc5042451a858231588a104093474c6a5c57dcc724841f5c888d237d690", size = 20721320, upload-time = "2026-01-10T06:44:59.619Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a5/34/2b1bc18424f3ad9af577f6ce23600319968a70575bd7db31ce66731bbef9/numpy-2.4.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:0cce2a669e3c8ba02ee563c7835f92c153cf02edff1ae05e1823f1dde21b16a5", size = 16944563, upload-time = "2026-01-10T06:42:14.615Z" },
...,
]
[[package]]
name = "pillow"
version = "12.1.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/d0/02/d52c733a2452ef1ffcc123b68e6606d07276b0e358db70eabad7e40042b7/pillow-12.1.0.tar.gz", hash = "sha256:5c5ae0a06e9ea030ab786b0251b32c7e4ce10e58d983c0d5c56029455180b5b9", size = 46977283, upload-time = "2026-01-02T09:13:29.892Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/43/c4/bf8328039de6cc22182c3ef007a2abfbbdab153661c0a9aa78af8d706391/pillow-12.1.0-cp311-cp311-macosx_10_10_x86_64.whl", hash = "sha256:a83e0850cb8f5ac975291ebfc4170ba481f41a28065277f7f735c202cd8e0af3", size = 5304057, upload-time = "2026-01-02T09:10:46.627Z" },
...
]
[[package]]
name = "rdkit"
version = "2025.9.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy" },
{ name = "pillow" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/b1/19/4a606ef1c090b0abc54627a4aa5313c8bbef50da888841101bf1aa89ff70/rdkit-2025.9.3-cp311-cp311-macosx_10_15_x86_64.whl", hash = "sha256:520e572f70ac81a75091f953d4d9805bc1455879ad85724591c2b4b03bf1faf1", size = 31862854, upload-time = "2025-12-03T13:33:11.974Z" },
...
]
[[package]]
name = "temp"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
{ name = "rdkit" },
]
[package.metadata]
requires-dist = [{ name = "rdkit", specifier = ">=2025.9.3" }]
Notice how we only asked to install RDKit, but uv also notified us that it installed numpy and pillow because they are dependencies of RDKit. A lockfile is like a log book, which keeps track of all the libraries that were installed into your virtual environment, both directly and indirectly. Each entry specifies not only the name and version of the library, but also URLs for all (for all possible platforms) existing wheels for that library. In other words, a lockfile is a set of instructions for exact recreation of a virtual environment on any other platform.
A lockfile might seem novel to python developers, but it has been an industry standard for over a decade.though, funny enough, some ecosystems, like JS/TS had to survive introduction of a completely new package manager (yarn) just to force the de-facto default package manager npm to adopt lockfiles. Let's walk through a simple example demonstrating the necessity of a lockfile.
A good way to think about the dependencies block in pyproject.tomlwhich removes the need for requirements.txt, if you ever used one is that it is a record of developer's intent.
For example, since we typed uv add rdkit, we effectively asked "just give me latest RDKit", so pyproject reflects that any RDKit later than the current version will do.
dependencies = [
"rdkit>=2025.9.3",
]If you share this pyproject.toml specification publicly, and someone tries to install it 3 years from now, the package manager will find the latest version of RDKit (because it satisfies the constraint ">=2025.9.3"). Which might lead to issues if RDKit changes the behavior of the functions your code relies on. A common workaround has been pinning the exact version, like:
dependencies = [
"rdkit==2025.9.3",
]This solves the immediate problem, but it makes your package practically impossible to integrate into any other project which uses similar dependencies (but different versions of them). This is where the lockfile comes in.
Your pyproject.toml defines the list of dependencies as requested by the library author when it was developed. Usually, this is just "give me the latest version of the package." And in most cases, if something worked with, say, RDKit 2025.9.3, it will likely work with, say 2027.1.1, so someone who develops a project with a requirement of RDKit 2027.1.1 can still install your package as a dependency.
However, in case something breaks, and you don't have time (or the need) to debug which dependency is causing an issue, you can recreate the exact environment used to develop the library by installing the packages following the specification of the lockfile (which, as we've seen, will define not only the versions of the top-level requirements, but also of all 2nd-order dependencies).
Installing packages following the specification of the uv.lock is a simple sync command: uv sync.
Notably, when you try to run some code uv run my-awesome-script.py, which is in a directory containing a uv.lock file, uv will automatically install the packages following the specification of the lockfile.
The proper way to add dependencies when you use uv is by running uv add <package_name>, which is a shortcut for running uv pip install <package_name> and updating the pyproject.toml file to reflect a new dependency. In other words, if you try to pip install things, it doesn't update the pyproject.toml file, and you'll have to update it manually.
Also, uv sync is a proper replacement for uv pip install -e .. The latter will install packages using pyproject.toml specification, so if someone runs it a year from now, it'll install newer versions than you had used during development. uv sync installs packages using uv.lock specification.
Pretty much! If every academic project will be published with a lockfile (like uv.lock), we'll solve the reproducibilitywhich here means the ability to run the code on a different machine and get the same result as code author, it's only part of the bigger problem of reproducibility of scientific results problem once and for all. So, to recap, here's what you should do whenever you start working on a new python project.
curl -LsSf https://astral.sh/uv/install.sh | sh # Install uv
uv init # create new projectWhenever you need to add a new package, simply type uv add <package_name>. Whenever you need to execute a script, type uv run <script_name>.
Here's the best part. Let's say you publish your code on github:
my-project/
├── my-awesome-script.py
├── README.md
├── pyproject.toml
└── uv.lockNormally, you'd have to instruct your users to follow a particular ritual:
git clone https://github.com/username/my-project.git
cd my-project
# create a virtual environment
python -m venv venv
# activate the virtual environment
source venv/bin/activate
# install dependencies
pip install -r requirements.txt
# run code
python my-awesome-script.pyWith uv, it's just a single line:
git clone https://github.com/username/my-project.git
cd my-project
uv run my-awesome-script.pyThe beautiful part is that uv run will always work as intended, regardless of whether this is your 100th time running the script on your personal machine or someone else running it for the first time, in which case it'll install a python of the version specified in pyproject.toml, create a venv, install all dependencies from the lockfile, and then run the script.
Above we discussed the basics: creating projects, adding dependencies, and running code. It's enough to get you started, and if the above was new material for you, you probably should stop reading and come back after you get used to the new workflows.
Let's say at some point you decide to start using linters and formatters for your code, e.g. Ruff (created by the same team that made uv). To do so, you'll simply run uv add ruff, which will work. But, when you publish your project to PyPi, do you really need to require your users who simply need to run your code to have Ruff installed?and you should never think this is an insignificant consideration since Ruff "only" takes 22 MB of space, because how you do anything is how you do everything No, and for any similar package that is only needed during development, you can utilize a dev dependency group by adding a flag --dev:
uv add ruff --devIn pyproject.toml, you'll now see two lists:
[project]
dependencies = [
"rdkit>=2025.9.3",
]
[dependency-groups]
dev = [
"ruff>=0.14.14",
]A related, but slightly different scenario. Say you build some package that does some analysis (like RetroCast), e.g. it can calculate and print average stock termination rate for some predicted routes. And say you also code an ability to create plots comparing the metric for different route lengths. To make a plot, you'd need some library like matplotlib or plotly.I really, really recommend you try plotly. You can create interactive html plots where you can zoom-in, toggle visibility of different lines, etc. It's an absolute game changer But what if 90% of your users will never use that module? Do you need to require them to install plotly? No, so you can define an optional dependency group, say related to visualizations:
uv add plotly --optional vizNow your pyproject.toml has a third dependency list:
[project]
dependencies = [
"rdkit>=2025.9.3",
]
[project.optional-dependencies]
viz = [
"plotly>=6.5.2",
]
[dependency-groups]
dev = [
"ruff>=0.14.14",
]When someone tries to run regular code, say uv run my-awesome-script.py which doesn't use plotly, they'll be able to do so without having to install it. And if you want someone to run code which does need plotly, you can simply instruct uv to use an extra (optional) dependency:
uv run --extra viz create-some-plots.pyHaving both optional dependencies and dependency groups is a bit confusing, and in the beginning you can think of dependency groups and "dev" dependencies as being synonymous because for the most part, that's the only dep group you'll define. If you end up working on large codebases, eventually you might find yourself defining a separate dependency groups for testing or web servers. The overarching principle is always the desire to avoid installing things you're not going to be using.
Let's say you want to incorporate a package that hasn't been published to PyPi and only exists as a github repository. For example, while I was working on SynthArena, I wanted to benchmark the original Retro* implementation. A defeatist option would be to git clone that repo, run that model in a separate repository, and then move the results. Instead, I forked the repo, created pyproject.toml file, added all dependencies with uv add. Now I can install that package like this:
uv add "retro-star @ git+https://github.com/anmorgunov/retro_star" --optional retrostarand voila!
[project.optional-dependencies]
rs = [
"retro-star",
]
[tool.uv.sources]
retro-star = { git = "https://github.com/anmorgunov/retro_star" }Let's say you want to define two optional dependency groups which have direct conflicts with each other? For example, you want to incorporate two different models, one strictly requiring numpy<2, and the other strictly requiring numpy>=2? Though this might seem impossible, you can actually declare them as conflicting dependencies, and although you'll never be able to use them together, like:
uv run --extra group1 --extra group2 some-script.pyyou'll be perfectly capable of using them separately in the same project!
uv run --extra group1 run-model1.py # OR
uv run --extra group2 run-model2.pyI actually described how I used this feature to benchmark 5 different multistep planners and why I eventually had to switch to a different uv run --directory approach.
Like I said, even if you just follow level 1 practices, you'll make it seamless for anyone to run your code on any platform. But it still won't be enough to make it easy to integrate your package as a dependency for someone else's project. Because ultimately, it doesn't prevent a situation when your package was built with libraryA v0.2, which strictly requires numpy<2, and someone else is creating a new project that justifiably uses numpy>2.declaration as conflicting groups only works if the conflict is between optional groups, not between an optional group and a core dependency
In the limit, the only way to ease the integration of your package into other projects is by following a simple maxim: do not use dependencies.
Now, obviously, rewriting all the core functionality of torch if you need to train a model is not a good use of your time, so the rule shouldn't be taken literally, but in general you should err on the side of thinking twice before installing a new library. For example, even though in pretty much every tutorial on data science you'll see a line like:
import pandas as pd
df = pd.read_csv('path/to/file.csv')you don't really need to install pandas just to read a csv. Unless you work with gigantic datasets where having core iteration loops implemented in C is crucial for speed, there's no reason not to use the built-in csv module:
import csv
with open('path/to/file.csv', 'r') as f:
data = list(csv.DictReader(f))or, if you want a one-liner:
from pathlib import Path
import csv
data = list(csv.DictReader(Path('path/to/file.csv').open()))Similarly, you really shouldn't install numpy if you only intend to use np.mean() and (or) np.std(). You'll be surprised how much you can achieve with built-in functionality, and with LLMs, implementing directly the feature you need is easier than ever. Deciding whether adding a new dependency is worth it is one of my major use cases of Gemini models in AI Studio: you can dump a significant chunk of your codebase, so that an LLM has enough context to decide whether the simplicity of using an existing package is worth the complexity of adding a new dependency.
If you're already using some solution for dependency management, here's why you should consider migrating to uv. The simple answer is: uv is fast. Blazingly fast. This might seem like an insufficient reason, and it's probably why a lot of people express hesitation when this is brought as an argument: after all, how often do you install your dependencies?
But this is precisely the problem. Before uv, I had multiple occasions when I suggested upgrading to some newer version of torch (because it had some nice new feature) or python altogether (e.g. moving from 3.10 to 3.11, which had significantly improved error tracebacks per PEP 657), and my collaborators hesitated because they had a working conda environment, and they didn't want to end up wasting half a day on fixing it if things go wrong (a very justifiable prior).
uv is so fast that I can spin up a fresh Lambda instance (so CUDA is installed), and start training a state-of-the-art model in literally a minute. Once again we see that software quality dictates scientific possibility.
I mean, even simple python installation which used to be whole process, is now just a single cli arg:
uv run --python 3.11 python -c "print('Hello, World!')" # or uv venv --python 3.11
uv run --python 3.13 python -c "print('Hello, World!')" # or uv venv --python 3.13also, with uv you can bring debugging onto a whole new level. One of my colleagues has been working on a codebase that relied on HuggingFace's transformers library, but unfortunately it didn't specify the version used during development, and the version available at the moment was running into a RuntimeError. With uv, you could do something like:
uv run --with transformers==4.44.2 --with torch --with accelerate bfloat-gen.py # runs fine
uv run --with transformers==4.45.0 --with torch --with accelerate bfloat-gen.py # RuntimeError
uv run --with transformers==4.56.2 --with torch --with accelerate bfloat-gen.py # RuntimeErrorTo confirm that the issue has been introduced with the release of 4.45.0. Take a look at this issue if you're curious how the story ended.
Every time you create a new conda environment, it creates a hard copy of the python and every dependency you install. So, if you have 2 different projects, both using python 3.11 and torch 2.8, conda will install identical copies of python and torch twice. That's roughly extra 1 GB of disk space. uv, on the other hand, uses aggressive caching: packages are installed once, and every time you want to reuse them, uv will create a symlink.
Sure you can create a venv with built-in python module like:
python -m venv myvenvBut, you'd need python installed in the first place, and if you need some specific version, the closest to having a seamless experience would be installing pyenv, and so you can think of uv as a replacement for pyenv and venv. But also, why would I even need to define the name of my virtual environment, and typing python -m venv venv feels wrong, so you start being creative, python -m venv yovenv, and then you have to remember what was the name of the venv when you have to type source yovenv/bin/activate, and then you have to make sure not to forget to include it in .gitignore.
uv is just better UX.
Poetry has tried to introduce a lot of best practices, including a lockfile into the python environment. The only problem is that it's written in python, and so resolving complex dependencies can be excruciatingly slow (uv is literally 10x faster).
Cursor is dying because cost-optimization forces models into tunnel vision. RAG agents fail because they only see what they search for. The superior workflow for 2026 is massive context windows (gemini 2.5 pro) and manual control. Stop letting agents hide code from the model.
A breakdown of Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation
a deep dive into how the metrics for ai chemical synthesis are broken, rewarding models for "solving" routes with impossible, hallucinatory chemistry. we present the data and the open-source tools to fix it.