The Combinatorial Explosion
Why retrosynthesis is hard: the search tree grows exponentially with route depth. At a typical branching factor of 50 templates per step, a 10-step synthesis requires exploring ~1016 nodes. This is the central computational challenge that motivates every planning algorithm in the review.
Interactive Calculator
Adjust the sliders or click any cell in the heatmap to explore how the search space scales. The formula is simple — bd — but the implications are profound.
Combinatorial Explosion Calculator
Retrosynthetic search scales as O(bd). Adjust the sliders to feel how quickly exhaustive search becomes physically impossible — even with a compute cluster.
Drug syntheses: 3-8 steps. Natural products: 10-20+.
Perfect parallelism assumed (optimistic).
Wall-clock time (1 core, 50 ms/template application)
180.8 days
Impractical — Guided search (MCTS, A*) essential
AWS cost
$174
at $0.04/core-hr (EC2 c7g)
Memory
145.5 GB
at 500 B/node
Tree nodes
3.1 × 10^8
505 = bd
Core-hours
4,340.278
total compute (any parallelism)
Table 3: Wall-clock time by branching factor and route depth
| Depth | b=5 | b=50 | b=100 |
|---|---|---|---|
| d=1 | |||
| d=2 | |||
| d=3 | |||
| d=5 | |||
| d=10 | |||
| d=15 |
Click any cell to update. Assumes 50 ms per template application (Figure 6 of the review). Heatmap updates with core count.
Assumptions & methodology
Time per template application: 50 ms. The paper (Figure 6) reports ~90 ms for a top-50 expansion policy application. We use 50 ms as a representative cost per template application including scoring and bookkeeping.
Storage per node: 500 bytes. Each node stores a SMILES string (~100 bytes average for drug-like molecules), parent pointer, reaction ID, expansion scores, and search metadata.
Cloud cost: $0.04/core-hour, based on AWS EC2 c7g (Graviton3) compute-optimized on-demand pricing. Spot instances would be ~60-70% cheaper, but the total core-hours (and thus the order of magnitude) remain the same.
Parallelism: Perfect linear scaling assumed — no communication overhead, no shared-memory contention. Real speedup would be lower, making these estimates optimistic.
Tree model: Pure bd exhaustive expansion with uniform branching. Real search trees are irregular — MCTS and A* prune aggressively, exploring only a tiny fraction of the full tree. That's the point: intelligent search is not optional.
Implications for Planning
Search algorithms are essential, not optional.
MCTS, A*, and neural heuristics exist because brute-force enumeration is impossible beyond trivial depths. The choice of search strategy determines which regions of the tree get explored within a fixed computational budget.
Model throughput is a chemical capability.
When expansion models are embedded in search loops, their inference speed becomes a structural constraint on planning depth. A marginally less accurate but orders-of-magnitude faster model may be the superior planner—it can explore deeper trees within the same time budget.
Stock set size changes the game.
Expanding the starting material inventory from ~100K to ~230M compounds dramatically increases the density of termination points, shortening the effective depth the planner must reach. This is why inventory size functions as a hidden difficulty dial and why STR values are not portable across studies.
Direct sequence generation bypasses the tree.
Full-route sequence models (DirectMultiStep) sidestep the combinatorial explosion entirely by generating complete routes as single sequences. They amortize the search cost into training, trading formal validity guarantees for speed and global conditioning.