The Combinatorial Explosion

Why retrosynthesis is hard: the search tree grows exponentially with route depth. At a typical branching factor of 50 templates per step, a 10-step synthesis requires exploring ~10¹⁶ nodes. This is the central computational challenge that motivates every planning algorithm in the review.

Interactive Calculator

Adjust the sliders or click any cell in the heatmap to explore how the search space scales. The formula is simple — b^d — but the implications are profound.

Combinatorial Explosion Calculator

Retrosynthetic search scales as O(b^d). Adjust the sliders to feel how quickly exhaustive search becomes physically impossible — even with a compute cluster.

Templates per step (b)50

150100200

Route length (d)5

15101520

Drug syntheses: 3-8 steps. Natural products: 10-20+.

CPU cores1

1641K100K

Perfect parallelism assumed (optimistic).

Wall-clock time (1 core, 50 ms/template application)

180.8 days

Impractical — Guided search (MCTS, A*) essential

AWS cost

$174

at $0.04/core-hr (EC2 c7g)

Memory

145.5 GB

at 500 B/node

Tree nodes

3.1 × 10^8

50⁵ = b^d

Core-hours

4,340.278

total compute (any parallelism)

Table 3: Wall-clock time by branching factor and route depth

Depth	b=5	b=50	b=100
d=1
d=2
d=3
d=5
d=10
d=15

Click any cell to update. Assumes 50 ms per template application (Figure 6 of the review). Heatmap updates with core count.

Assumptions & methodology

Time per template application: 50 ms. The paper (Figure 6) reports ~90 ms for a top-50 expansion policy application. We use 50 ms as a representative cost per template application including scoring and bookkeeping.

Storage per node: 500 bytes. Each node stores a SMILES string (~100 bytes average for drug-like molecules), parent pointer, reaction ID, expansion scores, and search metadata.

Cloud cost: $0.04/core-hour, based on AWS EC2 c7g (Graviton3) compute-optimized on-demand pricing. Spot instances would be ~60-70% cheaper, but the total core-hours (and thus the order of magnitude) remain the same.

Parallelism:Perfect linear scaling assumed — no communication overhead, no shared-memory contention. Real speedup would be lower, making these estimates optimistic.

Tree model: Pure b^dexhaustive expansion with uniform branching. Real search trees are irregular — MCTS and A* prune aggressively, exploring only a tiny fraction of the full tree. That's the point: intelligent search is not optional.

Implications for Planning

Search algorithms are essential, not optional.

MCTS, A*, and neural heuristics exist because brute-force enumeration is impossible beyond trivial depths. The choice of search strategy determines which regions of the tree get explored within a fixed computational budget.

Model throughput is a chemical capability.

When expansion models are embedded in search loops, their inference speed becomes a structural constraint on planning depth. A marginally less accurate but orders-of-magnitude faster model may be the superior planner—it can explore deeper trees within the same time budget.

Stock set size changes the game.

Expanding the starting material inventory from ~100K to ~230M compounds dramatically increases the density of termination points, shortening the effective depth the planner must reach. This is why inventory size functions as a hidden difficulty dial and why STR values are not portable across studies.

Direct sequence generation bypasses the tree.

Full-route sequence models (DirectMultiStep) sidestep the combinatorial explosion entirely by generating complete routes as single sequences. They amortize the search cost into training, trading formal validity guarantees for speed and global conditioning.