Planning Methods

A comprehensive taxonomy of 20 retrosynthetic planning architectures from 2018 to 2026, organized by planning paradigm. Based on Tables 5 and 6 of the review.

Explicit Graph Search

11

Routes assembled step-by-step via tree search (MCTS, A*) with single-step expansion models.

Hybrid / Neurosymbolic

6

Neural policies combined with symbolic search, LLM critics, or ensemble verification.

Direct Sequence Generation

3

Full retrosynthetic routes generated as a single sequence via autoregressive decoding.

Architecture Taxonomy

Click any row to expand details. Search by name, architecture, or contribution. Sort by column headers.

20 of 20 methods

Method Families

A qualitative comparison of the five major method families, from Table 6 of the review.

Expert-Encoded Planners

Explicit Graph Search
Representative Systems

LHASA, Chematica, and other systems built on hand-curated rule sets.

Training Signal

Human expertise encoded as reaction rules, often with explicit steric and electronic guards.

Explicit Search

Yes (typically best-first or proof-number search over a symbolic graph).

Solv-N Level

Can formally guarantee Solv-2 constraints if explicitly encoded in the rules. Otherwise, defaults to Solv-1.

Strength

High interpretability; direct linkage to mechanistic precedent. Crucially, this is the only paradigm where Solv-2 constraints (selectivity) can be formally guaranteed by construction if the rules are sufficiently detailed.

Limitation

Coverage is strictly bounded by the rule library; expensive and slow to update; brittle when faced with novel scaffolds or reaction classes.

Possible Failure Modes

Fails silently when a required transformation is absent from its knowledge base; misapplication of over-generalized rules in novel contexts.

Data-Driven Search Planners

Explicit Graph Search
Representative Systems

3N-MCTS, Retro*, AiZynthFinder, MEEA*. The workhorses of the "Navigability Era".

Training Signal

Supervised single-step policies and/or value functions learned from large reaction corpora (e.g., USPTO).

Explicit Search

Yes (MCTS, A*, or hybrid tree search).

Solv-N Level

Primarily operates at Solv-1 (topological connectivity). Any performance at Solv-2 is implicit and statistical, not guaranteed.

Strength

Systematic exploration of vast search spaces; modular architecture (chemical model can be upgraded independently of the search algorithm). The definitive Solv-1 machines.

Limitation

High inference cost; suffers from horizon effects on deep routes; chemical knowledge is purely statistical and limited to patterns in the training data.

Possible Failure Modes

Finds topologically valid but chemically naive routes; high STR masking low route quality; performance collapses on deep routes; over-reliance on statistically common but strategically poor disconnections.

Hybrid / Neurosymbolic Planners

Hybrid / Neurosymbolic
Representative Systems

Llamole, LARC, RetroChimera, AOT*. The emerging frontier aiming to bridge the validity gap.

Training Signal

Mixed. Combines learned policies with symbolic verifiers, or uses LLMs as heuristic guides, critics, or macro-planners.

Explicit Search

Yes (symbolic search is typically retained as the scaffold for ensuring rigor).

Solv-N Level

Explicitly targets the gap between Solv-1 and Solv-2 by adding layers of validation or heuristic guidance.

Strength

Attempts to combine the rigor of symbolic methods with the coverage of data-driven models. LLMs can inject non-topological constraints like safety or cost.

Limitation

High architectural complexity; performance is sensitive to the calibration between neural and symbolic components; LLMs introduce reproducibility and hallucination challenges.

Possible Failure Modes

Miscalibrated neural heuristics pruning valid symbolic branches; LLM hallucinations leading search astray; inconsistent logic between components causing dead-ends.

Direct Full-Route Generators

Direct Sequence Generation
Representative Systems

DirectMultiStep, SynLlama, RetroSynFormer.

Training Signal

Supervised learning on complete, serialized route trajectories. Learns the joint probability of the entire synthesis plan.

Explicit Search

No (inference is via autoregressive decoding, e.g., beam search).

Solv-N Level

Lacks formal guarantees at any tier. Validity (from Solv-0 upwards) must be externally verified after a route is generated.

Strength

Extremely fast inference; learns global, route-level conditioning (e.g., convergent strategies) that myopic search misses. More robust performance on deep routes.

Limitation

No formal validity guarantees at any tier—requires rigorous post-hoc validation. Boundary conditions (e.g., stock set) are baked into model weights, hindering adaptation.

Possible Failure Modes

Hallucinates chemically impossible steps or invalid SMILES (Solv-0 failure); proposes syntactically valid but chemically incoherent sequences; generates routes that are disconnected from the specified stock set.

The Endgame: A Strategic Outlook

The following is a personal interpretation of where the field is heading.

The diversity of planning methods presents a confusing landscape, but it is likely a transient one. The apparent competition between paradigms obscures a clear trajectory governed by a single, overriding constraint.

1. The Scalability Lesson from Expert Systems

The history of expert-encoded systems demonstrates the fundamental limits of manual curation. Their inability to achieve comprehensive coverage, despite decades of effort, reveals that any paradigm reliant on hand-encoded knowledge will be outpaced by data-driven methods that can scale with automated data acquisition.

2. The Pragmatism of Hybrids and the Bitter Lesson

Hybrid and neurosymbolic architectures are pragmatic, effective solutions for the current landscape, where high-fidelity data is scarce. They leverage symbolic logic to compensate for noisy, incomplete training sets. However, the long-term trend in computationally-driven fields, as described by Sutton's Bitter Lesson, favors general architectures that scale with data and compute over complex, hand-engineered integrations.

3. The Inevitable Convergence

This points toward a unified architecture where a rigorous, physics-aware search process serves as the high-fidelity data generator, and a large-scale sequence model serves as the inference policy. The computationally expensive work of ensuring chemical validity is performed once to create a training corpus, then amortized into a fast, generalizable model. This is the logical endpoint of search-augmented generation.

This reframes the central challenge for the field. The primary obstacle to progress is no longer a deficit in planning algorithms; it is the absence of a scalable, automated Solv-2 validator. Without a reliable oracle to certify chemical plausibility at scale, the data generator cannot run, and policy models are left to train on noisy, low-validity data.

Therefore, developing this validator is the highest-leverage research problem. The team that solves it will not just create a better planner; they will unlock the data required to train the first true chemical foundation models.