The Syntax of Matter
Synthesis Planning as the Foundation of Generative Chemistry
Anton Morgunov*, Yu Shee, Alexander V. Soudackov, Victor S. Batista*·Department of Chemistry, Yale University
The Argument
We have been training models to predict what molecules do before teaching them how molecules are made. This review argues that synthesis planning — the step-by-step logic of constructing a molecule — is the chemical equivalent of next-token prediction: the foundational objective that will unlock genuine chemical reasoning. We survey the field from 2020 to 2026, diagnose why current benchmarks are misleading, and propose a new validity framework (Solv-N) to measure what actually matters.
Explore the Review
Core arguments, frameworks, and open problems — distilled into web-native, directly-linkable references.
The 10 core arguments from the review — domain ideas and meta ideas — distilled for the web. Each is a self-contained, directly-linkable section.
An interactive calculator showing why retrosynthetic search is hard. Explore how O(b^d) scaling makes brute-force enumeration intractable.
An interactive taxonomy of 18+ retrosynthetic planning architectures (2018–2026) — filterable by paradigm, searchable, and sortable. The landscape at a glance.
Why benchmark numbers are misleading. Interactive tables showing how inventory size and metric choice inflate reported performance.
The Solv-N hierarchy — a new vocabulary for evaluating synthesis planners. The interactive reference the field has been missing.
A living document of grand challenges, research-sized problems, and engineering gaps. Updated as the field evolves.
The complete review with interactive hover definitions, AI-powered explanations, and audio narration. The paper of the future.
Reading Paths
Not sure where to start? Pick a guided path tailored to your background.
Quick Overview
The core argument in 15 minutes. For busy researchers who want the key takeaways without reading the full paper.
Chemist's Path
For synthetic and computational chemists. Focuses on the domain arguments, method landscape, and practical implications for bench chemistry.
ML Researcher Path
For ML/AI researchers. Focuses on the computational arguments, benchmark critique, and algorithmic open problems.
Complete Deep Dive
The full paper in recommended reading order. For readers who want comprehensive understanding of every argument and its evidence.
Central Claims
Structure precedes quantity.
We are attempting to solve chemistry's quantitative problems (toxicity, binding affinity) before we have mastered its structural grammar. The history of AI shows this order is backwards.
Navigability is solved. Validity is not.
Modern planners achieve 99%+ stock-termination rates. But stock-termination measures graph connectivity, not chemical correctness. The field is celebrating victory on the wrong metric.
Synthesis planning is the chemical analogue of next-token prediction.
Just as language models acquired generalizable reasoning by mastering the grammar of text, chemical models may acquire robust physical reasoning by training on the causal logic of molecular transformation.
Evaluation requires a validity hierarchy.
The proposed Solv-N framework separates syntactic correctness (Solv-0) and topological connectivity (Solv-1) from selectivity (Solv-2) and executability (Solv-3). Most published results only measure Solv-1.
Cite this work
Anton Morgunov, Yu Shee, Alexander V. Soudackov, Victor S. Batista. “The Syntax of Matter: Synthesis Planning as the Foundation of Generative Chemistry.” ChemRxiv (2026). Preprint.