ChemRxiv 2026 · Preprint

The Syntax of Matter

Synthesis Planning as the Foundation of Generative Chemistry

Anton Morgunov*, Yu Shee, Alexander V. Soudackov, Victor S. Batista*·Department of Chemistry, Yale University

The Argument

We have been training models to predict what molecules do before teaching them how molecules are made. This review argues that synthesis planning — the step-by-step logic of constructing a molecule — is the chemical equivalent of next-token prediction: the foundational objective that will unlock genuine chemical reasoning. We survey the field from 2020 to 2026, diagnose why current benchmarks are misleading, and propose a new validity framework (Solv-N) to measure what actually matters.

The Paradigm Shift Toward Artificial Chemical Intelligence. Analogous to the evolution of NLP and structural biology, chemical AI must transition from static graph representations to causal reaction trajectories evaluated via the Solv-N hierarchy. — **The Paradigm Shift Toward Artificial Chemical Intelligence.** Analogous to the evolution of NLP and structural biology — where optimizing for structural grammar yielded emergent reasoning — chemical AI must transition from static graph representations to causal reaction trajectories, rigorously evaluated via the Solv-N hierarchy.

Explore the Review

Core arguments, frameworks, and open problems — distilled into web-native, directly-linkable references.

Key Ideas

The 10 core arguments from the review — domain ideas and meta ideas — distilled for the web. Each is a self-contained, directly-linkable section.

Combinatorial Explosion

An interactive calculator showing why retrosynthetic search is hard. Explore how O(b^d) scaling makes brute-force enumeration intractable.

Methods

An interactive taxonomy of 18+ retrosynthetic planning architectures (2018–2026) — filterable by paradigm, searchable, and sortable. The landscape at a glance.

Evaluation

Why benchmark numbers are misleading. Interactive tables showing how inventory size and metric choice inflate reported performance.

Validity Framework

The Solv-N hierarchy — a new vocabulary for evaluating synthesis planners. The interactive reference the field has been missing.

Open Problems

A living document of grand challenges, research-sized problems, and engineering gaps. Updated as the field evolves.

Full Text

The complete review with interactive hover definitions, AI-powered explanations, and audio narration. The paper of the future.

Reading Paths

Not sure where to start? Pick a guided path tailored to your background.

Quick Overview

15 min·8 steps

The core argument in 15 minutes. For busy researchers who want the key takeaways without reading the full paper.

Start

Chemist's Path

35 min·15 steps

For synthetic and computational chemists. Focuses on the domain arguments, method landscape, and practical implications for bench chemistry.

Start

ML Researcher Path

35 min·15 steps

For ML/AI researchers. Focuses on the computational arguments, benchmark critique, and algorithmic open problems.

Start

Complete Deep Dive

90 min·16 steps

The full paper in recommended reading order. For readers who want comprehensive understanding of every argument and its evidence.

Start

Central Claims

Structure precedes quantity.

We are attempting to solve chemistry's quantitative problems (toxicity, binding affinity) before we have mastered its structural grammar. The history of AI shows this order is backwards.

Navigability is solved. Validity is not.

Modern planners achieve 99%+ stock-termination rates. But stock-termination measures graph connectivity, not chemical correctness. The field is celebrating victory on the wrong metric.

Synthesis planning is the chemical analogue of next-token prediction.

Just as language models acquired generalizable reasoning by mastering the grammar of text, chemical models may acquire robust physical reasoning by training on the causal logic of molecular transformation.

Evaluation requires a validity hierarchy.

The proposed Solv-N framework separates syntactic correctness (Solv-0) and topological connectivity (Solv-1) from selectivity (Solv-2) and executability (Solv-3). Most published results only measure Solv-1.

Cite this work

Anton Morgunov, Yu Shee, Alexander V. Soudackov, Victor S. Batista. “The Syntax of Matter: Synthesis Planning as the Foundation of Generative Chemistry.” ChemRxiv (2026). Preprint.