The Validity Framework

The Solv-N hierarchy — a new vocabulary for evaluating synthesis planners. When someone says “we evaluated at Solv-2,” this is where they point.

The Solvability Hierarchy

Four tiers of constraints that a proposed synthetic transformation must satisfy, ordered from basic (Solv-0) to comprehensive (Solv-3). Click any tier to jump to its details.

At a Glance

Solv-0Solved

Syntactic

Is the output a valid molecule?

Solv-1Solved

Topological

Is the bond rearrangement chemically legal?

Solv-2Open

Selectivity

Would this reaction give the correct product?

Solv-3Open

Executability

Would this route work in a real lab?

Tier Details

Each tier defines a level of chemical validity. Expand a tier to see its definition, how current planners handle it, and example failure modes.

Syntactic

The output is a well-formed molecular graph — valid SMILES, balanced atoms, legal valences.

Syntactic validity is the most basic constraint: the output must parse into a chemically meaningful molecular graph. This means valid SMILES strings, balanced atom counts, legal valences, and no orphaned bonds. It is the chemical equivalent of a grammatically correct sentence.

Template-based methods enforce Solv-0 by construction — the output is generated by applying a graph-edit rule to a valid input, guaranteeing a valid output. Template-free sequence models must learn syntactic constraints from training data, and violations (malformed SMILES, illegal valences) are a known failure mode, particularly for complex molecules.

Section 5.2.1

Topological

The proposed reaction center transformation is a legal bond rearrangement — correct atom mapping, valid bond changes, and recognized reaction topology.

Topological validity requires that the proposed transformation represents a legal reaction center modification. The atom-atom mapping must be consistent, the bond changes must correspond to a recognized reaction topology, and the transformation must be derivable from a known or plausible mechanistic pattern.

This is what most published planners actually measure when they report "solvability." A high stock-termination rate (STR) demonstrates that the planner can connect a target to purchasable starting materials through a chain of topologically valid transformations. Template-based methods guarantee Solv-1 by construction — every proposed step corresponds to a reaction template extracted from experimental data.

The critical insight is that Solv-1 says nothing about selectivity. A topologically valid transformation may propose a nucleophilic addition to the wrong carbonyl in a molecule with three carbonyls. The graph transformation is legal; the chemistry is wrong.

Example Failures

Nucleophilic addition — topological validity vs. selectivity

Three nucleophilic addition reactions illustrating the boundary between Solv-0 (syntactic), Solv-1 (topological) and Solv-2 (selectivity). Reaction 1 is Tier-1 valid but fails chemoselectivity (Solv-2C) because the nucleophile attacks the wrong electrophilic center. Reaction 2 is topologically invalid (Tier-1 failure). Reaction 3 is syntactically invalid (Tier-0 failure).

Three nucleophilic addition reactions demonstrating Solv-0, Solv-1, and Solv-2 validity boundaries

Section 5.2.1

Selectivity

The transformation is chemically plausible given competing functional groups and stereochemical requirements — all five selectivity sub-constraints must be satisfied simultaneously.

Selectivity validity is where current planners fail most catastrophically. A Solv-2 valid transformation must satisfy five sub-constraints simultaneously: chemoselectivity (the right functional group reacts), regioselectivity (at the right position), diastereoselectivity (correct relative stereochemistry), enantioselectivity (correct absolute stereochemistry), and stoichiometric control (single vs. multiple equivalent transformations).

The challenge is that these constraints are not independent. A reaction may be chemoselective but not regioselective, or regioselective but not stereoselective. The combinatorial nature of selectivity makes it fundamentally harder than topological validity — and it cannot be guaranteed by template matching alone. A template extracted from one substrate may not transfer to a structurally similar substrate with different electronic or steric properties.

Most published planners provide no formal control over selectivity. Template-based methods inherit whatever selectivity the training data encoded, but do not verify it for new substrates. Sequence-based methods generate products without explicit selectivity reasoning. Enantioselectivity and stoichiometric balance are largely ignored across the field.

Sub-constraints

Solv-2CChemoselectivity

The correct functional group reacts in the presence of competing reactive sites. A carbonyl reduction must target the intended carbonyl when multiple are present.

Solv-2RRegioselectivity

The reaction occurs at the correct position within the targeted functional group. For electrophilic aromatic substitution, the substituent must be directed to the correct ring position.

Solv-2DDiastereoselectivity

The correct relative stereochemistry is produced. Reactions forming new stereocenters must yield the intended diastereomer, not a mixture.

Solv-2EEnantioselectivity

The correct absolute stereochemistry is produced. Asymmetric reactions must employ appropriate chiral catalysts or auxiliaries and yield the intended enantiomer.

Solv-2SStoichiometry

Control of single vs. multiple equivalent transformations. When a molecule contains two identical reactive sites (e.g., two equivalent carbonyls), the planner must specify whether the reaction is intended to occur once or twice — and provide a mechanism that enforces that control. Proposing mono-addition to a symmetric dicarbonyl without a control mechanism fails this constraint, as the forward reaction will inevitably produce the double-addition product.

Example Failures

Nucleophilic addition — selectivity failures

Four nucleophilic addition reactions at Solv-2. Reaction 1 is fully Solv-2 valid: exhaustive alkylation of two equivalent carbonyls. Reaction 2 fails stoichiometry (Solv-2S) — mono-addition is proposed to a symmetric dicarbonyl with no control mechanism; the forward reaction inevitably over-reacts to the double-addition product. Reaction 3 fails regioselectivity (Solv-2R) — selective addition is attempted at one of two competing electrophilic sites where the intrinsic reactivity difference is insufficient. Reaction 4 fails chemoselectivity (Solv-2C) — the strongly basic organolithium reagent is quenched by the unprotected carboxylic acid before the intended addition can occur.

Four nucleophilic addition reactions illustrating Solv-2 selectivity failures: stoichiometry, regioselectivity, and chemoselectivity

Diels-Alder — stereoselectivity failures

Four Diels-Alder reactions demonstrating stereoselectivity at Solv-2. Reaction 1 is fully valid with the correct chiral catalyst (Evans oxazolidinone). Reaction 2 fails enantioselectivity (Solv-2E) — the wrong chiral catalyst is specified. Reaction 3 fails enantioselectivity — no chiral catalyst is used at all. Reaction 4 fails diastereoselectivity (Solv-2D) — the wrong relative stereochemistry is produced.

Four Diels-Alder reactions demonstrating enantioselectivity and diastereoselectivity failures at Solv-2

Section 8.1; Table 4

Executability

The route is experimentally viable — realistic conditions, adequate yields, compatible purification, and successful execution from start to finish.

Executability is the ultimate test: would the proposed route succeed in a laboratory? This requires not just valid individual steps (Solv-0 through Solv-2) but route-level coherence. Reaction conditions must be realistic, yields must be adequate at each step, purification must be feasible between steps, and the cumulative yield must be practical.

Executability is fundamentally non-Markovian — it cannot be assessed step-by-step. Impurities from step 1 may poison the catalyst in step 3. A protecting group installed in step 2 may be incompatible with conditions required in step 5. Cumulative yield over a 10-step route with 80% per-step yield is only 10.7%. These route-level interactions make Solv-3 the hardest tier to verify computationally.

No current computational system provides Solv-3 verification. The closest proxies are condition prediction models and yield estimation, but these operate on individual steps without route-level context. True Solv-3 verification may ultimately require integration with automated synthesis platforms for experimental validation.

Section 8.1

Why Solv-1 is Not Enough

The stock-termination rate (STR) — the field's dominant success metric — measures Solv-1 solvability: can the planner find a topologically valid path from the target to purchasable starting materials? By this criterion, the navigation problem is solved. Modern planners routinely exceed 99% STR.

But 99% STR does not mean 99% chemical correctness. The experimental constraints of selectivity — which functional group reacts, at which position, with which stereochemistry — impose a filter that drastically sparsifies the space of viable routes. A planner achieving 99.7% STR recovered only 11.9% of ground-truth routes. On the RetroCast benchmark, search planners collapse from 81% reconstruction at route length 2 to just 9% at length 6.

This is the core conceptual point of the Solv-N framework: of all routes that are topologically valid (Solv-1), we expect only a fraction to survive the selectivity filter (Solv-2), and fewer still to be experimentally executable (Solv-3). We lack automated metrics to quantify this attrition precisely — but the figure below illustrates the principle. Reporting high STR without chemical validation is misleading — it conflates navigability with validity.

The three eras of computational synthesis planning, showing the transition from navigability to validity
The synthesis planning search tree. Left: the Era of Navigability — all paths to starting materials are treated equally. Right: the Era of Validity — paths are filtered by chemical correctness, revealing that most "solved" routes are invalid.

MRR-V: Measuring What Matters

Solv-N replaces the traditional binary solvability metric — "did the planner find any route?" — with a tiered framework that asks which level of chemical correctness was achieved. But even Solv-N as a pass/fail criterion at each tier is incomplete. A planner that finds a Solv-2 valid route ranked 50th is less useful than one that ranks it first. Tier-level solvability alone cannot distinguish these cases.

The Mean Reciprocal Rank for Validity (MRR-V) captures this distinction. For each target, MRR-V computes 1/k, where k is the rank of the first route that satisfies a given validity tier. If the best valid route is ranked first, MRR-V = 1. If it is ranked tenth, MRR-V = 0.1. If no valid route is found, MRR-V = 0.

MRR-V is parameterized by validity tier, yielding a family of metrics across the entire Solv-N hierarchy. MRR-V@Solv-1 asks "how highly is the first topologically valid route ranked?" MRR-V@Solv-2 asks "how highly is the first selectively valid route ranked?" Each metric is meaningful on its own terms — a planner that generates 10,000 routes but buries the only valid one at rank 8,000 has a near-zero MRR-V even if its Solv-N solvability rate looks acceptable.

MRR-V@Solv-n = (1/|Q|) * Σ_{q ∈ Q} (1 / rank_q^{Solv-n})

Solv-N Quick Reference

The solvability hierarchy for evaluating synthesis planners

Solv-0Solved

Syntactic

The output is a well-formed molecular graph — valid SMILES, balanced atoms, legal valences.

Solv-1Solved

Topological

The proposed reaction center transformation is a legal bond rearrangement — correct atom mapping, valid bond changes, and recognized reaction topology.

Solv-2Open

Selectivity

The transformation is chemically plausible given competing functional groups and stereochemical requirements — all five selectivity sub-constraints must be satisfied simultaneously.

Solv-3Open

Executability

The route is experimentally viable — realistic conditions, adequate yields, compatible purification, and successful execution from start to finish.

Morgunov et al., in review 2026