article Reactions & Reactivity Experiment Automation

The Challenge of Balancing Model Sensitivity and Robustness in Predicting Yields: A Benchmarking Study of Amide Coupling Reactions

Zhen Liu, Yurii S. Moroz, Olexandr Isayev

2023

Highlight

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis, but current models have failed to generalize to large literature datasets.

Abstract

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41,239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfying results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.

Keywords

neural network machine learning high-throughput

Cite This Paper

@article{Liu2023,
  author = {Liu, Zhen and Moroz, Yurii S. and Isayev, Olexandr},
  title = {The Challenge of Balancing Model Sensitivity and Robustness in Predicting Yields: A Benchmarking Study of Amide Coupling Reactions},
  year = {2023},
  doi = {10.26434/chemrxiv-2023-j9h92},
  url = {http://dx.doi.org/10.26434/chemrxiv-2023-j9h92},
  publisher = {American Chemical Society (ACS)},
  keywords = {neural network, machine learning, high-throughput},
  researchAreas = {reactions-reactivity, experiment-automation},
  highlight = {Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis, but current models have failed to generalize to large literature datasets.}
}

Related Research Areas

Reactions & Reactivity Experiment Automation

Related Publications

2025

cited3

Transferable Machine Learning Interatomic Potential for Pd-Catalyzed Cross-Coupling Reactions

Anstine D., Zubatyuk R., Gallegos L., Paton R., Wiest O., Nebgen B., Jones T., Gomes G., Tretiak S., Isayev O.

(2025)

Ml Potentials

Reactions Reactivity

Experiment Automation

Materials Informatics

Finding efficient substrate-catalyst combinations for palladium-catalyzed cross-coupling reactions remains a critical challenge in synthetic chemistry, with broad implications for pharmaceutical and materials manufacturing.

DOI

2024

cited2

Accurate Ring Strain Energy Predictions with Machine Learning and Application in Strain-Promoted Reactions

Liu Z., Vinskus J., Fu Y., Liu P., Noonan K., Isayev O.

(2024)

Ml Potentials

Experiment Automation

Reactions Reactivity

Ring strain energy (RSE) is crucial for understanding molecular reactivity.

DOI

2024

cited85

Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential

Zhang S., Makoś M. Z., Jadrich R. B., Kraka E., Barros K., Nebgen B. T., Tretiak S., Isayev O., Lubbers N., Messerly R. A., Smith J. S.

Nature Chemistry, 16, 727–734 (2024)

Ml Potentials

Experiment Automation

Abstract Atomistic simulation has a broad range of applications from drug design to materials discovery.

DOI

2023

Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential

Messerly R., Zhang S., Makoś M., Jadrich R., Kraka E., Barros K., Nebgen B., Tretiak S., Isayev O., Lubbers N., Smith J.

(2023)

Ml Potentials

Experiment Automation

Reactions Reactivity

Abstract Reactive chemistry atomistic simulation has a broad range of applications from drug design to energy to materials discovery.

DOI

2021

cited94

Teaching a neural network to attach and detach electrons from molecules

Zubatyuk R., Smith J. S., Nebgen B. T., Tretiak S., Isayev O.

Nature Communications, 12 (2021)

Ml Potentials

Reactions Reactivity

Drug Discovery

Quantum Chemistry

Abstract Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations.

DOI

← Back to all publications