Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions
Kehan Guo, Zhen Liu, Zhichun Guo, Bozhao Nan, Olexandr Isayev, Nitesh Chawla, Olaf Wiest, Xiangliang Zhang
Abstract
Reaction yield prediction underpins computer-aided synthesis prediction (CASP). Formulated as a regression problem that takes both reactants and products as input, this task has been extensively studied using machine learning methods, based on handcrafted fingerprint features, SMILES encoded by Transformers, and molecular graphs encoded by Graph Neural Networks. However, a major limitation of these methods is their inability to effectively capture and model the underlying uncertainties, arising both from the inherently stochastic nature of chemical reaction processes and from inconsistencies or noise in how yields are measured and reported. What makes this seemingly simple regression problem even more challenging is the lack of any principled way to account for the underlying uncertainties, due to missing or unrecorded experimental process (commonly happens in chemical labs). Given these challenges, we propose a new formulation for yield prediction. Rather than assuming a single deterministic yield value for a given reaction, we model the outcome as a probabilistic distribution over three discrete yield regimes: high, medium, and low, reflecting the inherent uncertainty in the reaction process, which is often only partially observed. Accordingly, we propose Proto-Yield, an encoder-agnostic prototype network that models reactions as occurring in one of three yield regimes: high, medium, or low. Without access to full reaction processes, Proto-Yield learns to infer latent regimes and their associated yield distributions from noisy, incomplete training data. During inference, Proto-Yield outputs both a calibrated probability distribution over the yield regimes and the predicted yield conditioned on each regime. Extensive experiments on a 41,000-reaction patent corpus and two high-throughput benchmarks show that Proto-Yield improves R2 by up to 15% and reduces RMSE/MAE by 13% compared to baseline methods.
Keywords
Cite This Paper
@inproceedings{Guo2025,
author = {Guo, Kehan and Liu, Zhen and Guo, Zhichun and Nan, Bozhao and Isayev, Olexandr and Chawla, Nitesh and Wiest, Olaf and Zhang, Xiangliang},
title = {Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions},
year = {2025},
booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
pages = {791--801},
doi = {10.1145/3746252.3761323},
url = {http://dx.doi.org/10.1145/3746252.3761323},
publisher = {ACM},
keywords = {yield prediction, uncertainty modeling, prototype network, probabilistic distributions, chemical reactions},
researchAreas = {reactions, generative-ai},
citations = {0}
} Copied to clipboard!
Related Research Areas
Related Publications
Anticipating the Selectivity of Intramolecular Cyclization Reaction Pathways with Neural Network Potentials
Journal of Chemical Theory and Computation , 21 , 10362–10372 (2025)
The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
Chemical Science , 14 , 10835–10846 (2023)
Generative Models as an Emerging Paradigm in the Chemical Sciences
Journal of the American Chemical Society , 145 , 8736–8750 (2023)
Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds
Communications Chemistry , 5 (2022)
The transformational role of GPU computing and deep learning in drug discovery
Nature Machine Intelligence , 4 , 211–221 (2022)