Publications
2025
AIMNet2-rxn: A Machine Learned Potential for Generalized Reaction Modeling on a Millions-of-Pathways Scale
Anstine D. M., Zhao Q., Zubatiuk R., Zhang S., Singla V., Nikitin F., Savoie B. M., Isayev O.
AIMNet2-rxn is a machine-learned interatomic potential trained on 4.7 10^6 range-separated DFT calculations that accelerates reaction modeling by about six orders of magnitude while retaining approximately 1–2 kcal/mol accuracy along reaction coordinates. By leveraging three‑dimensional chemical information and a batched nudged elastic band (BNEB) method, the model searches millions of reaction pathways and enables high‑throughput mechanistic analysis for complex transformations such as glucose pyrolysis.
Mechanistic modeling of chemical transformations offers a compelling basis for understanding reactivity and allows for prediction of reaction outcomes before attempting experiments. Despite progress in machine learned interatomic potentials (MLIPs), we demonstrate that available models lack the accuracy for diverse reaction modeling. With this motivation, we developed a general MLIP for mechanistic modeling of organics, AIMNet2-rxn, using a dataset of ~4.7 x 106 range-separated DFT calculations. AIMNet2-rxn enables reaction modeling ~106 faster than the reference quantum mechanical (QM) methods while significantly outperforming graph-based ML, reaffirming the value using 3D chemical information for training. On a test suite of well-known reaction mechanisms—such as amide formation, proton transfers, and pericyclics—AIMNet2-rxn yields 1-2 kcal mol-1 accuracy across reaction coordinates without retraining or system-specific fine-tuning. To exploit GPU parallelism and AIMNet2-rxn efficiency, we introduce a batched nudged elastic band (BNEB) method that readily achieves minimum energy pathway search on a millions-of-reactions scale. To demonstrate complex reaction characterization, the thermodynamics of an 11-step pathway producing hydroxymethylfurfural, the experimentally observed major product of glucose pyrolysis, is evaluated. Overall, the accuracy and efficiency afforded by AIMNet2-rxn creates opportunities in high-throughput reaction discovery and deep reaction network analysis that would be infeasible with QM methods.
Transferable Machine Learning Interatomic Potential for Pd-Catalyzed Cross-Coupling Reactions
Anstine D., Zubatyuk R., Gallegos L., Paton R., Wiest O., Nebgen B., Jones T., Gomes G., Tretiak S., Isayev O.
Finding efficient substrate-catalyst combinations for palladium-catalyzed cross-coupling reactions remains a critical challenge in synthetic chemistry, with broad implications for pharmaceutical and materials manufacturing.
Finding efficient substrate-catalyst combinations for palladium-catalyzed cross-coupling reactions remains a critical challenge in synthetic chemistry, with broad implications for pharmaceutical and materials manufacturing. We report AIMNet2-Pd, a machine learned interatomic potential that enables rapid, accurate computational studies of palladium-catalyzed cross-coupling reactions. AIMNet2-Pd replaces computationally expensive electronic structure calculations with a neural network-based model that performs geometry optimization, transition state searches, and energy calculations in seconds while maintaining accuracy within 1-2 kcal mol⁻¹ and ~0.1 Å compared to the reference QM calculations. AIMNet2-Pd makes computational high-throughput catalyst screening and mechanistic studies of realistic systems feasible by providing on-demand thermodynamic and kinetic predictions for each step of a catalytic cycle. Importantly, the applicability of the systems extends beyond the monophosphine ligands in Pd(0)/Pd(II) cycles for which it has been trained on to chemically diverse Pd complexes. This demonstrates AIMNet2-Pd's utility to serve as a general-purpose and high-throughput tool for studying catalytic reactions.
AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs
Anstine D. M., Zubatyuk R., Isayev O.
Chemical Science 16 , 10228–10244
Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff.
Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff.
All That Glitters Is Not Gold: Importance of Rigorous Evaluation of Proteochemometric Models
Avdiunina P., Jamal S., Gusev F., Isayev O.
Journal of Chemical Information and Modeling 65 , 10239–10252
All That Glitters Is Not Gold: Importance of Rigorous Evaluation of Proteochemometric Models.
Democratizing Reaction Kinetics through Machine Vision and Learning
Baumer M., Gallegos L., Anstine D., Kubaney A., Regio J., Isayev O., Bernhard S., Gomes G.
Democratizing Reaction Kinetics through Machine Vision and Learning.
We present an innovative methodology for measuring amide coupling reaction rates by monitoring pH changes via indicator dyes, achieving precision comparable to traditional NMR techniques, called PRISM (Parallelized Reaction-rates via Indicator Spectrometry using Machine-vision) The experimental design, enabled by a serial dilution, allowed for measuring twelve rate constants concurrently, spanning more than four orders of magnitude using 96-well plates, with 1,162 total rate constants collected. Moreover, the instrumentation is 3D-printed, with the remaining components comprising readily available and cost-effective hardware, promoting the democratized use of this technique to generate uniform data sets. Validation with 19F-NMR confirmed PRISM’s reliability. Computational investigations reveal a concerted asynchronous SN2 mechanism, with base-catalyzed pathways exhibiting the lowest energy barriers. To complement the PRISM rate dataset, we developed a classification model that achieves high accuracy for out-of-distribution reactants in determining rate measurability, and a chemically rich graph neural network regression model for predicting quantitative reaction rates. This approach provides a framework that offers a resource-efficient strategy for studying reaction kinetics, which can be applied to other reaction classes.
Anticipating the Selectivity of Intramolecular Cyclization Reaction Pathways with Neural Network Potentials
Casetti N., Anstine D., Isayev O., Coley C. W.
Journal of Chemical Theory and Computation 21 , 10362–10372
Anticipating the Selectivity of Intramolecular Cyclization Reaction Pathways with Neural Network Potentials.
AIQM3: Targeting Coupled-Cluster Accuracy with Semi-Empirical Speed Across Seven Main Group Elements
Chen Y., Hou Y., Zubatyuk R., Isayev O., Dral P. O.
The AIQM series methods are successful neural network-based models that target coupled-cluster accuracy while maintaining high robustness and transferability across various tasks by leveraging Δ-learning.
The AIQM series methods are successful neural network-based models that target coupled-cluster accuracy while maintaining high robustness and transferability across various tasks by leveraging Δ-learning. However, the previous AIQM1 and AIQM2 models are limited to molecular systems with four elements: H, C, N, and O, which falls short of meeting the common needs for atomistic simulations. Here, we introduce the extension—AIQM3—that covers three additional chemical elements: S, F, Cl, and approaches coupled cluster level at the speed of a semi-empirical method. AIQM3 maintains the accuracy of its predecessor AIQM2, surpasses the commonly used density functional theory (DFT) method in different types of molecular interactions, and its efficiency is competitive with that of machine learning interatomic potentials on commodity CPU hardware. AIQM3 superiority is showcased for reaction simulations and tasks related to drug design, where it delivers accurate torsion profiles for various real-world drug-like molecules. In addition, AIQM3 can be used for infrared (IR) spectra calculations at a low cost. We provide a web service for the AIQM3 calculations on the Aitomistic Hub at aitomistic.xyz, to democratize and facilitate its use with the assistance of AI agents.
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions
Guo K., Liu Z., Guo Z., Nan B., Isayev O., Chawla N., Wiest O., Zhang X.
Proceedings of the 34th ACM International Conference on Information and Knowledge Management , 791–801
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions.
Machine learning anomaly detection of automated HPLC experiments in the cloud laboratory
Gusev F., Kline B. C., Quinn R., Xu A., Smith B., Frezza B., Isayev O.
Digital Discovery 4 , 3445–3454
Autonomous experiments are vulnerable to unforeseen adverse events.
Autonomous experiments are vulnerable to unforeseen adverse events. We developed a transferable ML framework that flags affected HPLC runs in real time and provides expert-level quality control without human oversight.
Machine learning interatomic potentials at the centennial crossroads of quantum mechanics
Kalita B., Gokcan H., Isayev O.
Nature Computational Science 5 , 1120–1132
Machine learning interatomic potentials at the centennial crossroads of quantum mechanics.
AIMNet2‐NSE: A Transferable Reactive Neural Network Potential for Open‐Shell Chemistry
Kalita B., Zubatyuk R., Anstine D. M., Bergeler M., Settels V., Stork C., Spicher S., Isayev O.
Angewandte Chemie International Edition
Abstract Open‐shell systems such as radical intermediates are central to radical polymerization (RP), combustion, catalysis, and many other chemical and industrial processes, yet their accurate modeling presents significant computational challenges.
Abstract Open‐shell systems such as radical intermediates are central to radical polymerization (RP), combustion, catalysis, and many other chemical and industrial processes, yet their accurate modeling presents significant computational challenges. Most of the current machine learning interatomic potentials do not distinguish between different spin states, making them unsuitable for open‐shell reactive chemistry. Here we present AIMNet2‐NSE (neural spin‐charge equilibration), a neural network potential that incorporates spin‐charge equilibration for accurate treatment of molecules and reactions with arbitrary charge and spin multiplicities. Built upon the AIMNet2 framework, AIMNet2‐NSE is trained on an extensive dataset comprising 20 million closed‐shell neutral and charged molecules, 13 million open‐shell radical configurations, and 200K radical reaction profiles. With explicit handling of spin charges, AIMNet2‐NSE enables prediction of spin‐resolved properties with near‐DFT accuracy while maintaining a favorable linear scaling compared to the polynomial scaling of electronic structure methods. The predictive capabilities and generalizability of our model are confirmed by evaluations on large‐scale radical test sets, the industrially relevant BASChem19 benchmark, and RP reactions. Overall, AIMNet2‐NSE represents a significant advancement in machine learning interatomic potentials, allowing efficient exploration of complex open‐shell systems, and significantly advancing our ability to model radical reaction pathways and reactive intermediates in chemical processes where traditional quantum mechanical methods are computationally prohibitive.
AIMNet2‐NSE: A Transferable Reactive Neural Network Potential for Open‐Shell Chemistry
Kalita B., Zubatyuk R., Anstine D. M., Bergeler M., Settels V., Stork C., Spicher S., Isayev O.
Angewandte Chemie
Abstract Open‐shell systems such as radical intermediates are central to radical polymerization (RP), combustion, catalysis, and many other chemical and industrial processes, yet their accurate modeling presents significant computational challenges.
Abstract Open‐shell systems such as radical intermediates are central to radical polymerization (RP), combustion, catalysis, and many other chemical and industrial processes, yet their accurate modeling presents significant computational challenges. Most of the current machine learning interatomic potentials do not distinguish between different spin states, making them unsuitable for open‐shell reactive chemistry. Here we present AIMNet2‐NSE (neural spin‐charge equilibration), a neural network potential that incorporates spin‐charge equilibration for accurate treatment of molecules and reactions with arbitrary charge and spin multiplicities. Built upon the AIMNet2 framework, AIMNet2‐NSE is trained on an extensive dataset comprising 20 million closed‐shell neutral and charged molecules, 13 million open‐shell radical configurations, and 200K radical reaction profiles. With explicit handling of spin charges, AIMNet2‐NSE enables prediction of spin‐resolved properties with near‐DFT accuracy while maintaining a favorable linear scaling compared to the polynomial scaling of electronic structure methods. The predictive capabilities and generalizability of our model are confirmed by evaluations on large‐scale radical test sets, the industrially relevant BASChem19 benchmark, and RP reactions. Overall, AIMNet2‐NSE represents a significant advancement in machine learning interatomic potentials, allowing efficient exploration of complex open‐shell systems, and significantly advancing our ability to model radical reaction pathways and reactive intermediates in chemical processes where traditional quantum mechanical methods are computationally prohibitive.
Fast and Accurate Ring Strain Energy Predictions with Machine Learning and Application in Strain-Promoted Reactions
Liu Z., Vinskus J., Fu Y., Liu P., Noonan K. J. T., Isayev O.
JACS Au 5 , 4750–4761
Fast and Accurate Ring Strain Energy Predictions with Machine Learning and Application in Strain-Promoted Reactions.
Efficient Molecular Crystal Structure Prediction and Stability Assessment with AIMNet2 Neural Network Potentials
Nayal K. S., O’Connor D., Zubatyuk R., Anstine D. M., Yang Y., Tom R., Deng W., Tang K., Marom N., Isayev O.
Crystal Growth & Design 25 , 9092–9106
Efficient Molecular Crystal Structure Prediction and Stability Assessment with AIMNet2 Neural Network Potentials.
Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy
Nikitin F., Anstine D. M., Zubatyuk R., Paliwal S. G., Isayev O.
Molecular geometry is crucial for biological activity and chemical reactivity; however, computational methods for generating 3D structures are limited by the vast scale of conformational space and the complexities of stereochemistry.
Molecular geometry is crucial for biological activity and chemical reactivity; however, computational methods for generating 3D structures are limited by the vast scale of conformational space and the complexities of stereochemistry. Here we present an approach that combines an expansive dataset of molecular conformers with generative diffusion models to address this problem. We introduce ChEMBL3D, which contains over 250 million molecular geometries for 1.8 million drug-like compounds, optimized using AIMNet2 neural network potentials to a near-quantum mechanical accuracy with implicit solvent effects included. This dataset captures complex organic molecules in various protonation states and stereochemical configurations. We then developed LoQI, a stereochemistry-aware diffusion model that learns molecular geometry distributions directly from this data. Through graph augmentation, LoQI accurately generates molecular structures with targeted stereochemistry, representing a significant advance in modeling capabilities over previous generative methods. The model outperforms traditional approaches, achieving up to tenfold improvements in energy accuracy and effective recovery of optimal conformations. Benchmark tests on complex systems, including macrocycles and flexible molecules, as well as validation with crystal structures, show LoQI can perform low energy conformer search efficiently. The model code and dataset are available at https: //github.com/isayevlab/LoQI.
GEOM-drugs revisited: toward more chemically accurate benchmarks for 3D molecule generation
Nikitin F., Dunn I., Koes D. R., Isayev O.
Digital Discovery 4 , 3282–3291
Revisiting GEOM drugs: corrected metrics and novel energy-based structural benchmark enable rigorous evaluation of 3D molecule generative models.
Revisiting GEOM drugs: corrected metrics and novel energy-based structural benchmark enable rigorous evaluation of 3D molecule generative models.
Design of Tough 3D Printable Elastomers with Human‐in‐the‐Loop Reinforcement Learning
Rapp J. L., Anstine D. M., Gusev F., Nikitin F., Yun K. H., Borden M. A., Bhat V., Isayev O., Leibfarth F. A.
Angewandte Chemie 137
Abstract The development of high‐performance elastomers for additive manufacturing requires overcoming complex property trade‐offs that challenge conventional material discovery pipelines.
Abstract The development of high‐performance elastomers for additive manufacturing requires overcoming complex property trade‐offs that challenge conventional material discovery pipelines. Here, a human‐in‐the‐loop reinforcement learning (RL) approach is used to discover polyurethane elastomers that overcome pervasive stress–strain property tradeoffs. Starting with a diverse training set of 92 formulations, a coupled multi‐component reward system was identified that guides RL agents toward materials with both high strength and extensibility. Through three rounds of iterative optimization combining RL predictions with human chemical intuition, we identified elastomers with more than double the average toughness compared to the initial training set. The final exploitation round, aided by solubility prescreening, predicted twelve materials exhibiting both high strength (>10 MPa) and high strain at break (>200%). Analysis of the high‐performing materials revealed structure‐property insights, including the benefits of high molar mass urethane oligomers, a high density of urethane functional groups, and incorporation of rigid low molecular weight diols and unsymmetric diisocyanates. These findings demonstrate that machine‐guided, human‐augmented design is a powerful strategy for accelerating polymer discovery in applications where data is scarce and expensive to acquire, with broad applicability to multi‐objective materials optimization.
Design of Tough 3D Printable Elastomers with Human‐in‐the‐Loop Reinforcement Learning
Rapp J. L., Anstine D. M., Gusev F., Nikitin F., Yun K. H., Borden M. A., Bhat V., Isayev O., Leibfarth F. A.
Angewandte Chemie International Edition 64
Abstract The development of high‐performance elastomers for additive manufacturing requires overcoming complex property trade‐offs that challenge conventional material discovery pipelines.
Abstract The development of high‐performance elastomers for additive manufacturing requires overcoming complex property trade‐offs that challenge conventional material discovery pipelines. Here, a human‐in‐the‐loop reinforcement learning (RL) approach is used to discover polyurethane elastomers that overcome pervasive stress–strain property tradeoffs. Starting with a diverse training set of 92 formulations, a coupled multi‐component reward system was identified that guides RL agents toward materials with both high strength and extensibility. Through three rounds of iterative optimization combining RL predictions with human chemical intuition, we identified elastomers with more than double the average toughness compared to the initial training set. The final exploitation round, aided by solubility prescreening, predicted twelve materials exhibiting both high strength (>10 MPa) and high strain at break (>200%). Analysis of the high‐performing materials revealed structure‐property insights, including the benefits of high molar mass urethane oligomers, a high density of urethane functional groups, and incorporation of rigid low molecular weight diols and unsymmetric diisocyanates. These findings demonstrate that machine‐guided, human‐augmented design is a powerful strategy for accelerating polymer discovery in applications where data is scarce and expensive to acquire, with broad applicability to multi‐objective materials optimization.
Machine Learning-Accelerated Screening of Hydroquinone Analogs for Proton-Coupled Electron Transfer
Sarma R., Wang Y., Hebert D., Tran E., Shao C., Fu S., Cho I., Isayev O., Garcia-Bosch I.
Proton-coupled electron transfer (PCET) mediated by hydroquinone and related molecules is key to natural and artificial energy conversion.
Proton-coupled electron transfer (PCET) mediated by hydroquinone and related molecules is key to natural and artificial energy conversion. The reactivity of these molecules depends on their bond dissociation free energy (BDFE), but studying the relationship between structure and thermochemistry across chemical space has been limited by computational expense. Here, we present the first use of the AIMNet2 neural network potential to calculate average BDFE (BDFEavg) values for the 2H+/2e− dehydrogenation of about 200,000 hydroquinone-like compounds, including vicinal diamines, diols, and dithiols. Benchmarking against DFT calculations for 168 substituted ortho-phenylenediamines (opda) shows good agreement (R² > 0.9). Our analysis finds that BDFEavg ranges from 50 to 80 kcal/mol and can be systematically tuned by modifying the backbone and N-substitution: electron-withdrawing groups raise BDFEavg by up to 15 kcal/mol, while lower aromaticity in furan and thiophene backbones decreases BDFEavg by approximately 10 kcal/mol compared to phenyl systems. We developed an additive "offset model" that allows separate investigation of backbone and sidechain effects. Validation through cyclic voltammetry and reactivity studies with quinone oxidants for selected compounds supports the computational results. This extensive thermochemical database and web-based prediction tool offer valuable resources for designing PCET reagents for catalysis, energy storage, and biomedical uses.
ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules
Zhang S., Zubatyuk R., Yang Y., Roitberg A., Isayev O.
Journal of Chemical Theory and Computation 21 , 4365–4374
ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules.
Including Physics-Informed Atomization Constraints in Neural Networks for Reactive Chemistry
Zhang S., Chigaev M., Isayev O., Messerly R. A., Lubbers N.
Journal of Chemical Information and Modeling 65 , 4367–4380
Including Physics-Informed Atomization Constraints in Neural Networks for Reactive Chemistry.
Discovery of Novel Celecoxib Polymorphs Using AIMNet2 Machine Learning Interatomic Potential
Zheng P., Abramov Y., Sun C. C., Isayev O.
Polymorphism plays a pivotal role in defining the solid-state properties of pharmaceutical compounds, yet the discovery and accurate energy ranking of polymorphs remain a challenge.
Polymorphism plays a pivotal role in defining the solid-state properties of pharmaceutical compounds, yet the discovery and accurate energy ranking of polymorphs remain a challenge. Here, we leverage a fine-tuned machine-learned interatomic potential AIMNet2 to explore the polymorphic landscape of celecoxib, a clinically important COX-2 inhibitor. Our approach combines GPU-accelerated crystal structure generation, active learning-guided model refinement, and quasi-harmonic free-energy corrections. The workflow successfully reproduces the experimental energy hierarchy of known polymorphs and identifies several novel low-energy structures with distinct packing motifs. In addition, we evaluate the elastic properties and thermal expansion effects across polymorphs, revealing structural features that underpin mechanical flexibility and thermodynamic preferences. This study demonstrates the power of AIMNet2-based crystal structure prediction for resolving complex pharmaceutical polymorphism and offers a powerful tool for future polymorph discovery and solid-state optimization.
High-throughput electronic property prediction of cyclic molecules with 3D-enhanced machine learning
Zheng P., Isayev O.
Chemical Science 16 , 20553–20563
Ring Vault contains 201 546 cyclic molecules across 11 elements.
Ring Vault contains 201 546 cyclic molecules across 11 elements. AIMNet2 with 3D information outperformed 2D models in predicting the electronic properties of cyclic molecules.
2024
Uncertainty-Aware Yield Prediction with Multimodal Molecular Features
Chen J., Guo K., Liu Z., Isayev O., Zhang X.
AAAI Conference on Artificial Intelligence 38 , 8274–8282
Predicting chemical reaction yields is pivotal for efficient chemical synthesis, an area that focuses on the creation of novel compounds for diverse uses.
MLatom 3: A Platform for Machine Learning-Enhanced Computational Chemistry Simulations and Workflows
Dral P. O., Ge F., Hou Y., Zheng P., Chen Y., Barbatti M., Isayev O., Wang C., Xue B., Pinheiro Jr M., et al.
J. Chem. Theory Comput. 20 , 1193–1213
In silico screening of LRRK2 WDR domain inhibitors using deep docking and free energy simulations
Gutkin E., Gusev F., Gentile F., Ban F., Koby S. B., Narangoda C., Isayev O., Cherkasov A., Kurnikova M. G.
Chemical Science 15 , 8800–8812
In this work, we combined Deep Docking and free energy MD simulations for the in silico screening and experimental validation for potential inhibitors of leucine rich repeat kinase 2 (LRRK2) targeting the WD40 repeat (WDR) domain.
In this work, we combined Deep Docking and free energy MD simulations for the in silico screening and experimental validation for potential inhibitors of leucine rich repeat kinase 2 (LRRK2) targeting the WD40 repeat (WDR) domain.
ANI/EFP: Modeling Long-Range Interactions in ANI Neural Network with Effective Fragment Potentials
Haghiri S., Viquez Rojas C., Bhat S., Isayev O., Slipchenko L.
Journal of Chemical Theory and Computation 20 , 9138–9147
ANI/EFP: Modeling Long-Range Interactions in ANI Neural Network with Effective Fragment Potentials.
Discovery of Crystallizable Organic Semiconductors with Machine Learning
Johnson H. M., Gusev F., Dull J. T., Seo Y., Priestley R. D., Isayev O., Rand B. P.
J. Am. Chem. Soc. 146 , 21583–21590
Discovery of Crystallizable Organic Semiconductors with Machine Learning.
De novo molecule design towards biased properties via a deep generative framework and iterative transfer learning
Sattari K., Li D., Kalita B., Xie Y., Lighvan F. B., Isayev O., Lin J.
Digital Discovery 3 , 410–421
The RRCGAN, validated through DFT, demonstrates success in generating chemically valid molecules targeting energy gap values with 75% of the generated molecules have RE of <20% of the targeted values.
The RRCGAN, validated through DFT, demonstrates success in generating chemically valid molecules targeting energy gap values with 75% of the generated molecules have RE of <20% of the targeted values.
Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR
Tropsha A., Isayev O., Varnek A., Schneider G., Cherkasov A.
Nat. Rev. Drug Discov. 23 , 141–155
Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential
Zhang S., Makoś M. Z., Jadrich R. B., Kraka E., Barros K., Nebgen B. T., Tretiak S., Isayev O., Lubbers N., Messerly R. A., et al.
Nature Chemistry 16 , 727–734
Abstract Atomistic simulation has a broad range of applications from drug design to materials discovery.
Abstract Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.
2023
Generative Models as an Emerging Paradigm in the Chemical Sciences
Anstine D. M., Isayev O.
J. Am. Chem. Soc. 145 , 8736–8750
Generative Models as an Emerging Paradigm in the Chemical Sciences.
Machine Learning Interatomic Potentials and Long-Range Physics
Anstine D. M., Isayev O.
J. Phys. Chem. A 127 , 2417–2431
Machine Learning Interatomic Potentials and Long-Range Physics.
Themed collection on Insightful Machine Learning for Physical Chemistry
Clark A. E., Dral P. O., Tamblyn I., Isayev O.
Physical Chemistry Chemical Physics 25 , 22563–22564
This themed collection includes a collection of articles on Insightful Machine Learning for Physical Chemistry.
This themed collection includes a collection of articles on Insightful Machine Learning for Physical Chemistry.
Synergy of semiempirical models and machine learning in computational chemistry
Fedik N., Nebgen B., Lubbers N., Barros K., Kulichenko M., Li Y. W., Zubatyuk R., Messerly R., Isayev O., Tretiak S.
J. Chem. Phys. 159 , 110901
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches.
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort—design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling
Gusev F., Gutkin E., Kurnikova M. G., Isayev O.
J. Chem. Inf. Model. 63 , 583–594
Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling.
Scalable hybrid deep neural networks/polarizable potentials biomolecular simulations including long-range effects
Jaffrelot Inizan T., Pl{\'e} T., Adjoua O., Ren P., Gokcan H., Isayev O., Lagard{\`e}re L., Piquemal J.
Chem. Sci. 14 , 5438–5452
Deep-HP is a scalable extension of the Tinker-HP multi-GPU molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Network (DNN) models.
Deep-HP is a scalable extension of the Tinker-HP multi-GPU molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Network (DNN) models.
The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
Liu Z., Moroz Y. S., Isayev O.
Chem. Sci. 14 , 10835–10846
A sensitive model captures the reactivity cliffs but overfit to yield outliers.
A sensitive model captures the reactivity cliffs but overfit to yield outliers. On the other hand, a robust model disregards the yield outliers but underfits the reactivity cliffs.
Structure Prediction of Epitaxial Organic Interfaces with Ogre, Demonstrated for Tetracyanoquinodimethane (TCNQ) on Tetrathiafulvalene (TTF)
Moayedpour S., Bier I., Wen W., Dardzinski D., Isayev O., Marom N.
J. Phys. Chem. C 127 , 10398–10410
Structure Prediction of Epitaxial Organic Interfaces with Ogre, Demonstrated for Tetracyanoquinodimethane (TCNQ., materials science
Comprehensive exploration of graphically defined reaction spaces
Zhao Q., Vaddadi S. M., Woulfe M., Ogunfowora L. A., Garimella S. S., Isayev O., Savoie B. M.
Sci. Data 10 , 145
Abstract Existing reaction transition state (TS) databases are comparatively small and lack chemical diversity.
$Δ^2$ machine learning for reaction property prediction
Zhao Q., Anstine D. M., Isayev O., Savoie B. M.
Chem. Sci. 14 , 13392–13401
Newly developed Δ 2 -learning models enable state-of-the-art accuracy in predicting the properties of chemical reactions.
Newly developed Δ 2 -learning models enable state-of-the-art accuracy in predicting the properties of chemical reactions.
2022
Extending machine learning beyond interatomic potentials for predicting molecular properties
Fedik N., Zubatyuk R., Kulichenko M., Lubbers N., Smith J. S., Nebgen B., Messerly R., Li Y. W., Boldyrev A. I., Barros K., et al.
Nat. Rev. Chem. 6 , 653–672
Extending machine learning beyond interatomic potentials for predicting molecular properties.
Learning molecular potentials with neural networks
Gokcan H., Isayev O.
WIREs Comput. Mol. Sci. 12 , e1564
AbstractThe potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry.
AbstractThe potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability.This article is categorized under:Data Science > Artificial Intelligence/Machine LearningMolecular and Statistical Mechanics > Molecular InteractionsSoftware > Molecular Modeling
Simulations of Pathogenic E1α Variants: Allostery and Impact on Pyruvate Dehydrogenase Complex-E1 Structure and Function
Gokcan H., Bedoyan J. K., Isayev O.
Journal of Chemical Information and Modeling 62 , 3463–3475
Simulations of Pathogenic E1α Variants: Allostery and Impact on Pyruvate Dehydrogenase Complex-E1 Structure and Function.
Prediction of Protein pKa with Representation Learning
Gokcan H., Isayev O.
The behavior of proteins is closely related to the protonation states of the residues.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pKa are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pKa prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pKa for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.
Prediction of Protein pKa with Representation Learning
Gokcan H., Isayev O.
The behavior of proteins is closely related to the protonation states of the residues.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pKa are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pKa prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pKa for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.
Prediction of protein pKawith representation learning
Gokcan H., Isayev O.
Chemical Science 13 , 2462–2474
We developed new empirical ML model for protein pKaprediction with MAEs below 0.
We developed new empirical ML model for protein pKaprediction with MAEs below 0.5 for all amino acid types.
Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds
Korshunova M., Huang N., Capuzzi S., Radchenko D. S., Savych O., Moroz Y. S., Wells C. I., Willson T. M., Tropsha A., Isayev O.
Communications Chemistry 5
AbstractDeep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties.
AbstractDeep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse rewards as the majority of the generated molecules are expectedly predicted as inactives. We propose several technical innovations to address this problem and improve the balance between exploration and exploitation modes in reinforcement learning. In a proof-of-concept study, we demonstrate the application of the deep generative recurrent neural network architecture enhanced by several proposed technical tricks to design inhibitors of the epidermal growth factor (EGFR) and further experimentally validate their potency. The proposed technical solutions are expected to substantially improve the success rate of finding novel bioactive compounds for specific biological targets using generative and reinforcement learning approaches.
Roadmap on Machine learning in electronic structure
Kulik H. J., Hammerschmidt T., Schmidt J., Botti S., Marques M. A. L., Boley M., Scheffler M., Todorovi{\'c} M., Rinke P., Oses C., et al.
Electron. Struct. 4 , 023004
AbstractIn recent years, we have been witnessing a paradigm shift in computational materials science.
Auto3D: Automatic Generation of the Low-Energy 3D Structures with ANI Neural Network Potentials
Liu Z., Zubatiuk T., Roitberg A., Isayev O.
J. Chem. Inf. Model. 62 , 5373–5382
The transformational role of GPU computing and deep learning in drug discovery
Pandey M., Fernandez M., Gentile F., Isayev O., Tropsha A., Stern A. C., Cherkasov A.
Nature Machine Intelligence 4 , 211–221
The transformational role of GPU computing and deep learning in drug discovery.
Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods
Zheng P., Yang W., Wu W., Isayev O., Dral P. O.
J. Phys. Chem. Lett. 13 , 3479–3491
Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods.
2021
Best practices in machine learning for chemistry
Artrith N., Butler K. T., Coudert F., Han S., Isayev O., Jain A., Walsh A.
Nature Chemistry 13 , 505–508
Best practices in machine learning for chemistry.
Crowdsourced mapping of unexplored target space of kinase inhibitors
Cichońska A., Ravikumar B., Allaway R. J., Wan F., Park S., Isayev O., Li S., Mason M., Lamb A., Tanoli Z., et al.
Nature Communications 12
Abstract Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged.
Abstract Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound–kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
Harnessing the Power of Smart and Connected Health to Tackle COVID-19: IoT, AI, Robotics, and Blockchain for a Better World
Firouzi F., Farahani B., Daneshmand M., Grise K., Song J., Saracco R., Wang L. L., Lo K., Angelov P., Soares E., et al.
IEEE Internet of Things Journal 8 , 12826–12846
Harnessing the Power of Smart and Connected Health to Tackle COVID-19: IoT, AI, Robotics, and Blockchain for a Better World.
Active Learning in Bayesian Neural Networks for Bandgap Predictions of Novel Van der Waals Heterostructures
Fronzi M., Isayev O., Winkler D. A., Shapter J. G., Ellis A. V., Sherrell P. C., Shepelin N. A., Corletto A., Ford M. J.
Advanced Intelligent Systems 3
The bandgap is one of the most fundamental properties of condensed matter.
The bandgap is one of the most fundamental properties of condensed matter. However, an accurate calculation of its value, which could potentially allow experimentalists to identify materials suitable for device applications, is very computationally expensive. Here, active machine learning algorithms are used to leverage a limited number of accurate density functional theory calculations to robustly predict the bandgap of a very large number of novel 2D heterostructures. Using this approach, a database of ≈2.2 million bandgap values for various novel 2D van der Waals heterostructures is produced.
Prediction of Protein pKa with Representation Learning
Gokcan H., Isayev O.
The behavior of proteins is closely related to the protonation states of the residues.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pKa are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pKa prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pKa for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.
Prediction of Protein pKa with Representation Learning
Gokcan H., Isayev O.
The behavior of proteins is closely related to the protonation states of the residues.
The behavior of proteins is closely related to the protonation states of the residues. Therefore, prediction and measurement of pKa are essential to understand the basic functions of proteins. In this work, we develop a new empirical scheme for protein pKa prediction that is based on deep representation learning. It combines machine learning with atomic environment vector (AEV) and learned quantum mechanical representation from ANI-2x neural network potential (J. Chem. Theory Comput. 2020, 16, 4192). The scheme requires only the coordinate information of a protein as the input and separately estimates the pKa for all five titratable amino acid types. The accuracy of the approach was analyzed with both cross-validation and an external test set of proteins. Obtained results were compared with the widely used empirical approach PROPKA. The new empirical model provides accuracy with MAEs below 0.5 for all amino acid types. It surpasses the accuracy of PROPKA and performs significantly better than the null model. Our model is also sensitive to the local conformational changes and molecular interactions.
A Bag of Tricks for Automated De Novo Design of Molecules with the Desired Properties: Application to EGFR Inhibitor Discovery
Korshunova M., Huang N., Capuzzi S., Radchenko D. S., Savych O., Moroz Y. S., Wells C., Willson T. M., Tropsha A., Isayev O.
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties.
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse rewards as the majority of the generated molecules are expectedly predicted as inactives. We propose several technical innovations to address this problem and improve the balance between exploration and exploitation modes in reinforcement learning. In a proof-of-concept study, we demonstrate the application of the deep generative recurrent neural network enhanced by several novel technical tricks to designing experimentally validated potent inhibitors of the epidermal growth factor (EGFR). The proposed technical solutions are expected to substantially improve the success rate of finding novel bioactive compounds for specific biological targets using generative and reinforcement learning approaches.
OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design
Korshunova M., Ginsburg B., Tropsha A., Isayev O.
Journal of Chemical Information and Modeling 61 , 7–13
OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design.
A critical overview of computational approaches employed for COVID-19 drug discovery
Muratov E. N., Amaro R., Andrade C. H., Brown N., Ekins S., Fourches D., Isayev O., Kozakov D., Medina-Franco J. L., Merz K. M., et al.
Chemical Society Reviews 50 , 9121–9151
We cover diverse methodologies, computational approaches, and case studies illustrating the ongoing efforts to develop viable drug candidates for treatment of COVID-19.
We cover diverse methodologies, computational approaches, and case studies illustrating the ongoing efforts to develop viable drug candidates for treatment of COVID-19.
Machine-Learning-Guided Discovery of 19 F MRI Agents Enabled by Automated Copolymer Synthesis
Reis M., Gusev F., Taylor N. G., Chung S. H., Verber M. D., Lee Y. Z., Isayev O., Leibfarth F. A.
Journal of the American Chemical Society 143 , 17677–17689
Machine-Learning-Guided Discovery of 19 F MRI Agents Enabled by Automated Copolymer Synthesis.
Artificial intelligence-enhanced quantum chemical method with broad applicability
Zheng P., Zubatyuk R., Wu W., Isayev O., Dral P. O.
Nature Communications 12
Abstract High-level quantum mechanical (QM) calculations are indispensable for accurate explanation of natural phenomena on the atomistic level.
Abstract High-level quantum mechanical (QM) calculations are indispensable for accurate explanation of natural phenomena on the atomistic level. Their staggering computational cost, however, poses great limitations, which luckily can be lifted to a great extent by exploiting advances in artificial intelligence (AI). Here we introduce the general-purpose, highly transferable artificial intelligence–quantum mechanical method 1 (AIQM1). It approaches the accuracy of the gold-standard coupled cluster QM method with high computational speed of the approximate low-level semiempirical QM methods for the neutral, closed-shell species in the ground state. AIQM1 can provide accurate ground-state energies for diverse organic compounds as well as geometries for even challenging systems such as large conjugated compounds (fullerene C 60 ) close to experiment. This opens an opportunity to investigate chemical compounds with previously unattainable speed and accuracy as we demonstrate by determining geometries of polyyne molecules—the task difficult for both experiment and theory. Noteworthy, our method’s accuracy is also good for ions and excited-state properties, although the neural network part of AIQM1 was never fitted to these properties.
Machine learned Hückel theory: Interfacing physics and deep neural networks
Zubatiuk T., Nebgen B., Lubbers N., Smith J. S., Zubatyuk R., Zhou G., Koh C., Barros K., Isayev O., Tretiak S.
The Journal of Chemical Physics 154
The Hückel Hamiltonian is an incredibly simple tight-binding model known for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials.
The Hückel Hamiltonian is an incredibly simple tight-binding model known for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these empirical parameters with machine-learned dynamic values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability, while the deep neural network parameterization is smooth and accurate and reproduces insightful features of the original empirical parameterization. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models.
Development of Multimodal Machine Learning Potentials: Toward a Physics-Aware Artificial Intelligence
Zubatiuk T., Isayev O.
Accounts of Chemical Research 54 , 1575–1585
Development of Multimodal Machine Learning Potentials: Toward a Physics-Aware Artificial Intelligence.
Teaching a neural network to attach and detach electrons from molecules
Zubatyuk R., Smith J. S., Nebgen B. T., Tretiak S., Isayev O.
Nature Communications 12
Abstract Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations.
Abstract Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.
2020
Crowdsourced mapping extends the target space of kinase inhibitors
Cichonska A., Ravikumar B., Allaway R. J., Park S., Wan F., Isayev O., Li S., Mason M., Lamb A., Tanoli Z., et al.
AbstractDespite decades of intensive search for compounds that modulate the activity of particular targets, there are currently small-molecules available only for a small proportion of the human proteome.
AbstractDespite decades of intensive search for compounds that modulate the activity of particular targets, there are currently small-molecules available only for a small proportion of the human proteome. Effective approaches are therefore required to map the massive space of unexplored compound-target interactions for novel and potent activities. Here, we carried out a crowdsourced benchmarking of predictive models for kinase inhibitor potencies across multiple kinase families using unpublished bioactivity data. The top-performing predictions were based on kernel learning, gradient boosting and deep learning, and their ensemble resulted in predictive accuracy exceeding that of kinase activity assays. We then made new experiments based on the model predictions, which further improved the accuracy of experimental mapping efforts and identified unexpected potencies even for under-studied kinases. The open-source algorithms together with the novel bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking new prediction algorithms and for extending the druggable kinome.
Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens
Devereux C., Smith J. S., Huddleston K. K., Barros K., Zubatyuk R., Isayev O., Roitberg A. E.
Journal of Chemical Theory and Computation 16 , 4192–4202
Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens.
High Throughput Screening of Millions of van der Waals Heterostructures for Superlubricant Applications
Fronzi M., Tawfik S. A., Ghazaleh M. A., Isayev O., Winkler D. A., Shapter J., Ford M. J.
Advanced Theory and Simulations 3
AbstractThe screening of novel materials is an important topic in the field of materials science.
AbstractThe screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first‐principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state‐of‐the‐art computational resources. Additionally, they can often be extremely time consuming. A time and resource efficient machine learning approach to create a dataset of structural properties of 18 million van der Waals layered structures is described. In particular, the authors focus on the interlayer energy and the elastic constant of layered materials composed of two different 2D structures that are important for novel solid lubricant and super‐lubricant materials. It is shown that machine learning models can predict results of computationally expansive approaches (i.e., density functional theory) with high accuracy.
TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials
Gao X., Ramezanghorbani F., Isayev O., Smith J. S., Roitberg A. E.
Journal of Chemical Information and Modeling 60 , 3408–3415
TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials.
TorchANI: A Free and Open Source PyTorch Based Deep Learning Implementation of the ANI Neural Network Potentials
Gao X., Ramezanghorbani F., Isayev O., Smith J., Roitberg A.
This paper presents TorchANI, a PyTorch based software for training/inferenceof ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces andother physical properties of molecular systems.
This paper presents TorchANI, a PyTorch based software for training/inferenceof ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces andother physical properties of molecular systems. ANI is an accurate neural networkpotential originally implemented using C++/CUDA in a program called NeuroChem.Compared with NeuroChem, TorchANI has a design emphasis on being light weight,user friendly, cross platform, and easy to read and modify for fast prototyping, whileallowing acceptable sacrifice on running performance. Because the computation ofatomic environmental vectors (AEVs) and atomic neural networks are all implementedusing PyTorch operators, TorchANI is able to use PyTorch’s autograd engine to automatically compute analytical forces and Hessian matrices, as well as do force trainingwithout additional codes required.
Review for: Assessing Conformer Energies using Electronic Structure and Machine Learning Methods
Isayev O.
Review for: Assessing Conformer Energies using Electronic Structure and Machine Learning Methods.
Correction: QSAR without borders
Muratov E. N., Bajorath J., Sheridan R. P., Tetko I. V., Filimonov D., Poroikov V., Oprea T. I., Baskin I. I., Varnek A., Roitberg A., et al.
Chemical Society Reviews 49 , 3716–3716
Correction: QSAR without borders.
Correction for ‘QSAR without borders’ by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.
QSAR without borders
Muratov E. N., Bajorath J., Sheridan R. P., Tetko I. V., Filimonov D., Poroikov V., Oprea T. I., Baskin I. I., Varnek A., Roitberg A., et al.
Chemical Society Reviews 49 , 3525–3564
Word cloud summary of diverse topics associated with QSAR modeling that are discussed in this review.
Word cloud summary of diverse topics associated with QSAR modeling that are discussed in this review.
DRACON: disconnected graph neural network for atom mapping in chemical reactions
Nikitin F., Isayev O., Strijov V.
Physical Chemistry Chemical Physics 22 , 26478–26486
We formulate a reaction prediction problem in terms of node-classification in a disconnected graph of source molecules and generalize a graph convolution neural network for disconnected graphs.
We formulate a reaction prediction problem in terms of node-classification in a disconnected graph of source molecules and generalize a graph convolution neural network for disconnected graphs.
Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials
Rufa D. A., Bruce Macdonald H. E., Fass J., Wieder M., Grinaway P. B., Roitberg A. E., Isayev O., Chodera J. D.
Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials.
AbstractAlchemical free energy methods with molecular mechanics (MM) force fields are now widely used in the prioritization of small molecules for synthesis in structure-enabled drug discovery projects because of their ability to deliver 1–2 kcal mol−1accuracy in well-behaved protein-ligand systems. Surpassing this accuracy limit would significantly reduce the number of compounds that must be synthesized to achieve desired potencies and selectivities in drug design campaigns. However, MM force fields pose a challenge to achieving higher accuracy due to their inability to capture the intricate atomic interactions of the physical systems they model. A major limitation is the accuracy with which ligand intramolecular energetics—especially torsions—can be modeled, as poor modeling of torsional profiles and coupling with other valence degrees of freedom can have a significant impact on binding free energies. Here, we demonstrate how a new generation of hybrid machine learning / molecular mechanics (ML/MM) potentials can deliver significant accuracy improvements in modeling protein-ligand binding affinities. Using a nonequilibrium perturbation approach, we can correct a standard, GPU-accelerated MM alchemical free energy calculation in a simple post-processing step to efficiently recover ML/MM free energies and deliver a significant accuracy improvement with small additional computational effort. To demonstrate the utility of ML/MM free energy calculations, we apply this approach to a benchmark system for predicting kinase:inhibitor binding affinities—a congeneric ligand series for non-receptor tyrosine kinase TYK2 (Tyk2)—wherein state-of-the-art MM free energy calculations (with OPLS2.1) achieve inaccuracies of 0.93±0.12 kcal mol−1in predicting absolute binding free energies. Applying an ML/MM hybrid potential based on the ANI2x ML model and AMBER14SB/TIP3P with the OpenFF 1.0.0 (“Parsley”) small molecule force field as an MM model, we show that it is possible to significantly reduce the error in absolute binding free energies from 0.97 [95% CI: 0.68, 1.21] kcal mol−1(MM) to 0.47 [95% CI: 0.31, 0.63] kcal mol−1(ML/MM).
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules
Smith J. S., Zubatyuk R., Nebgen B., Lubbers N., Barros K., Roitberg A. E., Isayev O., Tretiak S.
Scientific Data 7
Abstract Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models.
Abstract Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
2019
Inter-Modular Linkers play a crucial role in governing the biosynthesis of non-ribosomal peptides
Farag S., Bleich R. M., Shank E. A., Isayev O., Bowers A. A., Tropsha A.
Bioinformatics 35 , 3584–3591
Abstract Motivation Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals and anti-cancer agents.
Abstract Motivation Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. Results We explored whether Inter-Modular Linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39 804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39 232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. Availability and implementation The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the project is hosted in GitHub repository (https://github.com/SWFarag/nrps-linker). Supplementary information Supplementary data are available at Bioinformatics online.
Quantitative Structure–Price Relationship (QS$R) Modeling and the Development of Economically Feasible Drug Discovery Projects
Fernandez M., Ban F., Woo G., Isaev O., Perez C., Fokin V., Tropsha A., Cherkasov A.
Journal of Chemical Information and Modeling 59 , 1306–1313
Quantitative Structure–Price Relationship (QS$R) Modeling and the Development of Economically Feasible Drug Discovery Projects.
Text mining facilitates materials discovery
Isayev O.
Nature 571 , 42–43
Text mining facilitates materials discovery.
Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen
Menden M. P., Wang D., Mason M. J., Szalai B., Bulusu K. C., Guan Y., Yu T., Kang J., Jeon M., Wolfinger R., et al.
Nature Communications 10
Abstract The effectiveness of most cancer targeted therapies is short-lived.
Abstract The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca’s large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning
Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Tretiak S., Isayev O., Roitberg A. E.
Nature Communications 10
Abstract Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist’s toolset.
Abstract Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist’s toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.
Predicting Thermal Properties of Crystals Using Machine Learning
Tawfik S. A., Isayev O., Spencer M. J. S., Winkler D. A.
Advanced Theory and Simulations 3
AbstractCalculating vibrational properties of crystals using quantum mechanical (QM) methods is a challenging problem in computational material science.
AbstractCalculating vibrational properties of crystals using quantum mechanical (QM) methods is a challenging problem in computational material science. This problem is solved using complementary machine learning methods that rapidly and reliably recapitulate entropy, specific heat, effective polycrystalline dielectric function, and a non‐vibrational property (band gap) for materials calculated by accurate but lengthy QM methods. The materials are described mathematically using property‐labeled materials fragment descriptors. The machine learning models predict the QM properties with root mean square errors of 0.31 meV per atom per K for entropy, 0.18 meV per atom per K for specific heat, 4.41 for the trace of the dielectric tensor, and 0.5 eV for band gap. These models are sufficiently accurate to allow rapid screening of large numbers of crystal structures to accelerate material discovery.
Adsorption of nitrogen-containing compounds on hydroxylated α-quartz surfaces
Tsendra O., Boese A. D., Isayev O., Gorb L., Scott A. M., Hill F. C., Ilchenko M. M., Lobanov V., Leszczynska D., Leszczynski J.
RSC Advances 9 , 36066–36074
Adsorption energies of different nitrogen-containing compounds on two hydroxylated (001) and (100) quartz surfaces are computed.
Adsorption energies of different nitrogen-containing compounds on two hydroxylated (001) and (100) quartz surfaces are computed.
Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network
Zubatyuk R., Smith J. S., Leszczynski J., Isayev O.
Science Advances 5
We introduce a modular, chemically inspired deep neural network model for prediction of several atomic and molecular properties.
We introduce a modular, chemically inspired deep neural network model for prediction of several atomic and molecular properties.
2018
Machine learning for molecular and materials science
Butler K. T., Davies D. W., Cartwright H., Isayev O., Walsh A.
Nature 559 , 547–555
Machine learning for molecular and materials science.
Materials discovery by chemical analogy: role of oxidation states in structure prediction
Davies D. W., Butler K. T., Isayev O., Walsh A.
Faraday Discussions 211 , 553–568
We have built a model that ascribes probabilities to the formation of hypothetical compounds, given the proposed oxidation states of the constituent species.
We have built a model that ascribes probabilities to the formation of hypothetical compounds, given the proposed oxidation states of the constituent species.
Diffusion of energetic compounds through biological membrane: Application of classical MD and COSMOmic approximations
Golius A., Gorb L., Isayev O., Leszczynski J.
Journal of Biomolecular Structure and Dynamics 37 , 247–255
Diffusion of energetic compounds through biological membrane: Application of classical MD and COSMOmic approximations.
AFLOW-ML: A RESTful API for machine-learning predictions of materials properties
Gossett E., Toher C., Oses C., Isayev O., Legrain F., Rose F., Zurek E., Carrete J., Mingo N., Tropsha A., et al.
Computational Materials Science 152 , 134–145
AFLOW-ML: A RESTful API for machine-learning predictions of materials properties.
Transferable Dynamic Molecular Charge Assignment Using Deep Neural Networks
Nebgen B., Lubbers N., Smith J. S., Sifain A. E., Lokhov A., Isayev O., Roitberg A. E., Barros K., Tretiak S.
Journal of Chemical Theory and Computation 14 , 4687–4698
Transferable Dynamic Molecular Charge Assignment Using Deep Neural Networks.
Deep reinforcement learning for de novo drug design
Popova M., Isayev O., Tropsha A.
Science Advances 4
We introduce an artificial intelligence approach to de novo design of molecules with desired physical or biological properties.
We introduce an artificial intelligence approach to de novo design of molecules with desired physical or biological properties.
Discovering a Transferable Charge Assignment Model Using Machine Learning
Sifain A. E., Lubbers N., Nebgen B. T., Smith J. S., Lokhov A. Y., Isayev O., Roitberg A. E., Barros K., Tretiak S.
The Journal of Physical Chemistry Letters 9 , 4495–4501
Discovering a Transferable Charge Assignment Model Using Machine Learning.
Transforming Computational Drug Discovery with Machine Learning and AI
Smith J. S., Roitberg A. E., Isayev O.
ACS Medicinal Chemistry Letters 9 , 1065–1069
Transforming Computational Drug Discovery with Machine Learning and AI.
Less is more: Sampling chemical space with active learning
Smith J. S., Nebgen B., Lubbers N., Isayev O., Roitberg A. E.
The Journal of Chemical Physics 148
The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task.
The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble’s prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of organic molecules. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecules or materials, while remaining applicable to the general class of organic molecules composed of the elements CHNO.
Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using Complementary DFT and Machine Learning Approaches
Tawfik S. A., Isayev O., Stampfl C., Shapter J., Winkler D. A., Ford M. J.
Advanced Theory and Simulations 2
Abstract There are now, in principle, a limitless number of hybrid van der Waals (vdW) heterostructures that can be built from the rapidly growing number of 2D layers.
Abstract There are now, in principle, a limitless number of hybrid van der Waals (vdW) heterostructures that can be built from the rapidly growing number of 2D layers. The key question is how to explore this vast parameter space in a practical way. Computational methods can guide experimental work. However, even the most efficient electronic structure methods such as density functional theory, are too time consuming to explore more than a tiny fraction of all possible hybrid 2D materials. A combination of density functional theory (DFT) and machine learning techniques provide a practical method for exploring this parameter space much more efficiently than by DFT or experiments. As a proof of concept, this methodology is applied to predict the interlayer distance and band gap of bilayer heterostructures. The methods quickly and accurately predict these important properties for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of vdW heterostructures to identify new hybrid materials with useful and interesting properties.
Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network
Zubatyuk R., Smith J. S., Leszczynski J., Isayev O.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.
Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network
Zubatyuk R., Smith J. S., Leszczynski J., Isayev O.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.
Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network
Zubatyuk R., Smith J. S., Leszczynski J., Isayev O.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena.
Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.
2017
Universal fragment descriptors for predicting properties of inorganic crystals
Isayev O., Oses C., Toher C., Gossett E., Curtarolo S., Tropsha A.
Nature Communications 8
AbstractAlthough historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases.
AbstractAlthough historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction’s accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules
Smith J. S., Isayev O., Roitberg A. E.
Scientific Data 4
AbstractOne of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy.
AbstractOne of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML) methods are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials, such as neural networks, comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of more than 20 M off equilibrium conformations for 57,462 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community.
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost
Smith J. S., Isayev O., Roitberg A. E.
Chemical Science 8 , 3192–3203
We demonstrate how a deep neural network (NN) trained on a data set of quantum mechanical (QM) DFT calculated energies can learn an accurate and transferable atomistic potential for organic molecules containing H, C, N, and O atoms.
We demonstrate how a deep neural network (NN) trained on a data set of quantum mechanical (QM) DFT calculated energies can learn an accurate and transferable atomistic potential for organic molecules containing H, C, N, and O atoms.
2016
QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays
Capuzzi S. J., Politi R., Isayev O., Farag S., Tropsha A.
Frontiers in Environmental Science 4
QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays.
Atlas Regeneration Company, Inc.
Makarev E., Isayev O., Atala A.
Regenerative Medicine 11 , 141–143
Atlas Regeneration Company, Inc..
Material informatics driven design and experimental validation of lead titanate as an aqueous solar photocathode
Moot T., Isayev O., Call R. W., McCullough S. M., Zemaitis M., Lopez R., Cahoon J. F., Tropsha A.
Materials Discovery 6 , 9–16
Material informatics driven design and experimental validation of lead titanate as an aqueous solar photocathode.
2015
Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints
Isayev O., Fourches D., Muratov E. N., Oses C., Rasch K., Tropsha A., Curtarolo S.
Chemistry of Materials 27 , 735–743
Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints.
Are the reduction and oxidation properties of nitrocompounds dissolved in water different from those produced when adsorbed on a silica surface? A DFT M05-2X computational study
Sviatenko L. K., Isayev O., Gorb L., Hill F. C., Leszczynska D., Leszczynski J.
Journal of Computational Chemistry 36 , 1029–1035
Are the reduction and oxidation properties of nitrocompounds dissolved in water different from those produced when adsorbed on a silica surface? A DFT M05-2X computational study.
2012
Validation of a novel secretion modification region (SMR) of HIV-1 Nef using cohort sequence analysis and molecular modeling
Campbell P. E., Isayev O., Ali S. A., Roth W. W., Huang M., Powell M. D., Leszczynski J., Bond V. C.
Journal of Molecular Modeling 18 , 4603–4613
Validation of a novel secretion modification region (SMR) of HIV-1 Nef using cohort sequence analysis and molecular modeling.
Mechanical properties of silicon nanowires
Furmanchuk A., Isayev O., Dinadayalane T. C., Leszczynska D., Leszczynski J.
WIREs Computational Molecular Science 2 , 817–828
AbstractSilicon nanowires (SiNWs) are at the top of the list of materials used in conventional electromechanical devices as well as in strained nanotechnology.
AbstractSilicon nanowires (SiNWs) are at the top of the list of materials used in conventional electromechanical devices as well as in strained nanotechnology. Both experimental and theoretical studies showed the size‐dependent character of mechanical properties of SiNWs. However, the surface contaminations, local surface strains, ‘boundary conditions’, native oxide, equipment‐induced errors, and the errors caused by postprocessing of results lead to softening of Young's modulus and extension of the region where the size dependency is seen by experimentalists. Application of improved potentials or advanced theoretical modeling such as inclusion of explicit treatment of temperature and quantum‐mechanical effects allows to show specificity of Young's modulus to the size and shape in case of small (width <4 nm) nanowires. The ductile‐brittle transitions of SiNWs at different temperatures are revealed. Some suggestions on postprocessing techniques are discussed. © 2012 John Wiley & Sons, Ltd.This article is categorized under:Structure and Mechanism > Molecular Structures
In silico structure–function analysis of E. cloacae nitroreductase
Isayev O., Crespo‐Hernández C. E., Gorb L., Hill F. C., Leszczynski J.
Proteins: Structure, Function, and Bioinformatics 80 , 2728–2741
AbstractReduction, catalyzed by the bacterial nitroreductases, is the quintessential first step in the biodegradation of a variety of nitroaromatic compounds from contaminated waters and soil.
AbstractReduction, catalyzed by the bacterial nitroreductases, is the quintessential first step in the biodegradation of a variety of nitroaromatic compounds from contaminated waters and soil. The Enterobacter cloacae nitroreductase (EcNR) enzyme is considered as a prospective biotechnological tool for bioremediation of hazardous nitroaromatic compounds. Using diverse computational methods, we obtain insights into the structural basis of activity and mechanism of its function. We have performed molecular dynamics simulation of EcNR in three different states (free EcNR in oxidized form, fully reduced EcNR with benzoate inhibitor and fully reduced EcNR with nitrobenzene) in explicit solvent and with full electrostatics. Principal Component Analysis (PCA) of the variance‐covariance matrix showed that the complexed nitroreductase becomes more flexible overall upon complexation, particularly helix H6, in the vicinity of the binding site. A multiple sequence alignment was also constructed in order to examine positional constraints on substitution in EcNR. Five regions which are highly conserved within the flavin mononucleotide (FMN) binding site were identified. Obtained results and their implications for EcNR functioning are discussed, and new plausible mechanism has been proposed. Proteins 2012; © 2012 Wiley Periodicals, Inc.
2011
Evaluation of natural and nitramine binding energies to 3-D models of the S1S2 domains in the N-methyl-D-aspartate receptor
Ford-Green J., Isayev O., Gorb L., Perkins E. J., Leszczynski J.
Journal of Molecular Modeling 18 , 1273–1284
Evaluation of natural and nitramine binding energies to 3-D models of the S1S2 domains in the N-methyl-D-aspartate receptor.
Car–Parrinello Molecular Dynamics Simulations of Tensile Tests on Si⟨001⟩ Nanowires
Furmanchuk A., Isayev O., Dinadayalane T. C., Leszczynski J.
The Journal of Physical Chemistry C 115 , 12283–12292
Car–Parrinello Molecular Dynamics Simulations of Tensile Tests on Si⟨001⟩ Nanowires.
Novel view on the mechanism of water-assisted proton transfer in the DNA bases: bulk water hydration
Furmanchuk A., Isayev O., Gorb L., Shishkin O. V., Hovorun D. M., Leszczynski J.
Physical Chemistry Chemical Physics 13 , 4311
Novel view on the mechanism of water-assisted proton transfer in the DNA bases: bulk water hydration.
Effect of Solvation on the Vertical Ionization Energy of Thymine: From Microhydration to Bulk
Ghosh D., Isayev O., Slipchenko L. V., Krylov A. I.
The Journal of Physical Chemistry A 115 , 6028–6038
Effect of Solvation on the Vertical Ionization Energy of Thymine: From Microhydration to Bulk.
Toward robust computational electrochemical predicting the environmental fate of organic pollutants
Sviatenko L., Isayev O., Gorb L., Hill F., Leszczynski J.
Journal of Computational Chemistry 32 , 2195–2203
AbstractA number of density functionals was utilized for the calculation of electron attachment free energy for nitrocompounds, quinones and azacyclic compounds.
AbstractA number of density functionals was utilized for the calculation of electron attachment free energy for nitrocompounds, quinones and azacyclic compounds. Different solvation models have been tested on the calculation of difference in free energies of solvation of oxidized and reduced forms of nitrocompounds in aqueous solution, quinones in acetonitrile, and azacyclic compounds in dimethylformamide. Gas‐phase free energies evaluated at the mPWB1K/tzvp level and solvation energies obtained using SMD model to compute solvation energies of neutral oxidized forms and PCM(Pauling) to compute solvation energies of anion‐radical reduced forms provide reasonable accuracy of the prediction of electron attachment free energy, difference in free solvation energies of oxidized and reduced forms, and as consequence yield reduction potentials in good agreement with experimental data (mean absolute deviation is 0.15 V). It was also found that SMD/M05‐2X/tzvp method provides reduction potentials with deviation of 0.12 V from the experimental values but in cases of nitrocompounds and quinones this accuracy is achieved due to the cancelation of errors. To predict reduction ability of naturally occurred iron containing species with respect to organic pollutants we exploited experimental data within the framework of Pourbaix (Eh − pH) diagrams. We conclude that surface‐bound Fe(II) as well as certain forms of aqueous Fe(II)aq are capable of reducing a variety of nitroaromatic compounds, quinones and novel high energy materials under basic conditions (pH > 8). At the same time, zero‐valent iron is expected to be active under neutral and acidic conditions. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011
2010
Hydration of nucleic acid bases: a Car–Parrinello molecular dynamics approach
Furmanchuk A., Isayev O., Shishkin O. V., Gorb L., Leszczynski J.
Physical Chemistry Chemical Physics 12 , 3363
Hydration of nucleic acid bases: a Car–Parrinello molecular dynamics approach.
New insight on structural properties of hydrated nucleic acid bases from ab initio molecular dynamics
Furmanchuk A., Shishkin O. V., Isayev O., Gorb L., Leszczynski J.
Physical Chemistry Chemical Physics 12 , 9945
New insight on structural properties of hydrated nucleic acid bases from ab initio molecular dynamics.
Reaction of bicyclo[2.2.1]hept‐5‐ene‐endo‐2‐ylmethylamine and nitrophenyl glycidyl ethers
Kasyan L. I., Prid’ma S. A., Palchikov V. A., Karat L. D., Turov A. V., Isayev O.
Journal of Physical Organic Chemistry 24 , 705–713
AbstractReactions of 2‐nitro‐, 4‐nitro‐ and 2,4‐dinitrophenylglycidyl ethers with bicyclo[2.
AbstractReactions of 2‐nitro‐, 4‐nitro‐ and 2,4‐dinitrophenylglycidyl ethers with bicyclo[2.2.1]hept‐5‐ene‐endo‐2‐ylmethylamine in isopropanol have been studied. The mixtures of products were chromatographed on silica gel and eluted with ether or ether/2‐propanol (1:1), the structures of individual products have been confirmed by IR spectra, NMR 1H, 13C spectra, using experiments that involve homonuclear and heteronuclear scalar coupling interactions (COSY, TOCSY, HMQC, HMBC), and mass spectrometry. Amino alcohols as the major products of regioselective aminolysis of epoxides (according to the Krasusky rule) have been obtained. The minor products were the compounds with two hydroxyalkyl fragments at the nitrogen atom. In case of dinitrophenylglycidyl ether, it was the minor product of aryl nucleophilic substitution (SNAr). The abnormal course of aminolysis has been confirmed by the results of quantum‐chemical calculations of activation barries and Free Gibbs energies of the competitive reactions of epoxides (at the B3LYP/6‐311 + G(d,p) level of theory). Copyright © 2010 John Wiley & Sons, Ltd.
One-electron standard reduction potentials of nitroaromatic and cyclic nitramine explosives
Uchimiya M., Gorb L., Isayev O., Qasim M. M., Leszczynski J.
Environmental Pollution 158 , 3048–3053
One-electron standard reduction potentials of nitroaromatic and cyclic nitramine explosives.
2008
Ab Initio Molecular Dynamics Study on the Initial Chemical Events in Nitramines: Thermal Decomposition of CL-20
Isayev O., Gorb L., Qasim M., Leszczynski J.
The Journal of Physical Chemistry B 112 , 11005–11013
Ab Initio Molecular Dynamics Study on the Initial Chemical Events in Nitramines: Thermal Decomposition of CL-20.
Efficient and accurate ab initio prediction of thermodynamic parameters for intermolecular complexes
Isayev O., Furmanchuk A., Gorb L., Leszczynski J.
Chemical Physics Letters 451 , 147–152
Efficient and accurate ab initio prediction of thermodynamic parameters for intermolecular complexes.
2007
Are Isolated Nucleic Acid Bases Really Planar? A Car−Parrinello Molecular Dynamics Study
Isayev O., Furmanchuk A., Shishkin O. V., Gorb L., Leszczynski J.
The Journal of Physical Chemistry B 111 , 3476–3480
Are Isolated Nucleic Acid Bases Really Planar? A Car−Parrinello Molecular Dynamics Study.
Electronic Structure and Bonding of Fe(PhNO2)6 Complexes: A Density Functional Theory Study
Isayev O., Gorb L., Zilberberg I., Leszczynski J.
The Journal of Physical Chemistry A 111
Electronic Structure and Bonding of Fe(PhNO2)., pages=3571–3576
Theoretical calculations: Can Gibbs free energy for intermolecular complexes be predicted efficiently and accurately?
Isayev O., Gorb L., Leszczynski J.
Journal of Computational Chemistry 28 , 1598–1609
AbstractThe theoretical study has been performed to refine the procedure for calculations of Gibbs free energy with a relative accuracy of less than 1 kcal/mol.
AbstractThe theoretical study has been performed to refine the procedure for calculations of Gibbs free energy with a relative accuracy of less than 1 kcal/mol. Three benchmark intermolecular complexes are examined via several quantum‐chemical methods, including the second‐order Moller‐Plesset perturbation (MP2), coupled cluster (CCSD(T)), and density functional (BLYP, B3LYP) theories augmented by Dunnings correlation‐consistent basis sets. The effects of electron correlation, basis set size, and anharmonicity are systematically analyzed, and the results are compared with available experimental data. The results of the calculations suggest that experimental accuracy can be reached only by extrapolation of MP2 and CCSD(T) total energies to the complete basis set. The contribution of anharmonicity to the zero point energy and TΔSint values is fairly small. The new, economic way to reach chemical accuracy in the calculations of the thermodynamic parameters of intermolecular interactions is proposed. In addition, interaction energy (De) and free energy change (ΔA) for considered species have been evaluated by Carr‐Parrinello molecular dynamics (CPMD) simulations and static BLYP‐plane wave calculations. The free energy change along the reaction paths were determined by the thermodynamic integration/”Blue Moon Ensemble” technique. Comparison between obtained values, and available experimental and conventional ab initio results has been made. We found that the accuracy of CPMD simulations is affected by several factors, including statistical uncertainty and convergence of constrained forces (TD integration), and the nature of DFT (density functional theory) functional. The results show that CPMD technique is capable of reproducing interaction and free energy with an accuracy of 1 kcal/mol and 2–3 kcal/mol respectively. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007
Carboxamides and amines having two and three adamantane fragments
Kas’yan L. I., Karpenko D. V., Kas’yan A. O., Isaev A. K., Prid’ma S. A.
Russian Journal of Organic Chemistry 43 , 1642–1650
Carboxamides and amines having two and three adamantane fragments.
2006
Structure-toxicity relationships of nitroaromatic compounds: Full-length paper
Isayev O., Rasulev B., Gorb L., Leszczynski J.
Molecular Diversity 10 , 233–245
Structure-toxicity relationships of nitroaromatic compounds: Full-length paper.
2005
Acylation of Aminopyridines and Related Compounds with Endic Anhydride
Kas’yan L. I., Tarabara I. N., Pal’chikov V. A., Krishchik O. V., Isaev A. K., Kas’yan A. O.
Russian Journal of Organic Chemistry 41 , 1530–1538
Acylation of Aminopyridines and Related Compounds with Endic Anhydride.
Synthesis and Reactivity of Amines Containing Several Cage-like Fragments
Kas’yan L. I., Karpenko D. V., Kas’yan A. O., Isaev A. K.
Russian Journal of Organic Chemistry 41 , 678–688
Synthesis and Reactivity of Amines Containing Several Cage-like Fragments.
2004
Amides containing two norbornene fragments. Synthesis and chemical transformations
Kas?yan L. I., Isaev A. K., Kas?yan A. O., Golodaeva E. A., Karpenko D. V., Tarabara I. N.
Russian Journal of Organic Chemistry 40 , 1415–1426
Amides containing two norbornene fragments. Synthesis and chemical transformations.
Reaction of Endic Anhydride with Hydrazines and Acylhydrazines
Krishchik O. V., Tarabara I. N., Kas’yan A. O., Shishkina S. V., Shishkin O. V., Isaev A. K., Kas’yan L.
Russian Journal of Organic Chemistry 40 , 1140–1145
Reaction of Endic Anhydride with Hydrazines and Acylhydrazines.
Modeling the Gas-Phase Reduction of Nitrobenzene to Nitrosobenzene by Iron Monoxide: A Density Functional Theory Study
Zilberberg I., Ilchenko M., Isayev O., Gorb L., Leszczynski J.
The Journal of Physical Chemistry A 108 , 4878–4886
Modeling the Gas-Phase Reduction of Nitrobenzene to Nitrosobenzene by Iron Monoxide: A Density Functional Theory Study.
2003
Amino Alcohols with Bicyclic Carbon Skeleton. Alternative Functionalization of Nucleophilic Reaction Centers
Kas’yan L. I., Golodaeva E. A., Kas’yan A. O., Isaev A. K.
Russian Journal of Organic Chemistry 39 , 1398–1405
Amino Alcohols with Bicyclic Carbon Skeleton. Alternative Functionalization of Nucleophilic Reaction Centers.
2002
New N-(Arylsulfonyl)-5-aminomethylbicyclo[2.2.1]hept-2-enes. Synthesis, 1H and 13C NMR Spectra, and Chemical Reactions
Kas’yan A. O., Isaev A. K., Kas’yan L. I.
Russian Journal of Organic Chemistry 38 , 553–563
New N-(Arylsulfonyl)-5-aminomethylbicyclo[2.2.1]hept-2-enes. Synthesis, 1H and 13C NMR Spectra, and Chemical Reactions.
No publications match your search.