Computer simulations are foundational to theoretical chemistry. Quantum-mechanical (QM) methods provide the highest accuracy for simulating molecules but have difficulty scaling to large systems. Empirical interatomic potentials (classical force fields) are scalable, but lack transferability to new systems and are hard to systematically improve. Automated, data-driven machine learning is close to achieving the best of both approaches. Here we use transfer learning to retrain a general purpose neural network potential, ANI-1x, on a dataset of gold standard QM calculations (CCSD(T)/CBS level) that is relatively small but designed to optimally span chemical space. The resulting potential, ANI-1ccx, approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. ANI-1ccx is broadly applicable to materials science, biology and chemistry, and billions of times faster than the parent CCSD(T)/CBS calculations.
Read the ChemRxiv preprint for full details.
Accuracy in predicting reaction and isomerization energy differences on the (a) HC7/11 and (b) ISOL6 benchmarks, relative to CCSD(T)/CBS. Methods compared are the ANI-1ccx transfer learning potential, ANI-1x trained only on data, the DFT reference ( B97X-D3), and our coupled cluster extrapolation scheme CCSD(T)*/CBS.
Accuracy in predicting atomization energies Ea on the GDB-10to13 benchmark relative to CCSD(T)*/CBS.
Overall this work is built on the ANAKIN-ME method for developing the ANI-1 potential1 is, to the best of our knowledge, the first example of an extensible and transferrable (universal) ML atomistic potential for organic molecules. For training such models, we also developed a data set of 22 million structural conformations (conformers) from ~60K distinct organic molecules.2 This data set was built through an exhaustive sampling from the subset of the GDB-11 data set containing between one and eight C, N, and O atoms.
References: