Skip to main content

Advertisement

Log in

Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations

  • Artificial Intelligence Research Letter
  • Published:
MRS Communications Aims and scope Submit manuscript

Abstract

Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the trade-off between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than nine heavy atoms (training set of 117,232 entries, test set 13,026) and 0.012 eV for a small set of 66 molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed trade-offs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. L.A. Curtiss, P.C. Redfern, and K. Raghavachari: Gn theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 1, 810–825 (2011).

    Article  CAS  Google Scholar 

  2. L.A. Curtiss, P.C. Redfern, and K. Raghavachari: Gaussian-4 theory using reduced order perturbation theory. J. Chem. Phys. 127, 124105 (2007).

    Article  Google Scholar 

  3. N. Mardirossian and M. Head-Gordon: Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).

    Article  CAS  Google Scholar 

  4. A.D. Becke: A new mixing of Hartree–Fock and local density-functional theories. J. Chem. Phys. 98, 1372 (1993).

    Article  CAS  Google Scholar 

  5. L. Ward and C. Wolverton: Atomistic calculations and materials informatics: a review. Curr. Opin. Solid State Mater. Sci. 21, 167–176 (2017).

    Article  CAS  Google Scholar 

  6. C.M. Handley and J. Behler: Next generation interatomic potentials for condensed systems. Eur. Phys. J. B 87, 152 (2014).

    Article  Google Scholar 

  7. M. Rupp: Machine learning for quantum mechanics in a nutshell. Int. J. Quantum Chem. 115, 1058–1073 (2015).

    Article  CAS  Google Scholar 

  8. R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld: Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).

    Article  CAS  Google Scholar 

  9. P. Zaspel, B. Huang, H. Harbrecht, and O.A. von Lilienfeld: Boosting quantum machine learning models with a multilevel combination technique: pople diagrams revisited. J. Chem. Theory Comput. 15, 1546–1559 (2019).

    Article  CAS  Google Scholar 

  10. G. Pilania, J.E. Gubernatis, and T. Lookman: Multi-fidelity machine learning models for accurate bandgap predictions of solids. Comput. Mater. Sci. 129, 156–163 (2017).

    Article  CAS  Google Scholar 

  11. A. Seko, T. Maekawa, K. Tsuda, and I. Tanaka: Machine learning with systematic density-functional theory calculations: application to melting temperatures of single- and binary-component solids. Phys. Rev. B 89, 054303 (2014).

    Article  Google Scholar 

  12. J.S. Smith, B.T. Nebgen, R. Zubatyuk, N. Lubbers, C. Devereux, K. Barros, S. Tretiak, O. Isayev, and A.E. Roitbert: Outsmarting quantum chemistry through transfer learning universal neural network potentials for organic molecules. ChemArXiv (2018). 10.26434/chemrxiv.6744440.

    Google Scholar 

  13. K.T. Schütt, H.E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. Müller: Schnet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    Article  Google Scholar 

  14. F.A. Faber, A.S. Christensen, B. Huang, and O.A. von Lilienfeld: Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).

    Article  Google Scholar 

  15. R. Chard, Z. Li, K. Chard, L. Ward, Y. Babuji, A. Woodard, S. Tuecke, B. Blaiszik, M.J. Franklin, and I. Foster: DLHub: Model and Data Serving for Science (Cornell University, 2018). https://arxiv.org/abs/1811.11213

    Google Scholar 

  16. B. Narayanan, P.C. Redfern, R.S. Assary, and L.A. Curtiss: Accurate quantum chemical energies for 133 000 organic molecules. Chem. Sci. (2019). doi:10.1039/C9SC02834J

    Google Scholar 

  17. R. Ramakrishnan, P.O. Dral, M. Rupp, and O.A. von Lilienfeld: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    Article  CAS  Google Scholar 

  18. B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, and I. Foster: The materials data facility: data services to advance materials science research. JOM 68, 2045–2052 (2016).

    Article  Google Scholar 

  19. L. Ward, B. Blaiszik, I. Foster, R.S. Assary, B. Narayanan, and L.A. Curtiss: Dataset for Machine Learning Prediction of Accurate Atomization Energies of Organic Molecules from Low-Fidelity Quantum Chemical Calculations (Materials Data Facility, 2019). doi:10.18126/M2V65Z

    Google Scholar 

  20. https://github.com/globus-labs/g4mp2-atomization-energy.

  21. J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, and G.E. Dahl: Neural Message Passing for Quantum Chemistry (2017). http://arxiv.org/abs/1704.01212.

    Google Scholar 

  22. Z. Wu, B. Ramsundar, E.N. Feinberg, J. Gomes, C. Geniesse, A.S. Pappu, K. Leswing, and V. Pande: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  CAS  Google Scholar 

  23. A. Paul, D. Jha, R. Al-Bahrani, W. Liao, A. Choudhary, and A. Agrawal: CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties Using Multiple Molecular Representations (2018). http://arxiv.org/abs/1811.08283.

    Google Scholar 

  24. K.T. Schütt, P. Kessel, M. Gastegger, K.A. Nicoli, A. Tkatchenko, and K.-R. Müller: Schnetpack: a deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448–455 (2019).

    Article  Google Scholar 

  25. B. Huang and O.A. von Lilienfeld: The “DNA” of Chemistry: Scalable Quantum Machine Learning with “Amons”, 2017http://arxiv.org/abs/1707.04146.

    Google Scholar 

  26. A.S. Christensen, F.A. Faber, B. Huang, L.A. Bratholm, A. Tkatchenko, K.-R. Müller, and O.A. von Lilienfeld: qmlcode/qml: Release v0.3.1 (2017). doi:10.5281/ZENODO.817332.

    Google Scholar 

  27. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  28. J. Baxter: A Bayesian Information theoretic model of learning to learn via multiple task sampling. Mach. Learn 28, 7–39 (1997).

    Article  Google Scholar 

  29. N.J. Browning, R. Ramakrishnan, O.A. von Lilienfeld, and U. Roethlisberger: Genetic optimization of training sets for improved machine learning models of molecular properties. J. Phys. Chem. Lett. 8, 1351–1359 (2017).

    Article  CAS  Google Scholar 

  30. T.S. Hy, S. Trivedi, H. Pan, B.M. Anderson, and R. Kondor: Predicting molecular properties with covariant compositional networks. J. Chem. Phys. 148 (2018).

  31. S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley: Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016).

    Article  CAS  Google Scholar 

  32. C.W. Coley, W. Jin, L. Rogers, T.F. Jamison, T.S. Jaakkola, W.H. Green, R. Barzilay, and K.F. Jensen: A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  CAS  Google Scholar 

  33. T.A. Halgren: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996).

    Article  CAS  Google Scholar 

  34. N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, and G.R. Hutchison: Open babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    Article  Google Scholar 

  35. N.W.A. Gebauer, M. Gastegger, and K.T. Schütt: Generating Equilibrium Molecules with Deep Neural Networks (2018). http://arxiv.org/abs/1810.11347.

    Google Scholar 

  36. K. Yao, J.E. Herr, D.W. Toth, R. Mckintyre, and J. Parkhill: The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018).

    Article  CAS  Google Scholar 

  37. M. Nakata, T. Shimazaki, M. Hashimoto, and T. Maeda: PubChemQC PM6: A Dataset of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties (2019) pp. 1–33. http://arxiv.org/abs/1904.06046.

    Google Scholar 

  38. J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G.D. Peterson, R. Roskies, J.R. Scott, and N. Wilkens-Diehr: XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16, 62–74 (2014).

    Article  Google Scholar 

  39. C.A. Stewart, G. Turner, M. Vaughn, N.I. Gaffney, T.M. Cockerill, I. Foster, D. Hancock, N. Merchant, E. Skidmore, D. Stanzione, J. Taylor, and S. Tuecke: Jetstream: a self-provisioned, scalable science and engineering cloud environment. In Proc. 2015 XSEDE Conf. Sci. Adv. Enabled by Enhanc. Cyberinfrastructure - XSEDE’ 15; ACM Press, New York, NY, USA, 2015; pp. 1–8.

    Google Scholar 

Download references

Acknowledgments

This research was supported in part by the Exascale Computing Project (17-SC-20-SC) of the U.S. Department of Energy (DOE), by DOE’s Advanced Scientific Research Office (ASCR) under contract DE-AC02-06CH11357, and by the Joint Center for Energy Storage Research (JCESR), an Energy Innovation Hub funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences. This work used resources from the Extreme Science and Engineering Discovery Environment (XSEDE), supported by National Science Foundation Grant No. ACI-1548562:[38] specifically, Jetstream at the Texas Advanced Computing Center through allocation CIE170012;[39] the University of Chicago Research Computing Center; and the Argonne Leadership Computing Facility. This material is based upon work supported by Laboratory Directed Research and Development (LDRD) fund-ing from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357. This work was per-formed under financial assistance award 70NANB14H012 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Material Design (CHiMaD). This work was also supported by the National Science Foundation as part of the Midwest Big Data Hub under NSF Award Number: 1636950 “BD Spokes: SPOKE: MIDWEST: Collaborative: Integrative Materials Design (IMaD): Leverage, Innovate, and Disseminate.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Logan Ward.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ward, L., Blaiszik, B., Foster, I. et al. Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations. MRS Communications 9, 891–899 (2019). https://doi.org/10.1557/mrc.2019.107

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/mrc.2019.107

Navigation