Skip to main content

SAMPL6 logP challenge: machine learning and quantum mechanical approaches

Abstract

Two different types of approaches: (a) approaches that combine quantitative structure activity relationships, quantum mechanical electronic structure methods, and machine-learning and, (b) electronic structure vertical solvation approaches, were used to predict the logP coefficients of 11 molecules as part of the SAMPL6 logP blind prediction challenge. Using electronic structures optimized with density functional theory (DFT), several molecular descriptors were calculated for each molecule, including van der Waals areas and volumes, HOMO/LUMO energies, dipole moments, polarizabilities, and electrophilic and nucleophilic superdelocalizabilities. A multilinear regression model and a partial least squares model were used to train a set of 97 molecules. As well, descriptors were generated using the molecular operating environment and used to create additional machine learning models. Electronic structure vertical solvation approaches considered include DFT and the domain-based local pair natural orbital methods combined with the solvated variant of the correlation consistent composite approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Bannan CC, Mobley DL, Skillman AG (2018) SAMPL6 challenge results from pKa predictions based on a general Gaussian process model. J Comput Aided Mol Des 32:1165–1177. https://doi.org/10.1007/s10822-018-0169-z

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. Nicholls A, Wlodek S, Grant JA (2009) The SAMP1 solvation challenge: further lessons regarding the pitfalls of parametrization†. J Phys Chem B 113:4521–4532. https://doi.org/10.1021/jp806855q

    CAS  Article  PubMed  Google Scholar 

  3. Geballe MT, Skillman a G, Nicholls A et al (2010) The SAMPL2 blind prediction challenge: introduction and overview. J Comput Aided Mol Des 24:259–279. https://doi.org/10.1007/s10822-010-9350-8

    CAS  Article  PubMed  Google Scholar 

  4. Geballe MT, Guthrie JP (2012) The SAMPL3 blind prediction challenge: transfer energy overview. J Comput Aided Mol Des 26:489–496. https://doi.org/10.1007/s10822-012-9568-8

    CAS  Article  PubMed  Google Scholar 

  5. Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) The SAMPL4 host–guest blind prediction challenge: an overview. J Comput Aided Mol Des 28:305–317. https://doi.org/10.1007/s10822-014-9735-1

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Yin J, Henriksen NM, Slochower DR et al (2017) Overview of the SAMPL5 host–guest challenge: are we doing better? J Comput Aided Mol Des 31:1–19. https://doi.org/10.1007/s10822-016-9974-4

    CAS  Article  PubMed  Google Scholar 

  7. Jones MR, Brooks BR, Wilson AK (2016) Partition coefficients for the SAMPL5 challenge using transfer free energies. J Comput Aided Mol Des 30:1129–1138. https://doi.org/10.1007/s10822-016-9964-6

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Rizzi A, Murkli S, McNeill JN et al (2018) Overview of the SAMPL6 host–guest binding affinity prediction challenge. J Comput Aided Mol Des 32:937–963. https://doi.org/10.1007/s10822-018-0170-6

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. Klamt A, Schüürmann G (1993) COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J Chem Soc Perkin Trans 2:799–805. https://doi.org/10.1039/P29930000799

    Article  Google Scholar 

  10. Tielker N, Eberlein L, Güssregen S, Kast SM (2018) The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory. J Comput Aided Mol Des 32:1151–1163. https://doi.org/10.1007/s10822-018-0140-z

    CAS  Article  PubMed  Google Scholar 

  11. Zeng Q, Jones MR, Brooks BR (2018) Absolute and relative pKa predictions via a DFT approach applied to the SAMPL6 blind challenge. J Comput Aided Mol Des 32:1179–1189. https://doi.org/10.1007/s10822-018-0150-x

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Işık M, Levorse D, Mobley DL et al (2019) Octanol-water partition coefficient measurements for the SAMPL6 blind prediction challenge. J Comput Aided Mol Des. https://doi.org/10.1007/s10822-019-00271-3

    Article  PubMed  Google Scholar 

  13. DeYonker NJ, Cundari TR, Wilson AK (2006) The correlation consistent composite approach (ccCA): an alternative to the Gaussian-n methods. J Chem Phys 124:114104. https://doi.org/10.1063/1.2173988

    CAS  Article  PubMed  Google Scholar 

  14. DeYonker NJ, Wilson BR, Pierpont AW et al (2009) Towards the intrinsic error of the correlation consistent composite approach (ccCA). Mol Phys 107:1107–1121

    CAS  Article  Google Scholar 

  15. Riojas AG, Wilson AK (2014) Solv-ccCA: implicit solvation and the correlation consistent composite approach for the determination of pKa. J Chem Theory Comput 10:1500–1510. https://doi.org/10.1021/ct400908z

    CAS  Article  PubMed  Google Scholar 

  16. Patel P, Wilson AK (2020) Utilization of the domain-based local pair natural orbital methods within the correlation consistent composite approach. J Comput Chem. https://doi.org/10.1002/jcc.26129

    Article  PubMed  Google Scholar 

  17. Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Alves de Lima Ribeiro F, Ferreira MMC (2003) QSPR models of boiling point, octanol–water partition coefficient and retention time index of polycyclic aromatic hydrocarbons. J Mol Struct THEOCHEM 663:109–126. https://doi.org/10.1016/j.theochem.2003.08.107

    CAS  Article  Google Scholar 

  19. Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8:74–88

    Google Scholar 

  20. Yousefinejad S, Hemmateenejad B (2015) Chemometrics tools in QSAR/QSPR studies: a historical perspective. Chemom Intell Lab Syst 149:177–204. https://doi.org/10.1016/j.chemolab.2015.06.016

    CAS  Article  Google Scholar 

  21. Thanikaivelan P, Subramanian V, Raghava Rao J, Unni Nair B (2000) Application of quantum chemical descriptor in quantitative structure activity and structure property relationship. Chem Phys Lett 323:59–70. https://doi.org/10.1016/S0009-2614(00)00488-7

    CAS  Article  Google Scholar 

  22. Ochi S, Miyao T, Funatsu K (2017) Structure modification toward applicability domain of a QSAR/QSPR model considering activity/property. Mol Inform 36:1700076. https://doi.org/10.1002/minf.201700076

    CAS  Article  Google Scholar 

  23. Welborn M, Cheng L, Miller TF (2018) Transferability in machine learning for electronic structure via the molecular orbital basis. J Chem Theory Comput 14:4772–4779. https://doi.org/10.1021/acs.jctc.8b00636

    CAS  Article  PubMed  Google Scholar 

  24. Sangster J (1989) Octanol-water partition coefficients of simple organic compounds. J Phys Chem Ref Data 18:1111–1229. https://doi.org/10.1063/1.555833

    CAS  Article  Google Scholar 

  25. Kim S, Chen J, Cheng T et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47:D1102–D1109. https://doi.org/10.1093/nar/gky1033

    Article  PubMed  Google Scholar 

  26. O’Boyle NM, Banck M, James CA et al (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. Lee C, Yang W, Parr RG (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B 37:785–789. https://doi.org/10.1103/PhysRevB.37.785

    CAS  Article  Google Scholar 

  28. Becke AD (1993) Density-functional thermochemistry. III. The role of exact exchange. J Chem Phys 98:5648–5652. https://doi.org/10.1063/1.464913

    CAS  Article  Google Scholar 

  29. Grimme S, Antony J, Ehrlich S, Krieg H (2010) A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J Chem Phys 132:154104. https://doi.org/10.1063/1.3382344

    CAS  Article  PubMed  Google Scholar 

  30. Kendall RA, Dunning TH Jr, Harrison RJ (1992) Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J Chem Phys 96:6796–6806. https://doi.org/10.1063/1.462569

    CAS  Article  Google Scholar 

  31. Zhao YH, Abraham MH, Zissimos AM (2003) Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J Org Chem 68:7368–7373

    CAS  Article  Google Scholar 

  32. Molecular Operating Environment (MOE) (2018) 2018.01. Chemical Computing Group Inc., Montreal

  33. Dunning TH, Peterson KA, Wilson AK (2001) Gaussian basis sets for use in correlated molecular calculations. X. The atoms aluminum through argon revisited. J Chem Phys 114:9244. https://doi.org/10.1063/1.1367373

    CAS  Article  Google Scholar 

  34. Frisch MJ, Trucks GW, Schlegel HB et al (2016) Gaussian 16, revision A.03. Gaussian Inc., Wallingford

  35. Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38:3098–3100. https://doi.org/10.1103/PhysRevA.38.3098

    CAS  Article  Google Scholar 

  36. Perdew JP, Burke K, Ernzerhof M (1996) Generalized gradient approximation made simple. Phys Rev Lett 77:3865–3868. https://doi.org/10.1103/PhysRevLett.77.3865

    CAS  Article  PubMed  Google Scholar 

  37. Ernzerhof M, Scuseria GE (1999) Assessment of the Perdew-Burke-Ernzerhof exchange-correlation functional. J Chem Phys 110:5029–5036. https://doi.org/10.1063/1.478401

    CAS  Article  Google Scholar 

  38. Perdew JP, Chevary JA, Vosko SH et al (1992) Atoms, molecules, solids, and surfaces: applications of the generalized gradient approximation for exchange and correlation. Phys Rev B 46:6671–6687. https://doi.org/10.1103/PhysRevB.46.6671

    CAS  Article  Google Scholar 

  39. Perdew JP, Wang Y (1992) Accurate and simple analytic representation of the electron-gas correlation energy. Phys Rev B 45:13244–13249. https://doi.org/10.1103/PhysRevB.45.13244

    CAS  Article  Google Scholar 

  40. Marenich AV, Cramer CJ, Truhlar DG (2009) Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J Phys Chem B 113:6378–6396. https://doi.org/10.1021/jp810292n

    CAS  Article  PubMed  Google Scholar 

  41. Pinski P, Riplinger C, Valeev EF, Neese F (2015) Sparse maps—a systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals. J Chem Phys 143:34108. https://doi.org/10.1063/1.4926879

    CAS  Article  Google Scholar 

  42. Pavošević F, Peng C, Pinski P et al (2017) SparseMaps—a systematic infrastructure for reduced scaling electronic structure methods. V. Linear scaling explicitly correlated coupled-cluster method with pair natural orbitals. J Chem Phys https://doi.org/10.1063/1.4979993

    Article  PubMed  Google Scholar 

  43. Neese F (2018) Software update: the ORCA program system, version 4.0. Wiley Interdiscip Rev Comput Mol Sci 8:e1327. https://doi.org/10.1002/wcms.1327

    Article  Google Scholar 

  44. Neese F, Wennmohs F, Hansen A, Becker U (2009) Efficient, approximate and parallel Hartree–Fock and hybrid DFT calculations. A “chain-of-spheres” algorithm for the Hartree–Fock exchange. Chem Phys 356:98–109. https://doi.org/10.1016/j.chemphys.2008.10.036

    CAS  Article  Google Scholar 

  45. Laury ML, DeYonker NJ, Jiang W, Wilson AK (2011) A pseudopotential-based composite method: the relativistic pseudopotential correlation consistent composite approach for molecules containing 4d transition metals (Y-Cd). J Chem Phys 135:214103–214110.

    Article  Google Scholar 

  46. Jiang W, DeYonker NJ, Determan JJ, Wilson AK (2012) Toward accurate thermochemistry of first row transition metal complexes. J Phys Chem A 116:870

    CAS  Article  Google Scholar 

  47. Jorgensen KR, Wilson AK (2012) Enthalpies of formation for organosulfur compounds: atomization energy and hypohomodesmotic reaction schemes via ab initio composite methods. Comput Theor Chem 991:1–12. https://doi.org/10.1016/j.comptc.2012.03.003

    CAS  Article  Google Scholar 

  48. Alsunaidi ZHA, Wilson AK (2016) DFT and ab initio composite methods: investigation of oxygen fluoride species. Comput Theor Chem 1095:71–82. https://doi.org/10.1016/j.comptc.2016.08.024

    CAS  Article  Google Scholar 

  49. Reddy KN, Locke MA (1996) Molecular properties as descriptors of octanol-water partition coefficients of herbicides. Water Air Soil Pollut 86:389–405. https://doi.org/10.1007/BF00279169

    CAS  Article  Google Scholar 

  50. Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130. https://doi.org/10.1016/S0169-7439(01)00155-1

    CAS  Article  Google Scholar 

  51. Weber R, Wilson AK (2015) Do composite methods achieve their target accuracy? Comput Theor Chem 1072:58–62. https://doi.org/10.1016/j.comptc.2015.08.015

    CAS  Article  Google Scholar 

  52. Zhang G, Musgrave CB (2007) Comparison of DFT methods for molecular orbital eigenvalue calculations. J Phys Chem A 111:1554–1561. https://doi.org/10.1021/jp061633o

    CAS  Article  PubMed  Google Scholar 

  53. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52. https://doi.org/10.1016/0169-7439(87)80084-9

    CAS  Article  Google Scholar 

  54. Karlen Y, McNair A, Perseguers S et al (2007) Statistical significance of quantitative PCR. BMC Bioinformatics 8:131. https://doi.org/10.1186/1471-2105-8-131

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300. https://doi.org/10.1023/A:1018628609742

    Article  Google Scholar 

  56. Riplinger C, Sandhoefer B, Hansen A, Neese F (2013) Natural triple excitations in local coupled cluster calculations with pair natural orbitals. J Chem Phys 139:134101. https://doi.org/10.1063/1.4821834

    CAS  Article  PubMed  Google Scholar 

  57. Riplinger C, Pinski P, Becker U et al (2016) Sparse maps—a systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. J Chem Phys. https://doi.org/10.1063/1.4939030

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the University of North Texas Academic Computing Services for the use of the UNT Research Clusters. Computational resources were provided via the NSF Major Research Instrumentation program supported by the National Science Foundation under Grant No. CHE-1531468. This research was supported in part by the Intermural Research Program of the National Heart, Lung, and Blood Institute of the National Institutes of Health and utilized the high-performance computational capabilities of the LoBoS and Biowulf Linux clusters at the National Institutes of Health (http://www.lobos.nih.gov and http://biowulf.nih.gov).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angela K. Wilson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 96.5 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patel, P., Kuntz, D.M., Jones, M.R. et al. SAMPL6 logP challenge: machine learning and quantum mechanical approaches. J Comput Aided Mol Des 34, 495–510 (2020). https://doi.org/10.1007/s10822-020-00287-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-020-00287-0

Keywords

  • SAMPL6
  • Machine learning
  • DFT
  • DLPNO-ccCA
  • QSAR
  • Partition coefficient