Abstract
Two different types of approaches: (a) approaches that combine quantitative structure activity relationships, quantum mechanical electronic structure methods, and machine-learning and, (b) electronic structure vertical solvation approaches, were used to predict the logP coefficients of 11 molecules as part of the SAMPL6 logP blind prediction challenge. Using electronic structures optimized with density functional theory (DFT), several molecular descriptors were calculated for each molecule, including van der Waals areas and volumes, HOMO/LUMO energies, dipole moments, polarizabilities, and electrophilic and nucleophilic superdelocalizabilities. A multilinear regression model and a partial least squares model were used to train a set of 97 molecules. As well, descriptors were generated using the molecular operating environment and used to create additional machine learning models. Electronic structure vertical solvation approaches considered include DFT and the domain-based local pair natural orbital methods combined with the solvated variant of the correlation consistent composite approach.
Similar content being viewed by others
References
Bannan CC, Mobley DL, Skillman AG (2018) SAMPL6 challenge results from pKa predictions based on a general Gaussian process model. J Comput Aided Mol Des 32:1165–1177. https://doi.org/10.1007/s10822-018-0169-z
Nicholls A, Wlodek S, Grant JA (2009) The SAMP1 solvation challenge: further lessons regarding the pitfalls of parametrization†. J Phys Chem B 113:4521–4532. https://doi.org/10.1021/jp806855q
Geballe MT, Skillman a G, Nicholls A et al (2010) The SAMPL2 blind prediction challenge: introduction and overview. J Comput Aided Mol Des 24:259–279. https://doi.org/10.1007/s10822-010-9350-8
Geballe MT, Guthrie JP (2012) The SAMPL3 blind prediction challenge: transfer energy overview. J Comput Aided Mol Des 26:489–496. https://doi.org/10.1007/s10822-012-9568-8
Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) The SAMPL4 host–guest blind prediction challenge: an overview. J Comput Aided Mol Des 28:305–317. https://doi.org/10.1007/s10822-014-9735-1
Yin J, Henriksen NM, Slochower DR et al (2017) Overview of the SAMPL5 host–guest challenge: are we doing better? J Comput Aided Mol Des 31:1–19. https://doi.org/10.1007/s10822-016-9974-4
Jones MR, Brooks BR, Wilson AK (2016) Partition coefficients for the SAMPL5 challenge using transfer free energies. J Comput Aided Mol Des 30:1129–1138. https://doi.org/10.1007/s10822-016-9964-6
Rizzi A, Murkli S, McNeill JN et al (2018) Overview of the SAMPL6 host–guest binding affinity prediction challenge. J Comput Aided Mol Des 32:937–963. https://doi.org/10.1007/s10822-018-0170-6
Klamt A, Schüürmann G (1993) COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J Chem Soc Perkin Trans 2:799–805. https://doi.org/10.1039/P29930000799
Tielker N, Eberlein L, Güssregen S, Kast SM (2018) The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory. J Comput Aided Mol Des 32:1151–1163. https://doi.org/10.1007/s10822-018-0140-z
Zeng Q, Jones MR, Brooks BR (2018) Absolute and relative pKa predictions via a DFT approach applied to the SAMPL6 blind challenge. J Comput Aided Mol Des 32:1179–1189. https://doi.org/10.1007/s10822-018-0150-x
Işık M, Levorse D, Mobley DL et al (2019) Octanol-water partition coefficient measurements for the SAMPL6 blind prediction challenge. J Comput Aided Mol Des. https://doi.org/10.1007/s10822-019-00271-3
DeYonker NJ, Cundari TR, Wilson AK (2006) The correlation consistent composite approach (ccCA): an alternative to the Gaussian-n methods. J Chem Phys 124:114104. https://doi.org/10.1063/1.2173988
DeYonker NJ, Wilson BR, Pierpont AW et al (2009) Towards the intrinsic error of the correlation consistent composite approach (ccCA). Mol Phys 107:1107–1121
Riojas AG, Wilson AK (2014) Solv-ccCA: implicit solvation and the correlation consistent composite approach for the determination of pKa. J Chem Theory Comput 10:1500–1510. https://doi.org/10.1021/ct400908z
Patel P, Wilson AK (2020) Utilization of the domain-based local pair natural orbital methods within the correlation consistent composite approach. J Comput Chem. https://doi.org/10.1002/jcc.26129
Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010
Alves de Lima Ribeiro F, Ferreira MMC (2003) QSPR models of boiling point, octanol–water partition coefficient and retention time index of polycyclic aromatic hydrocarbons. J Mol Struct THEOCHEM 663:109–126. https://doi.org/10.1016/j.theochem.2003.08.107
Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8:74–88
Yousefinejad S, Hemmateenejad B (2015) Chemometrics tools in QSAR/QSPR studies: a historical perspective. Chemom Intell Lab Syst 149:177–204. https://doi.org/10.1016/j.chemolab.2015.06.016
Thanikaivelan P, Subramanian V, Raghava Rao J, Unni Nair B (2000) Application of quantum chemical descriptor in quantitative structure activity and structure property relationship. Chem Phys Lett 323:59–70. https://doi.org/10.1016/S0009-2614(00)00488-7
Ochi S, Miyao T, Funatsu K (2017) Structure modification toward applicability domain of a QSAR/QSPR model considering activity/property. Mol Inform 36:1700076. https://doi.org/10.1002/minf.201700076
Welborn M, Cheng L, Miller TF (2018) Transferability in machine learning for electronic structure via the molecular orbital basis. J Chem Theory Comput 14:4772–4779. https://doi.org/10.1021/acs.jctc.8b00636
Sangster J (1989) Octanol-water partition coefficients of simple organic compounds. J Phys Chem Ref Data 18:1111–1229. https://doi.org/10.1063/1.555833
Kim S, Chen J, Cheng T et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47:D1102–D1109. https://doi.org/10.1093/nar/gky1033
O’Boyle NM, Banck M, James CA et al (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33
Lee C, Yang W, Parr RG (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B 37:785–789. https://doi.org/10.1103/PhysRevB.37.785
Becke AD (1993) Density-functional thermochemistry. III. The role of exact exchange. J Chem Phys 98:5648–5652. https://doi.org/10.1063/1.464913
Grimme S, Antony J, Ehrlich S, Krieg H (2010) A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J Chem Phys 132:154104. https://doi.org/10.1063/1.3382344
Kendall RA, Dunning TH Jr, Harrison RJ (1992) Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J Chem Phys 96:6796–6806. https://doi.org/10.1063/1.462569
Zhao YH, Abraham MH, Zissimos AM (2003) Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J Org Chem 68:7368–7373
Molecular Operating Environment (MOE) (2018) 2018.01. Chemical Computing Group Inc., Montreal
Dunning TH, Peterson KA, Wilson AK (2001) Gaussian basis sets for use in correlated molecular calculations. X. The atoms aluminum through argon revisited. J Chem Phys 114:9244. https://doi.org/10.1063/1.1367373
Frisch MJ, Trucks GW, Schlegel HB et al (2016) Gaussian 16, revision A.03. Gaussian Inc., Wallingford
Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38:3098–3100. https://doi.org/10.1103/PhysRevA.38.3098
Perdew JP, Burke K, Ernzerhof M (1996) Generalized gradient approximation made simple. Phys Rev Lett 77:3865–3868. https://doi.org/10.1103/PhysRevLett.77.3865
Ernzerhof M, Scuseria GE (1999) Assessment of the Perdew-Burke-Ernzerhof exchange-correlation functional. J Chem Phys 110:5029–5036. https://doi.org/10.1063/1.478401
Perdew JP, Chevary JA, Vosko SH et al (1992) Atoms, molecules, solids, and surfaces: applications of the generalized gradient approximation for exchange and correlation. Phys Rev B 46:6671–6687. https://doi.org/10.1103/PhysRevB.46.6671
Perdew JP, Wang Y (1992) Accurate and simple analytic representation of the electron-gas correlation energy. Phys Rev B 45:13244–13249. https://doi.org/10.1103/PhysRevB.45.13244
Marenich AV, Cramer CJ, Truhlar DG (2009) Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J Phys Chem B 113:6378–6396. https://doi.org/10.1021/jp810292n
Pinski P, Riplinger C, Valeev EF, Neese F (2015) Sparse maps—a systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals. J Chem Phys 143:34108. https://doi.org/10.1063/1.4926879
Pavošević F, Peng C, Pinski P et al (2017) SparseMaps—a systematic infrastructure for reduced scaling electronic structure methods. V. Linear scaling explicitly correlated coupled-cluster method with pair natural orbitals. J Chem Phys https://doi.org/10.1063/1.4979993
Neese F (2018) Software update: the ORCA program system, version 4.0. Wiley Interdiscip Rev Comput Mol Sci 8:e1327. https://doi.org/10.1002/wcms.1327
Neese F, Wennmohs F, Hansen A, Becker U (2009) Efficient, approximate and parallel Hartree–Fock and hybrid DFT calculations. A “chain-of-spheres” algorithm for the Hartree–Fock exchange. Chem Phys 356:98–109. https://doi.org/10.1016/j.chemphys.2008.10.036
Laury ML, DeYonker NJ, Jiang W, Wilson AK (2011) A pseudopotential-based composite method: the relativistic pseudopotential correlation consistent composite approach for molecules containing 4d transition metals (Y-Cd). J Chem Phys 135:214103–214110.
Jiang W, DeYonker NJ, Determan JJ, Wilson AK (2012) Toward accurate thermochemistry of first row transition metal complexes. J Phys Chem A 116:870
Jorgensen KR, Wilson AK (2012) Enthalpies of formation for organosulfur compounds: atomization energy and hypohomodesmotic reaction schemes via ab initio composite methods. Comput Theor Chem 991:1–12. https://doi.org/10.1016/j.comptc.2012.03.003
Alsunaidi ZHA, Wilson AK (2016) DFT and ab initio composite methods: investigation of oxygen fluoride species. Comput Theor Chem 1095:71–82. https://doi.org/10.1016/j.comptc.2016.08.024
Reddy KN, Locke MA (1996) Molecular properties as descriptors of octanol-water partition coefficients of herbicides. Water Air Soil Pollut 86:389–405. https://doi.org/10.1007/BF00279169
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130. https://doi.org/10.1016/S0169-7439(01)00155-1
Weber R, Wilson AK (2015) Do composite methods achieve their target accuracy? Comput Theor Chem 1072:58–62. https://doi.org/10.1016/j.comptc.2015.08.015
Zhang G, Musgrave CB (2007) Comparison of DFT methods for molecular orbital eigenvalue calculations. J Phys Chem A 111:1554–1561. https://doi.org/10.1021/jp061633o
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52. https://doi.org/10.1016/0169-7439(87)80084-9
Karlen Y, McNair A, Perseguers S et al (2007) Statistical significance of quantitative PCR. BMC Bioinformatics 8:131. https://doi.org/10.1186/1471-2105-8-131
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300. https://doi.org/10.1023/A:1018628609742
Riplinger C, Sandhoefer B, Hansen A, Neese F (2013) Natural triple excitations in local coupled cluster calculations with pair natural orbitals. J Chem Phys 139:134101. https://doi.org/10.1063/1.4821834
Riplinger C, Pinski P, Becker U et al (2016) Sparse maps—a systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. J Chem Phys. https://doi.org/10.1063/1.4939030
Acknowledgements
The authors gratefully acknowledge the University of North Texas Academic Computing Services for the use of the UNT Research Clusters. Computational resources were provided via the NSF Major Research Instrumentation program supported by the National Science Foundation under Grant No. CHE-1531468. This research was supported in part by the Intermural Research Program of the National Heart, Lung, and Blood Institute of the National Institutes of Health and utilized the high-performance computational capabilities of the LoBoS and Biowulf Linux clusters at the National Institutes of Health (http://www.lobos.nih.gov and http://biowulf.nih.gov).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Patel, P., Kuntz, D.M., Jones, M.R. et al. SAMPL6 logP challenge: machine learning and quantum mechanical approaches. J Comput Aided Mol Des 34, 495–510 (2020). https://doi.org/10.1007/s10822-020-00287-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-020-00287-0