Abstract
The accurate prediction of the acid dissociation constants (pKa) of organic and drug molecules is known to be a challenging problem in computational quantum chemistry. Specifically, density functional theory-based predictions suffer from a high dependence on the nature of the functional group as well as the underlying exchange–correlation functional. Additionally, the introduction of explicit solvent molecules is known to be important for accurate prediction of the pKa values for many functional groups in water, making it a particularly challenging problem. The inclusion of only implicit solvation effects, though highly efficient, is often inadequate for the prediction of pKas. In this paper, we have considered a data set of 303 molecules containing 13 different functional groups to assess the predictability of DFT for the calculation of pKas. Using just implicit solvation models with DFT, each functional group shows a linear correlation with experiment, though with different slopes for different functional groups. Using simple linear regression-based corrections for systematic errors of different functional groups, we show that DFT including implicit solvation can be used to make reliable predictions of pKas with a mean absolute deviation of only 0.397 pKa units. For a test set of 100 larger and more complex drug molecules, the performance of our model is very good, though with a slightly larger mean absolute deviation of 0.629 pKa units. More importantly, our pKa protocol is general and applicable to any underlying density functional, making it an effective computational tool for pKa predictions.
Similar content being viewed by others
References
Manallack DT (2007) The PK(a) distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38
Alongi KS, Shields GC (2010) Theoretical calculations of acid dissociation constants: a review article. Annual Rep Comput Chem 6:113–138. https://doi.org/10.1016/S1574-1400(10)06008-1
Liao C, Nicklaus MC (2009) Comparison of nine programs predicting p K a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812. https://doi.org/10.1021/ci900289x
Ho J, Coote ML (2010) A universal approach for continuum solvent PK a calculations: are we there yet? Theor Chem Acc 125(1–2):3–21. https://doi.org/10.1007/s00214-009-0667-0
Fujiki R, Matsui T, Shigeta Y, Nakano H, Yoshida N (2021) Recent developments of computational methods for p K a prediction based on electronic structure theory with solvation models. J 4(4): 849-64. https://doi.org/10.3390/j4040058
Zhang S (2012) A reliable and efficient first principles-based method for predicting p K a values. 4. Organic bases. J Comput Chem 33(31):2469–2482. https://doi.org/10.1002/jcc.23068
Zhang S, Baker J, Pulay P (2010) A reliable and efficient first principles-based method for predicting p K a values. 2. Organic acids. J Phys Chem A 114(1):432–442. https://doi.org/10.1021/jp9067087
Shields GC, Seybold PG (2014) Computational approaches for the prediction of PKa values: QSAR in environmental and health sciences; CRC press. Taylor & Francis Group, Boca Raton
Mangold M, Rolland L, Costanzo F, Sprik M, Sulpizi M, Blumberger J (2011) Absolute p K a values and solvation structure of amino acids from density functional based molecular dynamics simulation. J Chem Theory Comput 7(6):1951–1961. https://doi.org/10.1021/ct100715x
Ho J (2014) Predicting PKa in implicit solvents: current status and future directions. Aust J Chem 67(10):1441–1460
Klamt A (2011) The COSMO and COSMO-RS solvation models. WIREs Comput Mol Sci 1(5):699–709. https://doi.org/10.1002/wcms.56
Klamt A, Eckert F, Diedenhofen M, Beck ME (2003) First principles calculations of aqueous p K a values for organic and inorganic acids Using COSMO−RS reveal an inconsistency in the slope of the p K a scale. J Phys Chem A 107(44):9380–9386. https://doi.org/10.1021/jp034688o
Ho J, Ertem MZ (2016) Calculating free energy changes in continuum solvation models. J Phys Chem B 120(7):1319–1329. https://doi.org/10.1021/acs.jpcb.6b00164
Eckert F, Klamt A (2006) Accurate prediction of basicity in aqueous solution with COSMO-RS. J Comput Chem 27(1):11–19. https://doi.org/10.1002/jcc.20309
Eckert F, Diedenhofen M, Klamt A (2010) Towards a first principles prediction of p K a : COSMO-RS and the cluster-continuum approach. Mol Phys 108(3–4):229–241. https://doi.org/10.1080/00268970903313667
Thapa B, Raghavachari K (2019) Accurate PKa evaluations for complex bio-organic molecules in aqueous media. J Chem Theory Comput 15(11):6025–6035. https://doi.org/10.1021/acs.jctc.9b00606
Kelly CP, Cramer CJ, Truhlar DG (2006) Adding explicit solvent molecules to continuum solvent calculations for the calculation of aqueous acid dissociation constants. J Phys Chem A 110(7):2493–2499
Pliego JR, Riveros JM (2002) Theoretical calculation of p K a using the cluster−continuum model. J Phys Chem A 106(32):7434–7439. https://doi.org/10.1021/jp025928n
Adam KR (2002) New density functional and atoms in molecules method of computing relative p K a values in solution. J Phys Chem A 106(49):11963–11972. https://doi.org/10.1021/jp026577f
Charifson PS, Walters WP (2014) Acidic and basic drugs in medicinal chemistry: a perspective. J Med Chem 57(23):9701–9717. https://doi.org/10.1021/jm501000a
Bell RP (2013) The proton in chemistry. Springer Science & Business Media, USA
Stewart R (2012) The proton: applications to organic chemistry. Elsevier, USA
Comer J, Box K (2003) High-throughput measurement of drug PKa values for ADME screening. JALA J Assoc Lab Autom 8(1):55–59. https://doi.org/10.1016/S1535-5535-04-00243-6
Cruciani G, Milletti F, Storchi L, Sforna G, Goracci L (2009) In Silico p K a prediction and ADME profiling. Chem Biodivers 6(11):1812–1821. https://doi.org/10.1002/cbdv.200900153
Orth ES, Ferreira JGL, Fonsaca JES, Blaskievicz SF, Domingues SH, Dasgupta A, Terrones M, Zarbin AJG (2016) PKa determination of graphene-like materials: validating chemical functionalization. J Colloid Interface Sci 467:239–244. https://doi.org/10.1016/j.jcis.2016.01.013
Pliego JR (2003) Thermodynamic cycles and the calculation of PKa. Chem Phys Lett 367(1–2):145–149. https://doi.org/10.1016/S0009-2614(02)01686-X
Liptak MD, Gross KC, Seybold PG, Feldgus S, Shields GC (2002) Absolute p K a determinations for substituted phenols. J Am Chem Soc 124(22):6421–6427
Liptak MD, Shields GC (2001) Accurate p K a calculations for carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods. J Am Chem Soc 123(30):7314–7319
Klicić JJ, Friesner RA, Liu S-Y, Guida WC (2002) Accurate prediction of acidity constants in aqueous solution via density functional theory and self-consistent reaction field methods. J Phys Chem A 106(7):1327–1335. https://doi.org/10.1021/jp012533f
Bochevarov AD, Harder E, Hughes TF, Greenwood JR, Braden DA, Philipp DM, Rinaldo D, Halls MD, Zhang J, Friesner RA (2013) Jaguar: a high-performance quantum chemistry software program with strengths in life and materials sciences. Int J Quantum Chem 113(18):2110–2142. https://doi.org/10.1002/qua.24481
Bochevarov AD, Watson MA, Greenwood JR, Philipp DM (2016) Multiconformation, density functional theory-based PKa prediction in application to large, flexible organic molecules with diverse functional groups. J Chem Theory Comput 12(12):6001–6019. https://doi.org/10.1021/acs.jctc.6b00805
Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, Allen D, Casey WM, Kleinstreuer NC, Williams AJ (2019) Open-source QSAR models for PKa prediction using multiple machine learning approaches. J Cheminformatics 11(1):1–20
Sprous DG, Palmer RK, Swanson JT, Lawless M (2010) QSAR in the pharmaceutical research setting: QSAR models for broad Large problems. Curr Top Med Chem 10(6):619–637
Wu J, Kang Y, Pan P, Hou T (2022) Machine learning methods for PKa prediction of small molecules: advances and challenges. Drug Discov. Today 103372
Lawler R, Liu Y-H, Majaya N, Allam O, Ju H, Kim JY, Jang SS (2021) DFT-machine learning approach for accurate prediction of p K a. J Phys Chem A 125(39):8712–8722
Marcel B, Czodrowski P (2020) Machine learning meets PK a. F1000 Research 9
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260
Marenich AV, Cramer CJ, Truhlar DG (2009) Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J Phys Chem B 113(18):6378–6396. https://doi.org/10.1021/jp810292n
Camaioni DM, Schwerdtfeger CA (2005) Comment on “accurate experimental values for the free energies of hydration of H+, OH−, and H3O+.” J Phys Chem A 109(47):10795–10797
Kelly CP, Cramer CJ, Truhlar DG (2006) Aqueous solvation free energies of ions and ion− water clusters based on an accurate value for the absolute aqueous solvation free energy of the proton. J Phys Chem B 110(32):16066–16081
Isse AA, Gennaro A (2010) Absolute potential of the standard hydrogen electrode and the problem of interconversion of potentials in different solvents. J Phys Chem B 114(23):7894–7899
Marenich AV, Ho J, Coote ML, Cramer CJ, Truhlar DG (2014) Computational electrochemistry: prediction of liquid-phase reduction potentials. Phys Chem Chem Phys 16(29):15068–15106
Ho J (2015) Are thermodynamic cycles necessary for continuum solvent calculation of PK a s and reduction potentials? Phys Chem Chem Phys 17(4):2859–2868. https://doi.org/10.1039/C4CP04538F
Thapa B, Schlegel HB (2016) Density functional theory calculation of p K a ’s of Thiols in aqueous solution using explicit water molecules and the polarizable continuum model. J Phys Chem A 120(28):5726–5735. https://doi.org/10.1021/acs.jpca.6b05040
Becke AD (1992) Density-functional thermochemistry. I. The effect of the exchange-only gradient correction. J Chem Phys 96(3):2155–2160
Becke AD (1997) Density-functional thermochemistry. V. Systematic optimization of exchange-correlation functionals. J Chem Phys 107(20):8554–8560
Lee C, Yang W, Parr RG (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B 37(2):785
Clark T, Chandrasekhar J, Spitznagel GW, Schleyer PVR (1983) Efficient diffuse function-augmented basis sets for anion calculations. III. The 3–21+ G basis set for first-row elements, Li–F. J Comput Chem 4(3):294–301
Ditchfield R, Hehre WJ, Pople JA (1971) Self-consistent molecular-orbital methods. IX. An extended gaussian-type basis for molecular-orbital studies of organic molecules. J Chem Phys 54(2):724–728
Francl MM, Pietro WJ, Hehre WJ, Binkley JS, Gordon MS, DeFrees DJ, Pople JA (1982) Self-consistent molecular orbital methods. XXIII. A polarization-type basis set for second-row elements. J Chem Phys 77(7):3654–3665
Hariharan PC, Pople JA (1973) The influence of polarization functions on molecular orbital hydrogenation energies. Theor Chim Acta 28:213–222
Hehre WJ, Ditchfield R, Pople JA (1972) Self—consistent molecular orbital methods. XII. Further extensions of Gaussian—type basis sets for use in molecular orbital studies of organic molecules. J Chem Phys 56(5):2257–2261
Chai J-D, Head-Gordon M (2008) Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. Phys Chem Chem Phys 10(44):6615–6620
Grimme S, Antony J, Ehrlich S, Krieg H (2010) A consistent and accurate Ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J Chem Phys 132(15):154104
Grimme S, Ehrlich S, Goerigk L (2011) Effect of the damping function in dispersion corrected density functional theory. J Comput Chem 32(7):1456–1465. https://doi.org/10.1002/jcc.21759
Frisch M J, Trucks G W, Schlegel H B, Scuseria G E, Robb M A, Cheeseman J R, Scalmani G, Barone V, Petersson G A, Nakatsuji H, Li X, Caricato M, Marenich A V, Bloino J, Janesko B G, Gomperts R, Mennucci B, Hratchian H P, Ortiz J V, Izmaylov A F, Sonnenberg J L, Williams Ding F, Lipparini F, Egidi F, Goings J, Peng B, Petrone A, Henderson T, Ranasinghe D, Zakrzewski V G, Gao J, Rega N, Zheng G, Liang W, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Throssell K, Montgomery Jr J A, Peralta J E, Ogliaro F, Bearpark M J, Heyd J J, Brothers E N, Kudin K N, Staroverov V N, Keith T A, Kobayashi R, Normand J, Raghavachari K, Rendell A P, Burant J C, Iyengar S S, Tomasi J, Cossi M, Millam J M, Klene M, Adamo C, Cammi R, Ochterski J W, Martin R L, Morokuma K, Farkas O, Foresman J B, Fox D J Gaussian 16 Rev C 01
Ertl P, Altmann E, McKenna JM (2020) The most common functional groups in bioactive molecules and how their popularity has evolved over time. J Med Chem 63(15):8408–8418. https://doi.org/10.1021/acs.jmedchem.0c00754
Acknowledgements
We acknowledge financial support from the National Science Foundation Grant CHE-2102583 at Indiana University. The Big Red 3 supercomputing facility at Indiana University was used for most of the calculations in this study.
Author information
Authors and Affiliations
Contributions
A.J.S. carried out the research project under the supervision of Prof. K.R. The initial draft of the manuscript was prepared by A.J.S. and was reviewed and revised by K.R.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing financial interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sanchez, A.J., Raghavachari, K. Development and assessment of a ChemInformatics model for accurate pKa prediction in aqueous medium. Theor Chem Acc 142, 86 (2023). https://doi.org/10.1007/s00214-023-03024-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00214-023-03024-6