Skip to main content
Log in

Development and assessment of a ChemInformatics model for accurate pKa prediction in aqueous medium

  • Research
  • Published:
Theoretical Chemistry Accounts Aims and scope Submit manuscript

Abstract

The accurate prediction of the acid dissociation constants (pKa) of organic and drug molecules is known to be a challenging problem in computational quantum chemistry. Specifically, density functional theory-based predictions suffer from a high dependence on the nature of the functional group as well as the underlying exchange–correlation functional. Additionally, the introduction of explicit solvent molecules is known to be important for accurate prediction of the pKa values for many functional groups in water, making it a particularly challenging problem. The inclusion of only implicit solvation effects, though highly efficient, is often inadequate for the prediction of pKas. In this paper, we have considered a data set of 303 molecules containing 13 different functional groups to assess the predictability of DFT for the calculation of pKas. Using just implicit solvation models with DFT, each functional group shows a linear correlation with experiment, though with different slopes for different functional groups. Using simple linear regression-based corrections for systematic errors of different functional groups, we show that DFT including implicit solvation can be used to make reliable predictions of pKas with a mean absolute deviation of only 0.397 pKa units. For a test set of 100 larger and more complex drug molecules, the performance of our model is very good, though with a slightly larger mean absolute deviation of 0.629 pKa units. More importantly, our pKa protocol is general and applicable to any underlying density functional, making it an effective computational tool for pKa predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Scheme 1

Similar content being viewed by others

References

  1. Manallack DT (2007) The PK(a) distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38

    Google Scholar 

  2. Alongi KS, Shields GC (2010) Theoretical calculations of acid dissociation constants: a review article. Annual Rep Comput Chem 6:113–138. https://doi.org/10.1016/S1574-1400(10)06008-1

    Article  CAS  Google Scholar 

  3. Liao C, Nicklaus MC (2009) Comparison of nine programs predicting p K a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812. https://doi.org/10.1021/ci900289x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ho J, Coote ML (2010) A universal approach for continuum solvent PK a calculations: are we there yet? Theor Chem Acc 125(1–2):3–21. https://doi.org/10.1007/s00214-009-0667-0

    Article  CAS  Google Scholar 

  5. Fujiki R, Matsui T, Shigeta Y, Nakano H, Yoshida N (2021) Recent developments of computational methods for p K a prediction based on electronic structure theory with solvation models. J 4(4): 849-64. https://doi.org/10.3390/j4040058

  6. Zhang S (2012) A reliable and efficient first principles-based method for predicting p K a values. 4. Organic bases. J Comput Chem 33(31):2469–2482. https://doi.org/10.1002/jcc.23068

    Article  CAS  PubMed  Google Scholar 

  7. Zhang S, Baker J, Pulay P (2010) A reliable and efficient first principles-based method for predicting p K a values. 2. Organic acids. J Phys Chem A 114(1):432–442. https://doi.org/10.1021/jp9067087

    Article  CAS  PubMed  Google Scholar 

  8. Shields GC, Seybold PG (2014) Computational approaches for the prediction of PKa values: QSAR in environmental and health sciences; CRC press. Taylor & Francis Group, Boca Raton

    Google Scholar 

  9. Mangold M, Rolland L, Costanzo F, Sprik M, Sulpizi M, Blumberger J (2011) Absolute p K a values and solvation structure of amino acids from density functional based molecular dynamics simulation. J Chem Theory Comput 7(6):1951–1961. https://doi.org/10.1021/ct100715x

    Article  CAS  PubMed  Google Scholar 

  10. Ho J (2014) Predicting PKa in implicit solvents: current status and future directions. Aust J Chem 67(10):1441–1460

    Article  CAS  Google Scholar 

  11. Klamt A (2011) The COSMO and COSMO-RS solvation models. WIREs Comput Mol Sci 1(5):699–709. https://doi.org/10.1002/wcms.56

    Article  CAS  Google Scholar 

  12. Klamt A, Eckert F, Diedenhofen M, Beck ME (2003) First principles calculations of aqueous p K a values for organic and inorganic acids Using COSMO−RS reveal an inconsistency in the slope of the p K a scale. J Phys Chem A 107(44):9380–9386. https://doi.org/10.1021/jp034688o

    Article  CAS  PubMed  Google Scholar 

  13. Ho J, Ertem MZ (2016) Calculating free energy changes in continuum solvation models. J Phys Chem B 120(7):1319–1329. https://doi.org/10.1021/acs.jpcb.6b00164

    Article  CAS  PubMed  Google Scholar 

  14. Eckert F, Klamt A (2006) Accurate prediction of basicity in aqueous solution with COSMO-RS. J Comput Chem 27(1):11–19. https://doi.org/10.1002/jcc.20309

    Article  CAS  PubMed  Google Scholar 

  15. Eckert F, Diedenhofen M, Klamt A (2010) Towards a first principles prediction of p K a : COSMO-RS and the cluster-continuum approach. Mol Phys 108(3–4):229–241. https://doi.org/10.1080/00268970903313667

    Article  CAS  Google Scholar 

  16. Thapa B, Raghavachari K (2019) Accurate PKa evaluations for complex bio-organic molecules in aqueous media. J Chem Theory Comput 15(11):6025–6035. https://doi.org/10.1021/acs.jctc.9b00606

    Article  CAS  PubMed  Google Scholar 

  17. Kelly CP, Cramer CJ, Truhlar DG (2006) Adding explicit solvent molecules to continuum solvent calculations for the calculation of aqueous acid dissociation constants. J Phys Chem A 110(7):2493–2499

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Pliego JR, Riveros JM (2002) Theoretical calculation of p K a using the cluster−continuum model. J Phys Chem A 106(32):7434–7439. https://doi.org/10.1021/jp025928n

    Article  CAS  Google Scholar 

  19. Adam KR (2002) New density functional and atoms in molecules method of computing relative p K a values in solution. J Phys Chem A 106(49):11963–11972. https://doi.org/10.1021/jp026577f

    Article  CAS  Google Scholar 

  20. Charifson PS, Walters WP (2014) Acidic and basic drugs in medicinal chemistry: a perspective. J Med Chem 57(23):9701–9717. https://doi.org/10.1021/jm501000a

    Article  CAS  PubMed  Google Scholar 

  21. Bell RP (2013) The proton in chemistry. Springer Science & Business Media, USA

    Google Scholar 

  22. Stewart R (2012) The proton: applications to organic chemistry. Elsevier, USA

    Google Scholar 

  23. Comer J, Box K (2003) High-throughput measurement of drug PKa values for ADME screening. JALA J Assoc Lab Autom 8(1):55–59. https://doi.org/10.1016/S1535-5535-04-00243-6

    Article  CAS  Google Scholar 

  24. Cruciani G, Milletti F, Storchi L, Sforna G, Goracci L (2009) In Silico p K a prediction and ADME profiling. Chem Biodivers 6(11):1812–1821. https://doi.org/10.1002/cbdv.200900153

    Article  CAS  PubMed  Google Scholar 

  25. Orth ES, Ferreira JGL, Fonsaca JES, Blaskievicz SF, Domingues SH, Dasgupta A, Terrones M, Zarbin AJG (2016) PKa determination of graphene-like materials: validating chemical functionalization. J Colloid Interface Sci 467:239–244. https://doi.org/10.1016/j.jcis.2016.01.013

    Article  CAS  PubMed  Google Scholar 

  26. Pliego JR (2003) Thermodynamic cycles and the calculation of PKa. Chem Phys Lett 367(1–2):145–149. https://doi.org/10.1016/S0009-2614(02)01686-X

    Article  CAS  Google Scholar 

  27. Liptak MD, Gross KC, Seybold PG, Feldgus S, Shields GC (2002) Absolute p K a determinations for substituted phenols. J Am Chem Soc 124(22):6421–6427

    Article  CAS  PubMed  Google Scholar 

  28. Liptak MD, Shields GC (2001) Accurate p K a calculations for carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods. J Am Chem Soc 123(30):7314–7319

    Article  CAS  PubMed  Google Scholar 

  29. Klicić JJ, Friesner RA, Liu S-Y, Guida WC (2002) Accurate prediction of acidity constants in aqueous solution via density functional theory and self-consistent reaction field methods. J Phys Chem A 106(7):1327–1335. https://doi.org/10.1021/jp012533f

    Article  CAS  Google Scholar 

  30. Bochevarov AD, Harder E, Hughes TF, Greenwood JR, Braden DA, Philipp DM, Rinaldo D, Halls MD, Zhang J, Friesner RA (2013) Jaguar: a high-performance quantum chemistry software program with strengths in life and materials sciences. Int J Quantum Chem 113(18):2110–2142. https://doi.org/10.1002/qua.24481

    Article  CAS  Google Scholar 

  31. Bochevarov AD, Watson MA, Greenwood JR, Philipp DM (2016) Multiconformation, density functional theory-based PKa prediction in application to large, flexible organic molecules with diverse functional groups. J Chem Theory Comput 12(12):6001–6019. https://doi.org/10.1021/acs.jctc.6b00805

    Article  CAS  PubMed  Google Scholar 

  32. Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, Allen D, Casey WM, Kleinstreuer NC, Williams AJ (2019) Open-source QSAR models for PKa prediction using multiple machine learning approaches. J Cheminformatics 11(1):1–20

    Article  CAS  Google Scholar 

  33. Sprous DG, Palmer RK, Swanson JT, Lawless M (2010) QSAR in the pharmaceutical research setting: QSAR models for broad Large problems. Curr Top Med Chem 10(6):619–637

    Article  CAS  PubMed  Google Scholar 

  34. Wu J, Kang Y, Pan P, Hou T (2022) Machine learning methods for PKa prediction of small molecules: advances and challenges. Drug Discov. Today 103372

  35. Lawler R, Liu Y-H, Majaya N, Allam O, Ju H, Kim JY, Jang SS (2021) DFT-machine learning approach for accurate prediction of p K a. J Phys Chem A 125(39):8712–8722

    Article  CAS  PubMed  Google Scholar 

  36. Marcel B, Czodrowski P (2020) Machine learning meets PK a. F1000 Research 9

  37. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260

    Article  CAS  PubMed  Google Scholar 

  38. Marenich AV, Cramer CJ, Truhlar DG (2009) Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J Phys Chem B 113(18):6378–6396. https://doi.org/10.1021/jp810292n

    Article  CAS  PubMed  Google Scholar 

  39. Camaioni DM, Schwerdtfeger CA (2005) Comment on “accurate experimental values for the free energies of hydration of H+, OH, and H3O+.” J Phys Chem A 109(47):10795–10797

    Article  CAS  PubMed  Google Scholar 

  40. Kelly CP, Cramer CJ, Truhlar DG (2006) Aqueous solvation free energies of ions and ion− water clusters based on an accurate value for the absolute aqueous solvation free energy of the proton. J Phys Chem B 110(32):16066–16081

    Article  CAS  PubMed  Google Scholar 

  41. Isse AA, Gennaro A (2010) Absolute potential of the standard hydrogen electrode and the problem of interconversion of potentials in different solvents. J Phys Chem B 114(23):7894–7899

    Article  CAS  PubMed  Google Scholar 

  42. Marenich AV, Ho J, Coote ML, Cramer CJ, Truhlar DG (2014) Computational electrochemistry: prediction of liquid-phase reduction potentials. Phys Chem Chem Phys 16(29):15068–15106

    Article  CAS  PubMed  Google Scholar 

  43. Ho J (2015) Are thermodynamic cycles necessary for continuum solvent calculation of PK a s and reduction potentials? Phys Chem Chem Phys 17(4):2859–2868. https://doi.org/10.1039/C4CP04538F

    Article  CAS  PubMed  Google Scholar 

  44. Thapa B, Schlegel HB (2016) Density functional theory calculation of p K a ’s of Thiols in aqueous solution using explicit water molecules and the polarizable continuum model. J Phys Chem A 120(28):5726–5735. https://doi.org/10.1021/acs.jpca.6b05040

    Article  CAS  PubMed  Google Scholar 

  45. Becke AD (1992) Density-functional thermochemistry. I. The effect of the exchange-only gradient correction. J Chem Phys 96(3):2155–2160

    Article  CAS  Google Scholar 

  46. Becke AD (1997) Density-functional thermochemistry. V. Systematic optimization of exchange-correlation functionals. J Chem Phys 107(20):8554–8560

    Article  CAS  Google Scholar 

  47. Lee C, Yang W, Parr RG (1988) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B 37(2):785

    Article  CAS  Google Scholar 

  48. Clark T, Chandrasekhar J, Spitznagel GW, Schleyer PVR (1983) Efficient diffuse function-augmented basis sets for anion calculations. III. The 3–21+ G basis set for first-row elements, Li–F. J Comput Chem 4(3):294–301

    Article  CAS  Google Scholar 

  49. Ditchfield R, Hehre WJ, Pople JA (1971) Self-consistent molecular-orbital methods. IX. An extended gaussian-type basis for molecular-orbital studies of organic molecules. J Chem Phys 54(2):724–728

    Article  CAS  Google Scholar 

  50. Francl MM, Pietro WJ, Hehre WJ, Binkley JS, Gordon MS, DeFrees DJ, Pople JA (1982) Self-consistent molecular orbital methods. XXIII. A polarization-type basis set for second-row elements. J Chem Phys 77(7):3654–3665

    Article  CAS  Google Scholar 

  51. Hariharan PC, Pople JA (1973) The influence of polarization functions on molecular orbital hydrogenation energies. Theor Chim Acta 28:213–222

    Article  CAS  Google Scholar 

  52. Hehre WJ, Ditchfield R, Pople JA (1972) Self—consistent molecular orbital methods. XII. Further extensions of Gaussian—type basis sets for use in molecular orbital studies of organic molecules. J Chem Phys 56(5):2257–2261

    Article  CAS  Google Scholar 

  53. Chai J-D, Head-Gordon M (2008) Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. Phys Chem Chem Phys 10(44):6615–6620

    Article  CAS  PubMed  Google Scholar 

  54. Grimme S, Antony J, Ehrlich S, Krieg H (2010) A consistent and accurate Ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J Chem Phys 132(15):154104

    Article  PubMed  Google Scholar 

  55. Grimme S, Ehrlich S, Goerigk L (2011) Effect of the damping function in dispersion corrected density functional theory. J Comput Chem 32(7):1456–1465. https://doi.org/10.1002/jcc.21759

    Article  CAS  PubMed  Google Scholar 

  56. Frisch M J, Trucks G W, Schlegel H B, Scuseria G E, Robb M A, Cheeseman J R, Scalmani G, Barone V, Petersson G A, Nakatsuji H, Li X, Caricato M, Marenich A V, Bloino J, Janesko B G, Gomperts R, Mennucci B, Hratchian H P, Ortiz J V, Izmaylov A F, Sonnenberg J L, Williams Ding F, Lipparini F, Egidi F, Goings J, Peng B, Petrone A, Henderson T, Ranasinghe D, Zakrzewski V G, Gao J, Rega N, Zheng G, Liang W, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Throssell K, Montgomery Jr J A, Peralta J E, Ogliaro F, Bearpark M J, Heyd J J, Brothers E N, Kudin K N, Staroverov V N, Keith T A, Kobayashi R, Normand J, Raghavachari K, Rendell A P, Burant J C, Iyengar S S, Tomasi J, Cossi M, Millam J M, Klene M, Adamo C, Cammi R, Ochterski J W, Martin R L, Morokuma K, Farkas O, Foresman J B, Fox D J Gaussian 16 Rev C 01

  57. Ertl P, Altmann E, McKenna JM (2020) The most common functional groups in bioactive molecules and how their popularity has evolved over time. J Med Chem 63(15):8408–8418. https://doi.org/10.1021/acs.jmedchem.0c00754

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We acknowledge financial support from the National Science Foundation Grant CHE-2102583 at Indiana University. The Big Red 3 supercomputing facility at Indiana University was used for most of the calculations in this study.

Author information

Authors and Affiliations

Authors

Contributions

A.J.S. carried out the research project under the supervision of Prof. K.R. The initial draft of the manuscript was prepared by A.J.S. and was reviewed and revised by K.R.

Corresponding author

Correspondence to Krishnan Raghavachari.

Ethics declarations

Conflict of interest

The authors declare no competing financial interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 431 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sanchez, A.J., Raghavachari, K. Development and assessment of a ChemInformatics model for accurate pKa prediction in aqueous medium. Theor Chem Acc 142, 86 (2023). https://doi.org/10.1007/s00214-023-03024-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00214-023-03024-6

Navigation