Journal of Computer-Aided Molecular Design

, Volume 32, Issue 10, pp 1117–1138 | Cite as

pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments

  • Mehtap Işık
  • Dorothy Levorse
  • Ariën S. Rustenburg
  • Ikenna E. Ndukwe
  • Heather Wang
  • Xiao Wang
  • Mikhail Reibarkh
  • Gary E. Martin
  • Alexey A. Makarov
  • David L. Mobley
  • Timothy Rhodes
  • John D. Chodera


Determining the net charge and protonation states populated by a small molecule in an environment of interest or the cost of altering those protonation states upon transfer to another environment is a prerequisite for predicting its physicochemical and pharmaceutical properties. The environment of interest can be aqueous, an organic solvent, a protein binding site, or a lipid bilayer. Predicting the protonation state of a small molecule is essential to predicting its interactions with biological macromolecules using computational models. Incorrectly modeling the dominant protonation state, shifts in dominant protonation state, or the population of significant mixtures of protonation states can lead to large modeling errors that degrade the accuracy of physical modeling. Low accuracy hinders the use of physical modeling approaches for molecular design. For small molecules, the acid dissociation constant (pKa) is the primary quantity needed to determine the ionic states populated by a molecule in an aqueous solution at a given pH. As a part of SAMPL6 community challenge, we organized a blind pKa prediction component to assess the accuracy with which contemporary pKa prediction methods can predict this quantity, with the ultimate aim of assessing the expected impact on modeling errors this would induce. While a multitude of approaches for predicting pKa values currently exist, predicting the pKas of drug-like molecules can be difficult due to challenging properties such as multiple titratable sites, heterocycles, and tautomerization. For this challenge, we focused on set of 24 small molecules selected to resemble selective kinase inhibitors—an important class of therapeutics replete with titratable moieties. Using a Sirius T3 instrument that performs automated acid–base titrations, we used UV absorbance-based pKa measurements to construct a high-quality experimental reference dataset of macroscopic pKas for the evaluation of computational pKa prediction methodologies that was utilized in the SAMPL6 pKa challenge. For several compounds in which the microscopic protonation states associated with macroscopic pKas were ambiguous, we performed follow-up NMR experiments to disambiguate the microstates involved in the transition. This dataset provides a useful standard benchmark dataset for the evaluation of pKa prediction methodologies on kinase inhibitor-like compounds.


Acid dissociation constants Spectrophotometric pKa measurement Blind prediction challenge SAMPL Macroscopic pKa Microscopic pKa Macroscopic protonation state Microscopic protonation state 



Statistical Assessment of the Modeling of Proteins and Ligands


\(-{\log _{10}}\) acid dissociation equilibrium constant


\(-{\log _{10}}\) apparent acid dissociation equilibrium constant in the presence of cosolvent


Dimethyl sulfoxide


Ionic-strength adjusted


Standard error of the mean


Target factor analysis


Liquid chromatography–mass spectrometry


Nuclear magnetic resonance spectroscopy


Heteronuclear multiple-bond correlation


Deutero-trifluoroacetic acid



MI, ASR, and JDC acknowledge support from the Sloan Kettering Institute. JDC acknowledges support from NIH grant P30 CA008748. MI, JDC, ASR, and DLM gratefully acknowledge support from NIH grant R01GM124270 supporting SAMPL blind challenges. MI acknowledges support from a Doris J. Hutchinson Fellowship. DLM appreciates financial support from the National Institutes of Health (1R01GM108889-01), the National Science Foundation (CHE 1352608). IEN acknowledges support from the MRL Postdoctoral Research Program. The authors are extremely grateful for the assistance and support from the MRL Preformulations and NMR Structure Elucidation groups for materials, expertise, and instrument time, without which this SAMPL challenge would not have been possible. MI and DL are grateful to Pion/Sirius Analytical for their technical support in the planning and execution of this study. We are especially thankful to Karl Box (Sirius Analytical) for the guidance on optimization and interpretation of pKa measurements with the Sirius T3, as well as feedback on the manuscript. We thank Brad Sherborne (MRL; ORCID: 0000-0002-0037-3427) for his valuable insights at the conception of the pKa challenge and connecting us with TR and DL who were able to provide resources for experimental measurements. We acknowledge Paul Czodrowski (Merck KGaA; ORCID: 0000-0002-7390-8795) who provided feedback on multiple stages of this work: challenge construction, purchasable compound selection, and manuscript. We acknowledge contributions from Caitlin Bannan who provided feedback on experimental data collection and structure of pKa challenge from a computational chemist’s perspective. We are also grateful to Marilyn Gunner (CCNY) for her feedback on this manuscript. We thank anonymous reviewers for their input and constructive comments that improved this manuscript. MI, ASR, and JDC are grateful to OpenEye Scientific for providing a free academic software license for use in this work. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author contributions

Conceptualization, MI, JDC, TR, ASR, DLM; Methodology, MI, DL, IEN; Software, MI, ASR; Formal Analysis, MI; Investigation, MI, DL, IEN, HW, XW, MR; Resources, TR, DL; Data Curation, MI; Writing-Original Draft, MI, JDC, IEN; Writing - Review and Editing, MI, DL, ASR, IEN, HW, XW, MR, GEM, DLM, TR, JDC; Visualization, MI, IEN; Supervision, JDC, TR, DLM, GEM, AAM; Project Administration, MI; Funding Acquisition, JDC, DLM, TR, MI.

Compliance with ethical standards

Conflict of interest

JDC was a member of the Scientific Advisory Board for Schrödinger, LLC during part of this study. JDC and DLM are current members of the Scientific Advisory Board of OpenEye Scientific Software. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, the Molecular Sciences Software Institute, the Starr Cancer Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete list of funding can be found at

Supplementary material

10822_2018_168_MOESM1_ESM.pdf (3.6 mb)
Supplementary material 1 (PDF 3731 KB) (68.4 mb)
Supplementary material 2 (ZIP 70025 KB)


  1. 1.
    Mobley DL, Chodera JD, Isaacs L, Gibb BC (2016) Advancing predictive modeling through focused development of model systems to drive new modeling innovations. UC Irvine: Department of Pharmaceutical Sciences, UCI. Accessed 16 May 2018
  2. 2.
    Drug Design Data Resource, SAMPL. Accessed 16 May 2018
  3. 3.
    Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS (2008) Predicting small-molecule solvation free energies: an informal blind test for computational chemistry. J Med Chem 51(4):769–779. CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Guthrie JP (2009) A blind challenge for computational solvation free energies: introduction and overview. J Phys Chem B 113(14):4501–4507CrossRefGoogle Scholar
  5. 5.
    Skillman AG, Geballe MT, Nicholls A (2010) SAMPL2 challenge: prediction of solvation energies and tautomer ratios. J Comput Aided Mol Des 24(4):257–258. CrossRefPubMedGoogle Scholar
  6. 6.
    Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ (2010) The SAMPL2 blind prediction challenge: introduction and overview. J Comput Aided Mol Des. 24(4):259–279. CrossRefPubMedGoogle Scholar
  7. 7.
    Skillman AG (2012) SAMPL3: blinded prediction of host–guest binding affinities, hydration free energies, and trypsin inhibitors. J Comput Aided Mol Des. 26(5):473–474. CrossRefPubMedGoogle Scholar
  8. 8.
    Geballe MT, Guthrie JP (2012) The SAMPL3 blind prediction challenge: transfer energy overview. J Comput Aided Mol Des 26(5):489–496. CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Muddana HS, Varnado CD, Bielawski CW, Urbach AR, Isaacs L, Geballe MT, Gilson MK (2012) Blind prediction of host–guest binding affinities: a new SAMPL3 challenge. J Comput Aided Mol Des 26(5):475–487. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Guthrie JP (2014) SAMPL4, a blind challenge for computational solvation free energies: the compounds considered. J Comput Aided Mol Des 28(3):151–168. CrossRefPubMedGoogle Scholar
  11. 11.
    Mobley DL, Wymer KL, Lim NM, Guthrie JP (2014) Blind prediction of solvation free energies from the SAMPL4 challenge. J Comput Aided Mol Des 28(3):135–150. CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) The SAMPL4 host–guest blind prediction challenge: an overview. J Comput Aided Mol Des 28(4):305–317. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Mobley DL, Liu S, Lim NM, Wymer KL, Perryman AL, Forli S, Deng N, Su J, Branson K, Olson AJ (2014) Blind prediction of HIV integrase binding from the SAMPL4 challenge. J Comput Aided Mol Des 28(4):327–345. CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Yin J, Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mobley DL, Gilson MK (2017) Overview of the SAMPL5 host–guest challenge: are we doing better? J Comput Aided Mol Des 31(1):1–19. CrossRefGoogle Scholar
  15. 15.
    Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge. J Comput Aided Mol Des 30(11):1–18. CrossRefGoogle Scholar
  16. 16.
    Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge. J Comput-Aided Mol Des 30(11):927–944. CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Rustenburg AS, Dancer J, Lin B, Feng JA, Ortwine DF, Mobley DL, Chodera JD (2016) Measuring experimental cyclohexane–water distribution coefficients for the SAMPL5 challenge. J Comput-Aided Mol Des 30(11):945–958. CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Pickard FC, König G, Tofoleanu F, Lee J, Simmonett AC, Shao Y, Ponder JW, Brooks BR (2016) Blind prediction of distribution in the SAMPL5 challenge with QM based protomer and pK a corrections. J Comput-Aided Mol Des 30(11):1087–1100. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Bodner GM (1986) Assigning the pKa’s of polyprotic acids. J Chem Educ 63(3):246CrossRefGoogle Scholar
  20. 20.
    Darvey IG (1995) The assignment of pKa values to functional groups in amino acids. Wiley, New YorkGoogle Scholar
  21. 21.
    Bezençon J, Wittwer MB, Cutting B, Smieško M, Wagner B, Kansy M, Ernst B (2014) pKa determination by 1H NMR spectroscopy–an old methodology revisited. J Pharm Biomed Anal 93:147–155. CrossRefPubMedGoogle Scholar
  22. 22.
    Elson EL, Edsall JT (1962) Raman spectra and sulfhydryl ionization constants of thioglycolic acid and cysteine. Biochemistry 1(1):1–7CrossRefGoogle Scholar
  23. 23.
    Elbagerma MA, Edwards HGM, Azimi G, Scowen IJ (2011) Raman spectroscopic determination of the acidity constants of salicylaldoxime in aqueous solution. J Raman Spectrosc 42(3):505–511. CrossRefGoogle Scholar
  24. 24.
    Rupp M, Korner R, V Tetko I (2011) Predicting the pKa of small molecules. Comb Chem High Throughput Screen 14(5):307–327CrossRefGoogle Scholar
  25. 25.
    Marosi A, Kovács Z, Béni S, Kökösi J, Noszál B (2009) Triprotic acid–base microequilibria and pharmacokinetic sequelae of cetirizine. Eur J Pharm Sci 37(3–4):321–328. CrossRefPubMedGoogle Scholar
  26. 26.
    Sober HA, Company CR (1970) Handbook of biochemistry: selected data for molecular biology. Chemical Rubber Company, ClevelandGoogle Scholar
  27. 27.
    Benesch RE, Benesch R (1955) The acid strength of the -SH group in cysteine and related compounds. J Am Chem Soc 77(22):5877–5881. CrossRefGoogle Scholar
  28. 28.
    Tam KY, Takács-Novák K (2001) Multi-wavelength spectrophotometric determination of acid dissociation constants: a validation study. Anal Chim Acta 434(1):157–167CrossRefGoogle Scholar
  29. 29.
    Allen RI, Box KJ, Comer JEA, Peake C, Tam KY (1998) Multiwavelength spectrophotometric determination of acid dissociation constants of ionizable drugs. J Pharm Biomed Anal 17(4):699–712CrossRefGoogle Scholar
  30. 30.
    Comer JEA, Manallack D (2014) Ionization constants and ionization profiles. In: Reedijk J (ed) Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, New York. CrossRefGoogle Scholar
  31. 31.
    Avdeef A, Box KJ, Comer JEA, Gilges M, Hadley M, Hibbert C, Patterson W, Tam KY (1999) PH-metric logP 11. pK a determination of water-insoluble drugs in organic solvent–water mixtures. J Pharm Biomed Anal 20(4):631–641CrossRefGoogle Scholar
  32. 32.
    Cabot JM, Fuguet E, Rosés M, Smejkal P, Breadmore MC (2015) Novel instrument for automated pKa determination by internal standard capillary electrophoresis. Anal Chem 87(12):6165–6172. CrossRefPubMedGoogle Scholar
  33. 33.
    Wan H, Holmén A, Någård M, Lindberg W (2002) Rapid screening of pKa values of pharmaceuticals by pressure-assisted capillary electrophoresis combined with short-end injection. J Chromatogr A 979(1–2):369–377CrossRefGoogle Scholar
  34. 34.
    Reijenga J, van Hoof A, van Loon A, Teunissen B (2013) Development of methods for the determination of pKa values. Anal Chem Insights 8:ACI.S12304. CrossRefGoogle Scholar
  35. 35.
    Sterling T, Irwin JJ (2015) ZINC 15 - ligand discovery for everyone. J Chem Inf Model 55(11):2324–2337. CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53(7):2719–2740. CrossRefPubMedGoogle Scholar
  37. 37.
    Saubern S, Guha R, Baell JB (2011) KNIME workflow to assess PAINS filters in SMARTS format. Comparison of RDKit and Indigo Cheminformatics Libraries. Mol Inf 30(10):847–850. CrossRefGoogle Scholar
  38. 38.
    eMolecules Database Free Version. Accessed 01 July 2017
  39. 39.
    OEChem Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe, NM.
  40. 40.
    Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M (2007) Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. J Comput-Aided Mol Des 21(12):681–691. CrossRefGoogle Scholar
  41. 41.
    Schrödinger Release 2016-4: Epik Version 3.8;. Schrödinger, LLC, New York, 2016Google Scholar
  42. 42.
    OEMolProp Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe, NM.
  43. 43.
    Wishart DS (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(90001):D668–D672. CrossRefPubMedGoogle Scholar
  44. 44.
    Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87(11):1123–1124. CrossRefGoogle Scholar
  45. 45.
    NCI Open Database, August 2006 Release. Accessed 8 Aug 2017
  46. 46.
    Enhanced NCI Database Browser 2.2. Accessed 8 Aug 2017
  47. 47.
    Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213. CrossRefPubMedGoogle Scholar
  48. 48.
    NCI/CADD Chemical Identifier Resolver. Accessed 8 Aug 2017
  49. 49.
    Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887–2893CrossRefGoogle Scholar
  50. 50.
    OEMedChem Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe.
  51. 51.
    Sirius T3 User Manual, v1.1. Sirius Analytical Instruments Ltd, East Sussex (2008)Google Scholar
  52. 52.
    Yasuda M (1959) Dissociation constants of some carboxylic acids in mixed aqueous solvents. Bull Chem Soc Japan 32(5):429–432CrossRefGoogle Scholar
  53. 53.
    Shedlovsky T (1962) The behaviour of carboxylic acids in mixed solvents. In: Pesce B (ed) Electrolytes. Pergamon Press, New York, pp 146–151Google Scholar
  54. 54.
    Avdeef A, Comer JEA, Thomson SJ (1993) pH-Metric log P. 3. Glass electrode calibration in methanol-water, applied to pKa determination of water-insoluble substances. Anal Chem 65(1):42–49. CrossRefGoogle Scholar
  55. 55.
    Takács-Novák K, Box KJ, Avdeef A (1997) Potentiometric pKa determination of water-insoluble compounds: validation study in methanol/water mixtures. Int J Pharm 151(2):235–248. CrossRefGoogle Scholar
  56. 56.
    Szakacs Z, Beni S, Varga Z, Orfi L, Keri G, Noszal B (2005) Acid–base profiling of imatinib (gleevec) and its fragments. J Med Chem 48(1):249–255. CrossRefPubMedGoogle Scholar
  57. 57.
    Szakacs Z, Kraszni M, Noszal B (2004) Determination of microscopic acid–base parameters from NMR–pH titrations. Anal Bioanal Chem 378(6):1428–1448. CrossRefPubMedGoogle Scholar
  58. 58.
    Dozol H, Blum-Held C, Guédat P, Maechling C, Lanners S, Schlewer G, Spiess B (2002) Inframolecular acid–base studies of the tris and tetrakis myo-inositol phosphates including the 1, 2, 3-trisphosphate motif. J Mol Struct 643(1–3):171–181CrossRefGoogle Scholar
  59. 59.
    OEDepict Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe.
  60. 60.
    Fraczkiewicz R (2013) In silico prediction of ionization. In: Reedijk J (ed) Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, New York. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mehtap Işık
    • 1
    • 2
  • Dorothy Levorse
    • 3
  • Ariën S. Rustenburg
    • 1
    • 4
  • Ikenna E. Ndukwe
    • 5
  • Heather Wang
    • 6
  • Xiao Wang
    • 5
  • Mikhail Reibarkh
    • 5
  • Gary E. Martin
    • 5
  • Alexey A. Makarov
    • 6
  • David L. Mobley
    • 7
  • Timothy Rhodes
    • 3
  • John D. Chodera
    • 1
  1. 1.Computational and Systems Biology Program, Sloan Kettering InstituteMemorial Sloan Kettering Cancer CenterNew YorkUSA
  2. 2.Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical SciencesCornell UniversityNew YorkUSA
  3. 3.Pharmaceutical SciencesMRL, Merck & Co., Inc.RahwayUSA
  4. 4.Graduate Program in Physiology, Biophysics, and Systems BiologyWeill Cornell Medical CollegeNew YorkUSA
  5. 5.Process and Analytical Research and DevelopmentMerck & Co., Inc.RahwayUSA
  6. 6.Analytical Research & DevelopmentMRL, Merck & Co., Inc.RahwayUSA
  7. 7.Department of Pharmaceutical Sciences and Department of ChemistryUniversity of California, IrvineIrvineUSA

Personalised recommendations