Skip to main content
Log in

Outliers detection in the statistical accuracy test of a pK a prediction

  • Original Paper
  • Published:
Journal of Mathematical Chemistry Aims and scope Submit manuscript

Abstract

The regression diagnostics algorithm REGDIA in S-Plus is introduced to examine the accuracy of pK a predicted with four programs: PALLAS, MARVIN, PERRIN and SYBYL. On basis of a statistical analysis of residuals, outlier diagnostics are proposed. Residual analysis of the ADSTAT program is based on examining goodness-of-fit via graphical diagnostics of 15 exploratory data analysis plots, such as bar plots, box-and-whisker plots, dot plots, midsum plots, symmetry plots, kurtosis plots, differential quantile plots, quantile-box plots, frequency polygons, histograms, quantile plots, quantile-quantile plots, rankit plots, scatter plots, and autocorrelation plots. Outliers in pK a relate to molecules which are poorly characterized by the considered pK a program. Of the seven most efficient diagnostic plots (the Williams graph, Graph of predicted residuals, Pregibon graph, Gray L–R graph, Index graph of Atkinson measure, Index graph of diagonal elements of the hat matrix and Rankit Q–Q graph of jackknife residuals) the Williams graph was selected to give the most reliable detection of outliers. The six statistical characteristics, \({F_{\rm exp},R^{2},R_{\rm P}^{2},{\it MEP},{\it AIC}}\), and s in pK a units, successfully examine the specimen of 25 acids and bases of a Perrin’s data set classifying four pK a prediction algorithms. The highest values \({F_{\rm exp},R^{2},R_{\rm P}^{2}}\) and the lowest value of MEP and s and the most negative AIC have been found for PERRIN algorithm of pK a prediction so this algorithm achieves the best predictive power and the most accurate results. The proposed accuracy test of the REGDIA program can also be extended to test other predicted values, as log P, log D, aqueous solubility or some physicochemical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xing L., Glen R.C.: Novel methods for the prediction of log P, pK and log D. J. Chem. Inf. Comput. Sci. 42, 796–805 (2002)

    CAS  Google Scholar 

  2. Xing L., Glen R.C., Clark R.D.: Predicting pK a by molecular tree structured fingerprints and PLS. J. Chem. Inf. Comput. Sci. 43, 870–879 (2003)

    CAS  Google Scholar 

  3. Zhang J., Kleinöder T., Gasteiger J.: Prediction of pK a values for aliphatic carboxylicv acids and alcohols with empirical atomic charge descriptors. J. Chem. Inf. Model. 46, 2256–2266 (2006)

    Article  CAS  Google Scholar 

  4. Hansen N.T., Kouskoumvekaki I., Jorgensen F.S., Brunak S., Jonsdottir S.O.: Prediction of pH-dependent aqueous solubility of druglike molecules. J. Chem. Inf. Model. 46, 2601–2609 (2006)

    Article  CAS  Google Scholar 

  5. ACD/LabsTM , pK a Predictor 3.0, Advanced Chemistry Development Inc. 133 Richmond St. W. Suite 605, Toronto

  6. Rekker R.F., ter Laak A.M., Mannhold R.: Prediction by the ACD/pK a method of values of the acid-base dissociation constant (pK a) for 22 drugs. Quant. Struct. Act. Relat. 12, 152 (1993)

    Article  CAS  Google Scholar 

  7. Slater B., McCormack A., Avdeef A., Commer J.E.A.: Comparison of ACD/pK a with experimental values. Pharm. Sci. 83, 1280–1283 (1994)

    Article  CAS  Google Scholar 

  8. Results of titrometric measurements on selected drugs compared to ACD/pK a September 1998 predictions, (Poster), AAPS, Boston, November 1997

  9. P. Fedichev, L. Menshikov, Long-range interactions of macroscopic objects in polar liquids, Quantum pK a calculation module, QUANTUM pharmaceuticals, http://www.q-lead.com

  10. Z. Gulyás, G. Pöcze, A. Petz, F. Darvas, Pallas cluster—a new solution to accelerate the high-throughut ADME-TOX prediction, ComGenex-CompuDrug, PKALC/PALLAS 2.1 CompuDrug Chemistry Ltd., http://www.compudrug.com

  11. J. Kenseth, Ho-ming Pang, A. Bastin, Aqueous pK a determination using the pK a Analyzer ProTM, http://www.CombiSep.com

  12. Evagelou V., Tsantili-Kakoulidou A., Koupparis M.: Determination of the dissociation constants of the cephalosporins cefepime and cefpirome using UV spectrometry and pH potentiometry. J. Pharm. Biomed. Anal. 31, 1119–1128 (2003)

    Article  CAS  Google Scholar 

  13. Tajkhorshid E., Paizs B., Suhai S.: Role of isomerization barriers in the pK a control of the retinal Schiff base: a density functional study. J. Phys. Chem. B 103, 4518–4527 (1999)

    Article  CAS  Google Scholar 

  14. SYBYL is distributed by tripos, Inc., St. Louis MO 63144, http://www.tripos.com

  15. Marvin: http://www.chemaxon.com/conf/Prediction_of_dissociation_constant_using_microcon-stants.pdf and http://www.chemaxon.com/conf/New_method_for_pKa_estimation.pdf

  16. Shapley W.A., Bacskay G.B., Warr G.G.: Ab initio quantum chemical studies of the pK a values of hydroxybenzoic acids in aqueous solution with special reference to the hydrophobicity of hydroxybenzoates and their binding to surfactants. J. Phys. Chem. B 102, 1938–1944 (1998)

    Article  CAS  Google Scholar 

  17. Schueuermann G., Cossi M., Barone V., Tomasi J.: Prediction of the pK a of carboxylic acids using the ab initio Continuum-Solvation Model PCM-UAHF. J. Phys. Chem. A 102, 6707–6712 (1998)

    Google Scholar 

  18. da Silva C.O., da Silva E.C., Nascimento M.A.C.: Ab initio calculations of absolute pK a values in aqueous solution I. Carboxylic acids. J. Phys. Chem. A 103, 11194–11199 (1999)

    Article  CAS  Google Scholar 

  19. Tran N.L., Colvin M.E.: The prediction of biochemical acid dissociation constants using first principles quantum chemical simulations. Theochem 532, 127–137 (2000)

    Article  CAS  Google Scholar 

  20. Citra M.J.: Estimating the pK a of phenols, carboxylic acids and alcohols from semiempirical quantum chemical methods. Chemosphere 38, 191–206 (1999)

    Article  CAS  Google Scholar 

  21. Chen I.J., MacKerell A.D.: Computation of the influence of chemical substitution on the pK a of pyridine using semiempirical and ab initio methods. Theor. Chem. Acc. 103, 483–494 (2000)

    CAS  Google Scholar 

  22. Bashford D., Karplus M.: pK a’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29, 10219–10225 (1990)

    Article  CAS  Google Scholar 

  23. Oberoi H., Allewell N.M.: Multigrid solution of the nonlinear Poison–Boltzmann equation and calculation of titration curves. Biophys. J. 65, 48–55 (1993)

    Article  CAS  Google Scholar 

  24. Antosiewicz J., McCammon J.A., Gilson M.K.: Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415–436 (1994)

    Article  CAS  Google Scholar 

  25. Sham Y.Y., Chu Z.T., Warshel A.: Consistent calculation of pK a’s of ionizable residues in proteins: semi-microscopic and microscopic approaches. J. Phys. Chem. B 101, 4458–4472 (1997)

    Article  CAS  Google Scholar 

  26. Kim K.H., Martin Y.C.: Direct prediction of linear free energy substituent effects from 3D structures using comparative molecular field effect. 1. Electronic effect of substituted benzoic acids. J. Org. Chem. 56, 2723–2729 (1991)

    Article  CAS  Google Scholar 

  27. Kim K.H., Martin Y.C.: Direct prediction of dissociation constants of clonidine-like imidazolines, 2-substituted imidazoles, and 1-methyl-2-substituted imidazoles from 3D structures using a comparative molecular field analysis (CoMFA) approach. J. med. Chem. 34, 2056–2060 (1991)

    Article  CAS  Google Scholar 

  28. Gargallo R., Sotriffer C.A., Liedl K.R., Rode B.M.: Application of multivariate data analysis methods to comparative molecular field analysis (CoMFA) data: proton affinities and pK a prediction for nucleic acids components. J. Comput. Aided Mol. Des. 13, 611–623 (1999)

    Article  CAS  Google Scholar 

  29. Perrin D.D., Dempsey B., Serjeant E.P.: pK a prediction for organic acids and bases. Chapman and Hall Ltd., London (1981)

    Google Scholar 

  30. CompuDrug NA Inc., pKALC version 3.1, (1996)

  31. ACD Inc. ACD/pK a version 1.0, (1997)

  32. http://chemsilico.com/CS_prpKa/PKAhome.html. Accessed Aug 2006

  33. Habibi-Yangjeh A., Danandeh-Jenagharad M., Nooshyar M.: Prediction acidity constant of various benzoic acids and phenols in water using linear and nonlinear QSPR models. Bull. Korean Chem. Soc. 26, 2007–2016 (2005)

    Article  CAS  Google Scholar 

  34. Popelier P.L.A., Smith P.J.: QSAR models based on quantum topological molecular similarity. Eur. J. Med. Chem. 41, 862–873 (2006)

    Article  CAS  Google Scholar 

  35. Schmid G.H. et al.: The application of iterative optimization techniques to chemical kinetic data of large random error. Can. J. Chem. 54, 3330–3341 (1976)

    Article  CAS  Google Scholar 

  36. M. Meloun, J. Militký, M. Forina, Chemometrics for Analytical Chemistry, Vol. 2. PC-Aided Regression and Related Methods (Ellis Horwood, Chichester, 1994), and Vol. 1. PC-Aided Statistical Data Analysis (Ellis Horwood, Chichester, 1992)

  37. S-PLUS: MathSoft, Data Analysis Products Division, 1700 Westlake Ave N, Suite 500, Seattle, WA 98109, USA, http://www.insightful.com/products/splus (1997)

  38. ADSTAT: ADSTAT 1.25, 2.0, 3.0 (Windows 95), TriloByte Statistical Software Ltd., Pardubice, Czech Republic

  39. Belsey D.A., Kuh E., Welsch R.E.: Regression Diagnostics: Identifying Influential data and Sources of Collinearity. Wiley, New York (1980)

    Google Scholar 

  40. Cook R.D., Weisberg S.: Residuals and Influence in Regression. Chapman & Hall, London (1982)

    Google Scholar 

  41. Atkinson A.C.: Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Claredon Press, Oxford (1985)

    Google Scholar 

  42. Chatterjee S., Hadi A.S.: Sensitivity Analysis in Linear Regression. Wiley, New York (1988)

    Book  Google Scholar 

  43. Barnett V., Lewis T.: Outliers in Statistical Data. 2nd edn. Wiley, New York (1984)

    Google Scholar 

  44. R.E. Welsch, Linear Regression Diagnostics, Technical Report 923-77, Sloan School of Management, Massachusetts Institute of Technology, (1977)

  45. Weisberg S.: Applied Linear Regression. Wiley, New York (1985)

    Google Scholar 

  46. Rousseeuw P.J., Leroy A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milan Meloun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meloun, M., Bordovská, S. & Kupka, K. Outliers detection in the statistical accuracy test of a pK a prediction. J Math Chem 47, 891–909 (2010). https://doi.org/10.1007/s10910-009-9609-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10910-009-9609-2

Keywords

Navigation