Journal of Computer-Aided Molecular Design

, Volume 26, Issue 7, pp 883–895 | Cite as

Multi-task learning for pKa prediction

  • Grigorios Skolidis
  • Katja Hansen
  • Guido Sanguinetti
  • Matthias Rupp


Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.


pKa prediction Multi-task learning Quantitative structure–property relationships Gaussian processes 



The authors thank Klaus-Robert Müller, Gisbert Schneider, Tiago Rodrigues, and an anonymous referee for helpful suggestions, and David Manallack for the provision of data. M. Rupp and K. Hansen acknowledge partial support by FP7-ICT programme of the European Community (PASCAL2) and DFG (grant MU 987/4-2). M. Rupp acknowledges partial support by FP7 programme of the European Community (Marie Curie IEF 273039). G. Sanguinetti and G. Skolidis acknowledge support from the Engineering and Physical Sciences Research Council (EPSRC, grant EP/F009461/2). G. Sanguinetti is funded by the Scottish government through the SICSA initiative.

Supplementary material

10822_2012_9582_MOESM1_ESM.pdf (1.6 mb)
PDF (1610 KB)


  1. 1.
    Rupp M, Körner R, Tetko IV (2010) Predicting the pKa of small molecules. Comb Chem High Throughput Screen 14(5):307–327CrossRefGoogle Scholar
  2. 2.
    Lee A, Crippen G (2009) Predicting pK a. J Chem Inf Model 49(9):2013–2033CrossRefGoogle Scholar
  3. 3.
    Fraczkiewicz R (2006) In silico prediction of ionization. In: Testa B, Waterbeemd H (eds) Comprehensive medicinal chemistry II, vol 5, Elsevier, Oxford, pp 603–626Google Scholar
  4. 4.
    Wan H, Ulander J (2006) High-throughput pKa screening and prediction amenable for ADME profiling. Expert Opin Drug Metab Toxicol 2(1):139–155CrossRefGoogle Scholar
  5. 5.
    Ho J, Coote M (2010) A universal approach for continuum solvent pK a calculations: are we there yet? Theor Chim Acta 125(1–2):3–21Google Scholar
  6. 6.
    Tehan B, Lloyd E, Wong M, Pitt W, Gancia E, Manallack D (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 2: application to amines, anilines and various nitrogen containing heterocyclic compounds. Quant Struct Act Rel 21(5):473–485CrossRefGoogle Scholar
  7. 7.
    Caruana R (1997) Multi-task learning. Mach Learn 28:41–75CrossRefGoogle Scholar
  8. 8.
    Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156CrossRefGoogle Scholar
  9. 9.
    Varnek A, Gaudin C, Marcou G, Baskin I, Pandey A, Tetko I (2009) Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model 49(1):133–144CrossRefGoogle Scholar
  10. 10.
    Ning X, Rangwala H, Karypis G (2009) Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J Chem Inf Model 49(11):2444–2456CrossRefGoogle Scholar
  11. 11.
    Mordelet F, Vert JP (2011) ProDiGe: PRioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinf 12:389CrossRefGoogle Scholar
  12. 12.
    Rossotti F, Rossotti H (1961) The determination of stability constants and other equilibrium constants in solution. McGraw-Hill, New YorkGoogle Scholar
  13. 13.
    Hasselbalch KA (1916) Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebundenen Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl. Biochem Z 78:112–144Google Scholar
  14. 14.
    Clark J, Perrin D (1964) Prediction of the strength of organic bases. Q Rev Chem Soc 18:295–320CrossRefGoogle Scholar
  15. 15.
    Perrin DD, Dempsey B, Serjeant EP (1981) pK a Prediction for organic acids and bases. Chapman and Hall/CRC Press, Boca RatonGoogle Scholar
  16. 16.
    Lyman W, Reehl W, Rosenblatt D (eds) (1982) Handbook of chemical property estimation methods: environmental behavior of organic compounds. McGraw-Hill, New YorkGoogle Scholar
  17. 17.
    Livingstone D (2003) Theoretical property predictions. Curr Top Med Chem 3(10):1171–1192CrossRefGoogle Scholar
  18. 18.
    Hammett L (1937) The effect of structure upon the reactions of organic compounds. Benzene derivatives. J Am Chem Soc 59(1):96–103CrossRefGoogle Scholar
  19. 19.
    Ertl P (1997) Simple quantum chemical parameters as an alternative to the Hammett sigma constants in QSAR studies. Quant Struct Act Rel 16(5):377–382CrossRefGoogle Scholar
  20. 20.
    Rupp M, Körner R, Tetko IV (2010) Estimation of acid dissociation constants using graph kernels. Mol Inf 29(10):731–740CrossRefGoogle Scholar
  21. 21.
    Tehan B, Lloyd E, Wong M, Pitt W, Montana J, Manallack D, Gancia E (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 1: application to phenols and carboxylic acids. Quant Struct Act Rel 21(5):457–472CrossRefGoogle Scholar
  22. 22.
    Howard P, Meylan W (1999) Physical/chemical property database (PHYSPROP). Syracuse Research Corporation, Environmental Science Center, 6225 Running Ridge Road, North Syracuse, New YorkGoogle Scholar
  23. 23.
    Fukui K, Yonezawa T, Nagata C (1954) Theory of substitution in conjugated molecules. Bull Chem Soc Jpn 27(7):423–427CrossRefGoogle Scholar
  24. 24.
    Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581CrossRefGoogle Scholar
  25. 25.
    Stewart J (1997) MOPAC: a general molecular orbital package. Quant Chem Prog Exch 10:86Google Scholar
  26. 26.
    Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, de Sousa JA, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554CrossRefGoogle Scholar
  27. 27.
    Rasmussen CE, Williams CK (2005) Gaussian processes for machine learning. MIT Press, CambridgeGoogle Scholar
  28. 28.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New YorkGoogle Scholar
  29. 29.
    Cressie NA (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
  30. 30.
    Bonilla E, Chai KM, Williams C (2008) Multi-task Gaussian process prediction. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20. MIT Press, Cambridge, pp 153–160Google Scholar
  31. 31.
    Rebonato R, Jäckel P (1999) The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes. J Risk 2(2):17–27Google Scholar
  32. 32.
    Skolidis G, Sanguinetti G (2011) Bayesian multitask classification with Gaussian process priors. IEEE Trans Neural Netw 22(12):2011–2021CrossRefGoogle Scholar
  33. 33.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83CrossRefGoogle Scholar
  34. 34.
    Manallack D (2007) The pKa distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38Google Scholar
  35. 35.
    Liao C, Nicklaus M (2009) Comparison of nine programs predicting pK a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Grigorios Skolidis
    • 1
  • Katja Hansen
    • 2
    • 4
  • Guido Sanguinetti
    • 3
  • Matthias Rupp
    • 4
    • 5
  1. 1.Department of Statistical ScienceUniversity College LondonLondonUK
  2. 2.Theory DepartmentFritz Haber Institute of the Max Planck SocietyBerlinGermany
  3. 3.School of InformaticsUniversity of EdinburghEdinburghScotland
  4. 4.Machine Learning Group, TU BerlinBerlinGermany
  5. 5.Institute of Pharmaceutical Sciences, ETH ZurichZürichSwitzerland

Personalised recommendations