Skip to main content
Log in

Multi-task learning for pKa prediction

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. As opposed to parametric approaches, where the information from the training data are summarized in the parameters of a distribution, non-parametric approaches require the training data for later predictions. This distinction does not prevent non-parametric approaches from having parameters, here the regression weights \(\varvec{\alpha}\) and hyper-parameters \(\varvec{\theta}\). Parameters \(\varvec{\alpha}\), which directly belong to the model itself, are computed from the data by solving an optimization problem. Hyper-parameters \(\varvec{\theta}\) parameterize the kernel, and can be estimated via gradient-based optimization by maximizing the marginal likelihood.

  2. Predictions are technically equivalent to those of kernel ridge regression [28], a regularized form of ordinary regression. Here, we do not use additional features of GPs like predictive variance. However, the used GP MTL methods do make use of Bayesian aspects of GPs.

  3. Technically, \({\mathbf{K}^\mathbf{t} \otimes \mathbf{K}^\mathbf{x} \in {\mathbb{R}}^{MN \times MN}}\). In our setting, each sample (compound) occurs in one task only. After removing (marginalizing out) rows and columns corresponding to combinations of compounds and tasks that don’t occur, the resulting matrix is N × N. In practice, it is not necessary to construct the MN × MN matrix explicitly.

  4. Task similarity matrices are positive definite. Their entries thus correspond to evaluations of an inner product in some Hilbert space, which can be converted to Euclidean distance by using \(||\mathbf{x}-\mathbf{z}||_{2}^{2}= \sum_{i=1}^d |x_i-z_i|^2 = \; <\!\mathbf{x}-\mathbf{z},\mathbf{x}-\mathbf{z}\!> =<\!\mathbf{x},\mathbf{x}\!> -2 <\!\mathbf{x},\mathbf{z}\!> + <\!\mathbf{z},\mathbf{z}\!>\).

  5. Comparison is based on Table S2 of the supplement of Ref. [20], using column R’ and third lines from each row of the common tasks.

References

  1. Rupp M, Körner R, Tetko IV (2010) Predicting the pK a of small molecules. Comb Chem High Throughput Screen 14(5):307–327

    Article  Google Scholar 

  2. Lee A, Crippen G (2009) Predicting pK a . J Chem Inf Model 49(9):2013–2033

    Article  CAS  Google Scholar 

  3. Fraczkiewicz R (2006) In silico prediction of ionization. In: Testa B, Waterbeemd H (eds) Comprehensive medicinal chemistry II, vol 5, Elsevier, Oxford, pp 603–626

    Google Scholar 

  4. Wan H, Ulander J (2006) High-throughput pK a screening and prediction amenable for ADME profiling. Expert Opin Drug Metab Toxicol 2(1):139–155

    Article  CAS  Google Scholar 

  5. Ho J, Coote M (2010) A universal approach for continuum solvent pK a calculations: are we there yet? Theor Chim Acta 125(1–2):3–21

    CAS  Google Scholar 

  6. Tehan B, Lloyd E, Wong M, Pitt W, Gancia E, Manallack D (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 2: application to amines, anilines and various nitrogen containing heterocyclic compounds. Quant Struct Act Rel 21(5):473–485

    Article  CAS  Google Scholar 

  7. Caruana R (1997) Multi-task learning. Mach Learn 28:41–75

    Article  Google Scholar 

  8. Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156

    Article  CAS  Google Scholar 

  9. Varnek A, Gaudin C, Marcou G, Baskin I, Pandey A, Tetko I (2009) Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model 49(1):133–144

    Article  CAS  Google Scholar 

  10. Ning X, Rangwala H, Karypis G (2009) Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J Chem Inf Model 49(11):2444–2456

    Article  CAS  Google Scholar 

  11. Mordelet F, Vert JP (2011) ProDiGe: PRioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinf 12:389

    Article  Google Scholar 

  12. Rossotti F, Rossotti H (1961) The determination of stability constants and other equilibrium constants in solution. McGraw-Hill, New York

    Google Scholar 

  13. Hasselbalch KA (1916) Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebundenen Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl. Biochem Z 78:112–144

    CAS  Google Scholar 

  14. Clark J, Perrin D (1964) Prediction of the strength of organic bases. Q Rev Chem Soc 18:295–320

    Article  CAS  Google Scholar 

  15. Perrin DD, Dempsey B, Serjeant EP (1981) pK a Prediction for organic acids and bases. Chapman and Hall/CRC Press, Boca Raton

    Google Scholar 

  16. Lyman W, Reehl W, Rosenblatt D (eds) (1982) Handbook of chemical property estimation methods: environmental behavior of organic compounds. McGraw-Hill, New York

    Google Scholar 

  17. Livingstone D (2003) Theoretical property predictions. Curr Top Med Chem 3(10):1171–1192

    Article  CAS  Google Scholar 

  18. Hammett L (1937) The effect of structure upon the reactions of organic compounds. Benzene derivatives. J Am Chem Soc 59(1):96–103

    Article  CAS  Google Scholar 

  19. Ertl P (1997) Simple quantum chemical parameters as an alternative to the Hammett sigma constants in QSAR studies. Quant Struct Act Rel 16(5):377–382

    Article  CAS  Google Scholar 

  20. Rupp M, Körner R, Tetko IV (2010) Estimation of acid dissociation constants using graph kernels. Mol Inf 29(10):731–740

    Article  CAS  Google Scholar 

  21. Tehan B, Lloyd E, Wong M, Pitt W, Montana J, Manallack D, Gancia E (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 1: application to phenols and carboxylic acids. Quant Struct Act Rel 21(5):457–472

    Article  CAS  Google Scholar 

  22. Howard P, Meylan W (1999) Physical/chemical property database (PHYSPROP). Syracuse Research Corporation, Environmental Science Center, 6225 Running Ridge Road, North Syracuse, New York

  23. Fukui K, Yonezawa T, Nagata C (1954) Theory of substitution in conjugated molecules. Bull Chem Soc Jpn 27(7):423–427

    Article  CAS  Google Scholar 

  24. Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581

    Article  CAS  Google Scholar 

  25. Stewart J (1997) MOPAC: a general molecular orbital package. Quant Chem Prog Exch 10:86

    Google Scholar 

  26. Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, de Sousa JA, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554

    Article  CAS  Google Scholar 

  27. Rasmussen CE, Williams CK (2005) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  28. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New York

    Google Scholar 

  29. Cressie NA (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  30. Bonilla E, Chai KM, Williams C (2008) Multi-task Gaussian process prediction. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20. MIT Press, Cambridge, pp 153–160

    Google Scholar 

  31. Rebonato R, Jäckel P (1999) The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes. J Risk 2(2):17–27

    Google Scholar 

  32. Skolidis G, Sanguinetti G (2011) Bayesian multitask classification with Gaussian process priors. IEEE Trans Neural Netw 22(12):2011–2021

    Article  Google Scholar 

  33. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83

    Article  Google Scholar 

  34. Manallack D (2007) The pK a distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38

    Google Scholar 

  35. Liao C, Nicklaus M (2009) Comparison of nine programs predicting pK a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors thank Klaus-Robert Müller, Gisbert Schneider, Tiago Rodrigues, and an anonymous referee for helpful suggestions, and David Manallack for the provision of data. M. Rupp and K. Hansen acknowledge partial support by FP7-ICT programme of the European Community (PASCAL2) and DFG (grant MU 987/4-2). M. Rupp acknowledges partial support by FP7 programme of the European Community (Marie Curie IEF 273039). G. Sanguinetti and G. Skolidis acknowledge support from the Engineering and Physical Sciences Research Council (EPSRC, grant EP/F009461/2). G. Sanguinetti is funded by the Scottish government through the SICSA initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Rupp.

Electronic supplementary material

Below is the link to the electronic supplementary material.

PDF (1610 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skolidis, G., Hansen, K., Sanguinetti, G. et al. Multi-task learning for pKa prediction. J Comput Aided Mol Des 26, 883–895 (2012). https://doi.org/10.1007/s10822-012-9582-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-012-9582-x

Keywords

Navigation