Journal of Computer-Aided Molecular Design

, Volume 26, Issue 7, pp 883–895

Multi-task learning for pKa prediction

  • Grigorios Skolidis
  • Katja Hansen
  • Guido Sanguinetti
  • Matthias Rupp
Article

DOI: 10.1007/s10822-012-9582-x

Cite this article as:
Skolidis, G., Hansen, K., Sanguinetti, G. et al. J Comput Aided Mol Des (2012) 26: 883. doi:10.1007/s10822-012-9582-x

Abstract

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.

Keywords

pKa prediction Multi-task learning Quantitative structure–property relationships Gaussian processes 

Supplementary material

10822_2012_9582_MOESM1_ESM.pdf (1.6 mb)
PDF (1610 KB)

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Grigorios Skolidis
    • 1
  • Katja Hansen
    • 2
    • 4
  • Guido Sanguinetti
    • 3
  • Matthias Rupp
    • 4
    • 5
  1. 1.Department of Statistical ScienceUniversity College LondonLondonUK
  2. 2.Theory DepartmentFritz Haber Institute of the Max Planck SocietyBerlinGermany
  3. 3.School of InformaticsUniversity of EdinburghEdinburghScotland
  4. 4.Machine Learning Group, TU BerlinBerlinGermany
  5. 5.Institute of Pharmaceutical Sciences, ETH ZurichZürichSwitzerland

Personalised recommendations