Automatic QSAR modeling of ADME properties: blood–brain barrier penetration and aqueous solubility

  • Olga Obrezanova
  • Joelle M. R. Gola
  • Edmund J. Champness
  • Matthew D. Segall


In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood–brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with ‘manually’ built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.


Automatic model generation process QSAR modeling ADME properties Blood–brain barrier penetration Aqueous solubility Gaussian Processes Drug discovery 


  1. 1.
    Cartmell J, Enoch S, Krstajic D, Leahy DE (2005) J Comput Aid Mol Des 19:821CrossRefGoogle Scholar
  2. 2.
    Zhang S, Golbraikh A, Oloff S, Kohn H, Tropsha A (2006) J Chem Inf Model 46:1984CrossRefGoogle Scholar
  3. 3.
    Winkler DA, Burden FR (2004) J Mol Graph Model 22:499CrossRefGoogle Scholar
  4. 4.
    Tetko IV (2002) J Chem Inf Comput Sci 42:717CrossRefGoogle Scholar
  5. 5.
    Burden FR (2001) J Chem Inf Comput Sci 41:830CrossRefGoogle Scholar
  6. 6.
    Obrezanova O, Csányi G, Gola JMR, Segall MD (2007) J Chem Inf Model 47:1847CrossRefGoogle Scholar
  7. 7.
    Schwaighofer A, Schroeter T, Mika S, Laub J, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) J Chem Inf Model 47:407CrossRefGoogle Scholar
  8. 8.
    Daylight Chemical Information Systems, Inc., SMARTS Tutorial. Retrieved from 16/10/2007
  9. 9.
    Ertl P, Rhodes B, Selzer P (2000) J Med Chem 43:3714CrossRefGoogle Scholar
  10. 10.
    Abraham MH, McGowan JC (1987) Chromatographia 23:243CrossRefGoogle Scholar
  11. 11.
    Butina D (1999) J Chem Inf Comput Sci 39:747CrossRefGoogle Scholar
  12. 12.
    Livingstone D (1995) Data analysis for chemists. Oxford University Press, Oxford, UKGoogle Scholar
  13. 13.
    Wold S, Sjöström M, Eriksson L (1998) In: Schleyer PvR, Allinger NL, Clark T, Gasteiger J, Kollman P, Schaefer HF III, Schreiner PR (eds) The encyclopedia of computational chemistry, vol 3. Wiley, Chichester UK, pp 2006–2022Google Scholar
  14. 14.
    Enot D, Gautier R, Le Marouille J (2001) SAR QSAR Environ Res 12:461CrossRefGoogle Scholar
  15. 15.
    Tino P, Nabney IT, Williams BS, Losel J, Sun Y (2004) J Chem Inf Comput Sci 44:1647CrossRefGoogle Scholar
  16. 16.
    Schroeter T, Schwaighofer A, Mika S, Laak AT, Sulzle D, Ganzer U, Heinrich N, Muller KR (2007) J Comput Aided Mol Des 21:485Google Scholar
  17. 17.
    MacKay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, Cambridge, UKGoogle Scholar
  18. 18.
    Rasmussen CE, Williams CKI (2006) Gaussian Processes for machine learning. The MIT Press, Cambridge, MAGoogle Scholar
  19. 19.
    Buhman MD (2003) Radial basis functions: theory and implementations. Cambridge University Press, Cambridge, UKGoogle Scholar
  20. 20.
    Whitley DC, Ford MG, Livingstone DJ (2000) J Chem Inf Comput Sci 40:1160CrossRefGoogle Scholar
  21. 21.
    Clark DE (2005) In: Doherty AM (ed) Annual reports in medicinal chemistry, vol 40. Elsevier Academic Press, San Diego, CA, pp 403–415CrossRefGoogle Scholar
  22. 22.
    Butina D, Gola JRM (2003) J Chem Inf Comput Sci 43:837CrossRefGoogle Scholar
  23. 23.
    Abraham MH, Ibrahim A, Zhao Y, Acree WE Jr (2006) J Pharm Sci 95:2091CrossRefGoogle Scholar
  24. 24.
    Huuskonen J (2000) J Chem Inf Comput Sci 40:773CrossRefGoogle Scholar
  25. 25.
    Rose K, Hall LH, Kier LB (2002) J Chem Inf Comput Sci 42:651CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Olga Obrezanova
    • 1
  • Joelle M. R. Gola
    • 1
  • Edmund J. Champness
    • 1
  • Matthew D. Segall
    • 1
  1. 1.BioFocus DPI Ltd.Saffron WaldenUK

Personalised recommendations