Skip to main content

Advertisement

Log in

DemQSAR: predicting human volume of distribution and clearance of drugs

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure–activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

VDSS :

Volume of distribution at steady state

CL:

Clearance

QSA/PR:

Quantitative structure–activity/-property relationship

GMFE:

Geometric mean fold-error

RFE:

Recursive feature elimination

CDK:

Chemistry development kit

PST:

Performance size trade-off

References

  1. Paul SM et al (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9(3):203–214

    CAS  Google Scholar 

  2. Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3(8):711–715

    Article  CAS  Google Scholar 

  3. Lau YY et al (2002) Development of a novel in vitro model to predict hepatic clearance using fresh, cryopreserved, and sandwich-cultured hepatocytes. Drug Metab Dispos 30(12):1446–1454

    Article  CAS  Google Scholar 

  4. Obach RS (1999) Prediction of human clearance of twenty-nine drugs from hepatic microsomal intrinsic clearance data: an examination of in vitro half-life approach and nonspecific binding to microsomes. Drug Metab Dispos 27(11):1350–1359

    CAS  Google Scholar 

  5. Stringer RA, Strain-Damerell C, Nicklin P, Houston JB (2009) Evaluation of recombinant cytochrome P450 enzymes as an in vitro system for metabolic clearance predictions. Drug Metab Dispos 37(5):1025–1034

    Article  CAS  Google Scholar 

  6. Rotroff DM et al (2010) Incorporating human dosimetry and exposure into high-throughput in vitro toxicity screening. Toxicol Sci 117(2):348–358

    Article  CAS  Google Scholar 

  7. Obach RS et al (1997) The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. J Pharmacol Exp Ther 283(1):46–58

    CAS  Google Scholar 

  8. Vilar S, Chakrabarti M, Costanzi S (2010) Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in silico derived physicochemical descriptors. J Mol Graph Model 28(8):899–903

    Article  CAS  Google Scholar 

  9. Hutter MC (2009) In silico prediction of drug properties. Curr Med Chem 16(2):189–202

    Article  CAS  Google Scholar 

  10. Yu MJ (2010) Predicting total clearance in humans from chemical structure. J Chem Inf Model 50(7):1284–1295

    Article  CAS  Google Scholar 

  11. Berellini G, Springer C, Waters NJ, Lombardo F (2009) In silico prediction of volume of distribution in human using linear and nonlinear models on a 669 compound data set. J Med Chem 52(14):4488–4495

    Article  CAS  Google Scholar 

  12. Olah M, Bologa C, Oprea TI (2004) An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des 18(7–9):437–449

    Article  CAS  Google Scholar 

  13. Gleeson MP, Hersey A, Montanari D, Overington J (2011) Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat Rev Drug Discov 10(3):197–208

    Article  CAS  Google Scholar 

  14. Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488

    Article  CAS  Google Scholar 

  15. Wolff ME (1995) Burger’s medicinal chemistry and drug discovery. J Chem Educ 72(8):A170

    Google Scholar 

  16. Hoekman D (1996) Exploring QSAR fundamentals and applications in chemistry and biology. J Am Chem Soc 118(43):10678

    Article  Google Scholar 

  17. Dearden JC, Cronin MT, Kaiser KL (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20(3–4):241–266

    Article  CAS  Google Scholar 

  18. Cruciani G, Pastor M, Guba W (2000) VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. Eur J Pharm Sci 11(Supplement 2):S29–S39

    Article  CAS  Google Scholar 

  19. Stevenson JM, Mulready PD (2003) Pipeline Pilot 2.1 By Scitegic, 9665 Chesapeake Drive, Suite 401, San Diego, CA 92123-1365. www.scitegic.com. J Am Chem Soc 125(5):1437–1438

    Google Scholar 

  20. Vilar S, Cozza G, Moro S (2008) Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr Top Med Chem 8(18):1555–1572

    Article  CAS  Google Scholar 

  21. Joachims T (1999) Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning. MIT Press, Cambridge, MA, pp 169–184

    Google Scholar 

  22. Igel C, Glasmachers T, Heidrich-Meisner V (2008) Shark. J Mach Learn Res 9:993–996

    Google Scholar 

  23. Abeel T, Van de Peer Y, Saeys Y (2009) Java-ML: a machine learning library. J Mach Learn Res 10:931–934

    Google Scholar 

  24. Hall M et al (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  25. Schaul T et al (2010) PyBrain. J Mach Learn Res 11:743–746

    Google Scholar 

  26. Li ZR et al (2007) MODEL-molecular descriptor lab: a web-based server for computing structural and physicochemical features of compounds. Biotechnol Bioeng 97(2):389–396

    Article  CAS  Google Scholar 

  27. Melville JL, Hirst JD, TMACC (2007) Interpretable correlation descriptors for quantitative structure—activity relationships. J Chem Inf Model 47(2):626–634

  28. Cited; Available from: http://www.ra.cs.uni-tuebingen.de/software/joelib/

  29. Steinbeck C et al (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12(17):2111–2120

    Article  CAS  Google Scholar 

  30. Obach RS, Lombardo F, Waters NJ (2008) Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos 36(7):1385–1405

    Article  CAS  Google Scholar 

  31. Lombardo F et al (2006) A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. J Med Chem 49(7):2262–2267

    Article  CAS  Google Scholar 

  32. Tychonoff AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198

    Google Scholar 

  33. Tibshirani R (1996) Regression shrinkage and selection via the lasso. R Stat Soc Series B Stat Methodol 58(1):267–288

    Google Scholar 

  34. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

    Article  Google Scholar 

  35. Yu H, Yang J, Wang W, Han J (2003) Discovering compact and highly discriminative features or feature combinations of drug activities using support vector machines. Proc IEEE Comput Soc Bioinform Conf 2:220–228

    Google Scholar 

  36. Li H et al (2005) Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods. J Chem Inf Model 45(5):1376–1384

    Article  CAS  Google Scholar 

  37. Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443

    Article  Google Scholar 

  38. Aizerman A, Braverman EM, Rozoner LI (1964) Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control 25:821–837

    Google Scholar 

  39. Genton MG (2001) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312

    Google Scholar 

  40. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  41. Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20((4):):269–276

    Article  CAS  Google Scholar 

  42. Gleeson MP, Hersey A, Hannongbua S (2011) In-silico ADME models: a general assessment of their utility in drug discovery applications. Curr Top Med Chem 11(4):358–381

    Article  CAS  Google Scholar 

  43. Rucker C, Rucker G, Meringer M (2007) y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model 47(6):2345–2357

    Article  Google Scholar 

  44. Gupta RR et al (2010) Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties. Drug Metab Dispos 38(11):2083–2090

    Article  CAS  Google Scholar 

  45. Adv. Chemistry Development Inc, Toronto, Canada. Cited; Available from: http://www.acdlabs.com/home/

  46. clogP v4.3, BioByte Corp., Claremont, USA. Cited; Available from: http://www.biobyte.com/index.html

  47. VolSurf, Molecular Discovery Ltd, Middlesex, UK. Cited; Available from: http://www.moldiscovery.com/index.php

  48. Molconn-Z, EduSoft LC, Richmond USA. Cited; Available from: http://www.edusoft-lc.com/

  49. Pipeline Pilot v7.5.2, Accelrys, San Diego USA. Cited; Available from: http://accelrys.com/

  50. MOE, Chemical Computing Group Inc, Montréal, Canada. Cited; Available from: http://www.chemcomp.com/

  51. Steinbeck C et al (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the developers of the CDK library, which made it possible to make this application publicly available. We also like to acknowledge useful comments from an anonymous referee. This work was supported by the International Research Training Group (IRTG) on “Genomics and Systems Biology of Molecular Networks” (GRK1360 and GRK1772, German Research Foundation (DFG)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ernst-Walter Knapp.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 265 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demir-Kavuk, O., Bentzien, J., Muegge, I. et al. DemQSAR: predicting human volume of distribution and clearance of drugs. J Comput Aided Mol Des 25, 1121–1133 (2011). https://doi.org/10.1007/s10822-011-9496-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-011-9496-z

Keywords

Navigation