Abstract
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure–activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.
Similar content being viewed by others
Abbreviations
- VDSS :
-
Volume of distribution at steady state
- CL:
-
Clearance
- QSA/PR:
-
Quantitative structure–activity/-property relationship
- GMFE:
-
Geometric mean fold-error
- RFE:
-
Recursive feature elimination
- CDK:
-
Chemistry development kit
- PST:
-
Performance size trade-off
References
Paul SM et al (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9(3):203–214
Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3(8):711–715
Lau YY et al (2002) Development of a novel in vitro model to predict hepatic clearance using fresh, cryopreserved, and sandwich-cultured hepatocytes. Drug Metab Dispos 30(12):1446–1454
Obach RS (1999) Prediction of human clearance of twenty-nine drugs from hepatic microsomal intrinsic clearance data: an examination of in vitro half-life approach and nonspecific binding to microsomes. Drug Metab Dispos 27(11):1350–1359
Stringer RA, Strain-Damerell C, Nicklin P, Houston JB (2009) Evaluation of recombinant cytochrome P450 enzymes as an in vitro system for metabolic clearance predictions. Drug Metab Dispos 37(5):1025–1034
Rotroff DM et al (2010) Incorporating human dosimetry and exposure into high-throughput in vitro toxicity screening. Toxicol Sci 117(2):348–358
Obach RS et al (1997) The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. J Pharmacol Exp Ther 283(1):46–58
Vilar S, Chakrabarti M, Costanzi S (2010) Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in silico derived physicochemical descriptors. J Mol Graph Model 28(8):899–903
Hutter MC (2009) In silico prediction of drug properties. Curr Med Chem 16(2):189–202
Yu MJ (2010) Predicting total clearance in humans from chemical structure. J Chem Inf Model 50(7):1284–1295
Berellini G, Springer C, Waters NJ, Lombardo F (2009) In silico prediction of volume of distribution in human using linear and nonlinear models on a 669 compound data set. J Med Chem 52(14):4488–4495
Olah M, Bologa C, Oprea TI (2004) An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des 18(7–9):437–449
Gleeson MP, Hersey A, Montanari D, Overington J (2011) Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat Rev Drug Discov 10(3):197–208
Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488
Wolff ME (1995) Burger’s medicinal chemistry and drug discovery. J Chem Educ 72(8):A170
Hoekman D (1996) Exploring QSAR fundamentals and applications in chemistry and biology. J Am Chem Soc 118(43):10678
Dearden JC, Cronin MT, Kaiser KL (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20(3–4):241–266
Cruciani G, Pastor M, Guba W (2000) VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. Eur J Pharm Sci 11(Supplement 2):S29–S39
Stevenson JM, Mulready PD (2003) Pipeline Pilot 2.1 By Scitegic, 9665 Chesapeake Drive, Suite 401, San Diego, CA 92123-1365. www.scitegic.com. J Am Chem Soc 125(5):1437–1438
Vilar S, Cozza G, Moro S (2008) Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr Top Med Chem 8(18):1555–1572
Joachims T (1999) Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning. MIT Press, Cambridge, MA, pp 169–184
Igel C, Glasmachers T, Heidrich-Meisner V (2008) Shark. J Mach Learn Res 9:993–996
Abeel T, Van de Peer Y, Saeys Y (2009) Java-ML: a machine learning library. J Mach Learn Res 10:931–934
Hall M et al (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18
Schaul T et al (2010) PyBrain. J Mach Learn Res 11:743–746
Li ZR et al (2007) MODEL-molecular descriptor lab: a web-based server for computing structural and physicochemical features of compounds. Biotechnol Bioeng 97(2):389–396
Melville JL, Hirst JD, TMACC (2007) Interpretable correlation descriptors for quantitative structure—activity relationships. J Chem Inf Model 47(2):626–634
Cited; Available from: http://www.ra.cs.uni-tuebingen.de/software/joelib/
Steinbeck C et al (2006) Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12(17):2111–2120
Obach RS, Lombardo F, Waters NJ (2008) Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos 36(7):1385–1405
Lombardo F et al (2006) A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. J Med Chem 49(7):2262–2267
Tychonoff AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198
Tibshirani R (1996) Regression shrinkage and selection via the lasso. R Stat Soc Series B Stat Methodol 58(1):267–288
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Yu H, Yang J, Wang W, Han J (2003) Discovering compact and highly discriminative features or feature combinations of drug activities using support vector machines. Proc IEEE Comput Soc Bioinform Conf 2:220–228
Li H et al (2005) Effect of selection of molecular descriptors on the prediction of blood-brain barrier penetrating and nonpenetrating agents by statistical learning methods. J Chem Inf Model 45(5):1376–1384
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443
Aizerman A, Braverman EM, Rozoner LI (1964) Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control 25:821–837
Genton MG (2001) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20((4):):269–276
Gleeson MP, Hersey A, Hannongbua S (2011) In-silico ADME models: a general assessment of their utility in drug discovery applications. Curr Top Med Chem 11(4):358–381
Rucker C, Rucker G, Meringer M (2007) y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model 47(6):2345–2357
Gupta RR et al (2010) Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties. Drug Metab Dispos 38(11):2083–2090
Adv. Chemistry Development Inc, Toronto, Canada. Cited; Available from: http://www.acdlabs.com/home/
clogP v4.3, BioByte Corp., Claremont, USA. Cited; Available from: http://www.biobyte.com/index.html
VolSurf, Molecular Discovery Ltd, Middlesex, UK. Cited; Available from: http://www.moldiscovery.com/index.php
Molconn-Z, EduSoft LC, Richmond USA. Cited; Available from: http://www.edusoft-lc.com/
Pipeline Pilot v7.5.2, Accelrys, San Diego USA. Cited; Available from: http://accelrys.com/
MOE, Chemical Computing Group Inc, Montréal, Canada. Cited; Available from: http://www.chemcomp.com/
Steinbeck C et al (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500
Acknowledgments
The authors are grateful to the developers of the CDK library, which made it possible to make this application publicly available. We also like to acknowledge useful comments from an anonymous referee. This work was supported by the International Research Training Group (IRTG) on “Genomics and Systems Biology of Molecular Networks” (GRK1360 and GRK1772, German Research Foundation (DFG)).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Demir-Kavuk, O., Bentzien, J., Muegge, I. et al. DemQSAR: predicting human volume of distribution and clearance of drugs. J Comput Aided Mol Des 25, 1121–1133 (2011). https://doi.org/10.1007/s10822-011-9496-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-011-9496-z