Molecular Diversity

, Volume 16, Issue 2, pp 389–400 | Cite as

QSAR classification of metabolic activation of chemicals into covalently reactive species

Full-Length Paper


Metabolic activation of chemicals into covalently reactive species might lead to toxicological consequences such as tissue necrosis, carcinogenicity, teratogenicity, or immune-mediated toxicities. Early prediction of this undesirable outcome can help in selecting candidates with increased chance of success, thus, reducing attrition at all stages of drug development. The ensemble modelling of mixed features was used for the development of a model to classify the metabolic activation of chemicals into covalently reactive species. The effects of the quality of base classifiers and performance measure for sorting were examined. An ensemble model of 13 naive Bayes classifiers was built from a diverse set of 1,479 compounds. The ensemble model was validated internally with five-fold cross validation and it has achieved sensitivity of 67.4% and specificity of 93.4% when tested on the training set. The final ensemble model was made available for public use.


QSAR QSTR Ensemble Toxicity Reactive metabolite Prediction Screening software 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11030_2012_9364_MOESM1_ESM.xls (221 kb)
ESM 1 (XLS 221 kb)


  1. 1.
    Kramer JA, Sagartz JE, Morris DL (2007) The application of discovery toxicology and pathology towards the design of safer pharmaceutical lead candidates. Nat Rev Drug Discov 6: 636–649. doi: 10.1038/nrd2378 PubMedCrossRefGoogle Scholar
  2. 2.
    Stepan AF, Walker DP, Bauman J, Price DA, Baillie TA, Kalgutkar AS, Aleo MD (2011) Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem Res Toxicol 24: 1345–1410. doi: 10.1021/tx200168d PubMedGoogle Scholar
  3. 3.
    Baillie TA (2008) Metabolism and toxicity of drugs. Two decades of progress in industrial drug metabolism. Chem Res Toxicol 21: 129–137. doi: 10.1021/tx7002273 PubMedCrossRefGoogle Scholar
  4. 4.
    Kalgutkar AS, Didiuk MT (2009) Structural alerts, reactive metabolites, and protein covalent binding: how reliable are these attributes as predictors of drug toxicity?. Chem Biodivers 6: 2115–2137. doi: 10.1002/cbdv.200900055 PubMedCrossRefGoogle Scholar
  5. 5.
    Lasser KE, Allen PD, Woolhandler SJ, Himmelstein DU, Wolfe SM, Bor DH (2002) Timing of new black box warnings and withdrawals for prescription medications. JAMA 287: 2215–2220. doi: 10.1001/jama.287.17.2215 PubMedCrossRefGoogle Scholar
  6. 6.
    Sun H, Scott DO (2010) Structure-based drug metabolism predictions for drug design. Chem Biol Drug Des 75: 3–17. doi: 10.1111/j.1747-0285.2009.00899.x PubMedCrossRefGoogle Scholar
  7. 7.
    Du QS, Huang RB, Chou KC (2008) Recent advances in QSAR and their applications in predicting the activities of chemical molecules, peptides and proteins for drug design. Curr Protein Pept Sci 9: 248–260. doi: 10.2174/138920308784534005 PubMedCrossRefGoogle Scholar
  8. 8.
    Du QS, Huang RB, Wei YT, Du LQ, Chou KC (2008) Multiple field three dimensional quantitative structure-activity relationship (MF-3D-QSAR). J Comput Chem 29: 211–219. doi: 10.1002/jcc.20776 PubMedCrossRefGoogle Scholar
  9. 9.
    Du QS, Huang RB, Wei YT, Pang ZW, Du LQ, Chou KC (2009) Fragment-based quantitative structure-activity relationship (FB-QSAR) for fragment-based drug design. J Comput Chem 30: 295–304. doi: 10.1002/jcc.21056 PubMedCrossRefGoogle Scholar
  10. 10.
    Prado-Prado FJ, González-Díaz H, de la Vega OM, Ubeira FM, Chou KC (2008) Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. Bioorg Med Chem 16: 5871–5880. doi: 10.1016/j.bmc.2008.04.068 PubMedCrossRefGoogle Scholar
  11. 11.
    Langowski J, Long A (2002) Computer systems for the prediction of xenobiotic metabolism. Adv Drug Deliv Rev 54: 407–415. doi: 10.1016/S0169-409X(02)00011-X PubMedCrossRefGoogle Scholar
  12. 12.
    Klopman G, Rosenkranz HS (1994) Approaches to SAR in carcinogenesis and mutagenesis. Prediction of carcinogenicity/mutagenicity using MULTI-CASE. Mutat Res Fundam Mol Mech Mutagen 305: 33–46. doi: 10.1016/0027-5107(94)90124-4 CrossRefGoogle Scholar
  13. 13.
    Greene N, Judson PN, Langowski JJ, Marchant CA (1999) Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR QSAR Environ Res 10: 299–314. doi: 10.1080/10629369908039182 PubMedCrossRefGoogle Scholar
  14. 14.
    Darvas F (1988) Predicting metabolic pathways by logic programming. J Mol Graph 6: 80–86. doi: 10.1016/0263-7855(88)85004-5 CrossRefGoogle Scholar
  15. 15.
    Mu F, Unkefer CJ, Unkefer PJ, Hlavacek WS (2011) Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds. Bioinformatics 27: 1537–1545. doi: 10.1093/bioinformatics/btr177 PubMedCrossRefGoogle Scholar
  16. 16.
    Enoch SJ, Cronin MT (2010) A review of the electrophilic reaction chemistry involved in covalent DNA binding. Crit Rev Toxicol 40: 728–748. doi: 10.3109/10408444.2010.494175 PubMedCrossRefGoogle Scholar
  17. 17.
    Arodź T, Yuen DA, Dudek AZ (2006) Ensemble of linear models for predicting drug properties. J Chem Inf Model 46: 416–423. doi: 10.1021/ci050375+ PubMedCrossRefGoogle Scholar
  18. 18.
    Li J, Lei B, Liu H, Li S, Yao X, Liu M, Gramatica P (2008) QSAR study of malonyl-CoA decarboxylase inhibitors using GA-MLR and a new strategy of consensus modeling. J Comput Chem 29: 2636–2647. doi: 10.1002/jcc.21002 PubMedCrossRefGoogle Scholar
  19. 19.
    Lei B, Xi L, Li J, Liu H, Yao X (2009) Global, local and novel consensus quantitative structure-activity relationship studies of 4-(phenylaminomethylene) isoquinoline-1,3(2H, 4H)-diones as potent inhibitors of the cyclin-dependent kinase 4. Anal Chim Acta 644: 17–24. doi: 10.1016/j.aca.2009.04.019 PubMedCrossRefGoogle Scholar
  20. 20.
    Norinder U, Liden P, Bostrom H (2006) Discrimination between modes of toxic action of phenols using rule based methods. Mol Divers 10: 207–212. doi: 10.1007/s11030-006-9019-3 PubMedCrossRefGoogle Scholar
  21. 21.
    Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inf 29: 476–488. doi: 10.1002/minf.201000061 CrossRefGoogle Scholar
  22. 22.
    Liew CY, Lim YC, Yap CW (2011) Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J Comput-Aided Mol Des 25: 855–871. doi: 10.1007/s10822-011-9468-3 PubMedCrossRefGoogle Scholar
  23. 23.
    Chou KC, Shen HB (2007) Recent progress in protein subcellular location prediction. Anal Biochem 370: 1–16. doi: 10.1016/j.ab.2007.07.006 PubMedCrossRefGoogle Scholar
  24. 24.
    Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6: 1728–1734. doi: 10.1021/pr060635i PubMedCrossRefGoogle Scholar
  25. 25.
    Chou KC, Shen HB (2007) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357: 633–640. doi: 10.1016/j.bbrc.2007.03.162 PubMedCrossRefGoogle Scholar
  26. 26.
    Asikainen AH, Ruuskanen J, Tuppurainen KA (2004) Performance of (consensus) kNN QSAR for predicting estrogenic activity in a large diverse set of organic compounds. SAR QSAR Environ Res 15: 19–32. doi: 10.1080/1062936032000169642 PubMedCrossRefGoogle Scholar
  27. 27.
    Gramatica P, Pilutti P, Papa E (2004) Validated QSAR prediction of OH tropospheric degradation of VOCs: splitting into training-test sets and consensus modeling. J Chem Inf Comput Sci 44: 1794–1802. doi: 10.1021/ci049923u PubMedCrossRefGoogle Scholar
  28. 28.
    Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273: 236–247. doi: 10.1016/j.jtbi.2010.12.024 PubMedCrossRefGoogle Scholar
  29. 29.
    Tropsha A, Golbraikh A (2007) Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr Pharm Des 13: 3494–3504. doi: 10.2174/138161207782794257 PubMedCrossRefGoogle Scholar
  30. 30.
    Validation of (Q)SAR Models (2011) Organisation for Economic Co-operation and Development.,3746,en_2649_34379_42926724_1_1_1_1,00.html. Accessed 23 May 2011
  31. 31.
    Home-PubMed-NCBI (2011) Accessed 26 August 2011
  32. 32.
    Micromedex® Healthcare series [Internet database] (2010) Thomson Healthcare. Accessed 25 November 2010
  33. 33.
    FDA Orange Book (2010) Approved drug products with therapeutic equivalence evaluations. Accessed 25 November 2010
  34. 34.
    Bolton EE, Wang Y, Thiessen PA, Bryant SH, Ralph AW, David CS (2008) Chapter 12 PubChem: Integrated platform of small molecules and biological activities. In: Annual reports in computational chemistry, vol 4. Elsevier, Amsterdam, pp 217–241. doi: 10.1016/S1574-1400(08)00012-1
  35. 35.
    Pipeline Pilot Student Edition (2011) Accessed 10 January 2011
  36. 36.
    Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48: 1733–1746PubMedCrossRefGoogle Scholar
  37. 37.
    Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32: 1466–1474. doi: 10.1002/jcc.21707 PubMedCrossRefGoogle Scholar
  38. 38.
    Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16: 412–424. doi: 10.1093/bioinformatics/16.5.412 PubMedCrossRefGoogle Scholar
  39. 39.
    Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405: 442–451. doi: 10.1016/0005-2795(75)90109-9 PubMedGoogle Scholar
  40. 40.
    Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27: 861–874. doi: 10.1016/j.patrec.2005.10.010 CrossRefGoogle Scholar
  41. 41.
    Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: Rapid prototyping for complex data mining tasks. In: KDD ’06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, August 2006. pp 935–940. doi: 10.1145/1150402.1150531
  42. 42.
    Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30: 275–349. doi: 10.3109/10409239509083488 PubMedCrossRefGoogle Scholar
  43. 43.
    Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22: 69–77. doi: 10.1002/qsar.200390007 CrossRefGoogle Scholar
  44. 44.
    Hawkins DM, Basak SC, Mills D (2003) Assessing model fit by cross-validation. J Chem Inf Comput Sci 43: 579–586. doi: 10.1021/ci025626i PubMedCrossRefGoogle Scholar
  45. 45.
    Hu LL, Huang T, Cai YD, Chou KC (2011) Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One 6: e22989. doi: 10.1371/journal.pone.0022989 PubMedCrossRefGoogle Scholar
  46. 46.
    Chou KC, Wu ZC, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6: e18258. doi: 10.1371/journal.pone.0018258 PubMedCrossRefGoogle Scholar
  47. 47.
    Wu ZC, Xiao X, Chou KC (2011) iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol BioSyst 7: 3287–3297. doi: 10.1039/c1mb05232b PubMedCrossRefGoogle Scholar
  48. 48.
    Lin WZ, Fang JA, Xiao X, Chou KC (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6: e24756. doi: 10.1371/journal.pone.0024756 PubMedCrossRefGoogle Scholar
  49. 49.
    Xiao X, Wang P, Chou KC (2011) GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. Mol BioSyst 7: 911–919. doi: 10.1039/c0mb00170h PubMedCrossRefGoogle Scholar
  50. 50.
    Xue Y, Li H, Ung CY, Yap CW, Chen YZ (2006) Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. Chem Res Toxicol 19: 1030–1039. doi: 10.1021/tx0600550 PubMedCrossRefGoogle Scholar
  51. 51.
    Ung CY, Li H, Yap CW, Chen YZ (2007) In silico prediction of pregnane X receptor activators by machine learning approaches. Mol Pharmacol 71: 158–168. doi: 10.1124/mol.106.027623 PubMedCrossRefGoogle Scholar
  52. 52.
    Nakao K, Fujikawa M, Shimizu R, Akamatsu M (2009) QSAR application for the prediction of compound permeability with in silico descriptors in practical use. J Comput Aided Mol Des 23: 309–319. doi: 10.1007/s10822-009-9261-8 PubMedCrossRefGoogle Scholar
  53. 53.
    Liew CY, Ma XH, Liu X, Yap CW (2009) SVM model for virtual screening of Lck inhibitors. J Chem Inf Model 49: 877–885. doi: 10.1021/ci800387z PubMedCrossRefGoogle Scholar
  54. 54.
    Hammann F, Gutmann H, Baumann U, Helma C, Drewe J (2009) Classification of cytochrome P450 activities using machine learning methods. Mol Pharm 6: 1920–1926. doi: 10.1021/mp900217x PubMedCrossRefGoogle Scholar
  55. 55.
    Fourches D, Barnes JC, Day NC, Bradley P, Reed JZ, Tropsha A (2010) Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species. Chem Res Toxicol 23: 171–183. doi: 10.1021/tx900326k PubMedCrossRefGoogle Scholar
  56. 56.
    Liew CY, Ma XH, Yap CW (2010) Consensus model for identification of novel PI3K inhibitors in large chemical library. J Comput Aided Mol Des 24: 131–141. doi: 10.1007/s10822-010-9321-0 PubMedCrossRefGoogle Scholar
  57. 57.
    Shen MY, Su BH, Esposito EX, Hopfinger AJ, Tseng YJ (2011) A comprehensive support vector machine binary hERG classification model based on extensive but biased end point hERG data sets. Chem Res Toxicol 24: 934–949. doi: 10.1021/tx200099j PubMedCrossRefGoogle Scholar
  58. 58.
    Dearden JC, Cronin MT, Kaiser KL (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20: 241–266. doi: 10.1080/10629360902949567 PubMedCrossRefGoogle Scholar
  59. 59.
    Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24: 2518–2525. doi: 10.1093/bioinformatics/btn479 PubMedCrossRefGoogle Scholar
  60. 60.
    Nelson SD (2001) Structure toxicity relationships-how useful are they in predicting toxicities of new drugs? In: Dansette PM, Snyder RR, Monks TJ et al (eds) Biological reactive intermediates VI, vol 500. Advances in Experimental Medicine and Biology, 2002/01/05 edn. Springer, Berlin, p 728Google Scholar
  61. 61.
    Kalgutkar AS, Soglia JR (2005) Minimising the potential for metabolic activation in drug discovery. Expert Opin Drug Metab Toxicol 1: 91–142. doi: 10.1517/17425255.1.1.91 PubMedCrossRefGoogle Scholar
  62. 62.
    Guengerich FP, MacDonald JS (2007) Applying mechanisms of chemical toxicity to predict drug safety. Chem Res Toxicol 20: 344–369. doi: 10.1021/tx600260a PubMedCrossRefGoogle Scholar
  63. 63.
    Golbraikh A, Tropsha A (2002) Beware of q2!. J Mol Graph Modell 20: 269–276CrossRefGoogle Scholar
  64. 64.
    Chou KC, Shen HB (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3: 153–162. doi: 10.1038/nprot.2007.494 PubMedCrossRefGoogle Scholar
  65. 65.
    Pearlman RS, Smith KM (1999) Metric validation and the receptor-relevant subspace concept. J Chem Inf Comput Sci 39: 28–35. doi: 10.1021/ci980137x CrossRefGoogle Scholar
  66. 66.
    Abraham M, McGowan J (1987) The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatography. Chromatographia 23: 243–246. doi: 10.1007/bf02311772 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Pharmaceutical Data Exploration Laboratory, Department of PharmacyNational University of SingaporeSingaporeSingapore
  2. 2.NUS High School of Mathematics and ScienceSingaporeSingapore

Personalised recommendations