Molecular Diversity

, Volume 11, Issue 1, pp 23–36 | Cite as

Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness

  • Sunil Gupta
  • João Aires-de-Sousa
Full Length Paper


The chemical space covered by compounds involved in metabolic reactions was compared with that of a random dataset of purchasable compounds by chemoinformatics techniques. The comparison was based on 3D structure, 2D structure, or descriptors of global properties, by means of self-organizing maps, random forests, and classification trees. The overlap between metabolites and non-metabolites was observed to be the least in the space defined by the global descriptors, the most discriminatory features being the number of OH groups, presence of aromatic systems, and molecular weight. Discrimination between the two datasets was achieved with accuracy up to 97%. Models were built to produce a metabolite-likeness parameter. A relationship between metabolite-likeness and ready biodegradability was observed.


Chemical diversity Chemoinformatics Computer chemistry Metabolism Neural networks 



Counterpropagation neural network




Japanese Ministry of International Trade and Industry




Radial distribution function


Random forest


Self-organizing map


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



S. Gupta acknowledges Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for the postdoctoral grant SFRH/BPD/14475/2003 co-funded by the POCI 2010 EU program. Molecular Networks GmbH (Erlangen, Germany) is acknowledged for access to PETRA and CORINA software packages. The authors thank Dr Robert Boethling for assistance with the MITI data.


  1. 1.
    Kulkarni SA, Zhu J, Blechinger S (2005) In silico techniques for the study and prediction of xenobiotic metabolism: a review. Xenobiotica 35:955–973, and references thereinGoogle Scholar
  2. 2.
    Ellis LBM, Roe D, Wackett LP (2006) The University of Minnesota biocatalysis/biodegradation database: the first decade. Nucleic Acids Res 34:D517–D521CrossRefGoogle Scholar
  3. 3.
    Holliday GL, Bartlett GJ, Almonacid DE, O’Boyle NM, Murray-Rust P, Thornton JM, Mitchell JBO (2005) MACiE: a database of enzyme reaction mechanisms. Bioinformatics 21:4315–4316CrossRefGoogle Scholar
  4. 4.
    Reitz M, Sacher O, Tarkhov A, Trümbach D, Gasteiger J (2004) Enabling the exploration of biochemical pathways. Org Biomol Chem 2:3226–3237CrossRefGoogle Scholar
  5. 5.
    Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 125:11853–11865CrossRefGoogle Scholar
  6. 6.
    Nobeli I, Ponstingl H, Krissinel EB, Thornton JM (2003) A structure-based anatomy of E. coli metabolome. J Mol Biol 334:697–719Google Scholar
  7. 7.
    Latino DARS, Aires-de-Sousa J (2006) Genome-scale classification of metabolic reactions: a chemoinformatics approach. Angew Chem Int Ed 45:2066–2069CrossRefGoogle Scholar
  8. 8.
    Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277–D280CrossRefGoogle Scholar
  9. 9.
    Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357CrossRefGoogle Scholar
  10. 10.
    Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Comput Sci 45:177–182Google Scholar
  11. 11.
    Henkel T, Brunne RM, Muller H, Reichel F (1999) Statistical investigation into the structural complementarity of natural products and synthetic compounds. Angew Chem Int Ed Engl 38:643–647CrossRefGoogle Scholar
  12. 12.
    Stahura FL, Jeffrey WG, Xue L, Bajorath J (2000) Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations. J Chem Inf Comput Sci 40:1245–1252CrossRefGoogle Scholar
  13. 13.
    Lee M-L, Schneider G (2001) Scaffold architecture and pharmacophoric properties of natural products and trade drugs: application in the design of natural product-based combinatorial libraries. J Comb Chem 3:284–289CrossRefGoogle Scholar
  14. 14.
    Feher M, Schmidt JM (2003) Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J Chem Inf Comput Sci 43:218–227CrossRefGoogle Scholar
  15. 15.
    Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H (2005) Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc Natl Acad Sci USA 102:17272–17277CrossRefGoogle Scholar
  16. 16.
    Karakoc E, Sahinalp SC, Cherkasov A (2006) Comparative QSAR and fragments distribution analysis of drugs, druglikes, metabolic substances, and antimicrobial compounds. J Chem Inf Model 46:2167–2182CrossRefGoogle Scholar
  17. 17.
    Cherkasov A (2006) Can ‹bacterial-metabolite-likeness’ model improve odds of ‹in silico’ antibiotic discovery? J Chem Inf Model 46:1214–1222Google Scholar
  18. 18.
    Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, Berlin Heidelberg New YorkGoogle Scholar
  19. 19.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  20. 20.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (2000) Classification and regression trees. Chapman & Hall/CRC, Boca RatonGoogle Scholar
  21. 21.
    Hemmer MC, Steinhauer V, Gasteiger J (1999) The prediction of the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19:151–164CrossRefGoogle Scholar
  22. 22.
    Organisation for Economic Co-operation and Development.
  23. 23.
    PETRA can be tested on the website and is developed by Molecular Networks GmbH (Erlangen, Germany).
  24. 24.
    National Institute of Technology and Evaluation, Japan.
  25. 25.
    CORINA software is available from Molecular Networks GmbH (Erlangen, Germany).
  26. 26.
    Aires-de-Sousa J (2002) Java tools for neural networks. Chemometrics Intell Lab Syst 61:167–173CrossRefGoogle Scholar
  27. 27.
    The JATOON applets are available at
  28. 28.
    Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958CrossRefGoogle Scholar
  29. 29.
    R Development Core Team (2004) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.orgGoogle Scholar
  30. 30.
    Fortran original by Breiman L, Cutler A, R port by Liaw A, Wiener M (2004)Google Scholar
  31. 31.
    Zupan J, Gasteiger J (1999) Neural networks in chemistry and drug design, 2nd edn. Wiley-VCH, WeinheimGoogle Scholar
  32. 32.
    Dobson CM (2004) Chemical space and biology. Nature 432:824–828CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2007

Authors and Affiliations

  1. 1.REQUIMTE, CQFB, Departamento de Química, Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal

Personalised recommendations