Bio- and Chemoinformatics Approaches for Metabolomics Data Analysis

  • Michael Witting
Part of the Methods in Molecular Biology book series (MIMB, volume 1738)


Metabolomics data analysis includes several repetitive tasks, including data sorting, calculation of exact masses or other physicochemical properties, or searching for identifiers in different databases. Several of these tasks can be automated using command line tools or short scripts in different scripting languages like Perl, Python, or R. This chapter presents simple solutions and short scripts written in R that can be used for the interaction with specific web services or for the calculation of physicochemical properties or molecular formulae.

Key words

R, isotope pattern Formula calculation Physicochemical properties Command line Web service Identifier conversion 


  1. 1.
    Benton HP, Wong DM, Trauger SA et al (2008) XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. Anal Chem 80:6382–6389CrossRefGoogle Scholar
  2. 2.
    Smith CA, Want EJ, O'Maille G et al (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78:779–787CrossRefGoogle Scholar
  3. 3.
    Müller C, Dietz I, Tziotis D et al (2013) Molecular cartography in acute chlamydia pneumoniae infections—a non-targeted metabolomics approach. Anal Bioanal Chem 405:5119–5131CrossRefGoogle Scholar
  4. 4.
    Stanstrup J, Gerlich M, Dragsted LO et al (2013) Metabolite profiling and beyond: approaches for the rapid processing and annotation of human blood serum mass spectrometry data. Anal Bioanal Chem 405(15):5037–5048CrossRefGoogle Scholar
  5. 5.
    Kind T, Fiehn O (2007) Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8:105CrossRefGoogle Scholar
  6. 6.
    Tziotis D, Hertkorn N, Schmitt-Kopplin P (2011) Kendrick-analogous network visualisation of ion cyclotron resonance Fourier transform mass spectra: improved options for the assignment of elemental compositions and the classification of organic molecular complexity. Eur J Mass Spectrom 17:415–421CrossRefGoogle Scholar
  7. 7.
    Witting M, Lucio M, Tziotis D et al (2015) DI-ICR-FT-MS-based high-throughput deep metabotyping: a case study of the Caenorhabditis Elegans–Pseudomonas Aeruginosa infection model. Anal Bioanal Chem 407:1059–1073CrossRefGoogle Scholar
  8. 8.
    Treutler H, Neumann S (2016) Prediction, detection, and validation of isotope clusters in mass spectrometry data. Meta 6:E37Google Scholar
  9. 9.
    Kerber A et al (1998) MOLGEN 40 Match-communications in mathematical and in computer. Chemistry 37:205–208Google Scholar
  10. 10.
    Peironcely JE et al (2012) OMG: Open Molecule Generator. J Cheminformatics 4:21CrossRefGoogle Scholar
  11. 11.
    Jaghoori MM et al (2013) PMG: multi-core Metabolite Identification. Electronic Notes in Theoretical Computer Science 299:53–60CrossRefGoogle Scholar
  12. 12.
    Kind T, Scholz M, Fiehn O (2009) How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS One 4:e5440CrossRefGoogle Scholar
  13. 13.
    Wohlgemuth G et al (2010) The chemical translation service—a web-based tool to improve standardization of metabolomic reports. Bioinformatics 26:2647–2648CrossRefGoogle Scholar
  14. 14.
    Steinbeck C et al (2003) The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500CrossRefGoogle Scholar
  15. 15.
    Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:33. Scholar
  16. 16.
    Cao M et al (2014) Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics. Metabolomics:1–11Google Scholar
  17. 17.
    Peironcely JE et al (2012) OMG: Open Molecule Generator. J Cheminformatics 4:1–13CrossRefGoogle Scholar
  18. 18.
    Wolf S et al (2010) In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics 11:148CrossRefGoogle Scholar
  19. 19.
    Gerlich M, Neumann S (2013) MetFusion: integration of compound identification strategies. J Mass Spectrom 48:291–298CrossRefGoogle Scholar
  20. 20.
    Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30CrossRefGoogle Scholar
  21. 21.
    Kanehisa M et al (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(suppl 1):D354–D357CrossRefGoogle Scholar
  22. 22.
    Wishart DS et al (2012) HMDB 3.0—the human metabolome database in 2013. Nucleic Acids Res 41((Database issue)):D801–D807. CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Wishart DS et al (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37(Database):D603–D610CrossRefGoogle Scholar
  24. 24.
    Sud M et al (2007) LMSD: LIPID MAPS structure database. Nucleic Acids Res 35(suppl 1):D527–D532CrossRefGoogle Scholar
  25. 25.
    Caspi R et al (2008) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 36(suppl 1):D623–D631PubMedGoogle Scholar
  26. 26.
    David S. Wishart, Yannick Djoumbou Feunang, Ana Marcu, An Chi Guo, Kevin Liang, Rosa Vázquez-Fresno, Tanvir Sajed, Daniel Johnson, Carin Li, Naama Karu, Zinat Sayeeda, Elvis Lo, Nazanin Assempour, Mark Berjanskii, Sandeep Singhal, David Arndt, Yonjie Liang, Hasan Badran, Jason Grant, Arnau Serra-Cayuela, Yifeng Liu, Rupa Mandal, Vanessa Neveu, Allison Pon, Craig Knox, Michael Wilson, Claudine Manach, Augustin Scalbert; HMDB 4.0: the human metabolome database for 2018, Nucleic Acids Research, gkx1089,
  27. 27.
    Juty N, Le Novère N, Laibe C (2012) and MIRIAM registry: community resources to provide persistent identification. Nucleic Acids Res 40:D580–D586CrossRefGoogle Scholar
  28. 28.
    Beisken S et al (2013) KNIME-CDK: workflow-driven cheminformatics. BMC Bioinformatics 14:257CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Research Unit Analytical BioGeoChemistryHelmholtz Zentrum München – German Research Center for Environmental HealthNeuherbergGermany
  2. 2.Chair of Analytical Analytical Food Chemistry, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und UmweltTechnische Universität MünchenFreisingGermany

Personalised recommendations