The Golm Metabolome Database: a database for GC-MS based metabolite profiling

  • Jan Hummel
  • Joachim Selbig
  • Dirk Walther
  • Joachim KopkaEmail author
Part of the Topics in Current Genetics book series (TCG, volume 18)


In the post-genomic era, biological science continues a transition from a predominantlyqualitative towards an increasingly quantitative science. Genomic, transcriptomic, proteomic, andnow metabolomic technologies significantly contribute to the generation of huge amounts of data. Thesedata, which typically describe changes in gene expression or changes in protein and metabolite pools,cannot effectively be analysed and interpreted by computer based programming if access is only providedthrough traditional publication schemes. Therefore ‘-omics’ data sets require formalisedrepresentation and access through databases. Otherwise important information will be lost which mayserve as reference data for current and future science. Transcript and protein profiling is dominatedby few almost comprehensive technologies. In contrast, the metabolomic field will require multipleanalytical profiling approaches to cover the chemical multitude of primary and secondary metabolism.As a consequence, technology-oriented metabolomics databases start to emerge. We will use GC-TOF-MS-basedmetabolite profiling as an example for the prototypical design of central database objects and structures.The focus will be on the required detailed information for the archiving of metabolite fingerprintingand profiling data sets. Special consideration is given to aspects of maintaining information sufficientand necessary for the experimental reproduction of metabolite identification and quantification results.Both aspects are essential for the sustainable use of GC-TOF-MS-based metabolite profiling and forthe comparison to other metabolomics technologies.


Resource Description Framework Metabolite Profile Mass Fragment Reference Substance Metabolite Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74:5463–5467 PubMedCrossRefGoogle Scholar
  2. 2.
    Arita M (2004) Computational resources for metabolomics. Briefings Funct Genomics Proteomics 3:84–93 CrossRefGoogle Scholar
  3. 3.
    Ausloos P, Clifton CL, Lias SG, Mikaya AI, Stein SE, Tchekhovskoi DV, Sparkman OD, Zaikin V, Zhu D (1999) The critical evaluation of a comprehensive mass spectral library. J Am Soc Mass Spectrom 10:287–299 PubMedCrossRefGoogle Scholar
  4. 4.
    Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34:D504–D506 PubMedCrossRefGoogle Scholar
  5. 5.
    Ball CA, Awad IA, Demeter J, Gollub J, Hebert JM, Hernandez-Boussard T, Jin H, Matese JC, Nitzberg M, Wymore F, Zachariah ZK, Brown PO, Sherlock G (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 33:D580–D582 PubMedCrossRefGoogle Scholar
  6. 6.
    Ben Wagner A (2006) SciFinder Scholar 2006: An empirical analysis of research topic query processing. J Chem Inf Model 46:767–774 PubMedCrossRefGoogle Scholar
  7. 7.
    Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004) Potential of metabolomics as a functional genomics tool. Trends Plant Sci 9:418–425 PubMedCrossRefGoogle Scholar
  8. 8.
    Booth D, Haas H, McCabe F, Newcomer E, Champion M, Ferris C, Orchard D (2003) Web Services Architecture.
  9. 9.
    Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371 PubMedCrossRefGoogle Scholar
  10. 10.
    Cary MP, Bader GD, Sander C (2005) Pathway information for systems biology. FEBS Letters 579:1815–1820 PubMedCrossRefGoogle Scholar
  11. 11.
    Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S (2004) NASCArrays: a repository for microarray data generated by NASC's transcriptomics service. Nucleic Acids Res 32:D575–D577 PubMedCrossRefGoogle Scholar
  12. 12.
    Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W (2005) BioMart and BioConductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21:3439–3440 PubMedCrossRefGoogle Scholar
  13. 13.
    Erban A, Schauer N, Fernie AR, Kopka J (2007) Non-supervised construction and application of mass spectral and retention time index libraries from time-of-flight GC-MS metabolite profiles. In: Weckwerth W (ed) Metabolomics: methods and protocols. Humana Press, Totowa, pp 19–38 Google Scholar
  14. 14.
    Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol 48:155–171 PubMedCrossRefGoogle Scholar
  15. 15.
    Fiehn O, Wohlgemuth G, Scholz M (2005) Automatic annotation of metabolomic mass spectra by integrating experimental metadata. Proc Lect Notes Bioinformatics 3615:224–239 Google Scholar
  16. 16.
    Galperin MY (2005) The molecular database collection: 2005 update. Nucleic Acids Res 33:D5–D24 PubMedCrossRefGoogle Scholar
  17. 17.
    Halket JM, Przyborowska A, Stein SE, Mallard WG, Down S, Chalmers RA (1999) Deconvolution gas chromatography mass spectrometry of urinary organic acids - potential for pattern recognition and automated identification of metabolic disorders. Rapid Commun Mass Spectrom 13:279–284 PubMedCrossRefGoogle Scholar
  18. 18.
    Halket JM, Waterman D, Przyborowska AM, Patel RKP, Fraser PD, Bramley PM (2005) Chemical derivatization and mass spectral libraries in metabolic pro?ling by GC/MS and LC/MS/MS. J Exp Bot 56:219–243 PubMedCrossRefGoogle Scholar
  19. 19.
    Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall RD, Kopka J, Lane GA, Lange BM, Liu JR, Mendes P, Nikolau BJ, Oliver SG, Paton NW, Rhee S, Roessner-Tunali U, Saito K, Smedsgaard J, Sumner LW, Wang T, Walsh S, Wurtele ES, Kell DB (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22:1601–1606 PubMedCrossRefGoogle Scholar
  20. 20.
    Junker BH, Klukas C, Schreiber F (2006) VANTED: A system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics 7:109 PubMedCrossRefGoogle Scholar
  21. 21.
    Lindon JC, Keun HC, Ebbels TMD, Pearce JMT, Holmes E, Nicholson JK (2005) The Consortium for Metabonomic Toxicology (COMET): aims, activities and achievements. Pharmacogenomics 6:691–699 PubMedCrossRefGoogle Scholar
  22. 22.
    Kanehisa M (1997) A database for post-genome analysis. Trends Genet 13:375–376 PubMedCrossRefGoogle Scholar
  23. 23.
    Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30 PubMedCrossRefGoogle Scholar
  24. 24.
    Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357 PubMedCrossRefGoogle Scholar
  25. 25.
    Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089 PubMedCrossRefGoogle Scholar
  26. 26.
    Kenny LC, Dunn WB, Ellis DI, Myers J, Baker PN and the GOPEC Consortium, Kell DB (2005) Novel biomarkers for pre-eclamsia detected using metabolomics and machine learning. Metabolomics 1:227–234 CrossRefGoogle Scholar
  27. 27.
    Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gill M, Karp PD (2005) EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33:D334–D337 PubMedCrossRefGoogle Scholar
  28. 28.
    Kind T, Fiehn O (2006) Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics 7:234 PubMedCrossRefGoogle Scholar
  29. 29.
    Kopka J (2006a) Current challenges and developments in GC-MS based metabolite profiling technology. J Biotechnol 124:312–322 PubMedCrossRefGoogle Scholar
  30. 30.
    Kopka J (2006b) Gas chromatography mass spectrometry. In: Nagata T, Lörz H, Widholm JM (eds) Biotechnology in agriculture and forestry Vol 57: Saito K, Dixon RA, Willmitzer L (eds) Plant metabolomics. Springer-Verlag, Berlin Heidelberg New York, pp 3–20 CrossRefGoogle Scholar
  31. 31.
    Kopka J, Fernie AF, Weckwerth W, Gibon Y, Stitt M (2004) Metabolite profiling in plant biology: platforms and destinations. Genome Biol 5:109–117 PubMedCrossRefGoogle Scholar
  32. 32.
    Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, Bergmüller E, Dörmann P, Gibon Y, Stitt M, Willmitzer L, Fernie AR, Steinhauser D (2005) GMD@CSBDB: The Golm metabolome database. Bioinformatics 21:1635–1638 PubMedCrossRefGoogle Scholar
  33. 33.
    Kremsky J (2005) PubChem versus CAS. Chem Eng News 83:6 Google Scholar
  34. 34.
    Krummenacker M, Paley S, Mueller L, Yan T, Karp PD (2005) Querying and computing with BioCyc databases. Bioinformatics 21:3454–3455 PubMedCrossRefGoogle Scholar
  35. 35.
    Kümmel A, Panke S, Heinemann M (2006) Putative regulatory sites unreveled by network-embedded thermodynamics analysis of metabolome data. Mol Syst Biol 2, (doi:10.1038/msb4100074 2006) Google Scholar
  36. 36.
    Lüdemann A, Weicht D, Selbig J, Kopka J (2004) PaVESy: Pathway visualization and editing system. Bioinformatics 20:2841–2844 PubMedCrossRefGoogle Scholar
  37. 37.
    Mehrotra B, Mendes P (2006) Bioinformatics: Approaches to integrate metabolomics and other systems biology data. In: Nagata T, Lörz H, Widholm JM (eds) Biotechnology in agriculture and forestry Vol 57: Saito K, Dixon RA, Willmitzer L (eds) Plant metabolomics. Springer-Verlag, Berlin Heidelberg New York, pp 3–20 Google Scholar
  38. 38.
    Moco S, Bino RJ, Vorst O, Verhoeven HA, de Groot J, van Beek TA, Vervoort J, de Vos CH (2006) A liquid chromatography mass spectrometry based metabolome database for tomato. Plant Physiol 141:1205–1218 PubMedCrossRefGoogle Scholar
  39. 39.
    Murray-Rust P, Rzepa HS, Tyrell SM, Zhang Y (2004a) Representation and use of chemistry in the global electronic age. Org Biomol Chem 2:3192–3203 PubMedCrossRefGoogle Scholar
  40. 40.
    Murray-Rust P, Rzepa HS, Stein S (2004b) The INChI as an LSID for molecules in lifescience. W3C Workshop on Semantic Web for Life Sciences, 27–28 October 2004, Cambridge, Massachusetts USA,
  41. 41.
    Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P (2004), Taverna: a tool for the composition and enactment of bioinformatics workflows. Bionformatics 20:3045–3054 CrossRefGoogle Scholar
  42. 42.
    Orchard S, Hermjakob H, Julian RK Jr, Runte K, Sherman D, Wojcik J, Zhu W, Apweiler R (2004) Common interchange standards for proteomics data: public availability of tools and schema. Proteomics 4:490–491 PubMedCrossRefGoogle Scholar
  43. 43.
    Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-Tunali U, Forbes MG, Willmitzer L, Fernie AR, Kopka J (2005) GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS Lett 579:1332–1337 PubMedCrossRefGoogle Scholar
  44. 44.
    Schomburg I, Chang A, Schomburg D (2002a) BRENDA, enzyme data and metabolic information. Nucleic Acids Res 30:47–49 PubMedCrossRefGoogle Scholar
  45. 45.
    Schomburg I, Chang AJ, Hofmann O, Ebeling C, Ehrentreich F, Schomburg D (2002b) BRENDA: a resource for enzyme data and metabolic information. Trends Biochem Sci 27:54–56 PubMedCrossRefGoogle Scholar
  46. 46.
    Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg (2004) BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 32:D431–D433 PubMedCrossRefGoogle Scholar
  47. 47.
    Schreiber F, Schwobbermeyer H (2005) MAVisto: a tool for the exploration of network motifs. Bioinformatics 21:3572–3574 PubMedCrossRefGoogle Scholar
  48. 48.
    Schwall K, Zielenbach K (2000) SciFinder – A new generation of research tool. Chem Innovat 30:45–50 Google Scholar
  49. 49.
    Shang S, Tan DS (2005) Advancing chemistry and biology through diversity-oriented synthesis of natural product-like libraries. Curr Opin Chem Biol 9:248–258 PubMedCrossRefGoogle Scholar
  50. 50.
    Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi H, Kurokawa K, Arita M, Saito K, Ohta D, Shibata D, Kanaya S (2006) KNApSAcK: A comprehensive species-metabolite relationship database. In: Nagata T, Lörz H, Widholm JM (eds) Biotechnology in agriculture and forestry Vol 57: Saito K, Dixon RA, Willmitzer L (eds) Plant metabolomics. Springer-Verlag, Berlin Heidelberg New York, pp 165–184 CrossRefGoogle Scholar
  51. 51.
    Smedsgaard J, Nielsen J (2005) Metabolite profiling of fungi and yeast: from phenotype to metabolome by MS and informatics. J Exp Bot 56:273–286 PubMedCrossRefGoogle Scholar
  52. 52.
    Smith CA, O'Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G (2005) METLIN - A metabolite mass spectral database. Ther Drug Monit 27:747–751 PubMedCrossRefGoogle Scholar
  53. 53.
    Spasiæ I, Dunn WB, Velarde G, Tseng A, Jenkins H, Hardy NW, Oliver SG, Kell DB (2006) MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics. BMC Bioinformatics 7:281 CrossRefGoogle Scholar
  54. 54.
    Stein L (2002) Creating a bioinformatics nation. Nature 417:119–120 PubMedCrossRefGoogle Scholar
  55. 55.
    Stein SE (1999) An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J Am Soc Mass Spectrom 10:770–781 CrossRefGoogle Scholar
  56. 56.
    Steinhauser D, Kopka J (2007) Methods, applications and concepts of metabolite profiling: primary metabolism. In: Fernie AR, Baginsky S (eds) Plant systems biology. Birkhäuser: in press Google Scholar
  57. 57.
    Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochem 62:817–836 CrossRefGoogle Scholar
  58. 58.
    Sundararaj S, Guo A, Habibi-Nazhad B, Rouani M, Stothard P, Ellison M, Wishart DS (2004) The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli. Nucleic Acids Res 32:D293–D295 PubMedCrossRefGoogle Scholar
  59. 59.
    Thimm O, Blaesing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939 PubMedCrossRefGoogle Scholar
  60. 60.
    Wagner C, Sefkow M, Kopka J (2003) Construction and application of a mass spectral and retention time index database generated from plant GC/EI-TOF-MS metabolite profiles. Phytochem 62:887–900 CrossRefGoogle Scholar
  61. 61.
    Whitley KM (2002) Analysis of SciFinder scholar and web of science citation searches. J Amer Soc Informat Sci Technol 53:1210–1215 CrossRefGoogle Scholar
  62. 62.
    Wilkinson MD, Links M (2002) BioMOBY: an open source biological web services proposal. Brief Bioinformatics 3:331–341 PubMedCrossRefGoogle Scholar
  63. 63.
    Zhang PF, Foerster H, Tissier CP, Mueller L, Paley S, Karp PD, Rhee SY (2005) MetaCyc and AraCyc. Metabolic pathway databases for plant research. Plant Physiol 138:27–37 PubMedCrossRefGoogle Scholar
  64. 64.
    Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136:2621–2632 PubMedCrossRefGoogle Scholar
  65. 65.
    Zimmermann P, Hennig L, Gruissem W (2005) Gene-expression analysis and network discovery using Genevestigator. Trends Plant Sci 10:407–409 PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jan Hummel
    • 1
  • Joachim Selbig
    • 2
  • Dirk Walther
    • 1
  • Joachim Kopka
    • 1
    Email author
  1. 1.Max Planck Institute of Molecular Plant Physiology (MPI-MP)Potsdam-GolmGermany
  2. 2.University of PotsdamInstitute of Biochemistry and Biology, c/o MPI-MPPotsdam-GolmGermany

Personalised recommendations