MetaboLights: towards a new COSMOS of metabolomics data management
- 1.6k Downloads
Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata (http://www.ebi.ac.uk/metabolights). It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6–8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.
KeywordsMetabolomics Databases ISA-Tab ISA commons
Metabolomics has become an important phenotyping technique for molecular biology and medicine. It assesses the molecular state of an organism or collections of organisms through the comprehensive quantitative and qualitative analysis of all small molecules in cells, tissues, and body fluids. Metabolic processes are at the core of physiology. Consequently, metabolomics is ideally suited as a medical tool to characterize disease states in organisms, as a tool for assessment of organisms for their suitability in, for example, renewable energy production, or for biotechnological applications in general. In addition application of metabolomics in environmental science, toxicology, food and medical industry is well established, growing and documented. Metabolomics studies generate large amounts of analytical data (Giga- to Terabytes depending on the size of the study) and therefore impose significant challenges for biomedical and life science e-infrastructures to cope with such data volumes and ensure that the data are captured, stored and disseminated based on open and widely accepted community standards. Years after the first standardisation exercises (Fiehn et al. 2007; Taylor et al. 2008), metabolomics is now reaching the state of a mature analytical technique as indicated by the establishment of 6–8 Regional Comprehensive Metabolomics Resource Cores (RCMRCs) by the NIH in the United States (http://grants.nih.gov/grants/guide/rfa-files/RFA-RM-11-016.html). In addition, we are now facing a rich ecosystem of specialised metabolomics databases, such as (Wishart et al. 2007; Kopka et al. 2005; Smith et al. 2005; Skogerson et al. 2011) as well as the first general metabolomics repositories (http://www.ebi.ac.uk/metabolights) and databases emerging. In Europe, the COSMOS consortium of 14 leading laboratories in metabolomics will begin its work on standards, data management and dissemination in metabolomics. Here, we outline these developments and show how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.
2 MetaboLights: a cross-species repository for metabolomics experiments
The European Bioinformatics Institute (EMBL-EBI) has recently launched MetaboLights, a database for metabolomics experiments and the associated metadata. It aims to become the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. The EBI ensures long-term stability and maintenance of the resource. Deposited datasets are assigned a stable identifier of the form MTBLS1 (the first dataset ever deposited in MetaboLights). These identifiers, like other stable identifiers in bioinformatics, can be used to mark datasets in publications or merge data in systems biology applications.
2.1 Call for submitting data
MetaboLights is now ready for receiving metabolomics datasets. We have, for example, recently received the validation dataset measured by O’Callaghan et al. for validating their PyMS software (O’Callaghan et al. 2012). We think that this is the way forward for sharing gold standard datasets for validating metabolomics software. Generally, we hope, and will work towards this with journal editors, that the submission of datasets used to justify findings in publications will be submitted to the MetaboLights or one of the emerging collaborating repositories. Interested readers are encouraged to go to http://www.ebi.ac.uk/metabolights/presubmit and submit their data. The MetaboLights team is happy to assist in this process.
3 Conclusion and outlook
Encoded in open standards to allow barrier-free and widespread analysis.
Tagged with a community-agreed, complete set of metadata (minimum information standard).
Supported by a communally developed set of open source data management and capturing tools.
Disseminated in open-access databases adhering to the above standards.
Supported by vendors and publishers, who require deposition upon publication
Properly interfaced with data in other biomedical and life science e-infrastructures (such as ELIXIR, BioMedBridges, EU-OPENSCREEN and BBMRI).
COSMOS will also strive to harmonize the European agenda with efforts in US, where the NIH is establishing 6–8 metabolomics services cores as well as a national metabolomics repository. Together with similar initiatives in Australia, Japan and hopefully more emerging over time, this opens the door for a global network of metabolomics data collection, exchange and dissemination.
The authors gratefully acknowledge funding of this work by the BBSRC MetaboLights Grant BB/I000933/1 and the European Commission COSMOS Grant EC312941. The authors are also extremely grateful to the participants of the initial metabolites planning workshops at the European Bioinformatics Institute (EMBL-EBI) as well as to those collaborators who contributed Metabolomics datasets in the early stages of the MetaboLights launch.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
- O’Callaghan, S., DeSouza, D. P., Isaac, A., Wang, Q., Hodkinson, L., Olshansky, M., Erwin, T., Appelbe, B., Tull, D. L., Roessner, U., Bacic, A., McConville, M. J., Likic, V. A. (2012). PyMS: a Python toolkit for processing of gas chromatography–mass spectrometry (GC–MS) data. Application and comparative study of selected tools. BMC Bioinformatics, 13(1), 115.PubMedGoogle Scholar
- Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., Fung, C., Nikolai, L., Lewis, M., Coutouly, M.-A., Forsythe, I., Tang, P., Shrivastava, S., Jeroncic, K., Stothard, P., Amegbey, G., Block, D., Hau, D. D., Wagner, J., Miniaci, J., Clements, M., Gebremedhin, M., Guo, N., Zhang, Y., Duggan, G. E., Macinnis, G. D., Weljie, A. M., Dowlatabadi, R., Bamforth, F., Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B. D., Vogel, H. J., Querengesser, L. (2007). HMDB: The human metabolome database. Nucleic Acids Research, 35(Database), D521–D526.PubMedCrossRefGoogle Scholar