, 14:7 | Cite as

Critical review of reporting of the data analysis step in metabolomics

  • E. C. Considine
  • G. Thomas
  • A. L. Boulesteix
  • A. S. Khashan
  • L. C. Kenny
Review Article



We present the first study to critically appraise the quality of reporting of the data analysis step in metabolomics studies since the publication of minimum reporting guidelines in 2007.


The aim of this study was to assess the standard of reporting of the data analysis step in metabolomics biomarker discovery studies and to investigate whether the level of detail supplied allows basic understanding of the steps employed and/or reuse of the protocol. For the purposes of this review we define the data analysis step to include the data pretreatment step and the actual data analysis step, which covers algorithm selection, univariate analysis and multivariate analysis.


We reviewed the literature to identify metabolomic studies of biomarker discovery that were published between January 2008 and December 2014. Studies were examined for completeness in reporting the various steps of the data pretreatment phase and data analysis phase and also for clarity of the workflow of these sections.


We analysed 27 papers, published anytime in 2008 until the end of 2014 in the area or biomarker discovery in serum metabolomics. The results of this review showed that the data analysis step in metabolomics biomarker discovery studies is plagued by unclear and incomplete reporting. Major omissions and lack of logical flow render the data analysis’ workflows in these studies impossible to follow and therefore replicate or even imitate.


While we await the holy grail of computational reproducibility in data analysis to become standard, we propose that, at a minimum, the data analysis section of metabolomics studies should be readable and interpretable without omissions such that a data analysis workflow diagram could be extrapolated from the study and therefore the data analysis protocol could be reused by the reader. That inconsistent and patchy reporting obfuscates reproducibility is a given. However even basic understanding and reuses of protocols are hampered by the low level of detail supplied in the data analysis sections of the studies that we reviewed.


Data analysis Reporting Minimum standards Biomarker discovery Metabolomics Guidelines 



This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI).

Compliance with ethical standards

Conflict of interest

We declare no competing financial interests.


  1. Amathieu, R., et al. (2011). Metabolomic approach by 1H NMR spectroscopy of serum for the assessment of chronic liver failure in patients with cirrhosis. Journal of Proteome Research, 10(7), 3239–3245.CrossRefPubMedGoogle Scholar
  2. Asiago, V. M., et al. (2010). Early detection of recurrent breast cancer using metabolite profiling. Cancer Research, 70(21), 8309–8318.CrossRefPubMedPubMedCentralGoogle Scholar
  3. Bertini, I., et al. (2012). Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Research, 72(1), 356–364.CrossRefPubMedGoogle Scholar
  4. Boulesteix, A.-L. H., Hornung, R., & Sauerbrei, W. (2017). On fishing for significance and statistician’sdegree of freedom in the era of big molecular data. In M. Ott, W. Pietsch & J. Wernecke (Eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data. Wiesbaden: Springer.Google Scholar
  5. Braaksma, M., et al. (2009). The effect of environmental conditions on extracellular protease activity in controlled fermentations of Aspergillus niger. Microbiology, 155(Pt 10), 3430–3439.CrossRefPubMedGoogle Scholar
  6. Brazma, A., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29(4), 365–371.CrossRefPubMedGoogle Scholar
  7. Brereton, R. G., & Lloyd, G. R. (2014). Partial least squares discriminant analysis: Taking the magic away. Journal of Chemometrics, 28(4), 213–225.CrossRefGoogle Scholar
  8. Chiarugi, A., et al. (2012). The NAD metabolome [mdash] a key determinant of cancer cell biology. Nature Reviews Cancer, 12(11), 741–752.CrossRefPubMedGoogle Scholar
  9. Dunn, W. B., et al. (2017). Quality assurance and quality control processes: Summary of a metabolomics community questionnaire. Metabolomics, 13(5), 50.CrossRefGoogle Scholar
  10. Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147–157.CrossRefPubMedGoogle Scholar
  11. Dutta, M., et al. (2012). A metabonomics approach as a means for identification of potential biomarkers for early diagnosis of endometriosis. Molecular BioSystems, 8(12), 3281–3287.CrossRefPubMedGoogle Scholar
  12. Farshidfar, F., et al. (2012). Serum metabolomic profile as a means to distinguish stage of colorectal cancer. Genome Medicine, 4(5), 42.CrossRefPubMedPubMedCentralGoogle Scholar
  13. Fiehn, O., et al. (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195–201.CrossRefGoogle Scholar
  14. Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3(3), 175–178.CrossRefGoogle Scholar
  15. Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165.CrossRefPubMedPubMedCentralGoogle Scholar
  16. Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267–276.CrossRefGoogle Scholar
  17. Godzien, J., et al. (2013). From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC-MS. Electrophoresis 34, 2812–2826.PubMedGoogle Scholar
  18. Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269–276.CrossRefPubMedGoogle Scholar
  19. Goodacre, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231–241.CrossRefGoogle Scholar
  20. Greenberg, N., et al. (2009). A proposed metabolic strategy for monitoring disease progression in Alzheimer’s disease. Electrophoresis, 30(7), 1235–1239.CrossRefPubMedGoogle Scholar
  21. Griffin, J. L., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics, 3(3), 179–188.CrossRefGoogle Scholar
  22. Gromski, P. S., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23.CrossRefPubMedGoogle Scholar
  23. Guan, W., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Hori, S., et al. (2011). A metabolomic approach to lung cancer. Lung Cancer, 74(2), 284–292.CrossRefPubMedGoogle Scholar
  25. Hrydziuszko, O., & Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics, 8(1), 161–174.CrossRefGoogle Scholar
  26. Jiang, Z., et al. (2011). A metabonomic approach applied to predict patients with cerebral infarction. Talanta, 84(2), 298–304.CrossRefPubMedGoogle Scholar
  27. Jin, H., et al. (2014). Serum metabolomic signatures of lymph node metastasis of esophageal squamous cell carcinoma. Journal of Proteome Research, 13(9), 4091–4103.CrossRefPubMedGoogle Scholar
  28. Jobard, E., et al. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33–41.CrossRefPubMedGoogle Scholar
  29. Johansen, K. K., et al. (2009). Metabolomic profiling in LRRK2-related Parkinson’s disease. PLoS ONE, 4(10), e7551.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Lin, L., et al. (2010). Direct infusion mass spectrometry or liquid chromatography mass spectrometry for human metabonomics? A serum metabonomic study of kidney cancer. Analyst, 135(11), 2970–2978.CrossRefPubMedGoogle Scholar
  31. Liu, Y., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658–668.CrossRefPubMedGoogle Scholar
  32. Lu, Y., et al. (2012). Serum metabolomics for the diagnosis and classification of myasthenia gravis. Metabolomics, 8(4), 704–713.CrossRefGoogle Scholar
  33. Mallett, S., et al. (2010). Reporting methods in studies developing prognostic models in cancer: A review. BMC Medicine, 8(1), 20.CrossRefPubMedPubMedCentralGoogle Scholar
  34. McShane, L. M., et al. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). British Journal of Cancer, 93(4), 387–391.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Metz, C.E. (2011). Metz ROC software at the University of Chicago.Google Scholar
  36. Michell, A. W., et al. (2008). Metabolomic analysis of urine and serum in Parkinson’s disease. Metabolomics, 4(3), 191.CrossRefGoogle Scholar
  37. Mickiewicz, B., et al. (2013). Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. American Journal of Respiratory and Critical Care Medicine, 187(9), 967–976.CrossRefPubMedPubMedCentralGoogle Scholar
  38. Morrison, N., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics, 3(3), 203–210.CrossRefGoogle Scholar
  39. Mousavi, M., et al. (2014). Serum metabolomic biomarkers of dementia. Dementia and Geriatric Cognitive Disorders Extra, 4(2), 252–262.CrossRefPubMedPubMedCentralGoogle Scholar
  40. Osborn, M. P., et al. (2013). Metabolome-wide association study of neovascular age-related macular degeneration. PLoS ONE, 8(8), e72737.CrossRefPubMedPubMedCentralGoogle Scholar
  41. Ouyang, X., et al. (2011). 1H NMR-based metabolomic study of metabolic profiling for systemic lupus erythematosus. Lupus, 20(13), 1411–1420.CrossRefPubMedGoogle Scholar
  42. Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30–32.CrossRefGoogle Scholar
  43. Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.CrossRefPubMedPubMedCentralGoogle Scholar
  44. Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783–789.CrossRefPubMedGoogle Scholar
  45. R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.Google Scholar
  46. Roberts, L. D., Koulman, A., & Griffin, J. L. (2014). Towards metabolic biomarkers of insulin resistance and type 2 diabetes: Progress from the metabolome. The Lancet Diabetes & Endocrinology, 2(1), 65–75.CrossRefGoogle Scholar
  47. Roede, J. R., et al. (2013). Serum metabolomics of slow vs. rapid motor progression Parkinson’s disease: A pilot study. PLoS ONE, 8(10), e77629.CrossRefPubMedPubMedCentralGoogle Scholar
  48. Rubtsov, D. V., et al. (2007). Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics, 3(3), 223–229.CrossRefGoogle Scholar
  49. Salek, R. M., et al. (2015). COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics, 11(6), 1587–1597.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Sangster, T., et al. (2006). A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst, 131(10), 1075–1078.CrossRefPubMedGoogle Scholar
  51. Shah, J. S., Brock, G. N., & Rai, S. N. (2015). Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics, 16(Suppl 15), P16–P16.CrossRefPubMedCentralGoogle Scholar
  52. Spicer, R., Salek, R., & Steinbeck, C. (2017). Compliance with minimum information guidelines in public metabolomics repositories. Scientific Data, 4, 17137.Google Scholar
  53. Steinbeck, C., et al. (2012). MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics, 8(5), 757–760.CrossRefPubMedPubMedCentralGoogle Scholar
  54. Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3(3), 211–221.CrossRefPubMedPubMedCentralGoogle Scholar
  55. Taylor, C. F., et al. (2007). The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology, 25(8), 887–893.CrossRefPubMedGoogle Scholar
  56. van den Berg, R. A., et al. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142–142.CrossRefPubMedPubMedCentralGoogle Scholar
  57. Vinaixa, M., et al. (2012). A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites, 2(4), 775–795.CrossRefPubMedPubMedCentralGoogle Scholar
  58. Walsh, B. H., et al. (2012). The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy. PLoS ONE, 7(12), e50520.CrossRefPubMedPubMedCentralGoogle Scholar
  59. Wang, J., et al. (2013). Metabolomic identification of diagnostic plasma biomarkers in humans with chronic heart failure. Molecular BioSystems, 9(11), 2618–2626.CrossRefPubMedGoogle Scholar
  60. Wei, C., et al. (2012). A metabonomics study of epilepsy in patients using gas chromatography coupled with mass spectrometry. Molecular Biosystems, 8(8), 2197–2204.CrossRefPubMedGoogle Scholar
  61. Weiner, J., et al. (2012). Biomarkers of inflammation, immunosuppression and stress with active disease are revealed by metabolomic profiling of tuberculosis patients. PLoS ONE, 7(7), e40221.CrossRefPubMedPubMedCentralGoogle Scholar
  62. Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.CrossRefPubMedPubMedCentralGoogle Scholar
  63. Williams, H. R. T., et al. (2012). Serum metabolic profiling in inflammatory bowel disease. Digestive Diseases and Sciences, 57(8), 2157–2165.CrossRefPubMedGoogle Scholar
  64. Wishart, D. S. (2016). Emerging applications of metabolomics in drug discovery and precision medicine. Nature Reviews Drug Discovery, 15(7), 473–484.CrossRefPubMedGoogle Scholar
  65. Xia, J., et al. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37(Web Server issue), W652–W660.CrossRefPubMedPubMedCentralGoogle Scholar
  66. Xia, J., et al. (2013). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9(2), 280–299.CrossRefPubMedGoogle Scholar
  67. Young, S. P., et al. (2013). The impact of inflammation on metabolomic profiles in patients with arthritis. Arthritis and Rheumatism, 65(8), 2015–2023.CrossRefPubMedPubMedCentralGoogle Scholar
  68. Zang, X., et al. (2014). Feasibility of detecting prostate cancer by ultraperformance liquid chromatography-mass spectrometry serum metabolomics. Journal of Proteome Research, 13(7), 3444–3454.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  • E. C. Considine
    • 1
  • G. Thomas
    • 2
  • A. L. Boulesteix
    • 3
  • A. S. Khashan
    • 1
    • 4
  • L. C. Kenny
    • 1
  1. 1.The Irish Centre for Fetal and Neonatal Translational Research (INFANT), Department of Obstetrics and GynaecologyUniversity College CorkCorkIreland
  2. 2.SQU4RERoeselareBelgium
  3. 3.Department of Medical Informatics, Biometry and EpidemiologyLMU MunichMunichGermany
  4. 4.Department of Epidemiology and Public HealthUniversity College CorkCorkIreland

Personalised recommendations