Metabolomics

, Volume 8, Supplement 1, pp 161–174

Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline

Original Article

DOI: 10.1007/s11306-011-0366-4

Cite this article as:
Hrydziuszko, O. & Viant, M.R. Metabolomics (2012) 8: 161. doi:10.1007/s11306-011-0366-4
  • 1.4k Views

Abstract

Missing values in mass spectrometry metabolomic datasets occur widely and can originate from a number of sources, including for both technical and biological reasons. Currently, little is known about these data, i.e. about their distributions across datasets, the need (or not) to consider them in the data processing pipeline, and most importantly, the optimal way of assigning them values prior to univariate or multivariate data analysis. Here, we address all of these issues using direct infusion Fourier transform ion cyclotron resonance mass spectrometry data. We have shown that missing data are widespread, accounting for ca. 20% of data and affecting up to 80% of all variables, and that they do not occur randomly but rather as a function of signal intensity and mass-to-charge ratio. We have demonstrated that missing data estimation algorithms have a major effect on the outcome of data analysis when comparing the differences between biological sample groups, including by t test, ANOVA and principal component analysis. Furthermore, results varied significantly across the eight algorithms that we assessed for their ability to impute known, but labelled as missing, entries. Based on all of our findings we identified the k-nearest neighbour imputation method (KNN) as the optimal missing value estimation approach for our direct infusion mass spectrometry datasets. However, we believe the wider significance of this study is that it highlights the importance of missing metabolite levels in the data processing pipeline and offers an approach to identify optimal ways of treating missing data in metabolomics experiments.

Keywords

FT-ICRMetabolic profilingMissing dataMissing entriesSignal processing

Supplementary material

11306_2011_366_MOESM1_ESM.doc (2.9 mb)
Supplementary material 1 (DOC 2993 kb)

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Centre for Systems BiologyUniversity of BirminghamEdgbaston, BirminghamUK
  2. 2.School of BiosciencesUniversity of BirminghamEdgbaston, BirminghamUK