Abstract
From data acquisition to statistical analysis, metabolomics data need to undergo several processing steps, which are crucial for the data quality and interpretation of the results. In this chapter, methods for preprocessing, normalization, and pretreatment of metabolomics data generated from nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS) are presented and discussed. Preprocessing is reported for both NMR and MS analysis. The challenges in preprocessing such complex data are highlighted. Subsequently, normalization methods such as total area normalization, probabilistic quotient normalization, and quantile normalization are explained. Finally, several scaling and data transformation methods are discussed for metabolomics data pretreatment, which is an important step prior to statistical analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ANOVA:
-
Analysis of variance
- CPMG:
-
Carr-Purcell-Meiboom-Gill
- GC:
-
Gas chromatography
- glog:
-
Generalized log
- LC:
-
Liquid chromatography
- LOESS:
-
Locally estimated smoothing
- m/z :
-
Mass-to-charge ratio
- MS:
-
Mass spectrometry
- NMR:
-
Nuclear magnetic resonance spectroscopy
- PCA:
-
Principal component analysis
- PLSR:
-
Partial least squares regression
- R2 :
-
Linear regression coefficient
- RSD:
-
Relative standard deviation
- RT:
-
Retention time
- QCs:
-
Quality control samples
- TSP:
-
3-trimethylsilylpropionic acid
References
Emwas A-HM, Salek RM, Griffin JL, Merzaban J. NMR-based metabolomics in human disease diagnosis: applications, limitations, and recommendations. Metabolomics. 2013;9(5):1048–72. doi:10.1007/s11306-013-0524-y.
Robertson DG, Watkins PB, Reily MD. Metabolomics in toxicology: preclinical and clinical applications. Toxicol Sci. 2011;120(Suppl1):S146–S70. doi:10.1093/toxsci/kfq358.
Vermeersch KA, Styczynski MP. Applications of metabolomics in cancer research. J Carcinog. 2013;12:9. doi:10.4103/1477-3163.113622.
Yin P, Xu G. Current state-of-the-art of nontargeted metabolomics based on liquid chromatography–mass spectrometry with special emphasis in clinical applications. J Chromatogr A. 2014;1374:1–13. doi:http://dx.doi.org/10.1016/j.chroma.2014.11.050.
Lacy P, McKay RT, Finkel M, Karnovsky A, Woehler S, Lewis MJ, et al. Signal intensities derived from different NMR probes and parameters contribute to variations in quantification of metabolites. PLoS One. 2014;9(1):e85732. doi:10.1371/journal.pone.0085732.
Gika HG, Theodoridis GA, Wingate JE, Wilson ID. Within-day reproducibility of an HPLC − MS-based method for metabonomic analysis: application to human urine. J Proteome Res. 2007;6(8):3291–303. doi:10.1021/pr070183p.
Pan Z, Raftery D. Comparing and combining NMR spectroscopy and mass spectrometry in metabolomics. Anal Bioanal Chem. 2007;387(2):525–7. doi:10.1007/s00216-006-0687-8.
Lewis MR, Pearce JTM, Spagou K, Green M, Dona AC, Yuen AHY, et al. Development and application of ultra-performance liquid chromatography-TOF MS for precision large scale urinary metabolic phenotyping. Anal Chem. 2016. doi:10.1021/acs.analchem.6b01481.
Dona AC, Jiménez B, Schäfer H, Humpfer E, Spraul M, Lewis MR, et al. Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal Chem. 2014;86(19):9887–94. doi:10.1021/ac5025039.
Henry VJ, Bandrowski AE, Pepin A-S, Gonzalez BJ, Desfeux A. OMICtools: an informative directory for multi-omic data analysis. Database. 2014. doi:10.1093/database/bau069.
Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779–87.
Pluskal T, Castillo S, Villar-Briones A, Oresic M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf. 2010;11:395.
Hrydziuszko O, Viant MR. Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics. 2012;8(1):161–74. doi:10.1007/s11306-011-0366-4.
Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc. 2011;6(7):1060–83. doi:http://www.nature.com/nprot/journal/v6/n7/abs/nprot.2011.335.html#supplementary-information.
Kamleh MA, Ebbels TMD, Spagou K, Masson P, Want EJ. Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. Anal Chem. 2012;84(6):2670–7. doi:10.1021/ac202733q.
Fernández-Albert F, Llorach R, Garcia-Aloy M, Ziyatdinov A, Andres-Lacueva C, Perera A. Intensity drift removal in LC/MS metabolomics by common variance compensation. Bioinformatics. 2014. doi:10.1093/bioinformatics/btu423.
Kirwan JA, Broadhurst DI, Davidson RL, Viant MR. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal Bioanal Chem. 2013;405(15):5147–57. doi:10.1007/s00216-013-6856-7.
Eliasson M, Rännar S, Madsen R, Donten MA, Marsden-Edwards E, Moritz T, et al. Strategy for optimizing LC-MS data processing in metabolomics: a design of experiments approach. Anal Chem. 2012;84(15):6869–76. doi:10.1021/ac301482k.
Veselkov KA, Lindon JC, Ebbels TMD, Crockford D, Volynkin VV, Holmes E, et al. Recursive segment-wise peak alignment of biological 1H NMR spectra for improved metabolic biomarker recovery. Anal Chem. 2009;81(1):56–66. doi:10.1021/ac8011544.
Savorani F, Tomasi G, Engelsen SB. icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. J Magn Reson. 2010;202(2):190–202. doi:http://dx.doi.org/10.1016/j.jmr.2009.11.012.
Wong JWH, Durante C, Cartwright HM. Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. Anal Chem. 2005;77(17):5655–61. doi:10.1021/ac050619p.
Blaise BJ, Shintu L, Elena B, Emsley L, Dumas M-E, Toulhoat P. Statistical recoupling prior to significance testing in nuclear magnetic resonance based metabonomics. Anal Chem. 2009;81(15):6242–51. doi:10.1021/ac9007754.
Sousa SAA, Magalhães A, Ferreira MMC. Optimized bucketing for NMR spectra: Three case studies. Chemom Intell Lab Syst. 2013;122:93–102. doi:http://dx.doi.org/10.1016/j.chemolab.2013.01.006.
Hao J, Liebeke M, Astle W, De Iorio M, Bundy JG, Ebbels TMD. Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN. Nat Protoc. 2014;9(6):1416–27.
Dieterle F, Ross A, Schlotterbeck G, Senn H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal Chem. 2006;78(13):4281–90. doi:10.1021/ac051632c.
Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–93. doi:10.1093/bioinformatics/19.2.185.
Veselkov KA, Vingara LK, Masson P, Robinette SL, Want E, Li JV, et al. Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Anal Chem. 2011;83(15):5864–72. doi:10.1021/ac201065j.
Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M. Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinf. 2007;8(1):1–17. doi:10.1186/1471-2105-8-93.
Bijlsma S, Bobeldijk I, Verheij ER, Ramaker R, Kochhar S, Macdonald IA, et al. Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Anal Chem. 2006;78(2):567–74.
van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142.
Bro R, Smilde AK. Centering and scaling in component analysis. J Chemom. 2003;17(1):16–33.
Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, et al. Structure-revealing data fusion. BMC Bioinf. 2014;15(1):1–17. doi:10.1186/1471-2105-15-239.
Parsons HM, Ludwig C, Günther UL, Viant MR. Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinf. 2007;8(1):1–16. doi:10.1186/1471-2105-8-234.
Acknowledgements
The author thanks Rui Pinto for helpful discussions in the preparation of this book chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Karaman, I. (2017). Preprocessing and Pretreatment of Metabolomics Data for Statistical Analysis. In: Sussulini, A. (eds) Metabolomics: From Fundamentals to Clinical Applications. Advances in Experimental Medicine and Biology(), vol 965. Springer, Cham. https://doi.org/10.1007/978-3-319-47656-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-47656-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47655-1
Online ISBN: 978-3-319-47656-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)