Skip to main content

Preprocessing and Pretreatment of Metabolomics Data for Statistical Analysis

  • Chapter
  • First Online:
Metabolomics: From Fundamentals to Clinical Applications

Part of the book series: Advances in Experimental Medicine and Biology ((PMISB,volume 965))

Abstract

From data acquisition to statistical analysis, metabolomics data need to undergo several processing steps, which are crucial for the data quality and interpretation of the results. In this chapter, methods for preprocessing, normalization, and pretreatment of metabolomics data generated from nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS) are presented and discussed. Preprocessing is reported for both NMR and MS analysis. The challenges in preprocessing such complex data are highlighted. Subsequently, normalization methods such as total area normalization, probabilistic quotient normalization, and quantile normalization are explained. Finally, several scaling and data transformation methods are discussed for metabolomics data pretreatment, which is an important step prior to statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ANOVA:

Analysis of variance

CPMG:

Carr-Purcell-Meiboom-Gill

GC:

Gas chromatography

glog:

Generalized log

LC:

Liquid chromatography

LOESS:

Locally estimated smoothing

m/z :

Mass-to-charge ratio

MS:

Mass spectrometry

NMR:

Nuclear magnetic resonance spectroscopy

PCA:

Principal component analysis

PLSR:

Partial least squares regression

R2 :

Linear regression coefficient

RSD:

Relative standard deviation

RT:

Retention time

QCs:

Quality control samples

TSP:

3-trimethylsilylpropionic acid

References

  1. Emwas A-HM, Salek RM, Griffin JL, Merzaban J. NMR-based metabolomics in human disease diagnosis: applications, limitations, and recommendations. Metabolomics. 2013;9(5):1048–72. doi:10.1007/s11306-013-0524-y.

    Article  CAS  Google Scholar 

  2. Robertson DG, Watkins PB, Reily MD. Metabolomics in toxicology: preclinical and clinical applications. Toxicol Sci. 2011;120(Suppl1):S146–S70. doi:10.1093/toxsci/kfq358.

    Article  CAS  PubMed  Google Scholar 

  3. Vermeersch KA, Styczynski MP. Applications of metabolomics in cancer research. J Carcinog. 2013;12:9. doi:10.4103/1477-3163.113622.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Yin P, Xu G. Current state-of-the-art of nontargeted metabolomics based on liquid chromatography–mass spectrometry with special emphasis in clinical applications. J Chromatogr A. 2014;1374:1–13. doi:http://dx.doi.org/10.1016/j.chroma.2014.11.050.

  5. Lacy P, McKay RT, Finkel M, Karnovsky A, Woehler S, Lewis MJ, et al. Signal intensities derived from different NMR probes and parameters contribute to variations in quantification of metabolites. PLoS One. 2014;9(1):e85732. doi:10.1371/journal.pone.0085732.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Gika HG, Theodoridis GA, Wingate JE, Wilson ID. Within-day reproducibility of an HPLC − MS-based method for metabonomic analysis: application to human urine. J Proteome Res. 2007;6(8):3291–303. doi:10.1021/pr070183p.

    Article  CAS  PubMed  Google Scholar 

  7. Pan Z, Raftery D. Comparing and combining NMR spectroscopy and mass spectrometry in metabolomics. Anal Bioanal Chem. 2007;387(2):525–7. doi:10.1007/s00216-006-0687-8.

    Article  CAS  PubMed  Google Scholar 

  8. Lewis MR, Pearce JTM, Spagou K, Green M, Dona AC, Yuen AHY, et al. Development and application of ultra-performance liquid chromatography-TOF MS for precision large scale urinary metabolic phenotyping. Anal Chem. 2016. doi:10.1021/acs.analchem.6b01481.

    Google Scholar 

  9. Dona AC, Jiménez B, Schäfer H, Humpfer E, Spraul M, Lewis MR, et al. Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal Chem. 2014;86(19):9887–94. doi:10.1021/ac5025039.

    Article  CAS  PubMed  Google Scholar 

  10. Henry VJ, Bandrowski AE, Pepin A-S, Gonzalez BJ, Desfeux A. OMICtools: an informative directory for multi-omic data analysis. Database. 2014. doi:10.1093/database/bau069.

    PubMed  PubMed Central  Google Scholar 

  11. Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779–87.

    Article  CAS  PubMed  Google Scholar 

  12. Pluskal T, Castillo S, Villar-Briones A, Oresic M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf. 2010;11:395.

    Article  Google Scholar 

  13. Hrydziuszko O, Viant MR. Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics. 2012;8(1):161–74. doi:10.1007/s11306-011-0366-4.

    Article  CAS  Google Scholar 

  14. Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc. 2011;6(7):1060–83. doi:http://www.nature.com/nprot/journal/v6/n7/abs/nprot.2011.335.html#supplementary-information.

  15. Kamleh MA, Ebbels TMD, Spagou K, Masson P, Want EJ. Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. Anal Chem. 2012;84(6):2670–7. doi:10.1021/ac202733q.

    Article  CAS  PubMed  Google Scholar 

  16. Fernández-Albert F, Llorach R, Garcia-Aloy M, Ziyatdinov A, Andres-Lacueva C, Perera A. Intensity drift removal in LC/MS metabolomics by common variance compensation. Bioinformatics. 2014. doi:10.1093/bioinformatics/btu423.

    PubMed Central  Google Scholar 

  17. Kirwan JA, Broadhurst DI, Davidson RL, Viant MR. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal Bioanal Chem. 2013;405(15):5147–57. doi:10.1007/s00216-013-6856-7.

    Article  CAS  PubMed  Google Scholar 

  18. Eliasson M, Rännar S, Madsen R, Donten MA, Marsden-Edwards E, Moritz T, et al. Strategy for optimizing LC-MS data processing in metabolomics: a design of experiments approach. Anal Chem. 2012;84(15):6869–76. doi:10.1021/ac301482k.

    Article  CAS  PubMed  Google Scholar 

  19. Veselkov KA, Lindon JC, Ebbels TMD, Crockford D, Volynkin VV, Holmes E, et al. Recursive segment-wise peak alignment of biological 1H NMR spectra for improved metabolic biomarker recovery. Anal Chem. 2009;81(1):56–66. doi:10.1021/ac8011544.

    Article  CAS  PubMed  Google Scholar 

  20. Savorani F, Tomasi G, Engelsen SB. icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. J Magn Reson. 2010;202(2):190–202. doi:http://dx.doi.org/10.1016/j.jmr.2009.11.012.

  21. Wong JWH, Durante C, Cartwright HM. Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. Anal Chem. 2005;77(17):5655–61. doi:10.1021/ac050619p.

    Article  CAS  PubMed  Google Scholar 

  22. Blaise BJ, Shintu L, Elena B, Emsley L, Dumas M-E, Toulhoat P. Statistical recoupling prior to significance testing in nuclear magnetic resonance based metabonomics. Anal Chem. 2009;81(15):6242–51. doi:10.1021/ac9007754.

    Article  CAS  PubMed  Google Scholar 

  23. Sousa SAA, Magalhães A, Ferreira MMC. Optimized bucketing for NMR spectra: Three case studies. Chemom Intell Lab Syst. 2013;122:93–102. doi:http://dx.doi.org/10.1016/j.chemolab.2013.01.006.

  24. Hao J, Liebeke M, Astle W, De Iorio M, Bundy JG, Ebbels TMD. Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN. Nat Protoc. 2014;9(6):1416–27.

    Article  CAS  PubMed  Google Scholar 

  25. Dieterle F, Ross A, Schlotterbeck G, Senn H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal Chem. 2006;78(13):4281–90. doi:10.1021/ac051632c.

    Article  CAS  PubMed  Google Scholar 

  26. Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–93. doi:10.1093/bioinformatics/19.2.185.

    Article  CAS  PubMed  Google Scholar 

  27. Veselkov KA, Vingara LK, Masson P, Robinette SL, Want E, Li JV, et al. Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Anal Chem. 2011;83(15):5864–72. doi:10.1021/ac201065j.

    Article  CAS  PubMed  Google Scholar 

  28. Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M. Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinf. 2007;8(1):1–17. doi:10.1186/1471-2105-8-93.

    Article  Google Scholar 

  29. Bijlsma S, Bobeldijk I, Verheij ER, Ramaker R, Kochhar S, Macdonald IA, et al. Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Anal Chem. 2006;78(2):567–74.

    Article  CAS  PubMed  Google Scholar 

  30. van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Bro R, Smilde AK. Centering and scaling in component analysis. J Chemom. 2003;17(1):16–33.

    Article  CAS  Google Scholar 

  32. Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, et al. Structure-revealing data fusion. BMC Bioinf. 2014;15(1):1–17. doi:10.1186/1471-2105-15-239.

    Article  Google Scholar 

  33. Parsons HM, Ludwig C, Günther UL, Viant MR. Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinf. 2007;8(1):1–16. doi:10.1186/1471-2105-8-234.

    Article  Google Scholar 

Download references

Acknowledgements

The author thanks Rui Pinto for helpful discussions in the preparation of this book chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Karaman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Karaman, I. (2017). Preprocessing and Pretreatment of Metabolomics Data for Statistical Analysis. In: Sussulini, A. (eds) Metabolomics: From Fundamentals to Clinical Applications. Advances in Experimental Medicine and Biology(), vol 965. Springer, Cham. https://doi.org/10.1007/978-3-319-47656-8_6

Download citation

Publish with us

Policies and ethics