Skip to main content

Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data

  • Protocol
  • First Online:
High-Throughput Metabolomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1978))

Abstract

Metabolomics is the science of characterizing and quantifying small molecule metabolites in biological systems. These metabolites give organisms their biochemical characteristics, providing a link between genotype, environment, and phenotype. With these opportunities also come data challenges, such as compound annotation, missing values, and batch effects. We present the steps of a general pipeline to process untargeted mass spectrometry data to alleviate the latter two challenges. We assume to have a matrix with metabolite abundances, with metabolites in rows and samples in columns. The steps in the pipeline include summarizing technical replicates (if available), filtering, imputing, transforming, and normalizing the data. In each of these steps, a method and parameters should be chosen based on assumptions one is willing to make, the question of interest, and diagnostic tools. Besides giving a general pipeline that can be adapted by the reader, our goal is to review diagnostic tools and criteria that are helpful when making decisions in each step of the pipeline and assessing the effectiveness of normalization and batch correction. We conclude by giving a list of useful packages and discuss some alternative approaches that might be more appropriate for the reader’s data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jordan KW, Nordenstam J, Lauwers GY, Rothenberger DA, Alavi K, Garwood M, Cheng LL (2009) Metabolomic characterization of human rectal adenocarcinoma with intact tissue magnetic resonance spectroscopy. Dis Colon Rectum 52(3):520–525. https://doi.org/10.1007/DCR.0b013e31819c9a2c. PubMed PMID: 00003453-200903000-00024

    Article  PubMed  PubMed Central  Google Scholar 

  2. Spratlin JL, Serkova NJ, Eckhardt SG (2009) Clinical applications of metabolomics in oncology: a review. Clin Cancer Res 15(2):431

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Griffin JL, Shockcor JP (2004) Metabolic profiles of cancer cells. Nat Rev Cancer 4:551. https://doi.org/10.1038/nrc1390

    Article  CAS  PubMed  Google Scholar 

  4. Mendes P, Kell DB, Westerhoff HV (1996) Why and when channelling can decrease pool size at constant net flux in a simple dynamic channel. Biochim Biophys Acta 1289(2):175–186. https://doi.org/10.1016/0304-4165(95)00152-2

    Article  PubMed  Google Scholar 

  5. Mendes P, Kell DB, Westerhoff HV (2005) Channelling can decrease pool size. Eur J Biochem 204(1):257–266. https://doi.org/10.1111/j.1432-1033.1992.tb16632.x

    Article  Google Scholar 

  6. Boros LG, Lerner MR, Morgan DL, Taylor SL, Smith BJ, Postier RG, Brackett DJ (2005) [1,2-13C2]-D-glucose profiles of the serum, liver, pancreas, and DMBA-induced pancreatic tumors of rats. Pancreas 31:4

    Google Scholar 

  7. El-Deredy W, Ashmore SM, Branston NM, Darling JL, Williams SR, Thomas DGT (1997) Pretreatment prediction of the chemotherapeutic response of human glioma cell cultures using nuclear magnetic resonance spectroscopy and artificial neural networks. Cancer Res 57(19):4196

    CAS  PubMed  Google Scholar 

  8. Griffin JL, Pole JCM, Nicholson JK, Carmichael PL (2003) Cellular environment of metabolites and a metabonomic study of tamoxifen in endometrial cells using gradient high resolution magic angle spinning 1H NMR spectroscopy. Biochim Biophys Acta 1619(2):151–158. https://doi.org/10.1016/S0304-4165(02)00475-0

    Article  CAS  PubMed  Google Scholar 

  9. Bahr TM, Hughes GJ, Armstrong M, Reisdorph R, Coldren CD, Edwards MG, Schnell C, Kedl R, LaFlamme DJ, Reisdorph N, Kechris KJ, Bowler RP (2013) Peripheral blood mononuclear cell gene expression in chronic obstructive pulmonary disease. Am J Respir Cell Mol Biol 49(2):316–323. https://doi.org/10.1165/rcmb.2012-0230OC. PubMed PMID: 23590301; PMCID: PMC3824029

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Bowler RP, Jacobson S, Cruickshank C, Hughes GJ, Siska C, Ory DS, Petrache I, Schaffer JE, Reisdorph N, Kechris K (2015) Plasma sphingolipids associated with chronic obstructive pulmonary disease phenotypes. Am J Respir Crit Care Med 191(3):275–284. https://doi.org/10.1164/rccm.201410-1771OC. PubMed PMID: 25494452; PMCID: PMC4351578

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Roberts LD, Souza AL, Gerszten RE, Clish CB (2012) Targeted metabolomics. Curr Protoc Mol Biol Chapter 30:Unit30.2. https://doi.org/10.1002/0471142727.mb3002s98

    Article  Google Scholar 

  12. Gowda GAN, Raftery D (2017) Recent advances in nmr-based metabolomics. Anal Chem 89(1):490–510. https://doi.org/10.1021/acs.analchem.6b04420

    Article  CAS  Google Scholar 

  13. Markley JL, Brüschweiler R, Edison AS, Eghbalnia HR, Powers R, Raftery D, Wishart DS (2017) The future of NMR-based metabolomics. Curr Opin Biotechnol 43:34–40. https://doi.org/10.1016/j.copbio.2016.08.001

    Article  CAS  PubMed  Google Scholar 

  14. Gowda GAN, Djukovic D (2014) Overview of mass spectrometry-based metabolomics: opportunities and challenges. Methods Mol Biol 1198:3–12. https://doi.org/10.1007/978-1-4939-1258-2_1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037

    Article  PubMed  Google Scholar 

  16. Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3(9):e161. https://doi.org/10.1371/journal.pgen.0030161

    Article  CAS  PubMed Central  Google Scholar 

  17. Leek JT, Storey JD (2008) A general framework for multiple testing dependence. Proc Natl Acad Sci 105(48):18718

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Fernández-Albert F, Llorach R, Garcia-Aloy M, Ziyatdinov A, Andres-Lacueva C, Perera A (2014) Intensity drift removal in LC/MS metabolomics by common variance compensation. Bioinformatics 30(20):2899–2905. https://doi.org/10.1093/bioinformatics/btu423

    Article  CAS  PubMed  Google Scholar 

  19. Redestig H, Fukushima A, Stenlund H, Moritz T, Arita M, Saito K, Kusano M (2009) Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. Anal Chem 81(19):7974–7980. https://doi.org/10.1021/ac901143w

    Article  CAS  PubMed  Google Scholar 

  20. Reisetter AC, Muehlbauer MJ, Bain JR, Nodzenski M, Stevens RD, Ilkayeva O, Metzger BE, Newgard CB, Lowe WL Jr, Scholtens DM (2017) Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data. BMC Bioinformatics 18(1):84. https://doi.org/10.1186/s12859-017-1501-7. PubMed PMID: 28153035; PMCID: PMC5290663

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Nodzenski M, Muehlbauer MJ, Bain JR, Reisetter AC, Lowe WL, Scholtens DM (2014) Metabomxtr: an R package for mixture-model analysis of non-targeted metabolomics data. Bioinformatics 30(22):3287–3288. https://doi.org/10.1093/bioinformatics/btu509

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Snyder LR, Kirkland JJ, Dolan JW (2010) Introduction to modern liquid chromatography, 3rd edn. Wiley, Hoboken, NJ

    Google Scholar 

  23. Åberg KM, Alm E, Torgrip RJO (2009) The correspondence problem for metabonomics datasets. Anal Bioanal Chem 394(1):151–162. https://doi.org/10.1007/s00216-009-2628-9

    Article  CAS  PubMed  Google Scholar 

  24. Zhou B, Xiao JF, Tuli L, Ressom HW (2012) LC-MS-based metabolomics. Mol BioSyst 8(2):470–481. https://doi.org/10.1039/c1mb05350g

    Article  CAS  PubMed  Google Scholar 

  25. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, Curran-Everett D, Silverman EK, Crapo JD (2010) Genetic epidemiology of COPD (COPDGene) study design. COPD 7(1):32–43. https://doi.org/10.3109/15412550903499522. PubMed PMID: 20214461; PMCID: PMC2924193

    Article  PubMed  Google Scholar 

  26. Petrache I, Petrusca DN, Bowler RP, Kamocki K (2011) Involvement of ceramide in cell death responses in the pulmonary circulation. Proc Am Thorac Soc 8(6):492–496. https://doi.org/10.1513/pats.201104-034MW. PubMed PMID: 22052925; PMCID: PMC3359077

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ahmed FS, Jiang XC, Schwartz JE, Hoffman EA, Yeboah J, Shea S, Burkart KM, Barr RG (2014) Plasma sphingomyelin and longitudinal change in percent emphysema on CT. The MESA lung study. Biomarkers 19(3):207–213. https://doi.org/10.3109/1354750X.2014.896414. PubMed PMID: 24649875; PMCID: PMC4088962

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hughes G, Cruickshank-Quinn C, Reisdorph R, Lutz S, Petrache I, Reisdorph N, Bowler R, Kechris K (2014) MSPrep—summarization, normalization and diagnostics for processing of mass spectrometry-based metabolomic data. Bioinformatics 30(1):133–134. https://doi.org/10.1093/bioinformatics/btt589

    Article  CAS  PubMed  Google Scholar 

  29. Ejigu BA, Valkenborg D, Baggerman G, Vanaerschot M, Witters E, Dujardin JC, Burzykowski T, Berg M (2013) Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. OMICS 17(9):473–485. https://doi.org/10.1089/omi.2013.0010. PubMed PMID: 23808607; PMCID: PMC3760460

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Han TL, Yang Y, Zhang H, Law KP (2017) Analytical challenges of untargeted GC-MS-based metabolomics and the critical issues in selecting the data processing strategy. F1000Res 6:967. https://doi.org/10.12688/f1000research.11823.1. PubMed PMID: 28868138; PMCID: PMC5553085

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Chen J, Zhang P, Lv M, Guo H, Huang Y, Zhang Z, Xu F (2017) Influences of normalization method on biomarker discovery in gas chromatography-mass spectrometry-based untargeted metabolomics: what should be considered? Anal Chem 89(10):5342–5348. https://doi.org/10.1021/acs.analchem.6b05152

    Article  CAS  PubMed  Google Scholar 

  32. Di Guida R, Engel J, Allwood JW, Weber RJ, Jones MR, Sommer U, Viant MR, Dunn WB (2016) Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 12:93. https://doi.org/10.1007/s11306-016-1030-9. PubMed PMID: 27123000; PMCID: PMC4831991

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Hrydziuszko O, Viant MR (2012) Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics 8(1):161–174. https://doi.org/10.1007/s11306-011-0366-4

    Article  CAS  Google Scholar 

  34. Wei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y (2018) Missing value imputation approach for mass spectrometry-based metabolomics data. Sci Rep 8(1):663. https://doi.org/10.1038/s41598-017-19120-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Han J, Danell RM, Patel JR, Gumerov DR, Scarlett CO, Speir JP, Parker CE, Rusyn I, Zeisel S, Borchers CH (2008) Towards high-throughput metabolomics using ultrahigh-field Fourier transform ion cyclotron resonance mass spectrometry. Metabolomics 4(2):128–140. https://doi.org/10.1007/s11306-008-0104-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Payne TG, Southam AD, Arvanitis TN, Viant MR (2009) A signal filtering method for improved quantification and noise discrimination in Fourier transform ion cyclotron resonance mass spectrometry-based metabolomics data. J Am Soc Mass Spectrom 20(6):1087–1095. https://doi.org/10.1016/j.jasms.2009.02.001

    Article  CAS  PubMed  Google Scholar 

  37. Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393. https://doi.org/10.1136/bmj.b2393

    Article  PubMed  PubMed Central  Google Scholar 

  38. Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    Google Scholar 

  39. Bijlsma S, Bobeldijk I, Verheij ER, Ramaker R, Kochhar S, Macdonald IA, van Ommen B, Smilde AK (2006) Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Anal Chem 78(2):567–574. https://doi.org/10.1021/ac051495j

    Article  CAS  PubMed  Google Scholar 

  40. Kowarik A, Templ M (2016) Imputation with the R package VIM. J Stat Software 74(7):16. https://doi.org/10.18637/jss.v074.i07

    Article  Google Scholar 

  41. Oba S, Sato M-a, Takemasa I, Monden M, Matsubara K-i, Ishii S (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096. https://doi.org/10.1093/bioinformatics/btg287

    Article  CAS  PubMed  Google Scholar 

  42. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  43. van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7:142. https://doi.org/10.1186/1471-2164-7-142

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28(6):882–883. https://doi.org/10.1093/bioinformatics/bts034

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gagnon-Bartsch JA, Speed TP (2012) Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3):539–552. https://doi.org/10.1093/biostatistics/kxr034. PubMed PMID: 22101192; PMCID: PMC3577104

    Article  PubMed  PubMed Central  Google Scholar 

  46. Gandolfo LC, Speed TP (2018) RLE plots: visualising unwanted variation in high dimensional data. PLoS One 13(2):e0191629

    PubMed  PubMed Central  Google Scholar 

  47. Bolstad BM, Collin F, Brettschneider J, Simpson K, Cope L, Irizarry RA, Speed TP (2005) Quality assessment of affymetrix GeneChip data. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, NY, pp 33–47

    Google Scholar 

  48. Brettschneider J, Collin F, Bolstad BM, Speed TP (2008) Quality assessment for short oligonucleotide microarray data. Technometrics 50(3):241–264. https://doi.org/10.1198/004017008000000334

    Article  Google Scholar 

  49. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. https://doi.org/10.1002/wics.101

    Article  Google Scholar 

  50. Dunn WB, Overy S, Quick WP (2005) Evaluation of automated electrospray-TOF mass spectrometryfor metabolic fingerprinting of the plant metabolome. Metabolomics 1(2):137–148. https://doi.org/10.1007/s11306-005-4433-6

    Article  CAS  Google Scholar 

  51. Overy SA, Walker HJ, Malone S, Howard TP, Baxter CJ, Sweetlove LJ, Hill SA, Quick WP (2005) Application of metabolite profiling to the identification of traits in a population of tomato introgression lines. J Exp Bot 56(410):287–296. https://doi.org/10.1093/jxb/eri070

    Article  CAS  PubMed  Google Scholar 

  52. Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78(3):779–787. https://doi.org/10.1021/ac051437y

    Article  CAS  PubMed  Google Scholar 

  53. Xia J, Wishart David S (2016) Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr Protoc Bioinformatics 55(1):14.0.1–14.0.91. https://doi.org/10.1002/cpbi.11

    Article  Google Scholar 

  54. Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN (2017) Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics. 18:114. https://doi.org/10.1186/s12859-017-1547-6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Box GEP, Cox DR (1964) An Analysis of Transformations. J R Stat Soc Series B 26(2):211–252

    Google Scholar 

  56. Bojko A (ed) (2009) Informative or misleading? Heatmaps deconstructed. Human-computer interaction new trends. Springer, Berlin

    Google Scholar 

  57. Risso D, Schwartz K, Sherlock G, Dudoit S (2011) GC-content normalization for RNA-Seq data. BMC Bioinformatics 12(1):480. https://doi.org/10.1186/1471-2105-12-480

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kassambara A, Mundt F (2017) Factoextra: extract and visualize the results of multivariate data analyses. https://cran.r-project.org/web/packages/factoextra/index.html

  59. Stacklies W, Redestig H, Scholz M, Walther D, Selbig J (2007) pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23(9):1164–1167. https://doi.org/10.1093/bioinformatics/btm069

    Article  CAS  PubMed  Google Scholar 

  60. Bishop CM (ed) (1999) Variational principal components. 1999 ninth international conference on artificial neural networks ICANN 99 (Conf Publ No 470)

    Google Scholar 

  61. Karpievitch YV, Nikolic SB, Wilson R, Sharman JE, Edwards LM (2015) Metabolomics data normalization with EigenMS. PLoS One 9(12):e116221. https://doi.org/10.1371/journal.pone.0116221

    Article  CAS  Google Scholar 

  62. Karpievitch YV, Taverner T, Adkins JN, Callister SJ, Anderson GA, Smith RD, Dabney AR (2009) Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics 25(19):2573–2580. https://doi.org/10.1093/bioinformatics/btp426

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Calderón-Santiago M, López-Bascón MA, Peralbo-Molina Á, Priego-Capote F (2017) MetaboQC: a tool for correcting untargeted metabolomics data with mass spectrometry detection using quality controls. Talanta 174:29–37. https://doi.org/10.1016/j.talanta.2017.05.076

    Article  CAS  PubMed  Google Scholar 

  64. De Livera AM, Dias DA, De Souza D, Rupasinghe T, Pyke J, Tull D, Roessner U, McConville M, Speed TP (2012) Normalizing and integrating metabolomics data. Anal Chem 84(24):10768–10776. https://doi.org/10.1021/ac302748b

    Article  CAS  PubMed  Google Scholar 

  65. De Livera AM, Sysi-Aho M, Jacob L, Gagnon-Bartsch JA, Castillo S, Simpson JA, Speed TP (2015) Statistical methods for handling unwanted variation in metabolomics data. Anal Chem 87(7):3606–3615. https://doi.org/10.1021/ac502439y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Shen X, Gong X, Cai Y, Guo Y, Tu J, Li H, Zhang T, Wang J, Xue F, Zhu Z-J (2016) Normalization and integration of large-scale metabolomics data using support vector regression. Metabolomics 12(5):89. https://doi.org/10.1007/s11306-016-1026-5

    Article  CAS  Google Scholar 

Download references

Acknowledgments

D.R., S.J., and K.K. were supported by NIH/NHLBI Grant Number P20 HL113445. D.R., D.G., and K.K. were supported by NIH/NCATS Colorado CTSA Grant Number UL1 TR002535. H.P.-L. was supported by training grant T15 LM009451. Contents are the authors’ sole responsibility and do not necessarily represent official NIH views.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katerina Kechris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Reinhold, D., Pielke-Lombardo, H., Jacobson, S., Ghosh, D., Kechris, K. (2019). Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data. In: D'Alessandro, A. (eds) High-Throughput Metabolomics. Methods in Molecular Biology, vol 1978. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9236-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9236-2_20

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9235-5

  • Online ISBN: 978-1-4939-9236-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics