Skip to main content

Normalization of Microbiome Profiling Data

Part of the Methods in Molecular Biology book series (MIMB,volume 1849)

Abstract

Normalization is a term that is often used but rarely defined and poorly understood. The number of choices of normalization procedure is large—some are inappropriate or inadmissible—and all are narrowly relevant to a specific analysis that depends on both the nature of the data and the question being asked. This chapter describes key definitions of normalization as they apply in metagenomics, mainly for taxonomic profiling data; while also demonstrating specific, reproducible examples of normalization procedures in the context of analysis techniques for which they were intended. The analysis and graphics code is distributed as a supplemental companion to this chapter so that the motivated reader can re-use it on new data.

Key words

  • Normalization
  • Microbiome
  • Metagenomics
  • DNA sequencing
  • Statistics

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-8728-3_10
  • Chapter length: 26 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-8728-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Hardcover Book
USD   249.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Wolfs TF, Zwart G, Bakker M, Goudsmit J (1992) HIV-1 genomic RNA diversification following sexual and parenteral virus transmission. Virology 189:103–110

    CAS  CrossRef  Google Scholar 

  2. Lipkin WI (2010) Microbe hunting. Microbiol Mol Biol Rev 74:363–377

    CAS  CrossRef  Google Scholar 

  3. Beerenwinkel N, Günthard HF, Roth V, Metzner KJ (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front Microbiol 3:329

    CAS  CrossRef  Google Scholar 

  4. Holmes S, Huber W (2018) Modern statistics for modern biology. Cambridge University Press, Cambridge (in press)

    Google Scholar 

  5. Aitchison J, Egozcue JJ (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37:829–850. https://doi.org/10.1007/s11004-005-7383-7

    CrossRef  Google Scholar 

  6. Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond 60:489–498. https://doi.org/10.1098/rspl.1896.0076

    CrossRef  Google Scholar 

  7. Caporaso JG, Kuczynski J, Stombaugh J et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336

    CAS  CrossRef  Google Scholar 

  8. Schloss PD, Westcott SL, Ryabin T et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541

    CAS  CrossRef  Google Scholar 

  9. Efron B (2000) The bootstrap and modern statistics. J Am Stat Assoc 95:1293–1296

    CrossRef  Google Scholar 

  10. Callahan BJ, McMurdie PJ, Holmes SP (2017) Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11:2639–2643

    CrossRef  Google Scholar 

  11. Kopylova E, Navas-Molina JA, Mercier C et al (2016) Open-source sequence clustering methods improve the state of the art. mSystems 1:e00003–e00015

    Google Scholar 

  12. McMurdie PJ, Holmes S (2014) Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 10:e1003531

    CrossRef  Google Scholar 

  13. Callahan BJ, McMurdie PJ, Rosen MJ et al (2016) DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 13:581–583

    CAS  CrossRef  Google Scholar 

  14. Li J, Tibshirani R (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 22:519–536

    CrossRef  Google Scholar 

  15. Marioni JC, Mason CE, Mane SM et al (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517

    CAS  CrossRef  Google Scholar 

  16. Rapaport F, Khanin R, Liang Y et al (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 14:R95

    CrossRef  Google Scholar 

  17. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  18. RStudio Team (2016) RStudio: integrated development environment for r. RStudio, Inc., Boston, MA

    Google Scholar 

  19. Huber W, Carey VJ et al (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12:115–121

    CAS  CrossRef  Google Scholar 

  20. McMurdie PJ, Holmes S (2013) phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 8:e61217

    CAS  CrossRef  Google Scholar 

  21. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome Biol 15:550

    CrossRef  Google Scholar 

  22. Fernandes AD, Reid JN, Macklaim JM et al (2014) Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2:1–13

    CAS  CrossRef  Google Scholar 

  23. Paulson JN, Stine OC, Bravo HC, Pop M (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10:1200–1202. Advance online publication SP - EP -:1–6

    CAS  CrossRef  Google Scholar 

  24. Zhou X, Lindsay H, Robinson MD (2014) Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res 42:e91

    CAS  CrossRef  Google Scholar 

  25. Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47

    Google Scholar 

  26. Law CW, Chen Y, Shi W, Smyth GK (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15:R29

    CrossRef  Google Scholar 

  27. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300

    Google Scholar 

  28. Kostic AD, Gevers D, Pedamallu CS et al (2012) Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res 22:292–298

    CAS  CrossRef  Google Scholar 

  29. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121

    CAS  CrossRef  Google Scholar 

  30. Fernandes AD, Macklaim JM, Linn TG et al (2013) ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One 8:e67019

    CAS  CrossRef  Google Scholar 

  31. Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338

    CrossRef  Google Scholar 

  32. Minchin PR (1987) An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69:89–107

    CrossRef  Google Scholar 

  33. Bray JR, Curtis JT (1957) An ordination of the upland forest communities of Southern Wisconsin. Ecol Monogr 27:325

    CrossRef  Google Scholar 

  34. Callahan B, Sankaran K, Fukuyama J et al (2016) Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Res 5:1492

    CrossRef  Google Scholar 

  35. Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions - R package for multivariate imputation of left-censored data under a compositional approach. Chemom Intell Lab Syst 143:85–96

    CAS  CrossRef  Google Scholar 

  36. Gloor GB, Reid G (2016) Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol 62:692–703

    CAS  CrossRef  Google Scholar 

  37. Turnbaugh PJ, Gordon JI (2009) The core gut microbiome, energy balance and obesity. J Physiol 587:4153–4158. https://doi.org/10.1113/jphysiol.2009.174136

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  38. Kolde R, Franzosa EA, Rahnavard G et al (2018) Host genetic variation and its microbiome interactions within the human microbiome project. Genome Med 10:6. https://doi.org/10.1186/s13073-018-0515-8

    CrossRef  PubMed  PubMed Central  Google Scholar 

  39. Anderson M (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46

    Google Scholar 

  40. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer, Berlin

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul J. McMurdie .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

McMurdie, P.J. (2018). Normalization of Microbiome Profiling Data. In: Beiko, R., Hsiao, W., Parkinson, J. (eds) Microbiome Analysis. Methods in Molecular Biology, vol 1849. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8728-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8728-3_10

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8726-9

  • Online ISBN: 978-1-4939-8728-3

  • eBook Packages: Springer Protocols