Skip to main content

Statistical Tools for Data Analysis

  • Protocol
  • First Online:
Hydrocarbon and Lipid Microbiology Protocols

Part of the book series: Springer Protocols Handbooks ((SPH))

  • 804 Accesses

Abstract

Microbial communities are complex and so are the data we use to describe them. In this chapter, we analyze a 16S amplicon data set from a marine time series using the open source QIIME software package. We first summarize complex, multivariate community composition data using ordination techniques. We then use the insights gained from ordination and multivariate statistical techniques to characterize the dominant relationships between environmental parameters and the microbial community in this marine ecosystem. Finally, we identify a list of taxa that show significant changes in relative abundance across seasons and describe the relationship between seasonality and community diversity. We go over several data visualization techniques that allow us to interpret our results. Analysis notes, QIIME commands, and sample data are provided. Finally, we discuss caveats regarding correlations and relative abundance data and describe how the statistical tools used in our 16S amplicon example can be applied to other types of “omics” data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hugenholtz P (2002) Exploring prokaryotic diversity in the genomic era. Genome Biol 3(2):1-0003.0008

    Article  Google Scholar 

  2. Muyzer G, Smalla K (1998) Application of denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE) in microbial ecology. A Van Leeuw J Microb 73(1):127–141

    Article  CAS  Google Scholar 

  3. Frostegård Å, Tunlid A, Bååth E (1993) Phospholipid fatty acid composition, biomass, and activity of microbial communities from two soil types experimentally exposed to different heavy metals. Appl Environ Microbiol 59(11):3605–3617

    PubMed  PubMed Central  Google Scholar 

  4. Marsh TL (1999) Terminal restriction fragment length polymorphism (T-RFLP): an emerging method for characterizing diversity among homologous populations of amplification products. Curr Opin Microbiol 2(3):323–327

    Article  CAS  PubMed  Google Scholar 

  5. Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA (1986) Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol 40(1):337–365

    Article  CAS  PubMed  Google Scholar 

  6. Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Mol Ecol 21(8):1794–1805

    Article  CAS  PubMed  Google Scholar 

  7. Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62(2):142–160

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Teeling H, Glöckner FO (2012) Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective. Brief Bioinform 13(6): 728–742

    Google Scholar 

  10. Gilbert JA et al (2009) The seasonal structure of microbial communities in the Western English Channel. Environ Microbiol 11(12):3132–3139

    Article  CAS  PubMed  Google Scholar 

  11. Gilbert JA et al (2010) Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project. Stand Genomic Sci 3(3):243

    Article  PubMed  PubMed Central  Google Scholar 

  12. Rideout JR et al (2014) Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ 2:e545

    Article  PubMed  PubMed Central  Google Scholar 

  13. McDonald D et al (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3):610–618

    Article  CAS  PubMed  Google Scholar 

  14. Caporaso JG et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Meth 7(5):335–336

    Article  CAS  Google Scholar 

  15. R Development Core Team (2008) R: A language and environment for statistical computing Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/

  16. Lawton JH (1999) Are there general laws in ecology? Oikos 177–192

    Google Scholar 

  17. Larsen PE, Gibbons SM, Gilbert JA (2012) Modeling microbial community structure and functional diversity across time and space. FEMS Microbiol Lett 332(2):91–98

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Caporaso JG et al. (2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA 108(Suppl 1):4516–4522

    Google Scholar 

  19. Legendre P, Legendre LF (2012) Numerical ecology. Elsevier. Amsterdam, Netherlands

    Google Scholar 

  20. Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R (2011) UniFrac: an effective distance metric for microbial community comparison. ISME J 5(2):169

    Article  PubMed  Google Scholar 

  21. Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129(2):271–280

    Article  Google Scholar 

  22. Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2):115–129

    Article  Google Scholar 

  23. Jiang X, Hu X, Shen H, He T (2012) Manifold learning reveals nonlinear structure in metagenomic profiles. In: 2012 I.E. international conference on bioinformatics and biomedicine (BIBM), IEEE, pp 1–6

    Google Scholar 

  24. Cacciatore S, Luchinat C, Tenori L (2014) Knowledge discovery by accuracy maximization. Proc Natl Acad Sci USA 111(14):5117–5122

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gower JC (2005) Principal coordinates analysis. Encyclopedia Biostat doi:10.1002/0470011815.b2a13070. http://onlinelibrary.wiley.com/doi/10.1002/0470011815.b2a13070/abstract?deniedAccessCustomisedMessage=&userIs Authenticated=false

  26. Vázquez-Baeza Y, Pirrung M, Gonzalez A, Knight R (2013) EMPeror: a tool for visualizing high-throughput microbial community data. GigaScience 2(1):16

    Article  PubMed  PubMed Central  Google Scholar 

  27. Clarke KR (1993) Non-parametric multivariate analyses of changes in community structure. Aust J Ecol 18(1):117–143

    Article  Google Scholar 

  28. Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures CRC. Boca Raton, Florida, USA

    Google Scholar 

  29. Anderson MJ (2005) Permutational multivariate analysis of variance. Department of Statistics, University of Auckland, Auckland

    Google Scholar 

  30. Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27(2 Part 1):209–220

    Google Scholar 

  31. Oksanen J (2011) Multivariate analysis of ecological communities in R: vegan tutorial. R package version 1(7) http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCcQFjAB&url=http%3A%2F%2-Fcc.oulu.fi%2F~jarioksa%2Fopetus%2Fmetodi%2Fvegantutor.pdf&ei=M2LjVOfXLIWgNsaRhJAO&usg=AFQjCNHsvyIZ380_KPgiGMqah_gA5V2jLQ&sig2=fMlVe0QMmwc1yNxmvRu-CVQ&bvm=bv.85970519,d.eXY

  32. Clarke K, Ainsworth M (1993) A method of linking multivariate community structure to environmental variables. Mar Ecol Prog Ser 92:205

    Google Scholar 

  33. Sawilowsky S, Fahoome G (2005) Kruskal–Wallis test. Encyclopedia of Statistics in behavioral Science http://onlinelibrary.wiley.com/doi/10.1002/0470013192.bsa333/abstract?deniedAccessCustomisedMessage=&userIsAuthenticated=false

  34. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 289300

    Google Scholar 

  35. Friedman J, Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8(9):e1002687

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ruan Q et al (2006) Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 22(20):2532–2538

    Article  CAS  PubMed  Google Scholar 

  37. Barberán A, Bates ST, Casamayor EO, Fierer N (2011) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J 6(2):343–351

    Article  PubMed  PubMed Central  Google Scholar 

  38. Dunne JA, Williams RJ, Martinez ND (2002) Food-web structure and network theory: the role of connectance and size. Proc Natl Acad Sci USA 99(20):12917–12922

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Alm E, Arkin AP (2003) Biological networks. Curr Opin Struc Biol 13(2):193–202

    Article  CAS  Google Scholar 

  40. Xia LC et al (2011) Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC Syst Biol 5(Suppl 2):S15

    Article  PubMed  PubMed Central  Google Scholar 

  41. David LA et al (2014) Host lifestyle affects human microbiota on daily timescales. Genome Biol 15(7):R8

    Article  Google Scholar 

  42. Stone L, Roberts A (1990) The checkerboard score and species distributions. Oecologia 85(1):74–79

    Article  Google Scholar 

  43. Gotelli NJ, Ulrich W (2012) Statistical challenges in null model analysis. Oikos 121(2):171–180

    Article  Google Scholar 

  44. Schloss PD et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Huse SM et al (2014) VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformat 15(1):41

    Article  Google Scholar 

  46. Glass EM, Meyer F (2011) The metagenomics RAST server: a public resource for the automatic phylogenetic and functional analysis of metagenomes. handbook of molecular microbial ecology I. Wiley, Hoboken, New Jersey, USA pp 325–331.

    Google Scholar 

  47. Eren AM et al (2014) Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J. http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2014195a.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean M. Gibbons .

Editor information

Editors and Affiliations

Coding Resources

Coding Resources

  1. 1.

    http://www.codecademy.com/

  2. 2.

    http://rosalind.info/problems/locations/

  3. 3.

    http://learnpythonthehardway.org/book/index.html

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this protocol

Cite this protocol

Gibbons, S.M. (2015). Statistical Tools for Data Analysis. In: McGenity, T., Timmis, K., Nogales Fernández, B. (eds) Hydrocarbon and Lipid Microbiology Protocols. Springer Protocols Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/8623_2015_50

Download citation

  • DOI: https://doi.org/10.1007/8623_2015_50

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49309-0

  • Online ISBN: 978-3-662-49310-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics