Abstract
Microbial communities are complex and so are the data we use to describe them. In this chapter, we analyze a 16S amplicon data set from a marine time series using the open source QIIME software package. We first summarize complex, multivariate community composition data using ordination techniques. We then use the insights gained from ordination and multivariate statistical techniques to characterize the dominant relationships between environmental parameters and the microbial community in this marine ecosystem. Finally, we identify a list of taxa that show significant changes in relative abundance across seasons and describe the relationship between seasonality and community diversity. We go over several data visualization techniques that allow us to interpret our results. Analysis notes, QIIME commands, and sample data are provided. Finally, we discuss caveats regarding correlations and relative abundance data and describe how the statistical tools used in our 16S amplicon example can be applied to other types of “omics” data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hugenholtz P (2002) Exploring prokaryotic diversity in the genomic era. Genome Biol 3(2):1-0003.0008
Muyzer G, Smalla K (1998) Application of denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE) in microbial ecology. A Van Leeuw J Microb 73(1):127–141
Frostegård Å, Tunlid A, Bååth E (1993) Phospholipid fatty acid composition, biomass, and activity of microbial communities from two soil types experimentally exposed to different heavy metals. Appl Environ Microbiol 59(11):3605–3617
Marsh TL (1999) Terminal restriction fragment length polymorphism (T-RFLP): an emerging method for characterizing diversity among homologous populations of amplification products. Curr Opin Microbiol 2(3):323–327
Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA (1986) Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol 40(1):337–365
Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Mol Ecol 21(8):1794–1805
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235
Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62(2):142–160
Teeling H, Glöckner FO (2012) Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective. Brief Bioinform 13(6): 728–742
Gilbert JA et al (2009) The seasonal structure of microbial communities in the Western English Channel. Environ Microbiol 11(12):3132–3139
Gilbert JA et al (2010) Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project. Stand Genomic Sci 3(3):243
Rideout JR et al (2014) Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ 2:e545
McDonald D et al (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3):610–618
Caporaso JG et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Meth 7(5):335–336
R Development Core Team (2008) R: A language and environment for statistical computing Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
Lawton JH (1999) Are there general laws in ecology? Oikos 177–192
Larsen PE, Gibbons SM, Gilbert JA (2012) Modeling microbial community structure and functional diversity across time and space. FEMS Microbiol Lett 332(2):91–98
Caporaso JG et al. (2011) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA 108(Suppl 1):4516–4522
Legendre P, Legendre LF (2012) Numerical ecology. Elsevier. Amsterdam, Netherlands
Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R (2011) UniFrac: an effective distance metric for microbial community comparison. ISME J 5(2):169
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129(2):271–280
Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2):115–129
Jiang X, Hu X, Shen H, He T (2012) Manifold learning reveals nonlinear structure in metagenomic profiles. In: 2012 I.E. international conference on bioinformatics and biomedicine (BIBM), IEEE, pp 1–6
Cacciatore S, Luchinat C, Tenori L (2014) Knowledge discovery by accuracy maximization. Proc Natl Acad Sci USA 111(14):5117–5122
Gower JC (2005) Principal coordinates analysis. Encyclopedia Biostat doi:10.1002/0470011815.b2a13070. http://onlinelibrary.wiley.com/doi/10.1002/0470011815.b2a13070/abstract?deniedAccessCustomisedMessage=&userIs Authenticated=false
Vázquez-Baeza Y, Pirrung M, Gonzalez A, Knight R (2013) EMPeror: a tool for visualizing high-throughput microbial community data. GigaScience 2(1):16
Clarke KR (1993) Non-parametric multivariate analyses of changes in community structure. Aust J Ecol 18(1):117–143
Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures CRC. Boca Raton, Florida, USA
Anderson MJ (2005) Permutational multivariate analysis of variance. Department of Statistics, University of Auckland, Auckland
Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27(2 Part 1):209–220
Oksanen J (2011) Multivariate analysis of ecological communities in R: vegan tutorial. R package version 1(7) http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCcQFjAB&url=http%3A%2F%2-Fcc.oulu.fi%2F~jarioksa%2Fopetus%2Fmetodi%2Fvegantutor.pdf&ei=M2LjVOfXLIWgNsaRhJAO&usg=AFQjCNHsvyIZ380_KPgiGMqah_gA5V2jLQ&sig2=fMlVe0QMmwc1yNxmvRu-CVQ&bvm=bv.85970519,d.eXY
Clarke K, Ainsworth M (1993) A method of linking multivariate community structure to environmental variables. Mar Ecol Prog Ser 92:205
Sawilowsky S, Fahoome G (2005) Kruskal–Wallis test. Encyclopedia of Statistics in behavioral Science http://onlinelibrary.wiley.com/doi/10.1002/0470013192.bsa333/abstract?deniedAccessCustomisedMessage=&userIsAuthenticated=false
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 289–300
Friedman J, Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8(9):e1002687
Ruan Q et al (2006) Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 22(20):2532–2538
Barberán A, Bates ST, Casamayor EO, Fierer N (2011) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J 6(2):343–351
Dunne JA, Williams RJ, Martinez ND (2002) Food-web structure and network theory: the role of connectance and size. Proc Natl Acad Sci USA 99(20):12917–12922
Alm E, Arkin AP (2003) Biological networks. Curr Opin Struc Biol 13(2):193–202
Xia LC et al (2011) Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC Syst Biol 5(Suppl 2):S15
David LA et al (2014) Host lifestyle affects human microbiota on daily timescales. Genome Biol 15(7):R8
Stone L, Roberts A (1990) The checkerboard score and species distributions. Oecologia 85(1):74–79
Gotelli NJ, Ulrich W (2012) Statistical challenges in null model analysis. Oikos 121(2):171–180
Schloss PD et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541
Huse SM et al (2014) VAMPS: a website for visualization and analysis of microbial population structures. BMC Bioinformat 15(1):41
Glass EM, Meyer F (2011) The metagenomics RAST server: a public resource for the automatic phylogenetic and functional analysis of metagenomes. handbook of molecular microbial ecology I. Wiley, Hoboken, New Jersey, USA pp 325–331.
Eren AM et al (2014) Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. ISME J. http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2014195a.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Coding Resources
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this protocol
Cite this protocol
Gibbons, S.M. (2015). Statistical Tools for Data Analysis. In: McGenity, T., Timmis, K., Nogales Fernández, B. (eds) Hydrocarbon and Lipid Microbiology Protocols. Springer Protocols Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/8623_2015_50
Download citation
DOI: https://doi.org/10.1007/8623_2015_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49309-0
Online ISBN: 978-3-662-49310-6
eBook Packages: Springer Protocols