Statistical Models and Analysis of Microbiome Data from Mice and Humans
After the initiation of the Human Microbiome Project in 2007, numerous statistical and bioinformatic tools for data analysis and computational methods were developed and applied to meet the needs of microbiome studies. One of the popular platforms is to implement the newly developed statistical and bioinformatic methods and models using R packages.
In this chapter, we introduce the widely used and newly developed statistical methods and models in the ecology and microbiome fields. We show readers how to use the current available statistical tools based on the R programming language to analyze microbiome data. Our purpose is to provide the analytical steps and tools to be implemented by microbiome researchers, who may not have advanced knowledge of statistical models and R programming language. Specifically, this chapter covers frequently used univariate and multivariate statistical models and visualization tools, in addition to alpha and beta metrics and R programming skills, using real data from mouse and human microbiome studies.
KeywordsGut microbiome Statistical methods Statistical analysis R package
We would like to acknowledge the NIDDK/National Institutes of Health grant R01 DK105118 and DOD BC160450P1 to Jun Sun. We thank the two anonymous reviewers whose comments/suggestions helped to improve and clarify this manuscript.
- Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodological) 44(2):139–177Google Scholar
- Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11:265–270Google Scholar
- Chen J (2012) GUniFrac: generalized UniFrac distances. R package version 1.0. http://CRAN.R-project.org/package=GUniFrac
- Kindt R, Coe R (2005) Tree diversity analysis. A manual and software for common statistical methods for ecological and biodiversity studies. World Agroforestry Centre (ICRAF), Nairobi. ISBN: 92-9059-179-XGoogle Scholar
- Oksanen J, Guillaume Blanchet F et al (2016) Vegan: community ecology package. R package version 2.4-1. http://CRAN.R-project.org/package=vegan
- R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
- RStudio Team (2016) RStudio: integrated development for R. RStudio, Boston. http://www.rstudio.com/
- Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, UrbanaGoogle Scholar
- Wickham H, Francois R (2016). dplyr: a grammar of data manipulation. R package version 0.5.0. http://CRAN.R-project.org/package=dplyr