Statistical Models and Analysis of Microbiome Data from Mice and Humans

  • Yinglin XiaEmail author
  • Jun SunEmail author
Part of the Physiology in Health and Disease book series (PIHD)


After the initiation of the Human Microbiome Project in 2007, numerous statistical and bioinformatic tools for data analysis and computational methods were developed and applied to meet the needs of microbiome studies. One of the popular platforms is to implement the newly developed statistical and bioinformatic methods and models using R packages.

In this chapter, we introduce the widely used and newly developed statistical methods and models in the ecology and microbiome fields. We show readers how to use the current available statistical tools based on the R programming language to analyze microbiome data. Our purpose is to provide the analytical steps and tools to be implemented by microbiome researchers, who may not have advanced knowledge of statistical models and R programming language. Specifically, this chapter covers frequently used univariate and multivariate statistical models and visualization tools, in addition to alpha and beta metrics and R programming skills, using real data from mouse and human microbiome studies.


Gut microbiome Statistical methods Statistical analysis R package 



We would like to acknowledge the NIDDK/National Institutes of Health grant R01 DK105118 and DOD BC160450P1 to Jun Sun. We thank the two anonymous reviewers whose comments/suggestions helped to improve and clarify this manuscript.


  1. Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodological) 44(2):139–177Google Scholar
  2. Borcard D, Gillet F et al (2011) Numerical ecology with R. Springer, New YorkCrossRefGoogle Scholar
  3. Chao A (1984) Nonparametric estimation of the number of classes in a population. Scand J Stat 11:265–270Google Scholar
  4. Charlson ES, Chen J et al (2010) Disordered microbial communities in the upper respiratory tract of cigarette smokers. PLoS One 5(12):0015216CrossRefGoogle Scholar
  5. Chen J (2012) GUniFrac: generalized UniFrac distances. R package version 1.0.
  6. Clarke KR (1993) Non-parametric multivariate analysis of changes in community structure. Aust J Ecol 18:117–143CrossRefGoogle Scholar
  7. Fernandes AD, Macklaim JM et al (2013) ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One 8(7):e67019CrossRefPubMedPubMedCentralGoogle Scholar
  8. Gloor GB, Reid G (2016) Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can J Microbiol 62(8):692–703CrossRefPubMedGoogle Scholar
  9. Gloor GB, Wu JR et al (2016) It’s all relative: analyzing microbiome data as compositions. Ann Epidemiol 26(5):322–329CrossRefPubMedGoogle Scholar
  10. Jin D, Wu S et al (2015) Lack of vitamin D receptor causes dysbiosis and changes the functions of the murine intestinal microbiome. Clin Ther 37(5):996–1009.e1007CrossRefPubMedGoogle Scholar
  11. Kindt R, Coe R (2005) Tree diversity analysis. A manual and software for common statistical methods for ecological and biodiversity studies. World Agroforestry Centre (ICRAF), Nairobi. ISBN: 92-9059-179-XGoogle Scholar
  12. Mandal S, Van Treuren W et al (2015) Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 26:27663PubMedGoogle Scholar
  13. Oksanen J, Guillaume Blanchet F et al (2016) Vegan: community ecology package. R package version 2.4-1.
  14. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
  15. RStudio Team (2016) RStudio: integrated development for R. RStudio, Boston.
  16. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423CrossRefGoogle Scholar
  17. Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, UrbanaGoogle Scholar
  18. Simpson EH (1949) Measurement of diversity. Nature 163:688CrossRefGoogle Scholar
  19. Wang J, Thingholm LB et al (2016) Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat Genet 48(11):1396–1406CrossRefPubMedPubMedCentralGoogle Scholar
  20. Wickham H, Francois R (2016). dplyr: a grammar of data manipulation. R package version 0.5.0.
  21. Xia Y, Sun J (2017) Hypothesis testing and statistical analysis of microbiome. Genes Dis 4(3):138–148. CrossRefGoogle Scholar

Copyright information

© The American Physiological Society 2018

Authors and Affiliations

  1. 1.Division of Academic Internal Medicine and Geriatrics, Department of MedicineUniversity of Illinois at ChicagoChicagoUSA
  2. 2.Division of Gastroenterology and Hepatology, Department of MedicineUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations