Robust and Powerful Differential Composition Tests for Clustered Microbiome Data

  • Zheng-Zheng TangEmail author
  • Guanhua Chen


Thanks to advances in high-throughput sequencing technologies, the importance of microbiome to human health and disease has been increasingly recognized. Analyzing microbiome data from sequencing experiments is challenging due to their unique features such as compositional data, excessive zero observations, overdispersion, and complex relations among microbial taxa. Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case–control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. The two-part version of the test can further improve power in the presence of excessive zero observations. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly adopted clustered data designs to evaluate the methods. We demonstrate that the methods properly control the type I error under all designs and are more powerful than existing methods in many scenarios. The usefulness of the proposed methods is further demonstrated with two real datasets from longitudinal microbiome studies on pregnant women and inflammatory bowel disease patients. The methods have been incorporated into the R package “miLineage” publicly available at


Microbiome composition Clustered data Association tests Zero-inflation Distribution-free 



We are grateful to the associate editor and the two anonymous reviewers for their helpful comments.

Supplementary material

12561_2019_9251_MOESM1_ESM.pdf (96 kb)
Supplementary material 1 (pdf 96 KB)


  1. 1.
    Alekseyenko AV, Perez-Perez GI, De Souza A, Strober B, Gao Z, Bihan M, Li K, Methé BA, Blaser MJ (2013) Community differentiation of the cutaneous microbiota in psoriasis. Microbiome 1(1):31CrossRefGoogle Scholar
  2. 2.
    Boos DD (1992) On generalized score tests. Am Stat 46(4):327–333Google Scholar
  3. 3.
    Braun TM, Feng Z (2001) Optimal permutation tests for the analysis of group randomized trials. J Am Stat Assoc 96(456):1424–1432MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI et al (2010) Qiime allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335CrossRefGoogle Scholar
  5. 5.
    Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report. Department of Industrial Engineering and Management Sciences, Northwestern University, EvanstonGoogle Scholar
  6. 6.
    Chen EZ, Li H (2016) A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32(17):2611–2617CrossRefGoogle Scholar
  7. 7.
    Collado MC, Isolauri E, Laitinen K, Salminen S (2008) Distinct composition of gut microbiota during pregnancy in overweight and normal-weight women-. Am J Clin Nutr 88(4):894–899CrossRefGoogle Scholar
  8. 8.
    Cragg JG (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39:829–844CrossRefzbMATHGoogle Scholar
  9. 9.
    Davies R (1980) The distribution of a linear combination of \(\chi ^2\) random variables. J Roy Stat Soc Ser C 29(3):323–333Google Scholar
  10. 10.
    Diggle P, Heagerty P, Liang KY, Zeger S (2002) Analysis of longitudinal data. Oxford University Press, OxfordzbMATHGoogle Scholar
  11. 11.
    DiGiulio DB, Callahan BJ, McMurdie PJ, Costello EK, Lyell DJ, Robaczewska A, Sun CL, Goltsman DS, Wong RJ, Shaw G et al (2015) Temporal and spatial variation of the human microbiota during pregnancy. Proc Natl Acad Sci 112(35):11060–11065CrossRefGoogle Scholar
  12. 12.
    Frees EW (2009) Regression modeling with actuarial and financial applications. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  13. 13.
    Gail MH, Mark SD, Carroll RJ, Green SB, Pee D (1996) On design considerations and randomization-based inference for community intervention trials. Stat Med 15(11):1069–1092CrossRefGoogle Scholar
  14. 14.
    Gilbert JA, Quinn RA, Debelius J, Xu ZZ, Morton J, Garg N, Jansson JK, Dorrestein PC, Knight R (2016) Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535:94–103CrossRefGoogle Scholar
  15. 15.
    Gilbert JA, Blaser MJ, Caporaso JG, Jansson JK, Lynch SV, Knight R (2018) Current understanding of the human microbiome. Nat Med 24(4):392CrossRefGoogle Scholar
  16. 16.
    Goodrich JK, Waters JL, Poole AC, Sutter JL, Koren O, Blekhman R, Beaumont M, Van Treuren W, Knight R, Bell JT et al (2014) Human genetics shape the gut microbiome. Cell 159(4):789–799CrossRefGoogle Scholar
  17. 17.
    Halfvarson J, Brislawn CJ, Lamendella R, Vázquez-Baeza Y, Walters WA, Bramer LM, D’Amato M, Bonfiglio F, McDonald D, Gonzalez A et al (2017) Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol 2(5):17004CrossRefGoogle Scholar
  18. 18.
    Hardin JW, Hilbe JM (2002) Generalized estimating equations. Chapman and Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
  19. 19.
    Koren O, Goodrich JK, Cullender TC, Spor A, Laitinen K, Bäckhed HK, Gonzalez A, Werner JJ, Angenent LT, Knight R et al (2012) Host remodeling of the gut microbiome and metabolic changes during pregnancy. Cell 150(3):470–480CrossRefGoogle Scholar
  20. 20.
    Kostic AD, Gevers D, Siljander H, Vatanen T, Hyötyläinen T, Hämäläinen AM, Peet A, Tillmann V, Pöhö P, Mattila I et al (2015) The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes. Cell Host Microbe 17(2):260–273CrossRefGoogle Scholar
  21. 21.
    La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Sodergren E, Weinstock G, Shannon WD (2012) Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE 7(12):e52078CrossRefGoogle Scholar
  22. 22.
    La Rosa PS, Warner BB, Zhou Y, Weinstock GM, Sodergren E, Hall-Moore CM, Stevens HJ, Bennett WE, Shaikh N, Linneman LA et al (2014) Patterned progression of bacterial populations in the premature infant gut. Proc Natl Acad Sci 111(34):12522–12527CrossRefGoogle Scholar
  23. 23.
    Li H (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Appl 2:73–94CrossRefGoogle Scholar
  24. 24.
    Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Nuriel-Ohayon M, Neuman H, Koren O (2016) Microbial changes during pregnancy, birth, and infancy. Front Microbiol 7:1031CrossRefGoogle Scholar
  27. 27.
    O’Brien JD, Record N, Countway P (2016) The power and pitfalls of Dirichlet–multinomial mixture models for ecological count data. bioRxiv.
  28. 28.
    Pesarin F, Salmaso L (2010) Permutation tests for complex data: theory, applications and software. Wiley, HobokenCrossRefzbMATHGoogle Scholar
  29. 29.
    Sainani K (2010) The importance of accounting for correlated observations. PMR 2(9):858–861CrossRefGoogle Scholar
  30. 30.
    Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9(8):811CrossRefGoogle Scholar
  31. 31.
    Smith MI, Yatsunenko T, Manary MJ, Trehan I, Mkakosya R, Cheng J, Kau AL, Rich SS, Concannon P, Mychaleckyj JC et al (2013) Gut microbiomes of Malawian twin pairs discordant for Kwashiorkor. Science 339(6119):548–554CrossRefGoogle Scholar
  32. 32.
    Storey JD et al (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31(6):2013–2035MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Tang ZZ, Chen G (2018) Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics.
  34. 34.
    Tang ZZ, Lin DY (2015) Meta-analysis for discovering rare-variant associations: statistical methods and software programs. Am J Hum Genet 97:35–53CrossRefGoogle Scholar
  35. 35.
    Tang ZZ, Chen G, Alekseyenko AV, Li H (2017) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278–1285Google Scholar
  36. 36.
    Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93CrossRefGoogle Scholar
  37. 37.
    Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N (2017) Negative binomial mixed models for analyzing microbiome count data. BMC Bioinform 18(1):4CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2019

Authors and Affiliations

  1. 1.Department of Biostatistics and Medical InformaticsUniversity of Wisconsin-Madison, and Wisconsin Institute for DiscoveryMadisonUSA
  2. 2.Department of Biostatistics and Medical InformaticsUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations