Robust and Powerful Differential Composition Tests for Clustered Microbiome Data
- 90 Downloads
Thanks to advances in high-throughput sequencing technologies, the importance of microbiome to human health and disease has been increasingly recognized. Analyzing microbiome data from sequencing experiments is challenging due to their unique features such as compositional data, excessive zero observations, overdispersion, and complex relations among microbial taxa. Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case–control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. The two-part version of the test can further improve power in the presence of excessive zero observations. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly adopted clustered data designs to evaluate the methods. We demonstrate that the methods properly control the type I error under all designs and are more powerful than existing methods in many scenarios. The usefulness of the proposed methods is further demonstrated with two real datasets from longitudinal microbiome studies on pregnant women and inflammatory bowel disease patients. The methods have been incorporated into the R package “miLineage” publicly available at https://tangzheng1.github.io/tanglab/software.html.
KeywordsMicrobiome composition Clustered data Association tests Zero-inflation Distribution-free
We are grateful to the associate editor and the two anonymous reviewers for their helpful comments.
- 2.Boos DD (1992) On generalized score tests. Am Stat 46(4):327–333Google Scholar
- 5.Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report. Department of Industrial Engineering and Management Sciences, Northwestern University, EvanstonGoogle Scholar
- 9.Davies R (1980) The distribution of a linear combination of \(\chi ^2\) random variables. J Roy Stat Soc Ser C 29(3):323–333Google Scholar
- 27.O’Brien JD, Record N, Countway P (2016) The power and pitfalls of Dirichlet–multinomial mixture models for ecological count data. bioRxiv. https://doi.org/10.1101/045468
- 33.Tang ZZ, Chen G (2018) Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics. https://doi.org/10.1093/biostatistics/kxy025
- 35.Tang ZZ, Chen G, Alekseyenko AV, Li H (2017) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278–1285Google Scholar