Abstract
The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic-normal model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the discrete part and the logistic-normal part of the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Our approach is also compared to others using the analysis of real microbiome data.
This is a preview of subscription content, access via your institution.


References
Blaser MJ (2014) The microbiome revolution. J Clin Invest 124:4162–4165
Cho I, Blaser MJ (2012) The human microbiome: at the interface of health and disease. Nat Rev Genet 13:260–270
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI (2007) The human microbiome project. Nature 449:804–810
Chen Y, Blaser MJ (2007) Inverse associations of Helicobacter pylori with asthma and allergy. Arch Intern Med 167:821–827
Hoen AG, Li J, Moulton LA, O’Toole GA, Housman ML, Koestler DC, Guill MF, Moore JH, Hibberd PL, Morrison HG, Sogin ML, Karagas MR, Madan JC (2015) Associations between gut microbial colonization in early life and respiratory outcomes in cystic fibrosis. J Pediatr 167:138–147
Madan JC, Salari RC, Saxena D, Davidson L, O’Toole GA, Moore JH, Sogin ML, Foster JA, Edwards WH, Palumbo P, Hibberd PL (2012) Gut microbial colonisation in premature neonates predicts neonatal sepsis. Arch Dis Child Fetal Neonatal Ed 97:F456–F462
Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, Barnes R, Watson P, Allen-Vercoe E, Moore RA, Holt RA (2012) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res 22:299–306
McColl KE (2010) Clinical practice. Helicobacter pylori infection. N Engl J Med 362:1597–1604
Reikvam DH, Erofeev A, Sandvik A, Grcic V, Jahnsen FL, Gaustad P, McCoy KD, Macpherson AJ, Meza-Zepeda LA, Johansen FE (2011) Depletion of murine intestinal microbiota: effects on gut mucosa and epithelial gene expression. PLoS ONE 6:e17996
Trasande L, Blustein J, Liu M, Corwin E, Cox LM, Blaser MJ (2013) Infant antibiotic exposures and early-life body mass. Int J Obes (Lond) 37:16–23
Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1031
Thomas T, Gilbert J, Meyer F (2012) Metagenomics—a guide from sampling to data analysis. Microb Inform Exp 2:3
Tringe SG, Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet 6:805–814
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145
Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9:811
Li HZ (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Ann Rev Stat Appl 2(2):73–94
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li HZ (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28:2106–2113
Mccoy CO, Matsen FA (2013) Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth. Peerj 1:e157
La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Sodergren E, Weinstock G, Shannon WD (2012) Hypothesis testing and power calculations for taxonomic-based human microbiome data. Plos One 7:e52078
McMurdie PJ, Holmes S (2014) Waste not, want not: why rarefying microbiome data is inadmissible. Plos Comput Biol 10:e1003531
Lin W, Shi PX, Feng R, Li HZ (2014) Variable selection in regression with compositional covariates. Biometrika 101:785–797
Randolph T, Zhao S, Copeland W, Hullar M, Shojaie A (2018) Kernel-penalized regression for analysis of microbiome data. Ann Appl Stat 12(1):540–566
Shi P, Zhang A, Li HZ (2016) Regression analysis for microbiome compositional data. Ann Appl Stat 10(2):1019–1040
Tang ZZ, Chen GH, Li HZ (2017) A general framework for association analysis of microbial community on a taxonomic tree. Bioinformatics 33:1278–1285
Chen J, Li HZ (2013) Variable selection for sparse dirichlet-multinomial regression with an application to microbiome data analysis. Ann Appl Stat 7:418–442
Xia F, Chen J, Fung WK, Li HZ (2013) A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics 69:1053–1063
Peng X, Li G, Liu Z (2016) Zero-inflated beta regression for differential abundance analysis with metagenomics data. J Comput Biol 23(2):102–110
Chen EZ, Li HZ (2016) A two-part mixed-effect model for analyzing longitudinal microbiome compositional data. Bioinformatics 32(17):2611–2617
Liang KY, Zeger SL (1986) Longitudinal data-analysis using generalized linear-models. Biometrika 73:13–22
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288
Fan J, Li R (2001) Variable selection via nonconvace penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Farzan SF, Korrick S, Li Z, Enelow R, Gandolfi AJ, Madan J, Nadeau K, Karagas MR (2013) In utero arsenic exposure and infant infection in a United States cohort: a prospective study. Environ Res 126:24–30
Aitchison J (2003) The statistical analysis of compositional data. Blackburn Press, Caldwell
Aitchison J (1982) The statistical-analysis of compositional data. J R Stat Soc Ser B 44:139–177
Teugels JL (1990) Some representations of the multivariate bernoulli and binomial distributions. J Multivar Anal 32:256–268
Dai B, Ding SL, Wahba G (2013) Multivariate Bernoulli distribution. Bernoulli 19:1465–1483
Billheimer D, Guttorp P, Fagan WF (2001) Statistical interpretation of species composition. J Am Stat Assoc 96:1205–1214
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94:19–35
Godambe VP (1991) Estimating functions. Oxford University Press, New York
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67:301–320
Zhao S, Shojaie A (2016) A significance test for graph-constrained estimation. Biometrics 72:484–493
Farzan SF, Korrick S, Li ZG, Enelow R, Gandolfi AJ, Madan J, Nadeau K, Karagas MR (2013) In utero arsenic exposure and infant infection in a United States cohort: a prospective study. Environ Res 126:24–30
Madan JC, Hoen AG, Lundgren SN, Farzan SF, Cottingham KL, Morrison HG, Sogin ML, Li H, Moore JH, Karagas MR (2016) Association of cesarean delivery and formula supplementation with the intestinal microbiome of 6-week-old infants. JAMA Pediatr 170:212–219
Degnan PH, Ochman H (2012) Illumina-based analysis of microbial community diversity. ISME J 6:183–194
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, Gormley N, Gilbert JA, Smith G, Knight R (2012) Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6:1621–1624
Turroni F, Ribbera A, Foroni E, van Sinderen D, Ventura M (2008) Human gut microbiota and bifidobacteria: from composition to functionality. Antonie Van Leeuwenhoek 94:35–50
Parracho HM, Bingham MO, Gibson GR, McCartney AL (2005) Differences between the gut microflora of children with autistic spectrum disorders and that of healthy children. J Med Microbiol 54:987–991
Russell SL, Gold MJ, Hartmann M, Willing BP, Thorson L, Wlodarska M, Gill N, Blanchet MR, Mohn WW, McNagny KM, Finlay BB (2012) Early life antibiotic-driven changes in microbiota enhance susceptibility to allergic asthma. EMBO Rep 13:440–447
Bisgaard H, Li N, Bonnelykke K, Chawes BL, Skov T, Paludan-Muller G, Stokholm J, Smith B, Krogfelt KA (2011) Reduced diversity of the intestinal microbiota during infancy is associated with increased risk of allergic disease at school age. J Allergy Clin Immunol 128(646–652):e641–e645
Cong X, Xu W, Romisher R, Poveda S, Forte S, Starkweather A, Henderson WA (2016) Gut microbiome and infant health: brain-gut-microbiota axis and host genetic factors. Yale J Biol Med 89:299–308
Kinross JM, Darzi AW, Nicholson JK (2011) Gut microbiome-host interactions in health and disease. Genome Med 3:14
Mueller NT, Bakacs E, Combellick J, Grigoryan Z, Dominguez-Bello MG (2015) The infant microbiome development: mom matters. Trends Mol Med 21:109–117
Munyaka PM, Khafipour E, Ghia JE (2014) External influence of early childhood establishment of gut microbiota and subsequent health implications. Front Pediatr 2:109
Vangay P, Ward T, Gerber JS, Knights D (2015) Antibiotics, pediatric dysbiosis, and disease. Cell Host Microbe 17:553–564
Sjogren YM, Tomicic S, Lundberg A, Bottcher MF, Bjorksten B, Sverremark-Ekstrom E, Jenmalm MC (2009) Influence of early gut microbiota on the maturation of childhood mucosal and systemic immune responses. Clin Exp Allergy 39:1842–1851
Rutayisire E, Huang K, Liu Y, Tao F (2016) The mode of delivery affects the diversity and colonization pattern of the gut microbiota during the first year of infants’ life: a systematic review. BMC Gastroenterol 16:86
Penders J, Thijs C, Vink C, Stelma FF, Snijders B, Kummeling I, van den Brandt PA, Stobberingh EE (2006) Factors influencing the composition of the intestinal microbiota in early infancy. Pediatrics 118:511–521
Mazmanian SK, Round JL, Kasper DL (2008) A microbial symbiosis factor prevents intestinal inflammatory disease. Nature 453:620–625
Corvaglia L, Tonti G, Martini S, Aceti A, Mazzola G, Aloisio I, Di Gioia D, Faldella G (2016) Influence of intrapartum antibiotic prophylaxis for group b streptococcus on gut microbiota in the first month of life. J Pediatr Gastroenterol Nutr 62:304–308
Bjorksten B, Sepp E, Julge K, Voor T, Mikelsaar M (2001) Allergy development and the intestinal microflora during the first year of life. J Allergy Clin Immunol 108:516–520
Azad MB, Konya T, Persaud RR, Guttman DS, Chari RS, Field CJ, Sears MR, Mandhane PJ, Turvey SE, Subbarao P, Becker AB, Scott JA, Kozyrskyj AL (2016) Impact of maternal intrapartum antibiotics, method of birth and breastfeeding on gut microbiota during the first year of life: a prospective cohort study. BJOG 123:983–993
Hoen AG, Li J, Moulton LA, O’Toole GA, Housman ML, Koestler DC, Guill MF, Moore JH, Hibberd PL, Morrison HG, Sogin ML, Karagas MR, Madan JC (2015) Associations between gut microbial colonization in early life and respiratory outcomes in cystic fibrosis. J Pediatr 167:138–147
Funding
This work was supported by NIH grants: R01GM123014, R01GM123056, P01ES022832, UG30D023275, R01CA127334, P20GM104416, K01LM011985 and R01LM012723 and EPA grant RD-83544201.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
None declared.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Li, Z., Lee, K., Karagas, M.R. et al. Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data. Stat Biosci 10, 587–608 (2018). https://doi.org/10.1007/s12561-018-9219-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-018-9219-2
Keywords
- Microbiome data analysis
- High dimension
- Zero-inflated
- Multivariate logistic normal
- Relative abundance
- Estimating equation