Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data

Abstract

The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers of zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic-normal model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers of zeros and the compositional data structure with the discrete part and the logistic-normal part of the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shows that our model outperforms existing methods. Our approach is also compared to others using the analysis of real microbiome data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. 1.

    Blaser MJ (2014) The microbiome revolution. J Clin Invest 124:4162–4165

    Article  Google Scholar 

  2. 2.

    Cho I, Blaser MJ (2012) The human microbiome: at the interface of health and disease. Nat Rev Genet 13:260–270

    Article  Google Scholar 

  3. 3.

    Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI (2007) The human microbiome project. Nature 449:804–810

    Article  Google Scholar 

  4. 4.

    Chen Y, Blaser MJ (2007) Inverse associations of Helicobacter pylori with asthma and allergy. Arch Intern Med 167:821–827

    Article  Google Scholar 

  5. 5.

    Hoen AG, Li J, Moulton LA, O’Toole GA, Housman ML, Koestler DC, Guill MF, Moore JH, Hibberd PL, Morrison HG, Sogin ML, Karagas MR, Madan JC (2015) Associations between gut microbial colonization in early life and respiratory outcomes in cystic fibrosis. J Pediatr 167:138–147

    Article  Google Scholar 

  6. 6.

    Madan JC, Salari RC, Saxena D, Davidson L, O’Toole GA, Moore JH, Sogin ML, Foster JA, Edwards WH, Palumbo P, Hibberd PL (2012) Gut microbial colonisation in premature neonates predicts neonatal sepsis. Arch Dis Child Fetal Neonatal Ed 97:F456–F462

    Article  Google Scholar 

  7. 7.

    Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, Barnes R, Watson P, Allen-Vercoe E, Moore RA, Holt RA (2012) Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res 22:299–306

    Article  Google Scholar 

  8. 8.

    McColl KE (2010) Clinical practice. Helicobacter pylori infection. N Engl J Med 362:1597–1604

    Article  Google Scholar 

  9. 9.

    Reikvam DH, Erofeev A, Sandvik A, Grcic V, Jahnsen FL, Gaustad P, McCoy KD, Macpherson AJ, Meza-Zepeda LA, Johansen FE (2011) Depletion of murine intestinal microbiota: effects on gut mucosa and epithelial gene expression. PLoS ONE 6:e17996

    Article  Google Scholar 

  10. 10.

    Trasande L, Blustein J, Liu M, Corwin E, Cox LM, Blaser MJ (2013) Infant antibiotic exposures and early-life body mass. Int J Obes (Lond) 37:16–23

    Article  Google Scholar 

  11. 11.

    Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027–1031

    Article  Google Scholar 

  12. 12.

    Thomas T, Gilbert J, Meyer F (2012) Metagenomics—a guide from sampling to data analysis. Microb Inform Exp 2:3

    Article  Google Scholar 

  13. 13.

    Tringe SG, Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet 6:805–814

    Article  Google Scholar 

  14. 14.

    Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145

    Article  Google Scholar 

  15. 15.

    Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9:811

    Article  Google Scholar 

  16. 16.

    Li HZ (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Ann Rev Stat Appl 2(2):73–94

    Article  Google Scholar 

  17. 17.

    Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li HZ (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28:2106–2113

    Article  Google Scholar 

  18. 18.

    Mccoy CO, Matsen FA (2013) Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth. Peerj 1:e157

    Article  Google Scholar 

  19. 19.

    La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, Sodergren E, Weinstock G, Shannon WD (2012) Hypothesis testing and power calculations for taxonomic-based human microbiome data. Plos One 7:e52078

    Article  Google Scholar 

  20. 20.

    McMurdie PJ, Holmes S (2014) Waste not, want not: why rarefying microbiome data is inadmissible. Plos Comput Biol 10:e1003531

    Article  Google Scholar 

  21. 21.

    Lin W, Shi PX, Feng R, Li HZ (2014) Variable selection in regression with compositional covariates. Biometrika 101:785–797

    MathSciNet  Article  Google Scholar 

  22. 22.

    Randolph T, Zhao S, Copeland W, Hullar M, Shojaie A (2018) Kernel-penalized regression for analysis of microbiome data. Ann Appl Stat 12(1):540–566

    MathSciNet  Article  Google Scholar 

  23. 23.

    Shi P, Zhang A, Li HZ (2016) Regression analysis for microbiome compositional data. Ann Appl Stat 10(2):1019–1040

    MathSciNet  Article  Google Scholar 

  24. 24.

    Tang ZZ, Chen GH, Li HZ (2017) A general framework for association analysis of microbial community on a taxonomic tree. Bioinformatics 33:1278–1285

    Google Scholar 

  25. 25.

    Chen J, Li HZ (2013) Variable selection for sparse dirichlet-multinomial regression with an application to microbiome data analysis. Ann Appl Stat 7:418–442

    MathSciNet  Article  Google Scholar 

  26. 26.

    Xia F, Chen J, Fung WK, Li HZ (2013) A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics 69:1053–1063

    MathSciNet  Article  Google Scholar 

  27. 27.

    Peng X, Li G, Liu Z (2016) Zero-inflated beta regression for differential abundance analysis with metagenomics data. J Comput Biol 23(2):102–110

    Article  Google Scholar 

  28. 28.

    Chen EZ, Li HZ (2016) A two-part mixed-effect model for analyzing longitudinal microbiome compositional data. Bioinformatics 32(17):2611–2617

    Article  Google Scholar 

  29. 29.

    Liang KY, Zeger SL (1986) Longitudinal data-analysis using generalized linear-models. Biometrika 73:13–22

    MathSciNet  Article  Google Scholar 

  30. 30.

    Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Fan J, Li R (2001) Variable selection via nonconvace penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  32. 32.

    Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    MathSciNet  Article  Google Scholar 

  33. 33.

    Farzan SF, Korrick S, Li Z, Enelow R, Gandolfi AJ, Madan J, Nadeau K, Karagas MR (2013) In utero arsenic exposure and infant infection in a United States cohort: a prospective study. Environ Res 126:24–30

    Article  Google Scholar 

  34. 34.

    Aitchison J (2003) The statistical analysis of compositional data. Blackburn Press, Caldwell

    MATH  Google Scholar 

  35. 35.

    Aitchison J (1982) The statistical-analysis of compositional data. J R Stat Soc Ser B 44:139–177

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Teugels JL (1990) Some representations of the multivariate bernoulli and binomial distributions. J Multivar Anal 32:256–268

    MathSciNet  Article  Google Scholar 

  37. 37.

    Dai B, Ding SL, Wahba G (2013) Multivariate Bernoulli distribution. Bernoulli 19:1465–1483

    MathSciNet  Article  Google Scholar 

  38. 38.

    Billheimer D, Guttorp P, Fagan WF (2001) Statistical interpretation of species composition. J Am Stat Assoc 96:1205–1214

    MathSciNet  Article  Google Scholar 

  39. 39.

    Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94:19–35

    MathSciNet  Article  Google Scholar 

  40. 40.

    Godambe VP (1991) Estimating functions. Oxford University Press, New York

    MATH  Google Scholar 

  41. 41.

    Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    MathSciNet  Article  Google Scholar 

  42. 42.

    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67:301–320

    MathSciNet  Article  Google Scholar 

  43. 43.

    Zhao S, Shojaie A (2016) A significance test for graph-constrained estimation. Biometrics 72:484–493

    MathSciNet  Article  Google Scholar 

  44. 44.

    Farzan SF, Korrick S, Li ZG, Enelow R, Gandolfi AJ, Madan J, Nadeau K, Karagas MR (2013) In utero arsenic exposure and infant infection in a United States cohort: a prospective study. Environ Res 126:24–30

    Article  Google Scholar 

  45. 45.

    Madan JC, Hoen AG, Lundgren SN, Farzan SF, Cottingham KL, Morrison HG, Sogin ML, Li H, Moore JH, Karagas MR (2016) Association of cesarean delivery and formula supplementation with the intestinal microbiome of 6-week-old infants. JAMA Pediatr 170:212–219

    Article  Google Scholar 

  46. 46.

    Degnan PH, Ochman H (2012) Illumina-based analysis of microbial community diversity. ISME J 6:183–194

    Article  Google Scholar 

  47. 47.

    Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, Gormley N, Gilbert JA, Smith G, Knight R (2012) Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6:1621–1624

    Article  Google Scholar 

  48. 48.

    Turroni F, Ribbera A, Foroni E, van Sinderen D, Ventura M (2008) Human gut microbiota and bifidobacteria: from composition to functionality. Antonie Van Leeuwenhoek 94:35–50

    Article  Google Scholar 

  49. 49.

    Parracho HM, Bingham MO, Gibson GR, McCartney AL (2005) Differences between the gut microflora of children with autistic spectrum disorders and that of healthy children. J Med Microbiol 54:987–991

    Article  Google Scholar 

  50. 50.

    Russell SL, Gold MJ, Hartmann M, Willing BP, Thorson L, Wlodarska M, Gill N, Blanchet MR, Mohn WW, McNagny KM, Finlay BB (2012) Early life antibiotic-driven changes in microbiota enhance susceptibility to allergic asthma. EMBO Rep 13:440–447

    Article  Google Scholar 

  51. 51.

    Bisgaard H, Li N, Bonnelykke K, Chawes BL, Skov T, Paludan-Muller G, Stokholm J, Smith B, Krogfelt KA (2011) Reduced diversity of the intestinal microbiota during infancy is associated with increased risk of allergic disease at school age. J Allergy Clin Immunol 128(646–652):e641–e645

    Google Scholar 

  52. 52.

    Cong X, Xu W, Romisher R, Poveda S, Forte S, Starkweather A, Henderson WA (2016) Gut microbiome and infant health: brain-gut-microbiota axis and host genetic factors. Yale J Biol Med 89:299–308

    Google Scholar 

  53. 53.

    Kinross JM, Darzi AW, Nicholson JK (2011) Gut microbiome-host interactions in health and disease. Genome Med 3:14

    Article  Google Scholar 

  54. 54.

    Mueller NT, Bakacs E, Combellick J, Grigoryan Z, Dominguez-Bello MG (2015) The infant microbiome development: mom matters. Trends Mol Med 21:109–117

    Article  Google Scholar 

  55. 55.

    Munyaka PM, Khafipour E, Ghia JE (2014) External influence of early childhood establishment of gut microbiota and subsequent health implications. Front Pediatr 2:109

    Article  Google Scholar 

  56. 56.

    Vangay P, Ward T, Gerber JS, Knights D (2015) Antibiotics, pediatric dysbiosis, and disease. Cell Host Microbe 17:553–564

    Article  Google Scholar 

  57. 57.

    Sjogren YM, Tomicic S, Lundberg A, Bottcher MF, Bjorksten B, Sverremark-Ekstrom E, Jenmalm MC (2009) Influence of early gut microbiota on the maturation of childhood mucosal and systemic immune responses. Clin Exp Allergy 39:1842–1851

    Article  Google Scholar 

  58. 58.

    Rutayisire E, Huang K, Liu Y, Tao F (2016) The mode of delivery affects the diversity and colonization pattern of the gut microbiota during the first year of infants’ life: a systematic review. BMC Gastroenterol 16:86

    Article  Google Scholar 

  59. 59.

    Penders J, Thijs C, Vink C, Stelma FF, Snijders B, Kummeling I, van den Brandt PA, Stobberingh EE (2006) Factors influencing the composition of the intestinal microbiota in early infancy. Pediatrics 118:511–521

    Article  Google Scholar 

  60. 60.

    Mazmanian SK, Round JL, Kasper DL (2008) A microbial symbiosis factor prevents intestinal inflammatory disease. Nature 453:620–625

    Article  Google Scholar 

  61. 61.

    Corvaglia L, Tonti G, Martini S, Aceti A, Mazzola G, Aloisio I, Di Gioia D, Faldella G (2016) Influence of intrapartum antibiotic prophylaxis for group b streptococcus on gut microbiota in the first month of life. J Pediatr Gastroenterol Nutr 62:304–308

    Article  Google Scholar 

  62. 62.

    Bjorksten B, Sepp E, Julge K, Voor T, Mikelsaar M (2001) Allergy development and the intestinal microflora during the first year of life. J Allergy Clin Immunol 108:516–520

    Article  Google Scholar 

  63. 63.

    Azad MB, Konya T, Persaud RR, Guttman DS, Chari RS, Field CJ, Sears MR, Mandhane PJ, Turvey SE, Subbarao P, Becker AB, Scott JA, Kozyrskyj AL (2016) Impact of maternal intrapartum antibiotics, method of birth and breastfeeding on gut microbiota during the first year of life: a prospective cohort study. BJOG 123:983–993

    Article  Google Scholar 

  64. 64.

    Hoen AG, Li J, Moulton LA, O’Toole GA, Housman ML, Koestler DC, Guill MF, Moore JH, Hibberd PL, Morrison HG, Sogin ML, Karagas MR, Madan JC (2015) Associations between gut microbial colonization in early life and respiratory outcomes in cystic fibrosis. J Pediatr 167:138–147

    Article  Google Scholar 

Download references

Funding

This work was supported by NIH grants: R01GM123014, R01GM123056, P01ES022832, UG30D023275, R01CA127334, P20GM104416, K01LM011985 and R01LM012723 and EPA grant RD-83544201.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhigang Li.

Ethics declarations

Conflict of Interest

None declared.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 483 kb)

Appendix

Appendix

Comparison of selected genera under two randomly selected reference genera: Akkermansia and Anoxybacillus. Results for Akkermansia being the reference genus is also presented in Sect. 4.

See Table 4.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Lee, K., Karagas, M.R. et al. Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data. Stat Biosci 10, 587–608 (2018). https://doi.org/10.1007/s12561-018-9219-2

Download citation

Keywords

  • Microbiome data analysis
  • High dimension
  • Zero-inflated
  • Multivariate logistic normal
  • Relative abundance
  • Estimating equation