ENIGMA: an enterotypelike unigram mixture model for microbial association analysis
Abstract
Background
One of the major challenges in microbial studies is detecting associations between microbial communities and a specific disease. A specialized feature of microbiome count data is that intestinal bacterial communities form clusters called as “enterotype”, which are characterized by differences in specific bacterial taxa, making it difficult to analyze these data under health and disease conditions. Traditional probabilistic modeling cannot distinguish between the bacterial differences derived from enterotype and those related to a specific disease.
Results
We propose a new probabilistic model, named as ENIGMA (Enterotypelike uNIGram mixture model for Microbial Association analysis), which can be used to address these problems. ENIGMA enabled simultaneous estimation of enterotypelike clusters characterized by the abundances of signature bacterial genera and the parameters of environmental effects associated with the disease.
Conclusion
In the simulation study, we evaluated the accuracy of parameter estimation. Furthermore, by analyzing the realworld data, we detected the bacteria related to Parkinson’s disease. ENIGMA is implemented in R and is available from GitHub (https://github.com/abikoushi/enigma).
Keywords
Enterotype Topic model Unigram mixture Bayesian inference MetagenomicsAbbreviations
 CH
CalinskiHarabasz
 CO
Control
 CP
Coverage probability
 JSD
JensenShannon divergence
 MAP
Maximum a posteriori
 OTU
Operational taxonomic units
 PD
Parkinson’s disease
 RMSE
Root mean squared error
 SE
Standard error
Background
More than 100 trillion microbes live on and within human beings and form of complex microbial communities (microbiota). Most microbes cannot be cultured in laboratories, making it difficult to understand how individual microorganisms mediate vital microbiomehost interactions under health and disease conditions. However, recent important advances in highthroughput sequencing technology have enabled observation of the composition of these intestinal microbes. For each sample drawn from an ecosystem, the number of occurrences of each operational taxonomic units (OTUs) is measured and the resulting OTU abundance can be summarized at any level of the bacterial phylogeny. Discovering recurrent microbial compositional patterns that are related to a specific disease is a significant challenge, as individuals with the same disease typically harbor different microbial community structures.
Recent largescale sequencing surveys of the human intestinal microbiome, such as the US NIH Human Microbiome Project (HMP) and the European Metagenomics of the Human Intestinal Tract project (MetaHIT), have revealed considerable variations in microbiota composition among individuals [1, 2]. Particularly, community clusters characterized by differences in the abundance of signature taxa, referred to as enterotypes, were first reported in humans [3]. Later, other studies identified enterotypelike clusters that may reflect features of the hostmicrobial physiology and homeostasis in different species [4, 5] or at different human body sites [6, 7, 8, 9]. This microbial stratification has motivated the development of methods for examining unknown clusters of microbial communities.
Probabilistic modeling of microbial metagenomics data often provides a powerful framework for characterizing the microbial community structures [10, 11, 12]. For example, Knights et al. [10] applied a Dirichlet prior to a singlelevel hierarchy and proposed a Bayesian approach for estimating the proportion of microbial communities. Holmes et al. [11] extended the Dirichlet prior to Dirichlet multinomial mixtures to facilitate clustering of microbiome samples. Shafiei et al. [12] proposed a hierarchical model for Bayesian inference of microbial communities (BioMiCo) to identify clusters of OTUs related to environmental factors of interest.
However, such models are not suitable for discovering enterotypelike clusters of microbial communities and associations between microbes and a specific disease for the following two reasons. First, the frameworks of Knights et al. [10] and Holmes et al. [11] do not explicitly address the association between the microbial compositional patterns and environmental depend on the interest. Second, the framework of Shafiei et al. [12] models the structure of each sample using a hierarchical mixture of multinomial distributions that are depends on the factors of interest. Individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes [3]. Thus, such enterotypelike clusters that describes interindividual variability among humans do not always to directly affect host probabilities such as diseases ranging from localized gastroenterologic disorders to neurologic, respiratory, metabolic hepatic, and cardiovascular illnesses.
 1.
ENIGMA uses OTU abundances as input and models each sample by the underlying unigram mixture whose parameters are represented by unknown group effects and known effects of interest. The group effects are represented by baseline parameters that change with a latent group of microbial communities. One of the most important features of our model is that the group effects are independent of the effects of interest. This enables the separation of interindividual variability and fixed effects of the host properties related to disease risk.
 2.
ENIGMA is regarded as Bayesian learning for detecting associations between a community structure and factors of interest. Our model can be used to simultaneously learn how enterotypelike clusters of OTUs contribute to the microbial structure and how microbial compositional patterns may be related to known features of the sample.
 3.
We provide an efficient learning procedure for ENIGMA by using a Laplace approximation to integrate latent variables and estimate the evidence of the complete model and credible intervals of the parameters. The software package that implements ENIGMA in the R environment is available from https://github.com/abikoushi/enigma.
We describe our proposed framework and algorithm in the “Methods” section. We evaluate the performance of ENIGMA using simulated data in terms of its accuracy to estimate parameters and identify clusters in the “Simulation study” section. We apply ENIGMA to clinical metagenomics data and demonstrate how ENIGMA simultaneously identifies enterotypelike clusters and gut microbiota related to Parkinson’s disease (PD) in the “Results on real data” section.
Methods
Suppose that we observe microbiome count data of K taxa for N samples with M individual host properties, (y_{nk},x_{nm}) (n=1,…,n;k=1,…,K;m=1,…,M) where \(y_{nk}\in \mathbb {N}\) represents the abundance of the kth taxa in the nth sample and x_{nm} represents a binary variable such that x_{nm}=1 if the nth sample has the mth host property and is otherwise x_{nm}=0. Here the word taxa can represent any level of the bacterial phylogeny, e.g., species, genes, family, order, etc.
Model
where γ_{l} is baseline parameter (Kdimensional vector) that changes with the latent class, M×K matrix \(\boldsymbol {B}=\left (\beta _{mk}\right)\) is effect of a environmental factor common to all enterotypelike clusters, β_{m} is a mth rowvector of B, \(\boldsymbol {\pi }=\left (\pi _{1},\ldots,\pi _{L}\right)\) is a mixing ratio of components, O_{K} is a Kdimensional zero matrix and I_{K} is Kdimensional identity matrix. Here, the softmax function is defined by \(\text {softmax}(\boldsymbol {x})=\frac {\exp (\boldsymbol {x})}{\sum _{k=1}^{K}{\exp (x_{k})}}\) for a vector x=(x_{1},…,x_{K})^{⊤} using an elementwise exponential function and the probability function of categorical distribution is parameterized as \(\Pr (z=l \boldsymbol {\pi }) = \pi _{l}\), l∈{1,…,L}. In a Bayesian approach, the prior distributions for π, β, and γ_{l} must be defiend. We set a prior based on the Dirichlet distribution for π, and flat priors to the hyperparameters σ and τ for β and γ, respectively. For the convenience of later section, let pl′=softmax(γ_{l}) be the probabilities of the occurrence of bacteria in the latent classes l.
Parameter estimation
Let \(\boldsymbol {\hat \theta }\) be the MAP estimator of θ, found by maximizing \(\log p(\boldsymbol {\theta },\boldsymbol {Y},\boldsymbol {X})\).
where C is a normalizing constant. This relationship shows that p(θY,X) can be approximated by the normal distribution \(N\left (\boldsymbol {\hat \theta }, H^{1}\left (\boldsymbol {\hat \theta }\right)\right)\). Credible intervals can be calculated from this multivariate normal distribution.
We used the stochastic programming language Stan (http://mcstan.org/) for its implementation. The MAP estimators were obtained by the LBFGS method. Credible intervals were computed from the using a Stan function to compute the Hessian at the MAP estimates.
This is the probability that the nth sample belongs to cluster l. Next, the nth sample is then classified into the lth cluster that maximizes the conditional probability given by Eq. 5.
Model Selection
where D is the number of free parameters. In model comparison, we choose the model showing larger log marginal likelihood.
Simulation study

Coverage probability (CP): The coverage probability is the proportion of the time over which the interval contains the true value. A discrepancy between the coverage probability and the nominal coverage probability frequently occurs. When the actual coverage is greater than the nominal coverage, the interval is referred to as conservative. If the interval is conservative, there is no inconsistency in interpretation.

Bias: The bias of B is defined by the difference between true value and estimated value \(E[\hat {\boldsymbol {B}}]\boldsymbol {B}\).

Standard error (SE): The standard error is the standard deviation from the estimate. A smaller standard error indicates the higher accuracy of estimation.

Root mean squared error (RMSE): The RMSE is defined by
\(\sqrt {E[\left (\boldsymbol {\hat {B}}  \mathbf {B}\right)^{2}]}\). A smaller RMSE indicates the higher accuracy of the estimation.

Accuracy: The accuracy is the percentage of samples correctly classified into original group.
To calcurate these metrics, we detrmined that we calculated the sample means and standard deviations of \(\hat {\boldsymbol {B}}\) and \(\left (\hat {\boldsymbol {B}}\boldsymbol {B}\right)^{2}\) from the 10,000 synthetic datasets.
Coverage probability (CP), bias, standard error (SE), and RMSE of \(\boldsymbol {\hat {B}}\)
β  CP  Bias  SE  RMSE  β  CP  Bias  SE  RMSE 

3.40  0.97  0.08  0.15  0.17  0.04  1.00  0.01  0.05  0.05 
2.65  0.97  0.06  0.15  0.16  0.04  1.00  0.01  0.05  0.05 
2.34  0.99  0.04  0.12  0.13  0.01  1.00  0.01  0.05  0.05 
2.32  0.99  0.03  0.12  0.12  0.01  1.00  0.01  0.04  0.04 
1.83  0.98  0.03  0.14  0.15  0.02  1.00  0.01  0.06  0.06 
1.59  0.99  0.02  0.13  0.13  0.02  1.00  0.01  0.04  0.05 
1.58  0.99  0.03  0.13  0.13  0.03  1.00  0.01  0.04  0.04 
1.51  0.99  0.02  0.14  0.14  0.10  1.00  0.00  0.08  0.08 
1.51  0.99  0.02  0.13  0.13  0.13  1.00  0.01  0.03  0.03 
1.29  0.99  0.02  0.11  0.11  0.14  1.00  0.01  0.03  0.03 
1.14  0.99  0.01  0.11  0.11  0.21  1.00  0.01  0.06  0.06 
0.95  1.00  0.01  0.09  0.09  0.23  1.00  0.00  0.08  0.08 
0.95  0.99  0.01  0.12  0.12  0.29  1.00  0.01  0.04  0.04 
0.92  1.00  0.01  0.09  0.09  0.31  1.00  0.01  0.05  0.05 
0.88  0.99  0.01  0.12  0.12  0.32  1.00  0.00  0.08  0.08 
0.84  1.00  0.01  0.05  0.05  0.33  1.00  0.01  0.04  0.04 
0.82  1.00  0.01  0.08  0.08  0.44  0.99  0.02  0.10  0.10 
0.78  0.99  0.01  0.13  0.13  0.46  1.00  0.01  0.05  0.05 
0.78  1.00  0.01  0.07  0.07  0.50  1.00  0.01  0.08  0.08 
0.76  1.00  0.01  0.08  0.08  0.53  1.00  0.00  0.06  0.06 
0.72  0.99  0.00  0.12  0.12  0.54  1.00  0.00  0.08  0.08 
0.68  1.00  0.01  0.10  0.10  0.55  1.00  0.01  0.04  0.04 
0.65  0.99  0.01  0.11  0.11  0.55  1.00  0.01  0.03  0.03 
0.65  0.99  0.01  0.11  0.11  0.56  1.00  0.01  0.05  0.05 
0.65  1.00  0.01  0.06  0.06  0.76  1.00  0.00  0.07  0.07 
0.61  1.00  0.01  0.06  0.06  0.79  1.00  0.00  0.06  0.06 
0.58  1.00  0.01  0.06  0.06  0.84  1.00  0.00  0.05  0.05 
0.58  1.00  0.01  0.07  0.07  0.90  1.00  0.01  0.04  0.04 
0.56  1.00  0.01  0.05  0.05  0.93  1.00  0.00  0.05  0.05 
0.52  1.00  0.01  0.06  0.06  0.96  1.00  0.01  0.08  0.08 
0.52  1.00  0.01  0.07  0.07  0.98  1.00  0.01  0.04  0.04 
0.51  1.00  0.01  0.04  0.05  1.01  1.00  0.01  0.08  0.08 
0.50  1.00  0.01  0.05  0.05  1.08  1.00  0.00  0.05  0.06 
0.50  1.00  0.01  0.04  0.04  1.10  1.00  0.00  0.05  0.05 
0.49  0.99  0.00  0.11  0.11  1.13  1.00  0.01  0.04  0.04 
0.47  1.00  0.01  0.05  0.05  1.14  1.00  0.01  0.04  0.04 
0.45  1.00  0.01  0.09  0.09  1.16  1.00  0.01  0.07  0.07 
0.42  0.99  0.01  0.13  0.13  1.22  1.00  0.01  0.04  0.04 
0.33  1.00  0.01  0.07  0.07  1.23  1.00  0.02  0.09  0.09 
0.28  1.00  0.00  0.09  0.09  1.43  1.00  0.00  0.04  0.04 
0.27  1.00  0.01  0.07  0.07  1.45  1.00  0.01  0.04  0.04 
0.23  1.00  0.00  0.09  0.09  1.47  1.00  0.00  0.04  0.04 
0.21  1.00  0.01  0.07  0.07  1.55  1.00  0.01  0.07  0.08 
0.18  1.00  0.00  0.10  0.10  1.60  1.00  0.01  0.03  0.03 
0.15  0.99  0.01  0.11  0.11  1.61  1.00  0.00  0.05  0.05 
0.11  1.00  0.01  0.06  0.06  1.89  1.00  0.01  0.03  0.03 
0.09  1.00  0.00  0.09  0.09  1.91  1.00  0.01  0.03  0.03 
0.05  1.00  0.01  0.04  0.04  1.95  1.00  0.01  0.02  0.02 
0.05  1.00  0.01  0.04  0.04  2.25  1.00  0.00  0.04  0.04 
Results on real data
Arumugam et al. (2011)’s data
We demonstrated that the enterotypelike cluster can be estimated using the data of Arumugam et al. [3]. This data is N=33, K=55. The data of Arumugam et al. [3] does not disclose the total read count. Thus, We used the relative abundance multiplied by 10,000 as y_{nk}. Based on the result of Arumugam et al. [3], the number of latent classes in ENIGMA was chosen to be L=3. We estimated the parameters using the ENIGMA and setting all β_{mk}=0 in Eq. 1. We set the hyperparameters of Dirichlet prior α=(1,…,1)^{⊤}, which is equivalent to a noninformative prior.
Parkinson’s disease data
Data summary
PD  CO  

Finland  74  74 
German  55  64 
USA  207  139 
Crosstabulation of gender and cluster
Class  1  2  3 

Female  22  31  21 
Male  21  27  26 
Comparison marginal likelihood
Finland  Germany  USA  

\(\mathcal {M}_{0}\)  442734.62  5913441.14  3010279.35 
\(\mathcal {M}_{1}\)  355079.50  3807297.76  2063932.02 
Bacteria significantly associated with PD in more than two countries
Finland  Germany  USA  

Family  \(\hat {\beta }\)  Lower bound  Upper bound  \(\hat {\beta }\)  Lower bound  Upper bound  \(\hat {\beta }\)  Lower bound  Upper bound 
Anaeroplasmataceae  0.87  1.28  0.45  1.69  2.03  1.35       
Bacteroidales S247 group  0.52  0.93  0.11  0.22  0.12  0.56  0.80  1.16  0.44 
Bradyrhizobiaceae        0.82  1.17  0.47  1.44  2.21  0.66 
Brevibacteriaceae        1.02  1.38  0.66  0.65  1.05  0.25 
Brucellaceae        1.69  2.50  0.87  1.34  1.75  0.92 
Clostridiaceae 1  0.54  0.96  0.13  0.08  0.42  0.26  0.52  0.88  0.16 
Comamonadaceae  0.85  1.35  0.35  1.27  1.61  0.93  0.21  0.57  0.15 
Elusimicrobiaceae  4.17  5.60  2.74  2.11  2.54  1.68  2.52  1.03  4.01 
Intrasporangiaceae        3.47  4.86  2.07  3.00  4.72  1.28 
Leuconostocaceae  2.66  4.30  1.02  0.50  0.13  0.86  1.74  2.22  1.25 
Moraxellaceae        1.58  1.92  1.24  0.92  1.28  0.56 
Pasteurellaceae  1.62  2.07  1.17  0.30  0.04  0.64  1.88  2.25  1.51 
Prevotellaceae  2.46  2.87  2.05  0.03  0.37  0.30  0.53  0.89  0.17 
Rhodocyclaceae        3.53  4.93  2.13  0.75  1.18  0.32 
Actinomycetaceae  0.11  0.78  1.01  0.42  0.07  0.78  0.91  0.54  1.28 
Bacillaceae  1.72  0.34  3.11  2.35  2.72  1.99  0.80  0.43  1.17 
Bdellovibrionaceae        1.43  0.40  2.46  3.07  1.78  4.36 
Bifidobacteriaceae  1.34  0.82  1.86  0.54  0.20  0.88  0.01  0.35  0.37 
Campylobacteraceae  0.36  0.31  1.03  4.90  4.48  5.33  0.83  0.46  1.21 
Cytophagaceae        2.45  1.56  3.34  1.70  0.27  3.13 
Enterococcaceae  3.87  2.70  5.05  0.74  0.40  1.08  0.09  0.28  0.45 
Lactobacillaceae  3.00  2.56  3.43  0.51  0.85  0.18  1.73  1.36  2.09 
Leptotrichiaceae  0.90  1.89  0.09  2.57  1.88  3.26  0.82  0.36  1.27 
Methanobacteriaceae        0.93  0.59  1.27  0.67  0.30  1.04 
Mitochondria  0.60  1.27  2.46  0.73  0.11  1.36  1.57  0.95  2.20 
Paenibacillaceae        2.19  1.28  3.10  1.71  1.30  2.12 
Planococcaceae        1.06  0.72  1.41  3.26  2.67  3.85 
Rhizobiaceae        0.64  0.24  1.03  1.52  1.08  1.95 
Streptococcaceae  0.44  0.03  0.86  0.84  0.50  1.17  0.26  0.10  0.62 
Succinivibrionaceae  0.32  0.76  0.11  0.74  0.40  1.08  4.31  3.76  4.86 
Synergistaceae  1.26  0.80  1.71  0.25  0.10  0.61  1.44  1.06  1.82 
Verrucomicrobiaceae  1.71  1.23  2.19  1.62  1.29  1.96  0.06  0.42  0.30 
Victivallaceae  0.42  0.00  0.85  0.68  0.34  1.02  0.93  0.54  1.32 
pValue of Wilcoxon test
Finland  Germany  USA  

Lachnospiraceae  0.009371  0.719014  0.002839 
Lactobacillaceae  0.030404  0.077771  0.000002 
Pasteurellaceae  0.006493  0.495315  0.004232 
Prevotellaceae  0.001303  0.030892  0.194592 
The analyses using realworld data thus show that ENIGMA can identify enterotypelike clusters and the associations between the gut microbiota and PD. Some of the results were strongly supported by those of previous studies.
Conclusion
We proposed a novel hierarchical Bayesian model, ENIGMA, for discovering the underlying microbial community structures and associations between microbiota and their environmental factors from microbial metagenome data. ENIGMA is based on a probabilistic model of a microbial community structures and supplied with labels for one or more environmental factors of interest for each sample. The structures of each sample are modeled by a multinomial distribution whose parameters are represented independently by group and environmental effects of each sample, which prevents mixing of individual differences and the effects of interest. This framework enables the model to simultaneously learn (i) how microbes contribute to an underlying community structures (cluster) and (ii) how microbial compositional patterns are explained by environmental factors of interest. The effectiveness of ENIGMA was evaluated through experiments involving both synthetic and readworld datasets. These newly discovered clusters and associations estimated using ENIGMA can provide insight into the the mechanisms of a microbial communities.
The major limitation of ENIGMA is its scalability and efficiency, as the number of the parameters in the model increase proportionally with the number of taxa when the number of environmental factors of interest is large. Further studies should focus on developing a scalable probabilistic model of microbial compositions to analyze underlying microbial structures with a large number of these effects by using sparse parameter estimation [19]. We are also interested in developing a dynamic probabilistic model similar to that reported by Blei and Lafferty [20] for analyzing timevarying bacteria compositions during disease progression.
Notes
Acknowledgements
Not applicable.
Funding
This work was supported by GrantsinAid from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT); Ministry of Health, Labour and Welfare of Japan (MHLW); Japan Agency for Medical Research and Development (AMED), and the Hori Sciences and Arts Foundation. Publication of this article was sponsored by AMED CREST JP18gm1010002.
Availability of materials
ENIGMA is implemented with R and is available from GitHub (https://github.com/abikoushi/enigma).
About this supplement
This article has been published as part of BMC Genomics Volume 20 Supplement 2, 2019: Selected articles from the 17th Asia Pacific Bioinformatics Conference (APBC 2019): genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume20supplement2.
Authors’ contributions
KA and TS designed the proposed algorithm. KO and MH designed the experiments. All authors have read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al. Human Mircrobiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012; 486:207–14.CrossRefGoogle Scholar
 2.Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G, et al. Richness of human gut microbiome correlates with metabolic markers. Nature. 2013; 500:541–6.CrossRefGoogle Scholar
 3.Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011; 473(7346):174.CrossRefGoogle Scholar
 4.Moeller AH, Degnan PH, Pusey AE, Wilson ML, Hahn BH, Ochman H. Chimpanzees and humans harbour compositionally similar gut enterotypes. Nat Commun; 3:1179.Google Scholar
 5.Hildebrand F, Nguyen TL, Brinkman B, Yunta RG, Cauwe B, Vandenabeele P, et al. Inflammationassociated enterotypes, host genotype, cage and interindividual effects drive gut microbiota variation in common laboratory mice. Genome Biol. 2013; 14(1):R4.CrossRefGoogle Scholar
 6.Ravel J. Vaginal microbiome of reproductiveage women. Proc Natl Acad Sci USA. 2011; 108.Supplement 1:4680–7.CrossRefGoogle Scholar
 7.Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Huttenhower C, et al. A guide to enterotypes across the human body: metaanalysis of microbial community structures in human microbiome datasets. PLoS Comput Biol. 2013; 9(1):e1002863.CrossRefGoogle Scholar
 8.Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014; 509:357–60.CrossRefGoogle Scholar
 9.Zhou Y, Mihindukulasuriya KA, Gao H, La Rose PS, Wylie KM, Martin JC, et al. Exploration of bacterial community classes in major human habitats. Genome Biol. 2014; 15:R66.CrossRefGoogle Scholar
 10.Knights D1, Costello EK, Knight R. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011; 35(2):343–59.CrossRefGoogle Scholar
 11.Holmws I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PloS ONE. 2012; 7(2):e30126.CrossRefGoogle Scholar
 12.Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, et al. BioMiCo: a supervised Bayesian model for inference of microbial community structure. Microbiome. 2015; 3(1):8.CrossRefGoogle Scholar
 13.Bishop C. Pattern recognition and machine learning. New York: SpringerVerlag; 2006.Google Scholar
 14.Scheperjans F, Aho V, Pereira PA, Koskinen K, Paulin L, Pekkonen E, et al. Gut microbiota are related to Parkinson’s disease and clinical phenotype. Mov Disord. 2015; 30(3):350–8.CrossRefGoogle Scholar
 15.HillBurns EM, Debelius JW, Morton JT, Wissemann WT, Lewis MR, Wallen ZD, et al. Parkinson’s disease and Parkinson’s disease medications have distinct signatures of the gut microbiome. Mov Disord. 2017; 32(5):739–49.CrossRefGoogle Scholar
 16.HeintzBuschart A, Pandey U, Wicke T, SixelDöring F, Janzen A, SittigWiegand E, et al. The nasal and gut microbiome in Parkinson’s disease and idiopathic rapid eye movement sleep behavior disorder. Mov Disord. 2018; 33(1):88–98.CrossRefGoogle Scholar
 17.Hopfner F, Künstner A, Müller SH, Künzel S, Zeuner KE, Margraf NG, et al. Gut microbiota in Parkinson disease in a northern German cohort. Brain Res; 1667:41–5.Google Scholar
 18.Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013; 31(9):814–21.CrossRefGoogle Scholar
 19.Yang Y, Chen N, Chen T. mLDM: a new hierarchical Bayesian statistical model for sparse microbioal association discovery. bioRxiv. 2016;:042630. https://doi.org/10.1101/042630.
 20.Blei DM, Lafferty JD. Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. New York: ACM: 2006. p. 113–20.Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.