Background

The complex microbial ecosystem of the gastrointestinal (GI) tract is important in human health and development [1]. One important step towards a systems-level understanding of the GI microbiota is characterization of biotic interactions. Different microbial taxa do not influence ecosystem processes equally, and determining the ecological roles of community members and the impact of specific taxa on overall biodiversity is of fundamental importance [2]. As we have previously presented [3], community time series analysis, an approach based on statistical analysis of changing population abundances within a single system over time, is needed in order to say with confidence that one population is affecting the dynamics of another. With a few exceptions [4, 5], there is still a relative paucity of longitudinal studies of appropriate sampling frequency and duration to allow for this approach, and until recently, it has been difficult to apply basic ecological theory to complex microbial communities.

Two of the most influential concepts in modern ecology are foundation and keystone species. As originally defined, a foundation species is ‘a single species that defines much of the structure of a community by creating locally stable conditions for other species, and by modulating and stabilizing fundamental ecosystem processes’ [6]. Further, foundation species are numerically dominant and form close biotic interactions with other community members [7]. Keystone species are similar to foundation species in that they are critical for maintaining the organization and diversity of their ecological communities through biotic interactions with other community members [8]. However, a keystone species is per definition of relatively low abundance, and thus represents a vulnerable point in an ecosystem, the removal of which has strong destabilizing effects, resulting in loss of biodiversity [9, 10]. More diverse communities have enhanced ecosystem function, stability and resistance to invasion [11], and identification of keystone and foundation taxa in the GI microbiota is important as they are potential entrance points for novel diagnostic strategies and therapeutic modulation.

A recently developed method for estimating biotic interactions in complex microbial communities, dubbed reverse ecology (RE) [12], uses genome sequences of species pairs in order to compute indices of metabolic overlap (i.e. exogenous compounds utilized by both species) and complementarity (i.e. compounds produced endogenously by one species which can be utilized by the other). These indices are taken as proxies for competition and facilitation, respectively. A major premise of RE is the concept of limiting similarity. This hypothesis states that competition is stronger between more closely related organisms, as proposed by Darwin in On the Origin of Species [13].

Here, we analyse published data from two longitudinal studies of the GI microbiota that used cutting-edge metagenomics methods [4, 5]. These studies are unique in that they are of long duration (185–443 days) and of sufficient temporal resolution for statistical time series analysis. We present comprehensive mapping of biotic interactions of the GI microbiota in four healthy adults, and we identify putative foundation and keystone taxa. We then compare these results to those obtained with an RE approach and evaluate the limiting similarity hypothesis in this context. Finally, we examine the data through the lens of enterotyping [14], demonstrating both temporal stability and instability within individuals.

Results

Mapping biotic interactions

Summary statistics of the data used for time series modelling are presented in Table 1. The analysis identified many genus level interactions (Fig. 1, Additional file 1: Figures S1–S7), with considerable variation among subjects (Table 2). Figures presented in the main text are of I1 while I2–4 are discussed in the text, and the corresponding figures are presented in supplementary information. The distributions of the number of data points used to compute the models for each subject are presented in Additional file 1: Figure S8. No relationship was observed between the number of data points used to compute a model and the chance of observing a significant relationship at the 99 % confidence level (Additional file 1: Table S1). When reducing the number of data points used to compute the models, the results were qualitatively very similar to the full models (Table 2), with correlations between regression coefficients at 0.9 or more when reducing the amount of data by 50 % (Additional file 1: Figure S9). However, for the same models, the number of observed significant interactions relative to the total number of potential interactions declined in a near linear fashion as the amount of data was reduced (Additional file 1: Figure S10). In this case, slopes among individuals were relatively consistent with the chance of discovering a significant interaction reduced by 0.6–0.9 % when reducing the data amount by 10 %.

Table 1 Summary statistics of data analysed in this study
Fig. 1
figure 1

Genus level bacterial interactions for I1. The heat map describes the strength and direction (β i,j in Eq. 1) of highly significant interactions. Dependent variables are along the y axis and independent variables along the x axis, i.e. if you follow the column of given genus upward from the x axis until you reach a coloured cell, that cell indicates the effect of the given genus (dependent) on the genus indicated on the y axis (independent). The colour key on the right-hand side indicates the sign and magnitude of interactions that were significant at the 99 % confidence level. Cells representing non-significant relationships are black. Genus names are coloured according to phylum provenance: black Actinobacteria, red Bacteroidetes, green Firmicutes, blue Proteobacteria, light blue Tenericutes, and pink Verrucomicrobia. Note that according to the NCBI taxonomy database, the family Erysipelotrichaceae and the genus Holdemania are classified as Firmicutes. Here, and throughout, we have remained consistent with the Greengenes classification of these taxa as Tenericutes

Table 2 Summary of time series modelling results

In line with general expectations [15], there was a predominance of negative interactions (Figs. 1 and 2; Table 2; Additional file 1: Figures S1–S7 and S13–S15). I1 and I3 had more positive interactions relative to I2 and I4 (Table 2). We determined the prevalence of pairwise interactions of varying signs in order to assess the frequencies of apparent cooperation (+/+), competition (−/−), exploitation (+/−), commensalism (+/0), with zero being a non-significant interaction and amensalism (−/0). Although there was considerable variability between individuals, pairwise interactions were dominated by competitive and amensal relationships (Fig. 3, Additional file 1: Figure S11). We did not observe a single instance of exploitation in any of the individuals. Overall, biotic interactions were more negative among members of a phylum than between members of different phyla when intra-genus interactions were included in the analysis (p < 0.001 for I1–I4). When intra-genus interactions were excluded, the test remained significant for I1, I3 and I4 (p < 0.001) but was only marginally significant for I2 (p = 0.06).

Fig. 2
figure 2

Ranked genus level degree of connectedness in I1. The bar charts show the total number of significant negative and positive genus level interactions. Dependent (acted upon) interactions are shown in a while independent (acting on) interactions are in b. Genus names are coloured according to phylum provenance: black Actinobacteria, red Bacteroidetes, green Firmicutes, blue Proteobacteria, light blue Tenericutes and pink Verrucomicrobia

Fig. 3
figure 3

Prevalence of observed pairwise interaction categories. The categories are indicated on the x axis: cooperation (+/+), competition (−/−), commensalism (+/0, with zero being a non-significant interaction) and amensalism (−/0). The y axis indicates the number of significant interactions in the specified categories (Additional file 1: Figure S11) relative to the total number of pairwise interactions identified in an individual (i.e. the number of pairwise interactions in which at least one taxon in a given pair interacts with the other)

Variability in ecological network connectedness

The biotic interactions described above were categorized as positive or negative (i.e. the βs in 1 were larger or smaller than zero) and whether they represent a taxa acting upon other taxa (independent variable, right hand term in Eq. 1) or taxa being acted upon (dependent variable, left hand term in Eq. 1). These categories are referred to as positive independent (PI), positive dependent (PD), negative independent (NI) and negative dependent (ND). The sum of all four categories thus represents the degree of connectedness of a taxon.

We observe a large variation in the degree of connectedness across the different genera (Fig. 2, Additional file 1: Figures S13–S15). We examined whether the ratio between positive and negative interactions in which a genus is involved is independent of the genus’ total degree of connectedness and found that more connected genera had disproportionately large numbers of positive interactions relative to less-connected genera in I1 and I3 (p < <0.001 for both, linear model), but not I2 and I4 (p = 0.845 and 0.974, respectively).

Bacteriodetes as a candidate foundation taxon

The degree of connectedness in an ecological network can be seen as a proxy for a taxon’s importance in structuring the dynamics of the ecosystem, i.e. its influentialness. Bacteroidetes is an abundant taxon (48.8 % mean relative abundance) that is highly connected to the other ecosystem members, in particular through positive interactions (Fig. 2, Additional file 1: Figures S13–S15). When we normalize the degree of connectedness to the number of genera in each phylum, Bacteroidetes emerge as the main contributors of positive interactions (Fig. 4, Additional file 1: Figure S16). This observation is supported by linear models testing for differences in the mean number of positive interactions between the different phyla in three of the four individuals (I1, p = 0.025; I3, p < 0.001; I4, p < 0.001). For I2, even though Bacteroidetes had the highest mean number of positive interactions, the low total number of positive interactions identified in this individual limits the statistical power to detect significant differences. Thus, the Bacteroidetes fit the description of a foundation taxon sensu [6].

Fig. 4
figure 4

Total and mean connectedness of six main phyla in I1-4. ad show counts of positive and negative interactions significant at the 99 % confidence level for I1-4, respectively. eh show the mean connectedness of the phyla, i.e. the counts in ad divided by the number of observed genera within the respective phyla in each individual. Each bar combines interactions viewed both as dependent and independent. Dependent and independent interactions can be seen separately in Additional file 1: Figure S16

Actinobacteria as a candidate keystone taxon

Actinobacteria constitute on average 1.8 % of the GI microbiotas of the four individuals, comprising comparatively few genera. When the degree of connectedness is normalized to relative abundances, it becomes apparent that this group is very influential in the community even though it is relatively scarce (Fig. 5). This pattern is particularly striking in I1, I2 and I4. I3 has a much higher mean abundance of Actinobacteria (6.1 %), making it less apparent. The central role of the Actinobacteria in the biotic interaction networks, due to large number of both negative and positive interactions, suggests that it may constitute a keystone taxon. This contrasts with the Bacteroidetes that are characterized both by high relative abundances and many positive interactions.

Fig. 5
figure 5

Phylum level degree of connectedness divided by relative abundances. The bars show the number of positive dependent (PD), positive independent (PI), negative dependent (ND) and negative independent (NI) interactions observed within each of the six main phyla divided by the mean relative abundance of each phylum (horizontal bars below the panels). The ratios have been converted to percentages. ad show data for I1-4, respectively

Relative abundances of Actinobacteria are strongly linked to community beta diversity when viewed across individuals (R 2 = 0.99, p << 0.001; Fig. 6). Proteobacteria and Tenericutes occur at abundances comparable to those of Actinobacteria, but no such relationship was observed for these groups. Within individuals, we also observe that variation in diversity is linked with Actinobacterial abundances (p << 0.001 for all four individuals; Additional file 1: Figure S17). These results further strengthen the position of Actinobacteria as a keystone taxon.

Fig. 6
figure 6

Between-individual relationship between Actinobacteria relative abundance and total community diversity (Shannon index). On the log scale, the relationship is linear (R 2 = 0.99, p << 0.001). This relationship was not observed for Tenericutes or Proteobacteria

Concordance with interaction maps from a RE approach

The RE approach [12, 16], although fundamentally different from the time series analysis approach taken here, produces results that are qualitatively similar, i.e., they estimate biotic interactions that can be categorized in the same way as our results into PI, PD, NI and ND. Comparison finds concordance between our results, and RE results in that more negative than positive interactions were observed (Additional file 1: Figure S16 E, J) and that more competition was observed within than between groups (p << 0.001; Additional file 1: Figure S18). This follows the general expectation that stronger competition would stem from a higher degree of metabolic overlap, which is linked with phylogenetic relatedness.

We also used the complementary and competitive indices as proxies for ecological connectivity (higher value = more connected), summing the indices of species within each of the six main phyla for comparison with time series analysis results (Fig. 4, Additional file 1: Figure S16). The results from RE are similar to the results from time series analysis in that Firmicutes and Bacteroidetes are the most highly connected groups (Additional file 1: Figure S16A–E). However, in contrast to time series analysis results, the six phyla are roughly equal in terms of mean connectedness estimated from the RE approach (Additional file 1: Figure S16 J).

Comparison of co-occurrence modelling and time series analysis

Co-occurrence modelling, sometimes used in cross-sectional community studies for inference of biotic interactions, is based on the rationale that negatively and positively correlated occurrence patterns arise from negative and positive interactions, respectively [17]. Here, we analysed co-occurrence of taxa within each of the individuals and compared coefficients of contemporaneous correlation to regression coefficients from time series analysis. In each individual, we found a negative correlation (−0.68, −0.48, −0.70 and −025 for I1–I4, respectively; Spearman correlations) between co-occurrence and time series modelling results. A more detailed analysis (generalized additive models) showed these relationships to be highly significant and mostly non-linear (Additional file 1: Figure S12).

Using the lens of enterotyping to view longitudinal data

In order to see if enterotypes are stable over time, we looked at temporal patterns of relative abundances of Bacteriodes, Ruminococcus, and Prevotella as well as a species rich fourth group of Clostridiales belonging to the Lachnospiraceae family and the genus Blautia (Additional file 1: Figure S19). These are the main bacterial groups associated with distinct enterotypes in previous publications [14, 18]. For I2, I3 and I4, there was no strong evidence of enterotype clustering (average silhouette widths of 0.23, 0.18 and 0.29, respectively). In addition, for these individuals, there was generally poor agreement between silhouette widths and Calinski-Harabasz indices for determining optimal cluster numbers, further indicating a poor fit to the data. For I1, however, there was support for two distinct clusters (Fig. 7a, b). This is in agreement with previously reported results [19].

Fig. 7
figure 7

Enterotype switching in I1. a Average silhouette widths of different cluster numbers indicate that two clusters provide a reasonable fit to the data. b PCoA plot of samples from I1 showing two distinct clusters. c Relative abundance of Prevotella in samples corresponding to enterotype 1 (red) and 2 (green). d Relative abundance of Bacteroides in samples corresponding to enterotype 1 (red) and 2 (green)

Enterotype switching was related to the Bacteroides/Prevotella balance (Fig. 7c, d). Enterotype 1, characterized by high Bacteroides abundances, was predominant (278 days; 84 %) with type 2 observed for 54 days (16 %). There was some evidence of temporal clustering of enterotype 2 observations (p = 0.015, Wald-Wolfowitz test; Additional file 1: Figure S20), but sporadic switching at single time points occurred throughout the time series. Interestingly, in the case of I1, there appears to be a strong dynamic coupling between Bacteroides and Prevotella (Fig. 1, Additional file 1: Figure S1) with Bacteroides exerting a strong positive influence on Prevotella and a less pronounced yet still significant positive reciprocal interaction. This relationship was not observed in the three other individuals and may be related to the switching phenotype observed for I1.

Discussion

Evaluation of the limiting similarity hypothesis

As species of the same genus have usually, though by no means invariably, some similarity in habits and constitution, and always in structure, the struggle will generally be more severe between species of the same genus, when they come into competition with each other, than between species of different genera [13].

Darwin’s limiting similarity hypothesis has been hotly debated with evidence both in favour of [2022] and against [2325]. The results presented here, as well a data from RE modelling, suggest that Darwin might have been right in the case of the GI microbiota. Our results agree with the general assumption of RE that there is a positive relationship between strength of competition and niche overlap, as measured by metabolic similarity inferred from full genome sequences. We can see this from the elevated intensity of negative interactions within relative to between phyla (Fig. 1, Additional file 1: Figures S1–S7 and S18). From RE, it is evident that the degree of within phylum metabolic overlap is generally higher than between members of different phyla, and our results corroborate the assertion that this results in stronger competition.

Levy and Borenstein [16] used co-occurrence patterns obtained from cross-sectional data along with RE to discriminate between habitat filtering and species assortment as community assembly rules for the human GI microbiota, finding that strongly competing species tend to co-occur more often than expected. Our results are in general agreement with the conclusion that habitat filtering is the main assembly rule in the GI microbiota since we found significant negative relationships between co-occurrence and interaction coefficients estimated by time series analysis (Additional file 1: Figure S12), demonstrating that competing taxa also tend to co-occur within individuals.

Differences between time series analysis and RE

While our results are similar to RE in some respects, there are also some fundamental differences. RE assumes that all community members compete with each other since there is a degree of metabolic and genomic overlap in all bacteria. This leads to perfect connectedness between all community members, making it difficult to identify taxa that are particularly important in structuring the community. Our approach found that most genera only interact with a few others (Table 2). This could be the result of stratification within the gastrointestinal tract [26], sequestering sub-populations of bacteria into separate regions. RE also limits interactions to known metabolic pathways and will inevitably fail to identify indirect interactions. RE, while undoubtedly producing important insight, estimates the potential for competition and cross-feeding rather than actual biotic interactions, without considering the environment. It is well known that the nature of pairwise biotic interactions is highly dependent on the environmental context [27, 28].

RE may be expected to improve in the future as more genomes and metabolic pathways are characterized. The time series approach, however, being based on observational population data, does not require some of the strong assumptions of RE.

Ecological roles: foundation and keystone species

Identification of ecological roles of GI community members is an important step towards understanding how stable, diverse and healthy GI microbial ecosystems are maintained. Current approaches to GI microbiota modulation, usually probiotic or prebiotic, are not based on mechanistic concepts, and current evidence has been insufficient to substantiate health benefits. However, these approaches are attractive as agents of intervention against, and prevention of, a number of maladies. In 2013, the global market value for probiotics was estimated at US$32 billion, and this number is expected to grow to US$52 billion by 2020 [29].

The concepts of foundation and keystone species are used in conservation biology in order to identify and focus efforts on taxa that are particularly important in maintaining the structure, diversity and ultimately the function of ecological systems. Taxa fulfilling these criteria could also be considered prime targets for maintenance of intestinal health through manipulation of the GI microbiota. The GI microbiota has been described as a functionally redundant ecosystem [30] that may be less reliant on particular species, and the role of specific taxa as community stabilizers has hitherto not been thoroughly addressed. Here, we expand foundation and keystone species concepts to apply to higher taxa, analogous to guilds or functional groups, in order to better fit the community structures observed in the GI tract. Species level characterization of complex bacterial communities is very difficult. Indeed, the entire species concept for bacteria is subject to much debate [31]. By grouping bacteria that are phylogenetically and functionally closely related into more informative units, we can begin to assign general ecological roles.

In our analyses, the Bacteroidetes stood out as an abundant group that is highly connected in the ecological network. In particular, they were involved in many positive interactions, which are known to be important ecosystem stabilizers because of their potential to cascade through the community with major effects on the structure and function [7, 32, 33]. Bacteriodetes spp. can break down complex polysaccharides that would otherwise be inaccessible to most other gut-adapted bacteria [3336], in particular host-derived and plant glycans [37]. Thus, by making additional nutrients available to other community members, Bacteroidetes could act as a foundation taxon through facilitation. Indeed, a previous study found species of Bacteroides to be highly connected in the ecological network of the GI microbiota [38], and reduced Bacteroidetes abundances have been associated with obesity, indicating a fundamental role for these bacteria in maintaining a healthy GI microbiota [39].

The keystone species concept has previously been used in the context of the GI microbiota for describing bacteria that are instrumental to the biodegradation of resistant starch [40]. There is also evidence that certain Actinobacteria are particularly adept at degrading this class of carbohydrates [35, 41, 42], suggesting that they could play a similar role. In our analyses, the Actinobacteria fit the definition of a keystone taxon since they are characterized by an influentialness in the community that is disproportional to their low relative abundance, due to the high number of biotic interactions in which they are involved. High abundances of Actinobacteria are also clearly associated with increased bacterial diversity in the GI (Fig. 6, Additional file 1: Figure S17), although our analyses cannot establish whether this group of bacteria is a cause or a result of this observation. Actinobacteria strains have long been used as probiotic agents, in particular Bifidobacteria spp. [43]. Inflammatory bowel disease (IBD) patients have been found to be deficient for Collinsella spp. [44, 45] suggesting a role for this genus in the prevention of dysbiosis.

Temporal stability of enterotypes and links to biotic interaction structure

Clustering of a set of GI microbial community profiles according to a distance metric has been used to categorize individual microbiotas in large cross-sectional studies [14] and to link GI microbiota profiles with dietary patterns [18]. Recently, however, the concept has been drawn into question as being overly simplistic, and it has been proposed that GI microbiota differences may be better described by continuous gradients rather than sharply bounded enterotypes [46]. Although controversial, potential benefits from classification of complex microbial communities by projection onto a lower dimensional space, e.g. for diagnostic purposes, mean that enterotyping and similar approaches should be carefully considered. A recent study re-analysing data from I1 found that the enterotype of a single individual can be unstable over time [19]. We observed the same changes in I1 but found no evidence for multiple clusters in the other three individuals examined. From our analyses, there is some indication that enterotype switching could be associated with a strong positive but asymmetric interaction between Bacteroides and Prevotella observed in I1 (Fig. 7), although it is hard to see why this would make the individual prone to switching. Relationships between community stability and structural properties, including mutualistic interactions, is an enduring problem in ecology [47] and larger longitudinal studies will be needed to further evaluate the preponderance of, as well as mechanisms behind, enterotype switching.

Conclusions

Although the time series data compiled in the two studies [4, 5] that supplied data for our analyses are impressive in terms of temporal scope and sampling frequency, they produced data for only four individuals. To our knowledge, these are presently the only two studies of this kind, and a sample size of four is rather small for drawing general conclusions. Although we observed large individual differences, both in taxon composition and biotic interaction structure, some of our observations, i.e., limiting similarity and highly connected taxa, are consistent across individuals. Our results are also in broad agreement with RE. Although logistically challenging, large longitudinal studies with many participants will be needed to further validate the results presented here.

Interestingly, although methodologically very similar, the two studies we examine show variability in their results that appears to be systematic. In particular, I3 and I4 [5] were found to have expanded diversity within the Firmicutes compared with I1 and I2 [4]. Whether these represent actual differences in community structures or are due to technical differences in sample/data processing is unclear. Our filtering procedures were standardized throughout and we tried to apply conservative cutoffs to reduce the possibility of discovering spurious interactions. It is well known that metagenomic amplicon sequencing can be extremely sensitive to even small variations in protocol, and efforts towards standardization in order to ensure comparability across studies should be a future priority.

Methods

Time series data

The data used in this study were operational taxonomic unit (OTU) tables from two longitudinal studies of the healthy adult GI microbiota [4, 5]. Each study investigated two individuals that we subsequently refer to as I1, I2, I3 and I4 (Table 1). In each study, the V4 region of the 16S rRNA gene was sequenced on an Illumina GAIIx using the same protocol [48]. Sequence reads were processed with the QIIME [49] software suite, and 97 % OTUs were picked against the Greengenes database using uclust [50]. Prior to our analysis, the following additional processing steps were applied: (1) Only the sample with the largest library size was kept if multiple samples were taken on a single day. (2) Singletons were removed. (3) Samples with a library size of less than 20,000 reads were removed. (4) A common scaling procedure was applied as recommended by McMurdie and Holmes [51]. This entails multiplying every OTU count in a given library with a factor that is the ratio of the smallest library size in the data set to the library size of the sample in question. This replaces rarefying (i.e. random sub-sampling to the lowest number of reads) as it effectively results in the library scaling one would achieve by averaging an infinite number of repeated sub-samplings. (5) OTU tables were collapsed to the genus level by merging counts of species within the same genus. (6) OTUs with a mean relative abundance of less than 0.01 % were removed in order to reduce noise.

Reverse ecology data

Competition and complementarity indices were taken from Levy and Borenstein [16], where pairwise competition indices were estimated from genome sequences based on the number of exogenous metabolites utilized by both species. These values were used as proxies for the intensity of competition. This was then normalized to a value between 0 and 1 where 0 means no competition and 1 signifies complete metabolic overlap, i.e. intra-specific competition. Similarly, the complementarity index between two species was estimated as the number of compounds produced by one species that can be used by the second. This served as a proxy for positive interactions. For our analysis, we reduced the table of 154 species to 97 species corresponding to the 6 phyla represented in the time series data.

Time series modelling

The dynamics of each genus level OTU was modelled according to the function,

$$ {x}_{i,t+\mathsf{1}}\hbox{-} {x}_{i,t}={\alpha}_{i,j}+{\beta}_{i,j}{x}_{j,t} $$
(1)

where x i,t is the log relative abundance of taxon i at time = t, α i,j are intercept terms, β i,j are linear regression coefficients and x j,t are log relative abundances of taxon j at time = t. If i = j, the estimated interaction is within a given genus which is expected to be strongly negative due to density-dependent competition. The total number of equations in a system is equal to n 2, where n is the total number of taxa. This approach does not capture relationships that are strongly non-linear, that could be modelled, e.g. by generalized additive models, but previous work has shown linear regression to be a good approximation [3]. OTUs observed in fewer than 50 % of samples for a given subject were excluded from regression analysis. Prior to model computation, samples that did not have another sample taken the day directly after it were dropped. Only non-zero data points were used for modelling, and models based on fewer than 50 non-zero points were excluded from further analysis. A 99 % confidence level (p ≤ 0.01) was used for a model to be considered significant. For computing models with data amounts reduced relative to the full models (Table 2), data pairs (x i,t+1 and x j,t ) were sampled randomly to a series of maximal numbers for each individual before model computation. The maximal numbers of samples for I1 and I3 were 250, 200, 150, 125, 100, 75, 50 and 25. For I2, they were 100, 75, 50 and 25. For I4, they were 125, 100, 75, 50 and 25. For each maximal number of data points, models were computed 100 times for each individual, using the filtering criteria stated above (except in the case of 25 data points where the minimal number was reduced from 50 to 25). In order to assess the correspondence between the full and reduced models, we used two metrics. First, we looked at the correlation between regression coefficients between full and reduced models. The coefficients were found to be roughly normally distributed by visual inspection, so we used the standard Pearson correlation for this metric. We also looked at the number of interactions significant at the 99 % confidence level relative to the total number of potential interactions (square of the interaction matrix dimension) for each individual for full and reduced models in order to assess the power of detection. Both correlations and power of detection values are reported as means of 100 model computations. The interaction matrix diagonals (inter-taxon interactions) were omitted from these analyses. All computations were carried out using R [52].

Tests for differences in interaction structure within and between groups

Tests were carried out by comparing the regression coefficients (βs) of significant models describing interactions within phyla with the coefficients of models describing interactions between members of different phyla. For the full test, this entailed pooling values from all within phylum models and all values from between phylum models. Comparisons were then carried out using Wilcoxon rank-sum tests (a.k.a. Mann-Whitney U tests). Since the βs from models describing intra-genus interactions were always highly negative, tests were carried out both with and without these values. For the RE data, the same test was carried out on a matrix containing the differences between the complementarity and competition indices, excluding the matrix diagonal elements.

Comparison of co-occurrence and time series analysis

For determining co-occurrence patterns, we first computed pairwise contemporaneous Spearman correlation coefficients between the different taxa within each individual. In order to ensure that the taxa included in the analysis were the same as those included in the time series models, OTUs observed in fewer than 50 % of samples for a given subject were excluded from the analyses. Since the distribution correlation coefficients from an individual sometimes deviated from normality, Spearman correlation was again employed in order to assess the relationship with regression coefficients obtained by time series regression analyses. This relationship was also assessed using generalize additive models as implemented in the R package ‘mgcv’, using 3 degrees of freedom in order to accommodate non-linear relationships.

Enterotyping

Enterotyping was carried out according to Arumugam et al. [14] using the square root of Jensen-Shannon divergences as a distance metric. Abundance profiles were clustered using partitioning around medioids as implemented in the R-package ‘cluster’. Optimal cluster numbers were determined using both the Calinski-Harabasz index [53] implemented in the ‘clusterSim’ R-package and silhouette widths [54] in the ‘cluster’ package. Principal coordinates analysis (PCoA) and visualization of results were carried out using the ‘ade4’ R-package. Testing for temporal enterotype clustering was done using the Wald-Wolfowitz test for randomness of the distribution of values in a vector [55] implemented in the R-package ‘adehabitatLT’.

Availability of supporting data

Supplementary information is included with the article and available on the Microbiome website.