The mammalian gut harbors a microbiota that consists of hundreds of microbial species whose relative abundances vary considerably among individuals [1]-[3]. At some extremes of this variation, composition and function of the microbiota show associations with complex diseases and these abnormal microbial assemblages may even contribute to the disease process [4]-[7]. Despite the growing catalogue of known gut microbes and an increasing understanding of their distributions in populations, the fundamental principles that guide assembly and define structure of the microbiome are largely unknown.

Ecological theory predicts that community assembly is governed by a combination of deterministic, historic, and neutral factors [8]. Evidence now exists that gut microbiota is structured by host-defined deterministic factors specified by the genotype (which relate directly to physiology and immune functions), deterministic environmental factors such as diet, and stochastic factors such as colonization order and history of antibiotic exposure [9]. Though the relative contribution of several of these factors have begun to be estimated individually, systematic studies are needed to understand how these factors converge to shape individualized microbiomes that show stability and resilience.

That natural genetic variation can indeed account for variation in the abundances of taxa of the gut microbiota has been demonstrated in mouse model systems, subsequently leading to the identification of quantitative trait loci (QTL) that affect the relative abundances of specific microbial taxa and groups of taxa in the gut [10]-[12]. Among the 18 QTLs initially mapped by Benson et al. [10], at least three of the microbiota QTL overlapped QTLs for complex diseases, suggesting that genetic predisposition to complex diseases may be attributable, in part, to assembly of abnormal microbiomes. Indeed, variation in several innate response genes is associated with inflammatory and metabolic diseases in humans and these diseases also manifest dysbiosis [13]-[18]. Although the causal relationships between genetic variation, dysbiosis, and disease are still largely unknown, work in experimental animal models shows that null mutations in innate response genes give rise to dysbiotic microbiota that can bring about disease characteristics when transferred into naïve animals [19]-[23].

In contrast to innate response genes, it is unclear how genetic variation in adaptive immune genes affects the microbiome. Rag -/- mice, which entirely lack an adaptive immune system, have significant abnormalities in composition of the gut microbiota [24]. However, the innate and adaptive responses have overlapping roles in gut function and innate responses dominate these roles when an adaptive response such as IgA production is abrogated [25]-[27]. These confounding effects have begun to be untangled, with recent studies showing that signaling through TLR5 can influence immunoglobulin production against flagellar antigens of the gut microbiome [28] and signaling through FoxP3+ T cells plays a role in stimulating IgA production in Peyer’s patches that modulates members of the Lachnospiraceae [29].

Though host factors can contribute measurably to fecal microbiota composition, these differences do not appear to explain the majority of the variation contributing to individuality. Thus, environmental and stochastic factors must also play significant roles. Several studies show measurable influences of dietary modulation on gut microbiota composition [7],[30]-[32], with short-term changes in diet resulting in relatively rapid responses in the relative abundances of taxa within the gut microbiota [33]. Even relatively minor short-term changes such as inclusion of whole grains or prebiotic oligosaccharides can translate into significant, albeit temporary, changes in microbiome composition [34],[35]. Relationships between microbiome composition and long-term diet are poorly understood but seem to be reversible in mice [36]. Nonetheless, some associations of long-term diet with overall microbiota composition have been reported in humans [37], making it still unclear if diet on its own is a significant contributor to the individuality of the gut microbiota.

Collectively, each of these deterministic factors (diet, immune function, and host genotype) can have measurable effects when studied independently, but it is unknown how these factors converge to ultimately shape composition of the microbiota. To provide insight into the interactions of these factors, we conducted a genome scan to search for QTL controlling composition of the microbiota and QTL controlling variable region utilization among expressed IgA in a mouse intercross model with a dietary variable (high-fat versus conventional diet). The mouse population was developed as an advanced intercross population produced from crosses of mice with a genetic predisposition to dietary-induced obesity (C57BL/6J) with those in a strain selected for high voluntary wheel running. At weaning, the population was randomly assigned to normal or high-fat diets for 6 to 8 weeks and sampled for microbiota composition with tissue from the ileum of the same animals sampled at necropsy for RNA extraction and measurement of mRNA from expressed IgA.


Basic statistics and variance components of the generation 10 microbiota

As we have reported previously [10], a large proportion of the taxa detected by pyrosequencing show a sparse distribution across the animal population; and of the 472 mice in this G10 population of mice, 203 taxa (OTUs at 97%) were detected in at least 75% of the animals. The mean relative abundances of these 203 consistently-detected taxa across all animals were quite broad, in the range of 0.045 for dominant taxa such as Alistipes OTU15 to 0.00027 for low abundance taxa such as OTU76601. There was also little relationship between the mean abundance of taxa and the range as some dominant taxa such as Parabacteroides OTU3 ranged nearly 1,000-fold across the animals (from abundances of 0.222 to 0.000226) while some lower abundance taxa such as Odoribacter OTU1 showed a tighter distribution (abundances of 0.006 to 0.00011). For statistical analyses, the relative abundances were log-transformed to reduce the effects of skewness, and the means and standard deviations of these log-transformed abundances are given in Additional file 1.

Estimates of the variance components (Additional file 2) for the microbiota taxa abundances vary considerably among the 203 taxa. Differences among the cohorts account for an average of 9.7% of the total variation, although these percentage values are in the range of 0 (in 7 taxa) to as much as 43.2%. Contributions from family differences average about one-half of that for cohort (4.8%), with 49 taxa showing no differences. Litter differences contribute on average 6.1% to the total variation, although again with a number of taxa (N = 29) showing no differences between litters. Residual variation contributes by far the largest amount to the total variation, averaging 79.4% and varying from 43.8% to 97.9% among the 203 taxa. Thus excluding the environmental cohort and litter contributions, an unknown fraction of the remaining 84.2% contributed by family and residual differences is genetic in origin.

Compositional features of the G4 and G10gut microbiota

The G10 population showed several major differences in composition of the microbiota when compared to the population mapped at G4 [10], many of which could be observed even at high taxonomic ranks. As illustrated in Figure 1, the G10 population had significantly higher levels of taxa belonging to the Bacteriodetes, Delta Proteobacteria, Epsilon Proteobacteria, Mollicutes, and Deferribacteres. This was offset by decreased levels of Clostridia, Bacilli, Beta Proteobacteria, Gamma Proteobacteria, and Flavobacteria. This same pattern could also be detected at the genus level (Figure 1B), with the G10 mice showing substantial elevation in members of the Bacteriodetes (Bacteriodes, Parabacteriodes, Rikenella, Allistipes), Epsilon Proteobacteria (Helicobacter), Delta Proteobacteria (Mucispirillum), and Mollicutes (Ureaplasma) that are offset by decreases in members of the Clostridia (Lachnobacterium, Roseburia, Dorea) and Bacilli (Lactobacillus, Lactococcus, Weissella), and Beta Proteobacteria (Variovorax). Phylogeny-based analysis of the 200 most abundant OTUs from a random selection of 100,000 pooled sequences of the G4 and the G10 animals (balanced for cohort in G4 and G10 and diet in G10) also showed many of these same differences (Additional file 3), with expansion of the diversity in taxa attributable to the Bacteridetes that was offset by a reduction in diversity of taxa attributable to the Firmicutes and the Proteobacteria. Estimates of alpha diversity using these same 100,000 sequences from the G4 and G10 populations (based on Shannon and Inverse Simpson indices) showed slightly higher diversity in the G4 animals, but the differences were not statistically significant (P <0.09). Thus, despite the dramatic changes in taxonomic composition of the microbiota between generations G4 and G10, there was little change in the overall levels of diversity.

Figure 1
figure 1

Comparison of microbiota composition between G 4 and G 10 populations. (A, B) Box and whisker plots for the Log10 relative abundances of taxa ((A) Class, (B) Genus) that are shared between the G4 (green) and G10 (red) populations. The boxes represent 75% of the data and whiskers indicate the range. (C) The relative abundances of the 16 shared genera between the G4 and G10 populations. G10 mice are shown on the left by cohort and G4 mice are on the right ordered by cohort.

Even though compositional differences in the microbiota emerged at G10, several general features were still conserved in the G4 and G10 populations. In both studies, only a small portion of the taxa were measurable across a significant proportion (75%) of the mice. Because genus-level processing of the pyrosequencing data by the CLASSIFIER algorithm is common to both studies, we examined the genera comprising the Core Measurable Microbiota (CMM) in both studies. All of the 16 genera of the G10 CMM were found among the 19 genera comprising the G4 CMM. Collectively, these 16 CMM genera comprise 40% to 50% of the total microbiota across all mice (Figure 1C). Though shared, these 16 CMM were distributed quite differently in the G4 and G10 populations. For example, members of the genera Alistipes, Bacteriodes, and Parabacteriodes dominate nearly all of the G10 mice but are only dominant in groups of G4 mice corresponding to individual cohorts.

Effect of high-fat diet on the G10 microbiota

To examine the effects of diet on compositional features of the microbiota, we first compared estimates of alpha diversity in the microbiota across animals fed control or high-fat diets. As shown in Figure 2A, the inverse Simpson’s index (1/D) showed modest, but statistically significant differences between the animals fed control versus high-fat diets, with animals on the high-fat diet displaying reduced levels of diversity. ANOVA identified 54 taxa showing significant effect of diet (P <0.05) but the Bonferonni-corrected significance level (P <0.00000483), left only eight of these 54 taxa passing the stringent multiple-testing threshold for significance. Even these eight taxa showed only modest differences in their distributions (Figure 2B). Likewise, Linear Discriminant Analysis (LDA) of the log-transformed abundances of the 34 taxa with smallest P values (from ANOVA) also showed a small, but measurable effect of diet (Figure 2C), with microbiota from animals fed control or high-fat diet displaying partial separation, almost exclusively in the first (X-axis) dimension.

Figure 2
figure 2

Effect of diet on the G 10 microbiota. (A) Box and whisker plots of the Inverse Simpson’s index for animals on the control or high-fat diet. The microbiota from animals fed the control (blue) or high-fat (red) diets show a significant effect of diet (one-sided t-test: P <0.045). (B) The distributions of eight taxa showing a statistically significant effect of diet (Bonferonni-adjusted P <0.00000483) are depicted in box and whisker plots. The boxes define 75% of the data points. Blue = control diet, Red = high-fat diet. (C) The log-transformed abundances of 34 taxa with the smallest, non-adjusted ANOVA P values were used as variables for Linear Discriminant Analysis (LDA) with diet and parity as factors. The first two LDA functions account for >60% (X axis) and >30% (Y-axis) of the variation in these taxa.

Collectively, despite the difference in fat content of the diets, diet-based differentiation of the microbiota in our population was minimal and likely due to the cumulative effects of small differences across multiple taxa, as opposed to large shifts in small numbers of taxa.

QTLs affecting relative abundances of G10gut microbial taxa

Table 1 gives a summary of all QTLs affecting the relative abundances of taxa from the gut microbiota of the G10 mice. Over all 203 taxa, a total of 42 QTLs were discovered, including 22 that had LOD scores reaching the 5% genomewide level of significance. Their confidence intervals average 9.85 Mb with a standard deviation of 5.25 Mb. These QTLs affect 39 of the 203 different taxa (19%), 36 of which are affected by a single QTL. Three taxa, OTU29627, OTU17740, and OTU30840, each are affected by two QTLs on different chromosomes. Using a strict approach to correct for multiple testing, FDR values were calculated from the probabilities associated with the LOD scores of the G10 QTLs (Table 1), and these are in the range of 0.002 to 0.513. Only a single QTL on Chr 9, which controls the abundance of the Alistipes OTU41353, exceeds this strict experiment-wide tthreshold. Overall, the FDR procedure suggests roughly one-half of the 42 total G10 QTLs could represent false positive results.

Table 1 QTL statistics for the microbiome traits in the G 10 mouse population

If the strict experiment-wise threshold is relaxed, multiple examples of overlap are observed (Table 1) among the genomic sites of significant (genome-wide P <0.05) and suggestive (genome-wide P <0.1) QTLs. Such overlap implies pleiotropy and underlying covariation of microbial taxa. The most obvious example of this is seen for six QTLs on Chr 9 at 37.3 Mb to 40.7 Mb that each affects a different taxon. All six of these QTLs, especially the four mapping to 40.7 Mb, may represent a single gene or set of closely linked genes with independent effects on these traits. Notably, these six QTLs have the lowest FDR values and thus the highest probability of being true positives. The traits affected by these six QTLs show three different patterns of covariation across the animals, suggesting alleles from two or more closely linked genes may be contributing to the phenotypic segregation. Altogether, 27 of the 42 total QTLs have non-overlapping confidence intervals, suggesting that there may be as few as 27 unique detectable QTLs affecting these traits. Over half (N = 19) of the 27 unique QTLs affect only one trait, however, so putative pleiotropy is not extensive.

Thirty-three of the 42 microbiota QTLs exhibit significant additive effects with an average absolute a value of 0.161. Of these 33, most (24) are negative in sign, indicating that the HR allele at these loci generally acts to increase the abundance of the affected taxa. The number of QTLs showing significant dominance genotypic effects is 29, nearly as many as those exhibiting additive genotype effects. Further, the mean of the absolute values of these significant dominance effects (d values) is 0.186, slightly greater than that for the additive effects. The d/a ratios (not shown) suggest that 13 of these 39 QTLs show dominance whereas 10 exhibit overdominance and six exhibit underdominance. An example of overdominance (heterozygote greater than either homozygote) is shown by the QTL on Chr 6 (54 Mb) affecting OTU32093 with a dominance value of 0.22 that is over five times greater than its additive value of 0.04.

The percentage of the total phenotypic variation explained by the microbiota QTLs is in the range of about 3% to over 8%, averaging 4.6%. The highest percentages explained are seen for the QTL on Chr 9 mentioned above although a QTL on Chr 16 affecting Mucispirillum (and Mucispirillium schaedleri which accounts for most of the Mucispirllum), and one on Chr 14 affecting OTU30840, account for over 6% of the variation in their abundances. In the G4 population, the percentage contributions of the microbiota QTLs were quite comparable, varying from 1.5% to 9.0% and averaging 4.7% [10].

QTL replication in the G10population

The high number of potentially false positive QTLs from multiple testing led us to search for validation through potentially overlapping QTLs mapped previously in the G4 population. The initial comparison revealed no overlap, but given the differences in taxonomic composition and given that the G4 QTLs were originally mapped only at the taxonomic rank of genus and higher, it seemed possible that the lack of overlap was partially due to the different levels of taxonomic resolution used for mapping. To overcome this confounder, the taxonomic resolution was normalized by processing the G4 microbiota data set using the same OTU pipeline used for the G10 data. This generated 331 species-level OTUs and 23 genera (Additional file 4) that met our trait distribution threshold (at least 5 reads per taxon across 75% of the animals). These taxa were then mapped using the G4 genotyping data as done previously [10] with the robust permutation-based thresholds of the GRAIP algorithm to account for structural relatedness among families [38]. A total of 21 significant QTLs were detected among the G4 OTUs (Additional file 5) and with equivalent levels of taxonomic resolution, four of these G4 QTLs now shared overlapping peaks with G10 QTLs or were immediately adjacent to a G10 QTL (Figure 3). These included two different G4 QTLs on Chr 1, and one each on Chr 3 and Chr 9. Notably, the G10 QTLs on Chr 9 had the highest degree of statistical support. In addition to overlapping peaks, three of these four QTLs affect organisms that are taxonomically related in the G4 and G10 animals. For example, the G4/G10 QTLs around 170 Mb of Chr 1 control OTUs belonging to the genera Bacteriodes (G4) and Prevotella (G10), both of which belong to the taxonomic order Bacteriodales. Likewise, the QTL peaks on Chr 3 and Chr 9 control OTUs belonging to the order Clostridiales (a member of the family Ruminococcaceae in the G4 population and an OTU belonging to the genus Clostridium in the G10 population). These overlapping QTLs controlling taxonomically related organisms in separate populations are strongly suggestive of replicated QTLs. In addition, the capacity of these QTLs to influence distinct, but taxonomically related organisms, further illustrates how some host genomic loci can exert pleiotropic effects across cross-sections of phylogenetic space in the microbiome.

Figure 3
figure 3

Relative positions of QTLs controlling gut microbial taxa from the G 4 and G 10 populations. In the graph on the top right (Whole Genome), peak positions of the QTLs from the G4 (green dots) and G10 (blue dots) generations are plotted along the length of the 19 mouse autosomes. The relative positions of each chromosome are indicated above the X axis. Regions where confidence intervals overlap from G4 and G10 QTLs are highlighted in gray and are plotted in the corresponding graphs to the left and below. Plots for the overlaps on the individual chromosomes show the chromosome positions (X axis) and naïve/GRAIP-adjusted LOD scores (Y axis) for the significant G4 QTLs with the confidence intervals of the G10 QTLs plotted as a single line at its corresponding position and peak LOD score. Individual taxa are color-coded below each graph.

QTL interactions in the G10population

QTLs were tested for potential interactions with sex and with diet by calculating the -2 ln (likelihood) for a model containing all terms, but with the interactions of the a and d effects with sex or diet. Likelihoods obtained from this model were then compared with the null model lacking the interaction terms. Differences between likelihoods from the two models were further evaluated by chi-square tests. Using a probability cutoff of P <0.05 for significance, two microbiota QTLs in the G10 population showed significant interactions with sex, and eight QTLs interacted with diet (Table 1). The sex interactions involve QTLs on Chr 2 (at 172.5 Mb) and Chr 16 (at 44.8 Mb), and in both cases, significant effects were seen only in the male mice. Among the 8 QTLs showing interactions with the dietary environment, only four separate genomic sites are represented. However, despite the small number of loci influenced by diet, we note that some of these loci are quite complex.

Particularly noticeable is the set of four QTLs mapping to the same position on Chr 9 that show different effects depending on the dietary environment (Figure 4). QTLs for OTU17740 (Figure 4A), OTU25269 (Figure 4B), and OTU25438 (Figure 4C) show significant QTL effects only in mice fed the control diet while the QTL for OTU13989 (Figure 4D) shows significant effects only for mice fed the high-fat diet. Not surprisingly, the abundances of OTU17740, OTU25269, and OTU25438 show high degrees of correlation across the G10 mice but no correlation with OTU13989. The QTLs for OTU29084 (Figure 4E) and OTU41353 (Figure 4F), which also map to a similar position, show no interaction with diet.

Figure 4
figure 4

QTL on Chr 9 show complex interactions with diet. (A-F) Plots of genotypic values at the QTL peak (X axis) are shown for six different OTUs that have peaks at the same or similar location on Chr 9 and which show differential effects depending on whether the G10 mice were fed the control or the high-fat diet. Relative abundances of the OTUs for animals having the HR/HR parental, HR/B6 heterozygote, or B6/B6 parental genotypes at QTL peak are plotted on the Y-axis. (A) OTU17740. (B) OTU25269. (C) OTU25483. (D) OTU13989. (E) OTU29084. (F) OTU41353. Error bars based on standard error.

A second set of overlapping QTLs showing significant interactions with diet are found on Chr 1 (Figure 5). Figure 5A shows that a Chr 1 QTL at 49.1 Mb clearly has a greater effect on the relative abundance of OTU15028 in mice fed the control rather than the high-fat diet whereas the reverse is true for the effects of a different QTL on Chr 1 (127.2) on Butyricicoccus (Figure 5B). Two additional examples are illustrated of QTLs with greater effects in mice fed the control (Figure 5C) or the high-fat diet (Figure 5D). The HR/HR genotype for the Chr 4 QTL (Figure 5C) increases the abundance of OTU17889 in mice fed the control diet, but the reverse is true for the B6/B6 genotype. The Chr 9 QTL affecting OTU13989 (Figure 5D) showed a different pattern in which the HR/B6 genotype increases the abundance of this taxon more so in mice fed the high-fat rather than the control diet.

Figure 5
figure 5

QTL on Chr 1, Chr 4, and Chr 9 show complex patterns of gene X diet interactions. (A-D) Plots of genotypic values at the QTL peak (X axis) are shown for four different OTUs where the QTL shows differential effects depending on whether the G10 mice were fed the control or the high-fat diet. Relative abundances of the OTUs for animals having the HR/HR parental, HR/B6 heterozygote, or B6/B6 parental genotypes at QTL peak are plotted on the Y-axis. (A) QTL for OTU15028 at 49.1 Mb on Chr 1. (B) QTL for Butyricoccus at 127 Mb on Chr 1. (C) QTL for OTU17889 at 12.5 Mb on Chr 4. (D) QTL for OTU13989 at 40.7 Mb on Chr 9. Error bars based on standard error.

QTL analysis of IghV utilization patterns

The availability of ileal tissue from a large subset of the animals across both diets provided a unique opportunity to examine the role of expressed IgA on microbiota composition and to determine if host genetic influence on IgA rearrangements or their expression played any role in shaping microbiota composition. The abundance of transcripts from rearranged and expressed IgA receptors among B-cells resident in the ileal tissue were measured by pyrosequencing of amplicons generated from cDNA using a primer immediately upstream of the IgA constant region (IgAC) in combination with the universal variable region (Vh) primer. The resulting sequences were then binned initially by BLAST analysis of each read against the VH region repertoire and each bin was subsequently normalized by the total number of reads per animal. The means of the log-transformed abundances of these 67 IgA traits (Additional file 6) varied from -3.38 (B196, IghV3-4) to -1.226 (B59, IghV1-53) and accounted for >90% of the total reads across most mice. Over all individuals, however, the minimum value for the IgA abundances was -5.082 (B218, IghV5-6-2), and the maximum individual value was 0.015 (B79, IghV1-72). ANOVA showed no significant effect of diet on the IgA abundance.

Estimates of the variance components (Additional file 7) differ considerably among the 67 IgA expression traits, with values for the cohorts varying from 0 to 19.1% and being the least important (mean = 2.8%) of the four components. Parity (differences between successive litters) was slightly more important (mean = 3.6%), but these two sources of environmental variation (cohorts and parity) jointly contribute just 6.4% of the total variation. Family differences vary from 0 to 32.4% and average 12.1%, greater than that for the microbiome traits. Again, however, the largest contribution is from residual (within family) variation, the estimates in the range of 59.7% to 99.4%, and averaging 81.6%.

As shown in Table 2, QTL analysis of IghV utilization patterns identified a total of 56 QTLs that had LOD scores reaching at least the 10% experimentwise threshold. Only one QTL (affecting B81, IghV1-75) showed a marginally significant interaction with sex, and none significantly interacted with diet, which was expected given the lack of dietary effects on the individual IghV abundance. Remarkably, 36 of the 56 QTLs had LOD scores >10, with 34 of these highly significant QTLs localized to a segment on Chr 12 that encompasses the IgH region. In addition, two highly significant QTLs mapped to segments on Chr 17. While this 7 Mb confidence interval spans >400 genes/pseudogenes, it includes the mouse Major Histocompatibility locus (MHC) with 74 class I, II, and III genes. Given the known involvement of MHC in controlling immunoglobulin production, it seems reasonable that diversity in one of the class II genes could give rise to variation in the IgH V-region utilization patterns. With highly significant QTLs overlapping well-known sites contributing to regulation of immunoglobulin production, the IgA-specific IghV region utilization patterns appear to be robust phenotypes. However, none of the 56 overall QTL for IgA overlapped with any of the QTL for microbiota. Also, correlations between the abundances of each of these 67 IgA variable regions and the microbiota comprising the CMM, all were non-significant (P >0.05). Importantly, the lack of QTLs with pleiotropic effects on IgA and the microbiota implies that genetic variation influencing IghV region utilization and expression has little effect on broad compositional features of the microbiota in the mouse population that was studied.

Table 2 QTL statistics for the IgA traits in the G 10 mouse population

Because antibody specificity is generated through recombination of the V, D, and J regions, along with hypermutation, it was possible that IghV region utilization alone did not provide the specificity necessary to detect association with the microbiota. To test this possibility, we employed a higher resolution approach to bin the IgA sequences, using the K-mer based strategy in CD-hit [39] to cluster predicted protein sequences from the 2,644,330 quality-filtered IgA reads. With a 99% identity cutoff for clustering, this yielded 4,505 different clusters. The vast majority of these clusters exhibited low abundances and were often present in only a single animal, or were sparsely distributed. However, 71 clusters were observed across the majority of the animals (>5 reads shared across 75% of the mice), presumably arising from convergent clones expanded across multiple animals. Some correlations of the abundances of these 71 clusters with the 300 most abundant species and OTUs from the 16S rRNA-derived microbiota data (Figure 6) were significant (r >0.5) among several of the IgA clusters themselves and among several of the individual microbial taxa, but no significant associations were observed between any of the IgA clusters and microbial taxa. The lack of overlapping QTLs and the absence of correlation collectively imply that genetic variation influencing the immunoglobulin repertoire plays little role in the individuality of microbiome composition.

Figure 6
figure 6

Correlation matrix between microbiota composition and IghV region expression. For each of 308 mice, the relative abundance of 67 IghV regions shared among the expressed IgA population of >75% of the animals and the relative abundance of the most abundant taxon of the microbiota were used to calculate the correlation coefficient (r). Red and blue coloring corresponds to the r value for each pair-wise comparison according to the color gradient.


Our study population presented a unique opportunity to examine a combination of deterministic factors that shape composition of the gut microbiome in G10 descendants of an advanced intercross population that had previously been studied at G4. Several aspects of the overall microbiome composition were notably different between the G4 and G10 animals. While the overall species compositions differed significantly (substantially higher in members of the Bacteriodetes at G10 versus G4), the most striking difference was the variation between breeding cohorts, accounting for an average 26% of the total variation across taxa in the G4 but only 9.6% of the total variation in the G10. It is possible that changes in the pyrosequencing reagent stream that were introduced by the supplier during the 18 months between the G4 and G10 populations contributed to the unique compositional features, but these changes would most likely manifest as biases in taxonomic compositions and not their distributions across the populations. Resequencing a small number of G4 samples with similar reagents used for G10 samples showed quite similar taxonomic content, suggesting this was not a factor. Second, it is possible that population-specific characteristics of the microbiota were brought about by phenotypic and/or genotypic drift or they reflect the degree to which recombination has dispersed the variation from parental lines across the progeny. For the latter case, the dispersal of parental genomic variation through accumulating recombinations by G10 could result in a more evenly distributed microbiota. The increased dispersion of genomic variation could also be augmented independently by ‘maturation’ of the microbiome, going from more chaotic distributions during the first few generations in the facility to more stable configurations after 10 generations of breeding in the same facility.

Effect of high-fat diet on the gut microbiota of the G10population

A high-fat diet was incorporated into the experimental design to test for interactions between genotype and diet. This design also provided an opportunity to examine closely effects of the high-fat diet alone across an intercross population, in contrast to studies using a single inbred line. Single line studies often show substantial changes in the microbiota [31] marked by blooms of related taxa, whereas the effects of a high-fat diet across the large numbers of animals from our intercross population showed a modest effect on alpha-diversity and small, but statistically significant differences across a large number of taxa. Whether the magnitude of the diet effect was muted in our study because of the genetic diversity from the intercross or some other factor is not clear. Recent studies across 100 different mouse lines showed dietary effects dispersed across several taxa and these effects were unique to certain lines [40]. Clearly, understanding the effects of diet on the microbiome will require much more study in different types of populations to understand these complex interactions.

Microbiota QTLs

The results of our analysis defined 42 QTLs that affected the relative abundances of 39 of the 203 taxa. We were conservative in using only 5% and 10% genome-wide thresholds rather than chromosome-wide thresholds to determine significance. Because we analyzed so many traits, however, it was not surprising that the FDR procedure suggested that as many as roughly 1/2 of the QTLs affecting these traits may be false positives. On the other hand, this also means that about 20 QTLs reflect true underlying genetic variation affecting the microbiota composition. The greatest support (lowest FDR values) was for QTLs on Chr 9, especially one at 37.3 Mb affecting OTU41353.

The mapping precision of the QTLs we discovered was enhanced in our advanced intercross population at G10 compared to that in the G4 population. Thus the mean confidence interval of 9.85 Mb calculated for these QTLs is considerably less than that of 20.7 Mb found by Benson et al. [10] in the G4 mice, and would have dropped to about 7 Mb if we had used a 1-LOD (rather than 1.5-LOD) drop criterion as was done in the G4 study. It therefore is clear that the additional mapping precision expected from the accumulated recombinations in the G10 population was in fact achieved.

QTL replication

Despite the population-specific features of the G4 and G10 microbiota, normalizing the levels of taxonomic inquiry in the G4 and G10 microbiotas produced four different genomic segments where QTLs overlapped from the two studies. At three of these loci, the taxa controlled in the G4 or G10 populations shared taxonomic relatedness at the family or order level. In addition to the replication we observed in these populations, recent QTL analyses of the skin microbiome identified two out of the 14 QTLs controlling taxa of the skin microbiome that overlap those we had previously found in the G4 population [41]. Of the two shared QTLs, only the QTLs on Chr 14 appear to control taxonomically related organisms (G4 QTL controls Lactococcus whereas the skin microbiome QTL controls an OTU belonging to the Firmicutes).

From a broad perspective, the ability of host loci to control a variety of microbial taxa would support multiple possible outcomes of microbiome assembly, with each assemblage potentially sharing a common core of metabolic and functional niches despite the diversity in taxonomic composition. From the host perspective, the ability to support multiple possible assemblages would be advantageous and allow the assembly process to work upon the microbial capital it happens to encounter early in life.

Pleiotropic patterns

A prominent feature of several QTLs we discovered was that they affected more than one taxon. While it is possible that some of this apparent pleiotropy is due to linkage disequilibrium, it seems unlikely that this would explain all of the pleiotropic loci. Correlated traits are often related by their contribution to similar pathways or functions, but in the case of microbial traits, correlated microbial taxa could be controlled by the same QTL due to common physiological characteristics (for example, common sensitivity to defensins secreted in the mucosa), or common metabolic traits (for example, ability to degrade mucins). One could even envision pleiotropy occurring indirectly, whereby host genetic factors favor colonization by a given taxon and this initial event sets the stage for colonization by a second taxon (for example, metabolic end product of one taxon serving as a substrate for a second taxon). This could be the case at the complex of overlapping QTLs we identified on Chroe 9. Here we observed distinct effects on two different sets of correlated taxa (Figure 3). While both sets of correlated taxa (colored red or blue in the matrix) comprise OTUs belonging to the class Bacteriodetes, early colonization by OTU17740 (blue cluster), may favor subsequent colonization by OTUs 25269 and 25483 whereas colonization by OTU41353 favors subsequent colonization by 29084. Colonization by Peptococcus (OTU13989) may actually favor a third pathway in which strains belonging to the red or blue correlated clusters are tolerated. Defining the underlying basis of these QTLs will therefore provide clues to important characteristics of gut microbes and the niches that they occupy.

Microbiota QTLs, obesity, and diet

Given the known association of gut microbes with obesity and various metabolic disorders [42], it is reasonable to expect that some of the microbiota QTLs might exhibit pleiotropic effects on body weight or composition. To examine this possibility, we compared the locations of the microbiota QTLs (Table 1) with QTLs previously found for body weight and the percentage of fat and lean tissue in these same (8-week-old) mice [33]. This comparison revealed four instances of overlaps for QTLs on Chr 5, Chr 9, Chr 11, and Chr 18, details of which are summarized in Table 3. Several potential candidate genes for these QTLs are listed in the Table, but it will require additional effort to discover whether these or other genes underlying the QTLs actually affect both kinds of traits, and if so, what pathways might be involved.

Table 3 Possible candidate genes for QTLs affecting microbiome and body weight/composition traits in the G 10 mice

Regardless of which candidate genes contribute to these phenotypes, our discovery of putative pleiotropic effects of QTLs on microbiome composition and body weight/fatness/leanness illustrates the theoretical potential for genetic predisposition to obesity to be manifested in part by susceptibility to aberrant colonization of the gut.

Perhaps the most significant finding in our study was the identification of several microbiota QTLs exhibiting interactions with diet. While only eight of the 42 total microbiota QTLs (19%) showed these interactions, this low proportion is identical to that for QTLs affecting body weight or the percentage of fat or lean tissue in this same mouse population [32]. Because of the apparent pleiotropy of these QTLs, however, as few as four different genes (two on Chr 1, one on Chr 4, and one on Chr 9) may be involved.

Among the microbiota QTLs showing interactions with the dietary environment, the four on Chr 9 each affecting a different taxon were most impressive. These QTLs all mapped to the same precise position (40.7 Mb), and thus likely represent the same underlying gene. The QTL affecting OTU13989 showed the most restricted confidence interval of just 1.9 Mb that according to the Mouse Genome Informatics database [43] contains 11 protein coding genes. Of these 11, Bsx, brain specific homeobox (at 40.9 Mb), would seem to represent an outstanding candidate for the QTLs. Bsx mutants exhibit increased fat mass, decreased food intake after fasting, and reduced locomotor activity [44].

From a broader perspective, our discovery of gene X diet interactions on microbiota composition supports the idea that dietary modifications can potentially modify or even overcome allelic effects on microbiome composition. In fact, recent studies on the microbiome of infants show that dietary modulation of microbiome composition and function can influence expression patterns of innate response genes [45]; and in adults, dietary modulation can also affect metabolic and inflammatory markers in the blood [35]. Combined, these findings are of special significance to human health because they suggest that dietary intervention could overcome heritable components of disease predisposition that are manifest through the gut microbiome. Similarly, with respect to animal agriculture, our discovery implies that dietary modulation could overcome the effects of undesirable genotypes associated with weight gain or even with colonization by zoonotic pathogens.

The microbiota and IgA

Secretory IgA (SIgA) plays important roles in barrier defense against enteric pathogens by binding to cell surface molecules of the pathogen and precluding attachment [46],[47]. Such a barrier defense would not necessarily be limited to pathogens and could play a role in homeostasis by limiting exposure of the epithelial layer to the mass of microbial cells in the microbiota. Indeed null mutations that block class switching to IgA have significant effects on microbiota composition [48],[49]. More recently, FoxP3+ Tcell-dependent production of high-affinity IgA was found to be associated with shaping the microbiota, specifically by enriching for members of the Firmicutes [29]. Remarkably, this IgA-mediated enrichment seems to be mediated through a positive influence of the IgA on the microbiota as opposed to the removal of potential competitors.

Unlike studies in isogenic derivatives of a single line, our study provided a unique opportunity to examine specificity of the expressed IgA repertoire with respect to the microbiome across a population with genetic diversity dispersed randomly across the progeny. Genetic variation had a significant outcome on variable region utilization patterns but it did not affect composition of the gut microbiome. Likewise, we could not detect association between VDJ rearrangements and composition of the contemporary microbiota. Of course, it is quite possible that specificity of IgA-microbe interaction is below our level of sensitivity. While we can approximate species-level resolution with our OTU-pipeline, specificity of the interaction may be dictated at the strain-level. The IgA-mediated enrichment of the microbiota observed by Kawamoto et al. [29] was detected by sequencing of antibody-bound taxa, implying that the high-affinity IgA responsible for shaping the microbiota in their studies was directed toward cell surface molecules. Indeed, cell surface molecules such as teichoic acids, extracellular polysaccharides, and surface proteins tend to be some of the most highly variable and strain-specific traits of a bacterial species, making it unlikely that we would have detected such interactions. In the absence of strong associations between the microbiome composition and expressed IgA molecules, the correlation among Vh usage patterns and convergent VDJ rearrangements that we observed across individuals becomes even more intriguing. Convergence among expressed VDJ regions between individuals has been observed in antibody repertoires of zebrafish [50] and mice [51] and it can be observed in vaccine responses as well as anamnestic sera from patients recovering from epidemics, implying that microbes may be capable of eliciting specific signatures of IgH rearrangements. If so, then the convergent responses observed in our animal population could either reflect signatures of strain-level interactions between the contemporary microbiota and the mucosal immune system or, they could reflect interactions with the microbiota early in life, prior to contemporary microbiota we measured in the mature animals.


Detailed analysis of the taxonomic abundance of the gut microbiota at G4 and G10 of the C57BL/6J X HR intercross have provided insight into the impact of host factors, dietary factors, and stochastic factors on gut microbiota composition. Major differences in dominant taxa of the gut microbiota occurred over time between G4 and G10. This was particularly the case for the distributions of these taxa, which were highly cohort-dependent and variable (wide ranges) in G4 animals but less cohort-driven with modest ranges at G10, suggesting that the microbiome may have progressed from a more to less chaotic assembly over time. Despite these differences, four overlapping QTLs were still detected among both G4 and G10 mice.

A high-fat diet in one-half of the G10 animals brought about a modest impact on the microbiota that resulted from cumulative incremental changes in many taxa as opposed to large swings in taxonomic abundance. The genomic region at 40.7 Mb on Chr 9 had overlapping G4/G10 QTLs and many of the G10 QTLs in this region showed significant interactions with diet, as did additional QTLs on Chr 1 and Chr 4. Detection of these gene X diet interactions implies that it may be possible to modify the heritability of microbiota composition via dietary modulation.

Quantitative analysis of the patterns of Vh utilization in the expressed IgA transcripts of G10 animals showed a remarkable number of convergent VDJ rearrangements that were shared between individuals. The convergence could reflect common exposure of earlier assemblages of the microbiota as no associations were detected among the Vh utilization patterns and any of the microbiota that were measured contemporary with the Vh patterns. On the other hand, very high degrees of association were detected in the Vh utilization patterns and genetic variation in regions of Chr 12 and Chr 17 that overlap with the IgH and MHC loci. Although genetic variation in these major drivers of immunoglobulin responses had expected effects on variation in VDJ rearrangements, none of this variation accounted for variation in the contemporary microbiota and correspondingly, no overlapping microbiota/Vh were detected. Collectively, we conclude that host genetics and diet converge to shape microbiota composition, but the effects of host genetic variation are not manifest through Vh utilization patterns for immunoglobulin A.

Materials and methods

The population

The population of mice used in this study was generated from original crosses of inbred C57BL/6J (B6) female mice with male mice from a strain (HR) selected for a high level of voluntary wheel running [52]. The mice were reared through the ninth generation following a previously-described protocol [53], at which time single-pair matings were made that produced up to two litters each in the G10 generation. All G10 pups were weaned at 3 weeks and by 4 weeks, randomly allocated into either a group fed a high-fat diet or a group fed a control diet (see Table 1 in [53]). When the mice were approximately 8 weeks of age, fecal pellets were collected for DNA extraction and subsequent pyrosequencing. Mice then were given access to running wheels during each of 6 consecutive days, with exercise traits measured for all individuals in one of 13 different sequential cohorts as previously described [53]. All G10 mice were sacrificed shortly after the exercise period (between age 53 to 59 days), tail clips were taken for genotyping and segments of the ileum were removed for RNA extraction (described below). All procedures were approved by the Institutional Animal Care and Use Committee at the University of North Carolina at Chapel Hill.

SNP genotyping

We used the Mouse Universal Genotyping Array, MUGA [54], to yield genotypes for 2,058 fully informative SNPs (average spacing = 1,223 kb). SNPs were checked for significant segregation distortion, and for errors using Merlin [55], with extremely unlikely calls dropped from the analysis. A list of these SNP markers with their locations (in Mb) is given in an Appendix in Leamy et al. [53]. Genotypes of the individual animals are available at the CAGE microbiome analysis database [56].

Pyrosequencing of microbiota

DNA extraction from fecal pellets and pyrosequencing analyses were performed as previously described [10],[57]. Composition of the microbiota was assessed by deep pyrosequencing of PCR products originating from the V1-V2 region of the 16S rRNA gene with bar-coded fusion primers containing Roche-454 A or B Titanium sequencing, followed by a unique 8-base barcode sequence (B) and, finally, the 5′ ends of primer A-8FM (5′-CCATCTCATCCCTGCGTGTCTCCGACTCAGBBBBBBBBAGAGTTTGATCMTGGCTCAG) and of primer B-357R (5′-CCTATCCCCTGTGTGCCTT-GGCAGTCTCAGBBBBBBBBCTGCTGCCTYCCGTA-3′). All PCR reactions were quality controlled for amplicon saturation by quantifying and comparing band intensities of the PCR products after gel electrophoresis with standards using GeneTools software (Syngene). Amplicons from 48 individual samples were pooled in equal amounts, gel-purified, quantified by Pico Green analysis, and used for emulsion PCR (emPCR). After recovery and enrichment for DNA-containing beads, the emPCR products from the 48-sample pools were sequenced on individual regions of 2-region Picotitre plates on a Roche-GS-FLX machine using Titanium sequencing chemistry.

Pyrosequencing data processing pipelines

Raw data from the Roche-454 GS-FLX machine were first processed through specialized scripts that filtered the data on the basis of the following criteria, with sequences not meeting these criteria being removed from further analysis: (1) a complete forward primer sequence and barcode; (2) ≤2 ‘N’ in a sequence read, where N is equivalent to an interrupted and resumed sequencing signal from sequential flows; (3) a sequence of >200 NT and <500 NT; and (4) an average quality score ≥20 across the entire length of the sequence.

After filtering, reads were trimmed to remove 5′ and 3′ adapter and primer sequences, parsed by barcode into corresponding sample files, automatically associated with a matching .QUAL file containing the quality scores, and uploaded into a MySQL database and associated with sample information. MySQL database tables are stored on a database server and available to the public through the CAGE microbiome analysis database login [58]. The raw read and .QUAL files are also available at the NCBI Sequence Read Archive under Bioproject Accession PRJNA265870. To help normalize taxonomic assignment and phylogenetic distance estimates of individual sequence reads, the entire data set was initially processed through the Multi-CLASSIFIER algorithm, which assigns hierarchical taxonomic status to each sequence read based on a covariance model developed from a training set [59],[60]. This algorithm is capable of processing very large data sets and was recently shown to provide adequate taxonomic assignments to pyrosequencing data [61]. After processing through the Multi-CLASSIFIER, sequences were parsed into ‘classified’ and ‘unclassified’ sets based on meeting threshold limits of 0.8 at the genus level against the Multi-CLASSIFIER model.

Classified reads were then assigned species-level status using a BLAST pipeline that associated the read with species-level taxonomic assignment using a curated database developed from RDP and SILVA databases of curated 16S ribosomal RNA sequences [59],[62]. Sequences were considered a species match if they achieved 97% identity with a reference sequence over a minimum of 200 bases of contiguous BLAST alignment. Sequence reads that failed to meet the 0.8 scoring threshold at the genus level from the Multi-CLASSIFIER algorithm (‘unclassified’ reads) were further processed into Operational Taxonomic Units (OTUs) using CD-Hit to estimate phylogenetic distances and cluster at 97% cutoff [63]. Taxonomic status of these OTUs was approximated by BLAST against the curated database. For QTL mapping, only dominant taxonomic/OTU bins containing at least five sequences in >75% of the mice were used. This reduced the total number of taxonomic/OTU bins from >18,000 to 203 bins that were log-normally distributed and referred to herein as the Core Measurable Microbiota (CMM). In addition to removing sparse data, this threshold step also had the important function of removing bins that result from chimeric sequences, artifacts of aggressive clustering, or sequencing errors. Reads from each bin from the combined ‘classified’ and ‘unclassified’ portions of the pipeline were then normalized relative to the total number of reads for each sample. For mapping and statistical analyses, the abundances were subjected to log10 transformation to reduce the effects of extensive variation in values across multiple mice. Microbiota data were available for a total of 472 mice. Raw data are available at the database server [58] and at the NCBI Sequence read Archive under Bioproject Accession PRJNA265870.

Pyrosequencing of expressed IgA transcripts

RNA was extracted from flash-frozen segments of the ileum using the Biosprint One-for-all Vet Kits (Qiagen). Ileum segments were suspended in 1 mL of Trizol in 2 mL Cryovials along with a single 3 mm sterile tungsten carbide bead (Qiagen). Samples were homogenized for 4 min at 30 cycles/s in a Tissue Lyzer and immediately placed on ice. After a 3-min centrifugation at 14,000 rpm, 300 uL of the supernatant was transferred to individual wells of the One-for-all Vet kit 96 deep well plates and the remainder was archived at -80°C. The deep well plates were then loaded onto the Biosprint 96 and automated RNA extraction performed according to the manufacturer’s instructions and purified RNA was eluted into RNAse-free water. After quantification, cDNA was prepared from 5 ug of total RNA using oligo-dT(12-18) primers (Invitrogen) and the Superscript III protocol (Invitrogen). The resulting cDNA was diluted 1:10 into 50 uL PCR reactions containing 10% DMSO along with 0. 6 μM of PCR primers for the IgA constant region (IgAC) [64] and a universal primer for the Igh variable region (Universal Vh) [65]. The IgAC and Universal Vh primers also contained the Roche A and B Titanium adapter sequences (bold) at their 5′ ends. Primer sequence for the Roche B adapter- IgAC primer is CCTATCCCCTGTGTGCCTTGGCAGTCTCAGCTCAGGCCATTCAGAGTACA. The primer sequences for the Roche A-universal Vh primers also contained a sample-specific 8-base barcode (b) immediately upstream of the Vh region. The primer sequences for the Roche A-barcode-Universal Vh primers were: CCATCTCATCCCTGCGTGTCTCCGACTCAGbbbbbbbbAGGTSMARCTGCAGSAGTCWGG. PCR amplification was performed in 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 1.5 mM MgCl2, 2.5 U TaqDNA polymerase (Invitrogen Life Technologies), and 0.2 mM each of dGTP, dATP, dTTP, and dCTP. The PCR amplification program consisted of 30 cycles of 30 s at 94°C (2 min in first cycle), 1 min at 58°C, and 1 min at 72°C. The program was followed by 10 min at 72°C to allow extension of all products. After PCR amplification and quality control check by gel electrophoresis, the amplicons were quantified by Pico-Green and pooled at a 1:1 ratio in pools of 48 samples each followed by two cycles of cleanup using Ampure beads. Each pool was then subject to pyrosequencing on the Roche-454 FLX Titanium platform. Raw data are available at the database server [58] and at the NCBI Sequence read Archive under Bioproject Accession PRJNA265870.

To process the IgA sequence data for QTL analysis, the data were first filtered to remove low quality reads as for 16S rRNA sequencing. For each read, the predicted amino acid sequence of the appropriate reading frame was subsequently mapped by BLAST analysis against the 268 mouse Vh region genes from the ImMunoGene Tics web resource (IMGT) repertoire [66]. This yielded 67 IghV regions that were detected across 75% of the animals. For mapping, the relative abundance of transcripts from each IghV region bin for each animal was normalized by the total reads in each sample and log10-transformed.

Preliminary statistical analysis

The log-transformed values for all microbiome and IgA traits first were subjected to a multivariate analysis of variance that showed overall significance (P <0.05) for sex, diet, cohort, parity, and litter size at birth. We therefore adjusted for the effects of these factors and examined the distributions of the abundances of the residuals for each trait. Using an alpha level of 0.01, and the false discovery rate [67] to adjust the probabilities from Kolmorgorov-Smirnov tests, all traits were found to be normally distributed. We therefore calculated means and standard deviations for all taxa to provide a basic description of their distributions.

It also was of interest to estimate variance components for families, parity, and cohorts to determine the contribution of each of these random factors to the total variance of each trait. Cohort and parity (differences between first and second litters in each family) effects are due to environmental and/or epigenetic factors whereas differences among families and within litters (residual) are produced by both genetic and environmental factors. We estimated cohort, family, parity, and residual components and tested them for significance via a mixed model that also included sex, diet, and litter size as fixed factors. Once calculated, we also expressed each of the four components as a percentage of the total variance.

QTL mapping

G4 data were mapped as described [10] with R-QTL and adjusted for familial structure using the GRAIP algorithm to adjust the significance thresholds. To map QTLs in the G10 for the microbiota and IgA expression traits, we used the newly developed QTLRel program implemented in R [68],[69] with an approach previously described [53]. This program was specifically developed to account for family structure and relatedness among individuals, as occurs in advanced intercross populations, and obviated the need for GRAIP-adjustments to the significance thresholds. We used the Haley-Knott interval mapping [70] option in QTLRel to impute genotypic values between any of the 2,058 total SNPs separated by more than 1 centiMorgan (cM), effectively increasing the total number of markers to 3,023. At each of these markers, QTLRel evaluated the phenotypic values of each trait with a model that included additive and dominance genetic effects as well as sex, diet, litter size, parity, and cohort to adjust for any effects of these covariates. The program produced likelihood ratio values at each of the markers throughout the genome that were converted into LOD scores.

To evaluate all of these LOD scores for each trait, we estimated both 5% (significant) and 10% (suggestive) genomewide thresholds with the traditional permutation method [71] available in QTLRel. For both the microbiota and the IgA expression traits, we ran the permutation procedure with 1,000 iterations on each taxon and recorded the 95th and 90th percentile LOD values in each of these runs. In the QTL scans for each trait, the highest LOD score on each Chr that met or exceeded the suggestive threshold was considered to represent the site of a putative QTL. Where the LOD score distributions exhibited multiple peaks exceeding this value, each peak was considered to represent the position of an individual QTL if it was separated by a drop of at least 1.5 LOD units from other peaks. Confidence intervals for each of the QTLs also were defined by 1.5 LOD drops on either side of the peak position [72].

Because we performed multiple (203) QTL scans, we expected a number of false positive QTL results by chance alone. To assess how probable this was for each of the putative QTLs found, therefore, we subjected the probabilities (estimated from permutations) associated with their LOD scores to the false discovery rate procedure [67]. We used an n = 203 in this procedure, and it yielded a false discovery rate (FDR) for each QTL that was useful in indicating its probability of being a false positive result.

QTLRel also computed additive (a) and dominance genotypic values (d) at the site of each QTL, and tested these values for significance (P <0.05) via individual t-tests. An additive genotypic value estimates one-half of the difference between the phenotypic values for the two homozygotes, which if positive in sign, indicates that the HR allele increases the mean of the trait (if negative, it decreases the mean). A dominance genotypic value estimates the difference between the mid-homozygous and the heterozygous values, and if significant, indicates that the QTL exhibits dominance [73]. To determine the extent and type of dominance, it is useful to divide d by a. Thus a d/a ratio of approximately +1 or -1 indicates complete dominance, a ratio well over +1 (>1.5) indicates overdominance (heterozygote greater than either homozygote), and a ratio well less than -1 (<-1.5) indicates underdominance (heterozygote less than either homozygote [74]. Besides a and d values, QTLRel also estimated the percentage of the total phenotypic variation of the trait explained by each QTL.

Once QTL locations were determined, we used an option in QTLRel to test for potential interactions of the QTLs with sex and with diet. At each of the sites of the QTLs discovered, QTLRel calculated the -2 ln (likelihood) for a model containing all terms described above, but in addition, the interactions of the a and d effects with sex (or diet). Each likelihood value generated from this model was compared with that generated in the null model that did not include the interaction terms, and the differences between these likelihoods were evaluated using a chi-square test. Probabilities from these tests were evaluated using the conventional level (0.05) of significance [53],[74]. We interpreted significant QTL by sex (or diet) interactions as indicating different genotypic effects on the trait depending on the level of sex (males or females) or diet (control or high-fat). Where these interactions occurred, we tested the effect of the QTL in the separate sexes or diets and used the suggestive threshold values to assess significance.

Data availability

Sequencing data and associated sample metadata are available at the NCBI archive under Bioproject Accession number PRJNA265870. Raw and processed sequencing data and metadata are also available at ( Complete instructions for using this database are available on the login page. Links to Excel files containing the processed microbiota data and processed genotype data are also available directly on the login page (

Additional files