Background

Aquaculture is one of the fastest-growing industries and plays a vital role in fulfilling the global requirements for human protein consumption [1]. Growth rate and muscle yield are key traits affecting the profitability of aquaculture. Understanding the mechanisms for fast and efficient muscle growth is beneficial for developing strategies that improve these characteristics. Muscle growth in farmed fish is influenced by host genetics and factors such as nutrition and environmental condition [2]. Traditional phenotype-based genetic selection is used to improve fish production traits. However, it is not possible to apply this muscle-yield selection strategy on potential breeding candidates since measuring this trait requires sacrificing the fish prior to sexual maturation [3]. Family-based selection procedures have been undertaken by the United States Department of Agriculture at the National Center for Cool and Cold Water Aquaculture (NCCCWA) to improve growth rate and muscle yield in rainbow trout. The 5th-generation fast-growth line families were the base population for the fillet yield selection lines [4]. A family-based selection for muscle yield in a closed, pedigreed population was used to develop high- muscle yield (ARS-FY-H), randomly mated control (ARS-FY-C), and low- muscle yield (ARS-FY-L) genetic lines.

The gastrointestinal compartments of fish contain large microbial communities that play an essential role in homeostasis, physiology, and gut development [5,6,7]. Microbiota residing in the host gut act as a barrier for the colonization of pathogenic bacteria [8]. These bacteria can produce vitamins B and K, short-chain fatty acid, butyric acid, and different antimicrobial metabolites, which may improve the host growth rate and muscle percentage [9]. Host genetics also play a crucial role in shaping the gut microbiome [10]. In addition to host genotype, diet alteration (plant- and animal-based meal) can change the population of the host microbiota as fish subsequently obtain their microbiota from the first-feed they eat [11, 12]. In humans, previous studies showed an influence of gut microbiota on muscle fitness and degradation [13, 14]. Symbiotic microbial communities residing in humans supply short-chain fatty acids (SCFAs) to the skeletal muscle resulting in improved muscle percentage and fitness, whereas, dysbiosis (imbalance in microbiota) results in muscle degradation due to increased intestinal permeability and liberation of endotoxin into circulation [14,15,16]. Muscle constitutes about 50–60% of the fish body weight [17] and plays a significant role in the regulation of nutrient metabolism, growth, and inflammation in humans and fish [18,19,20]. Similarly, Lahiri et al. reported a correlation between the gut microbiota and the skeletal muscle mass in mice. Mice lacking gut microbiota showed muscle atrophy, decreased expression of insulin-like growth factor 1, and reduced transcription of genes associated with skeletal muscle growth and mitochondrial function [21], suggesting a potential role of the gut microbiota in improving muscle yield and reducing muscle atrophy.

Microbiota transmit nutrient signals to their hosts, which might shape the gut microbiome in every stage of life based on diet intake, behavioral change, and environmental influence [22]. Research had shown that transplantation of gut microbes in an animal may improve the muscle mass percentage, function, and reduction in muscle atrophy markers [21]. Few studies have investigated the correlation of gut microbial composition in muscle development and metabolic profile in fish. Therefore, the overall objective of our research was to study the gut microbiota in two genetic lines of rainbow trout selected for high- and low-muscle yield. We postulated that microbial diversity is associated with genetic selection for improved muscle yield.

Results and discussion

Divergent selection was practiced for fillet yield to develop high (ARS-FY-H) and low (ARS-FY-L) yield genetic lines of rainbow trout. The two fish groups used in this study were collected after two generations of selection and were statistically different in their average muscle yield as indicated by a one-way Mann-Whitney U test (p < 0.05; Fig. 1). The mean muscle yield of the high (ARS-FY-H) genetic was 0.53 ± 0.01%, and that of the low (ARS-FY-L) genetic line was 0.51 ± 0.02%.

Fig. 1
figure 1

Significant differences in the muscle yield among samples collected from the ARS-FY-H (high yield) and ARS-FY-L (low yield) rainbow trout genetic lines. The statistical significance of the differences in muscle yield between the two groups was tested by a one-way Mann-Whitney U test (p = 0.0029)

Comparison of gut assemblages in high-(ARS-FY-H) and low-(ARS-FY-L) muscle yield genetic lines

Fish were reared and harvested under identical conditions, however, there was a significant difference in gut microbes between the two harvest days in the high- (ARS-FY-H; F1,15 = 8.24, p < 0.05, R2 = 37.06%) but not low-muscle yield genetic lines (ARS-FY-L; F1,17 = 0.85, p > 0.05). Therefore, harvest day was treated as a random effect in all models to test for the main effect of genetic line. Using a linear mixed model, we tested for differences in gut alpha diversity between fish genetic lines and found that diversity was greater in high (ARS-FY-H) genetic line (LMM, χ2(1) = 14.11, p < 0.05, Fig. 2) when controlling for the harvest day effect. Both nMDS ordination and PERMANOVA results (F1,36 = 4.7, p < 0.05, R2 = 11.9%) indicated that the muscle-yield genetic line was predictive of gut microbial assemblages in rainbow trout (Fig. 3a). There were no significant differences in multivariate dispersion between gut assemblages of low (ARS-FY-L) and high (ARS-FY-H) genetic line samples. A total of 468 OTUs were shared between the two genetic lines (Fig. 3b). The high (ARS-FY-H) muscle-yield genetic line samples had almost double the number of unique OTUs compared to the low (ARS-FY-L) muscle-yield genetic line.

Fig. 2
figure 2

Alpha diversity of ARS-FY-H (high) and ARS-FY-L (low) genetic lines. The box plots indicate higher microbial diversity in the ARS-FY-H samples (p < 0.05). Boxplots show the median value as a bold black bar, the upper and lower limits of the box being the third and first quartile of the data, the whiskers extend up to 1.5 times the interquartile range, and open circles are outlier points

Fig. 3
figure 3

A. nMDS ordination of microbial assemblage structure in the ARS-FY-H (high) and ARS-FY-L (low) genetic lines. B. Venn-diagram showing numbers of shared and unique OTUs between the two genetic lines

Together, these results indicate that the muscle-yield genetic lines are predictive of gut microbial assemblages and suggest that host genetic selective breeding might help curate a particular gut microbial assemblage. This notion is supported by recent studies in tilapia, showing host genetic selection for cold thermal tolerance has an effect on the microbiome [23]. Similarly, studies in stickleback fish identified an association between gut microbial differences and host genetic divergence [24]. Previous work from our lab group revealed significant variation in beta diversity of the bacterial communities of rainbow trout families showing variation in growth rate [25]. Together, these studies indicate a substantial impact of host selection or genetics in predicting host-associated microbial assemblages.

Taxonomy and functional diversity correlate with selection for fish muscle yield

A total of eight phyla, 12 classes, 36 families, and 64 genera had significant differences in abundance between the two genetic lines (Kruskal-Wallis test; p < 0.05, Additional file 1). Phyla Bacteroidetes, Fusobacteria, Deniococcus, Acidobacteria, Patescibacteria, and Nitrospora had higher abundance in the high (ARS-FY-H) muscle yield genetic line, whereas, the phylum Tenericutes had higher relative abundance in the low (ARS-FY-L) muscle yield genetic line (Fig. 4). Using a genus-level comparison, some unclassified genera belonging to family Burkholderiaceae and Gammaproteobacteria had higher abundance in high (ARS-FY-H) muscle yield genetic line. The genera Bacteroides, Deniococcus, Lutelibacter, Nitrosomonas, Pasteurella, and Negativibacillus were present only in the high (ARS-FY-H) muscle yield genetic line.

Fig. 4
figure 4

Comparison of the bacterial phyla with > 5% abundances between the ARS-FY-H (high) and ARS-FY-L (low) genetic lines. Boxplots show the median value as a bold black bar, the upper and lower limits of the box being the third and first quartile of the data, the whiskers extend up to 1.5 times the interquartile range, and the closed colored circles are outlier points

Higher abundances of the phyla Bacteroidetes, Fusobacteria, Deniococcus might be associated with higher muscle percentage, as these taxa are known symbionts and produce metabolites such as SCFAs that are beneficial to the host [26,27,28,29]. For example, genera in the phylum Bacteroidetes are associated with degradation of complex protein polymers and these are responsible for the formation of SCFAs like succinic acid, propionic acid, and acetic acid as the end products [28]. Similarly, genera in the phylum Fusobacteria, a phylum reported to be abundant in freshwater fish guts [30, 31], may produce butyrate, which supplies energy to gastrointestinal cells and inhibits pathogens in freshwater fish [32]. Fusobacteria are known to colonize the gut of zebrafish, synthesize vitamins, excrete butyrate, and metabolites associated with improving fish health [33]. Similarly, bacteria in the phylum Deinococcus can metabolize glucose [34]. Conversely, phylum Tenericutes had higher abundance in the low (ARS-FY-L) muscle yield genetic line samples. This phylum is found in the gut of Fathead minnows [35], however, the functional role of this phylum is not well studied in fish. A study on crabs showed that the Tenericutes phylum is correlated with Hepatopancreatic necrosis disease [36].

There are 64 taxonomic groups with significant differential abundances between the two muscle yield lines. Among them, 21 groups have > 10% and 4 had > 15% differential abundances. These 21 taxonomic groups include Arcicella, unclassified Clostridiaceae, unclassified Burkholderiaceae, Pedobacter, Absconditabacteriales, Arsenicibacter, Chiinophagaceae, Cloacibacterium, Hydrogenophaga, Rhizorhapis, Rhodoferax, unclassified Sphingomonadaceae, Sphingorhabdus, unclassified Spirosomaceae, Thermomonas, Thiothrix, Undibacterium, unclassified Veillonellaceae, Chryseobacterium. All of these taxonomic groups have greater abundance in the high (ARS-FY-H) muscle yield genetic line, whereas, the genus Mycoplasma (with differential abundance 16.8%), and unclassified Firmicutes (with differential abundance 7.9%) have higher abundance in the low (ARS-FY-L) muscle yield genetic line (Additional file 1).

A study in rainbow trout showed that the genus Clostridium butyricum in the family Clostridiaceae, enhances the disease resistance in the host against the pathogen Vibrio by increasing the phagocytic activity of leucocytes [37]. Inhibiting pathogenic bacteria from colonizing the host might help to improve host health, including growth rate and metabolism. In addition, these genera have been used as a probiotic to improve immune response and survival in Paralichthys olivaceus fish [38]. Arcicella, belonging to phylum Bacteroidetes has been identified in freshwater environments, and these bacteria can ferment carbohydrates [39]. Similarly, bacteria from the genus Pedobacter is dominant in the gut of healthy Atlantic Salmon [40].

The genus Bacteroides is an important group of bacteria colonizing the intestine of a wide variety of hosts, including humans [41, 42], mice [43], and tilapia fish [44]. These bacteria ferment carbohydrates and produce short-chain fatty acids (SCFAs) like acetate, propionate and butyrate. These SCFAs are key regulators of skeletal muscle metabolism and function [45]. A recent study showed that species of Cloacibacterium isolated from the Abalone intestine can hydrolyze starch and ferment sugars like glucose, galactose, fructose, maltose, mannose, and produce fatty acids [46]. Both the Bacteroides and Cloacibacterium have greater abundance in high (ARS-FY-H) muscle yield genetic lines and are associated with digestion, fatty acid metabolism, pathogen inhibition, which, are linked to host health and digestion. Family Burkholderiaceae had a higher abundance in high (ARS-FY-H), and genus Burkholderia belonging to this family were reported as the most abundant genus in the fish gut [47]. However, their role in muscle yield and or host health in fish has not been reported before.

Taxa with significantly higher abundance in the low (ARS-FY-L) muscle yield genetic line were Mycoplasma (16.8%) and the unclassified phylum Firmicutes (7.9%). A previous study showed that the genus Mycoplasma is the most abundant taxon in adult Atlantic salmon [48]. Bacteria belonging to this genus have also been described as pathogenic in gills of the fish Tinca tinca [49]. The lesser abundance of unclassified Firmicutes in the low (ARS-FY-L) muscle yield genetic line might be associated with decreased body weight or correlated with a decrease in muscle percentage in fish. The ratio of Firmicutes to Bacteroides has been shown to correlate with weight gain in humans [50] and this trend could exist in fish as well. A previous study in our laboratory showed that body weight of rainbow trout is moderately correlated with muscle yield, regression coefficient (R2) values of 0.56 [51].

Tax4Fun analyses were used in this study to enumerate differential functional capabilities of microbial communities in high (ARS-FY-H) and low (ARS-FY-L) genetic lines (Fig. 5). Bacterial functional pathways related to calcium signaling, pentose and glucuronate interconversions, synthesis and degradation of ketone bodies, linoleic acid metabolism, lysine degradation, and arachidonic acid metabolism were enriched in most of the high (ARS-FY-H) genetic line samples. Microbial pathways involved in fatty acid metabolism are known to supply energy to muscle cells, which is essential for muscle growth [52], and was more abundant in the high (ARS-FY-H) genetic line. Genus Bacteroides belonging to phylum Bacteroidetes showed higher abundance in the high (ARS-FY-H) genetic line, and is associated with fatty acid metabolism thus producing SCFAs [53]. A study in mice revealed that SCFAs produced by microbes in the gut supported muscle function by preventing muscle atrophy and boosting muscle strength [21].

Fig. 5
figure 5

Heatmap showing metabolic pathways that statistically differed between the ARS-FY-H (high) and ARS-FY-L (low) genetic lines. Samples and pathways are clustered based on Euclidean distances. The abundance of each pathway was scaled to a range (− 4, 4) with red and blue colors representing higher and lower pathway abundance, respectively

Similarly, microbe mediated lysine degradation increased production of SCFAs (butyrate and acetate) in the human gut [54]. The genus Fusobacterium was more abundant in the high (ARS-FY-H) muscle yield genetic line and known to be involved in lysine degradation and production of SCFAs [55]. Microbial synthesis and degradation of ketone bodies (KB), identified in the ARS-FY-H samples, were reported as associated with increased muscle mass in humans [56, 57]. Ketone bodies make an energy substrate that supplies energy to the brain and muscles, contributing to the maintenance of energy homeostasis through regulation of lipogenesis [56]. Arachidonic acid metabolism is essential for the functions of skeletal muscle and the immune system, which might be associated with increased muscle mass and host health [58, 59]. The family Clostridiaceae had greater abundance in high (ARS-FY-H) muscle yield genetic line fish and was reported as correlated with enriched pentose and glucuronate interconversions [60]. Bacteria associated with this pathway are involved in the breakdown of complex substrates in pig gut microbiomes and improved carbon and energy uptake in the host [60].

Fish with low (ARS-FY-L) muscle yield had unique functional profiles that differed from high (ARS-FY-H) muscle yield samples. For example, pyruvate metabolism, amino acid metabolism, folate biosynthesis, glycosphingolipid biosynthesis, glyoxylate and dicarboxylate metabolism, adipocytokine signaling pathway, and two-complement system were enriched in most of the low (ARS-FY-L) genetic line samples (Fig. 5). Glycosphingolipids act as negative regulators of skeletal muscle differentiation and growth in rats [61,62,63]. Similarly, bile secretion is associated with lipid digestive functions [64, 65] and may reduce adiposity in the host, which might result in lower muscle mass. In spite of the differential enrichment of pathways between the muscle yield groups, further investigation should be done to validate the role of these microbial pathways in the host.

Conclusion

In this study, the gut microbial assemblage (alpha and beta) diversity correlated with selectively-bred muscle yield genetic lines. Microbial differences between the two genetic lines could be observed as a host genetic selection signature on the gut microbiota. Both differences in taxonomic groups of microbes and their functional predictions correlated with muscle yield. Our results suggest a role of specific microbial taxa in improving host muscle growth and metabolism.

Methods

Fish husbandry and harvest

Fish were produced at the NCCCWA as a part of our ongoing selective breeding program. Briefly, full-sibling families (ARS-FY-H = 99; ARS-FY-L = 23) were produced from single-sire × single-dam mating events. Each family was reared separately from hatch through approximately 30 g (4 months post-hatch) when 8 fish per family were anesthetized (100 mg/L tricaine methanesulfonate, M-222) and uniquely tagged by inserting a passive integrated transponder (Avid Identification Systems Inc., Norco, CA) into the peritoneal cavity. After tagging, fish were comingled and reared in a single, 1800-L tank that also housed contemporary fish (n = 118) from the ARS-FY-C line. The tank received identical water from a partial re-use system and water temperature was maintained at approximately 13 °C for the entirety of the grow-out period. Fish were split at random into a total of two 1800-L and two 3800-L tanks as they grew to maintain biomass densities below 100 kg/cubic meter. At approximately 13 months of age, fish used in the current study were split into three replicate 800-L tanks (n = 46 fish per tank). One week before harvest, the fish were split at random into four 800-L tanks (34–35 fish per tank) to allow the harvest of two complete tanks on each of two successive days and thus minimize netting-associated stress associated with harvesting of a partial tank. Fish were fed a commercial diet (Finfish G, 42% protein, 16% fat; Ziegler Bros Inc., Gardners, PA) using automatic feeders (Arvotec, Huutokoski, Finland) that provided feed at a daily ration that was considered as slightly below satiation.

At approximately 15 months post-hatch, fish were euthanized using an overdose of anesthetic (300 mg/L MS-222) and processed for analysis of the muscle yield trait. Fish were not fed the day prior to and the day of harvest. Twenty families were pre-selected from the high (ARS-FY-H) and low (ARS-FY-L) genetic lines (40 families total within four tanks; n = 1 fish per family) for fecal collection at harvest; selection was based on divergent mid-parent breeding values and to maximize genetic diversity within each line. Due to a mortality of a high (ARS-FY-H) genetic line fish, 39 fecal samples were collected for this study, 19 from the high (ARS-FY-H) muscle yield genetic line (11 and 8 samples from the first and second harvest dates, respectively) and 20 representing the low (ARS-FY-L) genetic line (13 and 7 samples from the first and second harvest dates, respectively). Fecal samples were stripped manually into sterile Eppendorf tubes (Eppendorf, Hauppauge, NY), then stored in a − 80 °C freezer until analysis. Fish were eviscerated and the carcasses were placed on ice and held in a 4 °C refrigerator overnight for analysis of muscle yield the following day.

DNA extraction, library preparation and sequencing

To extract DNA, fecal samples from 19 high (ARS-FY-H) and 20 low (ARS-FY-L) genetic line fish were subjected to DNA isolation using a Promega Maxwell DNA Isolation Kit (Promega Corporation, Madison, WI), as we previously described [25] with a minor modification where 20 μL of lysozyme was added in samples to facilitate cell wall lysis. Briefly, 200 mg of fecal sample was added to a microtube containing 160 μL of incubation buffer, 20 μL proteinase k solution, and 20 μL lysozyme. The mixture was incubated at 70 °C overnight, and after incubation, 400 μL of lysis buffer was added to the mixture and the sample was vortexed briefly. The samples were then subjected to the Maxwell 16 Automated DNA purification machine and the DNA was collected in a 50-μL elution buffer.

Library preparations and sequencing were done based on 16S rRNA sequencing strategy using the Illumina 16S Metagenomic Sequencing Library Preparation Guide. Briefly, 10 μM of 515F and 10 μM of 806R primers amplifying V4 regions were used to target 16S rRNA gene using McLAB HiFi master mix using polymerase chain reaction (PCR). The final PCR reaction consisted of 12.5 μL 2x HiFi, 1 μL of 10 μM 515F primer and 1 μL of 10 μM 806R primer, 5 μL DNA and 5.5 μL sterile nuclease-free water. The PCR product was then subjected to size selection using a magnetic bead capture kit (Ampure; Agencourt). After the first PCR clean up, dual indexed primers were used to amplify the V4 region as described by Kozich et al. [66]. After indexing, samples were again size selected using a magnetic bead capture kit (Ampure; Agencourt). PCR products were quantified after amplification and indexing using a Qubit fluorometer (Invitrogen, Carlsbard, CA) and fragment size (approximately 450 bp) was visualized on a 1.5% gel electrophoresis stained with SYBR safe, then normalized to 4 nM. Samples were loaded onto an Illumina MiSeq flow cell and sequencing was done using 250 bp-paired end sequencing using a 500 cycle V2 reagent cartridge (Illumina, Inc., San Diego, CA) according to the manufacturer’s instructions [67].

Bioinformatics analysis

A total of 28,518,046 paired-end raw sequences were obtained during the Miseq run. Sequencing data were analyzed using Mothur (v.1.40.2, www.mothur.org) according to the Mothur Illumina Miseq standard operating procedure (SOP) [66, 68] with several modifications. After forming contigs, the total number of sequences was 11,020,368, and pcr.seqs command was used to trim primers and adaptors to the V4 region. The median length of the sequences was determined as 253 by using the summary.seqs command [69]. Screen.seqs command was used to remove sequences with length > 254 bp and < 251 bp containing homopolymers of > 8, and with ambiguous base calls. The split.abund command was used to keep Operational Taxonomic Units (OTUs) with more than two reads [70]. The SILVA v123 database [71] was used to align the sequences and those that failed to align, or classified as Archaea, chloroplast, Eukaryote, mitochondrial, or unknown were excluded from the analysis. Chimeric sequences were detected by chimera.vsearch and removed from the analysis. The remaining sequences were clustered using cluster.split [72] at a threshold of > 97% sequence similarity. Operational Taxonomic Units with relative abundance < 10 across all samples were removed from the analysis by using the remove.rare command [73, 74]. The final data set was subsampled at 2420 sequences to normalize the data set for statistical analyses. DNA extraction and library preparation blanks were included during sequencing and bioinformatics, and all OTUs within these samples were removed from the final analysis. The code used during bioinformatics analysis, the taxonomy file, and the shared sample × OTU matrix are all included in Additional files 2, 3, and 4, respectively.

Statistical analysis

To test for the statistically significant differences of the muscle yield between the two groups, a one-way Mann-Whitney U test (Prism, GraphPad Software, Inc., La Jolla, CA) was performed. Statistical analyses (alpha diversity, beta diversity, microbial functional profiling pathways) were performed in R version 3.5.2 using the packages vegan [75], plyr [76], dplyr [77], ggplot2 [78], lmerTest [79], pheatmap, MuMIn [80], lme4 [81], Tax4Fun [82], DEseq2 [83], rcompanion [84], grid [85], and TidyVerse [86].

Alpha and Beta diversity analysis of fecal samples between high (ARS-FY-H) and low (ARS-FY-L) muscle yield genetic lines

Sixteen fecal samples (that passed QC during bioinformatics analysis) from the high (ARS-FY-H) and 18 fecal samples (that passed QC) from the low (ARS-FY-L) muscle yield line were used for this analysis. A Tukey’s ladder of power transformation was performed to fit inverse Simpson values to a Gaussian distribution. Alpha diversity between the genetic lines was compared using a linear mixed-effects model (LMMs) with the genetic line as a fixed effect and harvest day set as a random effect (package lme4) [87].

Beta diversity was calculated to test if muscle yield genetic line was predictive of the gut microbiota. To do this, a Bray-Curtis dissimilarity matrix was generated using the vegdist function in the Vegan package [88]. The betadisper function was used to test for the homogeneity of multivariate dispersion between gut assemblages from high (ARS-FY-H) and low (ARS-FY-L) muscle yield genetic lines. The metaMDS function in Vegan was used to generate a non-metric multidimensional scaling ordination (nMDS), which was then plotted using ggplot2 [89]. The adonis function in Vegan was used to perform PERMANOVA on Bray-Curtis dissimilarity values to determine if the genetic line was predictive of gut assemblages, while controlling for harvest day effect (harvest day as strata, 999 permutations). An indicator species analysis was performed in Mothur to determine the microbial assemblages that were characteristic of muscle-yield genetic lines [29]. A Kruskal-Wallis test was used to assess differences in relative abundances of taxa between the genetic lines. The nMDS ordination showed a pattern suggesting a ‘harvest day’ based effect; therefore, we subset our samples into two data frames based on independent harvest days. Both data frames had nearly equal numbers of gut microbial samples from the two genetic lines, including 10 samples from high (ARS-FY-H) and 11 samples from low (ARS-FY-L) - in harvest day 1, and 6 from high (ARS-FY-H) and 7 from low (ARS-FY-L) - in harvest day 2. Separate Bray-Curtis dissimilarity matrices were generated for each data frame, followed by nMDS ordination values calculated and plotted in ggplot2. PERMANOVA was used to test for differences in microbial assemblages with genetic line set as a fixed effect.

Beta diversity analysis of the fecal samples, feed and water

Feed and water samples were sequenced to determine whether the fecal microbial assemblages of trout differed from the environment. Beta diversity was calculated based on a Bray-Curtis dissimilarity matrix representing sample-to-sample pairwise distances using the vegdist function, and non-metric multidimensional scaling (nMDS) ordination was used for visualization using the metaMDS function and plotting in ggplot2. The adonis function was performed to determine if sample type (gut, feed, water), set as a fixed effect, was descriptive of microbial assemblages. The detailed methodology and results for this experiment have been included in Additional file 5.

Functional annotation of 16S rRNA sequence data

Phylotype based OTU clustering and classification was performed using the phylotype command in Mothur to investigate microbial functional and metabolic capacities of OTUs. The shared file was then converted to the biome format using the make.biom command in Mothur. The Tax4FUN package in R was used to predict the microbial functional and metabolic capacities by linking 16S rRNA gene-based taxonomic profiles to KEGG reference profiles [82]. The normalized KEGG pathway output was used to investigate the enrichment of microbial pathways between the genetic line samples using DESeq2. Informative pathways associated with host-microbiome interactions with an average FTU score of 0.55 and an adjusted p-value less than 0.001 were selected for heatmap visualization using the pheatmap R package [90]. The R code used during the analysis has been included in Additional file 6, and statistical results for all analyses are included in Additional file 7.