Background

Genetic diversity is an important resource in animal production and conservation. Loss of genetic diversity can impact a species or a breed’s ability to adapt to changing environmental or production pressures [1] and can lead to the accumulation of lethal or deleterious alleles with detrimental effects on health and production [2]. Adaptability of animal production requires the preservation of genetic diversity, and the identification of signatures of selection can provide essential insights that can be useful for conservation and breed improvement objectives [3, 4].

Artificial and natural selection both result in changes in allele frequencies that can lead to the fixation of certain alleles [5]. The neutral regions that surround alleles under selection tend to lose genetic variation due to the hitchhiking effect, which increases linkage disequilibrium (LD) and can result in the formation of selective sweeps [6,7,8,9]. Analyses of runs of homozygosity (ROH) can be used to detect regions that have experienced loss of heterozygosity due to the presence of selective pressures [10]. Analysis based on Wright’s fixation index (FST) identifies differences in allele frequencies between populations and is one of the most commonly used methods to identify single nucleotide polymorphisms (SNPs) under selection [11,12,13]. In sheep, these analyses have been used to understand the genomic regions under selection for traits such as environmental adaption [14,15,16,17], parasite resistance and susceptibility [18], morphological traits [19,20,21], wool quality [22], and production traits [23] in many breeds.

The purpose of this study was to evaluate measures of genetic diversity and signatures of selection in three popular U.S. sheep breeds (Rambouillet, Katahdin, and Dorper) with diverse characteristics. The U.S. Rambouillet was first established in 1840 with the import of fine-wool sheep from France [24]; today, Rambouillet is a multi-purpose breed that is noted for wool and carcass quality. Compared to the other breeds of this study, Rambouillet is a larger framed, later-maturing breed [25] and is prominent in the semi-arid western states where much of the U.S. sheep production is concentrated [26]. The Katahdin is a composite hair breed that was developed in the 1950s from St. Croix hair sheep and wool breeds including the Suffolk and Wiltshire Horn [27, 28]. Katahdin is a fast-growing and prolific breed that is regarded for parasite resistance and suitability to warm, tropical environments [29, 30]. Dorper sheep are early-maturing and produce heavily muscled carcasses with many favorable palatability and tenderness characteristics [31]. To develop the Dorper breed, Dorset Horn and fat-rumped Blackhead Persian breeds were crossed to combine maternal traits with wool-shedding ability and adaptability to harsh environmental conditions [32].

Genotype data of Rambouillet, Katahdin, and Dorper animals were analyzed for inbreeding, effective population size (Ne), and signatures of selection (through FST and ROH) in order to gain understanding of the selection pressures that affect these breeds. Genes present in regions under selection were further evaluated to identify the biological pathways that are most likely affected. The results of this study provide insights into the genetic variation present in breeds commonly raised in the U.S. for their meat and wool quality (Rambouillet), parasite resistance (Katahdin), and environmental hardiness (Dorper).

Methods

Animal sampling and DNA genotyping

Sheep belonging to the Rambouillet, Katahdin, and Dorper breeds were selected for analyses of inbreeding and genetic diversity. These breeds were chosen to facilitate analyses of sheep that have been selected for diverse purposes, including fine wool (Rambouillet), parasite resistance (Katahdin), and environmental hardiness (Dorper). In total, 745 Rambouillet sheep were sampled from the Texas A&M AgriLife Research flock (TAMU; N = 403) and from central performance ram tests held at the North Dakota State University (NDSU; N = 159) and University of Wyoming (UWY; 183), representing more than 30 seedstock producers located in Colorado, Montana, North Dakota, South Dakota, and Wyoming. The Katahdin sheep (N = 581) were sampled from 20 flocks located across the U.S. (Arkansas, Georgia, Idaho, Indiana, Missouri, New York, Ohio, Oregon, Texas, Virgina, West Virginia, and Wisconsin). The Dorper sheep were sampled from the TAMU research flock (N = 265), which was founded from Dorper or White Dorper rams and ewes incorporated from over 20 producers throughout the U.S. in the early 2000s [33]. Over the last two decades, this flock has been managed as one cohort. In total, 1591 sheep were analyzed in the current study.

Sample collection and genotyping of these sheep have been described previously [34,35,36]. Briefly, DNA extraction and genotyping of Katahdin sheep were conducted at Neogen Corporation-GeneSeek Operations, Lincoln, NE, USA, whereas DNA from the Rambouillet and Dorper animals was extracted from whole blood samples using the phenol–chloroform method or from tissue samples by AgResearch (GenomNZ, AgResearch, New Zealand) [37]. Genotyping was conducted with the high-density (HD) Illumina 600K SNP BeadChip (Illumina Inc., San Diego, CA, USA) that includes 606,006 SNPs, the Applied Biosystems™ Axiom™ Ovine Genotyping Array (50K) that includes 51,572 SNPs (Thermo Fisher Scientific, catalog number 550898), or the AgResearch Sheep Genomics 60K SNP chip that includes 68,848 SNPs (GenomNZ, AgResearch, New Zealand). In total, 132 Dorper and 243 Rambouillet samples were genotyped on the AgResearch chip, 133 Dorper and 502 Rambouillet samples were genotyped on the Applied Biosystems array, and the 581 Katahdin samples were genotyped on the HD Illumina chip.

Duplicate markers within a panel were filtered to retain the SNPs with the highest call rate (CR) at each unique position. Compatible SNPs across panels were merged using the Plink v1.9 software [38, 39]. Non-autosomal SNPs and SNPs with a CR lower than 90% were removed, which resulted in a dataset of 36,113 SNPs. The 1591 sheep had a genotype CR ≥ 90%. Removing SNPs with a low minor allele frequency (MAF) or that are in high LD can hinder the detection of ROH [40] and MAF thresholds can affect the comparison of structure between populations [41]. For these reasons, no further filtering was applied to the data prior to principal component analysis (PCA), ROH analyses, Wright’s fixation index (FST) outlier approach, and Ne estimation.

The Rambouillet and Katahdin sheep analyzed in this study were sampled from geographically distant flocks and can be considered representative of their breed in the context of the U.S. sheep industry. Due to a limited representation of the current breed diversity in the sampled Dorper animals, the allele frequencies reported in these sheep are not anticipated to be representative of the U.S. Thus, for the Dorper breed, although analyses of these animals are still valuable and worth reporting, care should be taken to avoid over interpretation. For this reason, the Dorper ROH and Ne results are shared as supplementary data. The Dorper data were used in the pairwise FST signatures of selection analyses to facilitate interpretation of outlier SNPs that are potentially under selection in the Rambouillet and Katahdin breeds.

Population structure, inbreeding, and effective population size

The population structure between breeds was visualized through PCA that was conducted with the Plink v1.9 software. The results were visualized with the package ggplot2 in R version 4.2.3, with the first principal component (PC1) plotted on the x-axis and PC2 plotted on the y-axis [42,43,44]. The proportions of variance explained (PVE) by PC1 and PC2 were calculated by dividing the first and second eigenvalues, respectively, by the sum of all eigenvalues.

To understand the level of inbreeding within each breed, homozygosity was evaluated through the proportion of observed and expected homozygous SNPs and through two methods of inbreeding calculations. The number of expected and observed homozygous SNPs and the method-of-moments inbreeding coefficient (F) were calculated using Plink v1.9 (–het) for each animal [45]. The F statistic is calculated as ([observed homozygous count] − [expected count])/([total observations] − [expected count]). ROH-based inbreeding coefficients were calculated with the package detectRuns using the parameters described for the ROH analysis. The genome-wide ROH-based inbreeding coefficient, FROH, was calculated as the sum of the individual’s ROH length over the length of the genome [46]. The distribution of FROH by breed was visualized with ggplot2. The Kruskal–Wallis test was used as a non-parametric alternative to the one-way analysis of variance (ANOVA) to determine whether there were statistical differences in F or FROH between breeds [47]. The Kruskal–Wallis test was followed by the Dunn’s test to calculate P-values for pairwise breed comparisons [48].

The Ne of each breed was estimated using three methods. Historic Ne was estimated using the LD-based method in the SNeP software version 1.11 for each breed [49]. Default parameters were set for each analysis. LD was measured using r2, the squared correlation coefficient between pairs of SNPs [50]. The rate of the decline in Ne was calculated between each consecutive reported generation and for the overall distribution. The NeEstimator v2.1 software was used to estimate current Ne for each breed using the LD method within the random mating model [51]. For comparison, current and historic Ne were also estimated using the GONe software [52], with default parameters. Results of SNeP, NeEstimator, and GONe were visualized in R with ggplot2.

Runs of homozygosity and FST analyses

Analysis of ROH was conducted to identify regions which may be under selection pressure in the Rambouillet, Katahdin, and Dorper breeds. The ROH were identified using the sliding window method with the R package detectRuns [46]. Each window comprised 15 SNPs with a maximum of two opposite SNP genotypes (heterozygous/homozygous) and one missing SNP allowed per window. A ROH was required to have a minimum length of 250,000 bp and to contain at least 30 SNPs. There was a minimum density of one SNP per every 10,000 kb and a maximum gap between SNPs of 10,000 kb. A homozygous window threshold of 0.05 was used to determine SNP inclusion in a ROH. Results were visualized using the CMPlot, ggplot2, and patchwork packages in R [53, 54]. The regions that contained a ROH in more than 50% of the individuals of a breed were selected for further analyses.

The average length of ROH called in Katahdin and Rambouillet were investigated for significant differences between breeds. All data were evaluated for normality with the Shapiro-Wilks test prior to model selection with the ‘shapiro.test’ function in R [44, 55]. The Wilcoxon unpaired two-sample test was performed using the ‘compare_means’ function in the ggpubr package of R [43, 56].

To better understand the genetic differences between breeds, Wright’s fixation index (FST) was calculated in Plink v2.0 using the method proposed by Weir and Cockerham [39, 57]. Analysis with Plink v2.0 was preferred over Plink v1.9 because of its ability to generate pairwise FST output files simultaneously for all comparisons. Per-variant FST estimates were generated for each SNP and genome-wide FST estimates were reported as ratio-of-averages between each pair of breeds, which were calculated from the ratio of the average variance components [58]. Fisher’s exact test in R was used to estimate p-values from contingency tables constructed with genotype counts for each SNP and each pair of breeds. Outlier SNPs with FST estimates greater than three standard deviations above the mean and with significant Fisher’s test P-values (< 0.05) were considered to differ greatly between breeds. For each pairwise breed analysis, FST regions of interest were defined by significant outlier SNPs located within 200,000 bp of each other, ± 100,000 bp before and after the flanking markers. For the purpose of comparison, FST was also estimated through the BayeScan 2.1 software with default parameters [59]. Genotype files were prepared for import into BayesScan using the ‘genomic_converter’ tool in the R package radiator [60].

The identified ROH and FST regions were queried through the National Center for Biotechnology Information (NCBI) genome browser tool [61, 62]. The ARS UI_Ramb_v2.0 genome assembly [63] was used for all SNP positions and gene region analyses. Characterized genes falling within a ROH or FST region were recorded for pathway enrichment analyses. Where possible, the predicted human ortholog of Ovis aries genes containing the LOC symbol (pseudo or uncharacterized genes) were recorded.

Pathway analyses

Genes of interest from ROH and FST analyses were investigated to identify associated biological pathways. Pathway enrichment analyses were conducted through the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Mapper search tools [64,65,66,67]. Due to reference database availability, queries through GO for biological process, molecular function, and cellular component were made against the Bos taurus reference database and queries of KEGG Mapper were made against the Homo sapiens reference. Pathway analyses were conducted for genes located within Rambouillet, Katahdin, or Dorper ROH islands and for genes located within Katahdin-Rambouillet, Katahdin-Dorper, or Rambouillet-Dorper FST regions. In addition, pathway analyses were conducted for genes which were in common between both FST analyses of Rambouillet (Katahdin-Rambouillet and Rambouillet-Dorper) or both FST analyses of Katahdin (Katahdin-Rambouillet and Katahdin-Dorper). To better understand the biological implications of genes within ROH islands and FST outlier regions, the STRING database was queried for protein–protein interaction networks and functional enrichment analysis of these genes [68].

Results

Population structure, inbreeding, and effective population size

The population structure of the studied animals was first evaluated through PCA. The analysis revealed three clearly distinct clusters, with Rambouillet, Katahdin, and Dorper sheep clustering more closely with individuals of the same breed than individuals of other breeds (see Additional file 1: Fig. S1). The model’s PC1 had an eigenvalue of 179.82 and was estimated to explain 35.99% of all the variation while PC2 had an eigenvalue of 116.82 and was estimated to explain 23.38% of the variation. The largest principal component separated the Katahdin and Rambouillet breeds, while PC2 separated Dorper from both Katahdin and Rambouillet. The placement of these breeds is similar to the results of previous PCA, which have shown the separation of breeds originating from West Africa, South Africa, and the Iberian Peninsula [69, 70].

The average proportion of expected and observed homozygous SNPs for Rambouillet, Katahdin, and Dorper are reported in Table 1. The observed homozygosity was similar to previous estimates for Rambouillet [71] and was slightly higher than previous estimates for Katahdin and Dorper [72, 73]. For each breed, the average FROH estimate was higher than the average F estimate. The Kruskal–Wallis P-value for ‘F ~ Breed’ was 4.90e−04, and for ‘FROH ~ Breed’ was 2.89e−25. Based on the Dunn’s test, Katahdin had significantly higher F inbreeding coefficients than Rambouillet; for FROH, the Rambouillet breed had significantly lower inbreeding coefficients than both Dorper and Katahdin (Fig. 1). Overall, FROH estimates were higher than previously reported estimates for worldwide sheep populations [74, 75]. Previous studies have sampled fewer individuals per breed, which may contribute to these observations.

Table 1 Homozygosity and inbreeding coefficients for the Dorper, Katahdin, and Rambouillet breeds
Fig. 1
figure 1

Distribution of F and FROH inbreeding coefficients by breed. Boxplots of the distribution of F and FROH for Dorper, Katahdin, and Rambouillet breeds. The horizontal black line represents the overall mean inbreeding coefficient calculated from both F and FROH estimates

Estimates of Ne were calculated using the SNeP, NeEstimator, and GONe software. The Ne of the Rambouillet and Katahdin breeds were estimated from 759 generations ago to 13 generations ago using SNeP, from 741 generations ago to one generation ago using GONe, and for the current generation (given the notation 0) with NeEstimator (Fig. 2). Estimations with GONe were nonlinear, while estimations with SNeP showed a consistent decline in Ne from the furthest generation to the most recent. These general observations of SNeP and GONe trajectories are similar to the Ne results reported in the Master thesis work of D Adepoju with cattle data [76]. The GONe and SNeP estimates began to show more agreement approximately 30 generations ago (Fig. 2a). In the most recent generation, Katahdin were estimated to have an Ne of 161.4 by NeEstimator or 436.1 by GONe, while Rambouillet had Ne estimates of 56.9 by NeEstimator or 111.8 by GONe (Fig. 2b).

Fig. 2
figure 2

Recent and historic Ne estimated through three methods. a Ne estimates from 30 generations ago to the current one for Katahdin and Rambouillet, estimated with GONe, NeEstimator, and SNeP; b Historic Ne estimates for Katahdin and Rambouillet estimated with GONe and SNeP

The rate of change (m) of Ne size over generational time can give insight into the rate of diversity loss. These calculations were made with the results of SNeP, as these estimates followed a linear pattern. The overall rate of change was m = 3.02 for Rambouillet and m = 2.76 for Katahdin, suggesting that the Rambouillet breed has lost genetic diversity at a slightly more rapid rate than the Katahdin breed. The most rapid decrease in Ne occurred between 80 and 98 generations ago for the Katahdin (m = 5.11; Table 2) and between 15 and 17 generations ago for the Rambouillet breed (m = 9.00; Table 3). Katahdin also had a high rate of change 15 to 17 generations ago (m = 5, Table 3). In addition, both Rambouillet and Katahdin showed higher rates of change between 20 and 23 generations ago, with Rambouillet having an m of 7.33, and Katahdin having an m of 5.00 between these intervals. Each subsequent generation had a lower Ne estimate than the preceding generation, and in each generation the estimate for Rambouillet was greater than that for Katahdin. The Ne estimates for these breeds became more similar as the generations became closer to the current one. Analysis results for the Dorper sheep are reported as supplementary data (see Additional file 2: Table S1). The greatest rate of change for Dorper was estimated from 17 to 20 generations ago, with m = 4.00, and the overall rate of change was m = 2.05.

Table 2 Effective population size (Ne) for Katahdin sheep
Table 3 Effective population size (Ne) for Rambouillet sheep

Runs of homozygosity and Wright’s FST analyses

There were 51,992 and 72,946 ROH called for Katahdin and Rambouillet, respectively. The ROH were categorized into five classes by length, with classes defined by 0–6 Mb, 6–12 Mb, 12–24 Mb, 24–48 Mb, and > 48 Mb. The majority of the ROH identified were shorter than 6 Mb in length; 86% of the Rambouillet total ROH and 75% of the Katahdin total ROH were within this class (see Additional file 3: Table S2). The percentages of 6–12 Mb and 12–24 Mb long ROH were greater in Katahdin than in Rambouillet (see Additional file 4: Fig. S2). The average lengths of ROH called in Katahdin and Rambouillet were significantly different (Wilcoxon P-value = 1.29e−82), with Rambouillet having significantly shorter mean ROH than Katahdin (Fig. 3). The differences observed in mean ROH length and in percentage of ROH by class may reflect differences in LD decay at signatures of selection between these breeds. As LD breaks down rapidly over distance, short ROH can indicate more ancient inbreeding or selection events while long ROH are indicative of more recent selection [77, 78].

Fig. 3
figure 3

Length of ROH called by breed. a Plot of the average length of ROH (x-axis) by the total number of ROH called per individual (y-axis), with the Katahdin sheep plotted in blue and the Rambouillet sheep in purple; b Average length of ROH by breed, with the overall mean indicated by the horizontal line and breed means indicated by black points. The Wilcoxon P-value of 'average length ~ breed' is 1.29e−82

A ROH island was defined by the presence of SNPs called within a ROH in 50% or more of the animals of a breed (Fig. 4). The ROH analysis identified three ROH islands in Rambouillet and two ROH islands in Katahdin (Table 4). The ROH island identified in Rambouillet on chromosome 6 had both the largest number of SNPs called in an ROH island (202 SNPs) and the highest percentage of animals called in an ROH at the same SNP (94.23%). ROH islands in Rambouillet were identified on chromosome 3 (Fig. 5a), chromosome 6 (Fig. 5b), and chromosome 7 (Fig. 5c), and those in Katahdin on chromosome 23 (Fig. 6a) and chromosome 25 (Fig. 6b). There was some overlap between ROH islands called in Rambouillet and Dorper sheep on chromosome 6, although the ROH called in Rambouillet were both longer and more highly conserved within the breed (see Additional file 5: Table S3).

Fig. 4
figure 4

Manhattan plot of ROH and FST results for Katahdin and Rambouillet sheep. The top of the plot (scale 0 to 100) is the percentage of SNPs called in an ROH in Katahdin (blue) and Rambouillet (purple) breeds. The bottom of the plot (scale 0 to 1.0) contains the results of the Katahdin-Rambouillet pairwise FST of each SNP (gray)

Table 4 Regions comprised of SNPs called within a ROH in 50% or more of the animals of a breed
Fig. 5
figure 5

Chromosomes with ROH called in 50% or more of the Rambouillet sheep. a ROH called on chromosome 3; b ROH called on chromosome 6; c ROH called on chromosome 7. The ROH called in Katahdin are plotted in blue and the ROH in Rambouillet are plotted in purple

Fig. 6
figure 6

Chromosomes with ROH called in 50% or more of the Katahdin sheep. a ROH called on chromosome 23; b ROH called on chromosome 25. The ROH called in Katahdin are plotted in blue and the ROH in Rambouillet are plotted in purple

The genomic regions of ROH islands were evaluated for the presence of known and predicted genes. Seventy-six genes were identified in the Katahdin ROH islands, including 20 uncharacterized LOC genes without predicted Homo sapiens orthologs, and two copy number variants (CNV) of the TRNAC-GCA gene. Fifty-five unique genes were used for pathway enrichment queries (see Additional file 6: Table S4). For Rambouillet, 117 unique genes, multiple copies of tRNA genes TRNAS-GGA, TRNAH-AUG, TRNAW-CCA, and TRNAC-GCA, and 37 uncharacterized LOC genes were identified between the ROH islands on chromosomes 3 and 6 (see Additional file 7: Table S5). The ROH island identified on chromosome 7 was intergenic. Known or predicted protein–protein interactions of genes located within ROH islands were identified through query of the STRING database. For genes within the Rambouillet ROH island on chromosome 6, the largest number of interactions was found for FAM184B (9 proteins), FAM13A (7 proteins), and for CCSER1, LAP3, LCORL, MED28, and NCAPG, all of which had interactions with six proteins. Within the ROH island on chromosome 3, the largest number of interactions was found for TMEM117 (7 proteins), ZCRB1 (6 proteins), and HNRNPA1 (5 proteins). Analysis of protein–protein interactions for the Katahdin ROH island on chromosome 23 identified four interactions with NDUFV2, and analysis of the ROH island on chromosome 25 contained only interactions of ARID4B with TOMM20 and RBM34 (see Additional file 8: Table S6).

Pairwise FST of 0.140, 0.156, and 0.161 were estimated between Katahdin and Rambouillet, between Rambouillet and Dorper, and between Katahdin and Dorper, respectively. Thresholds of + 3 standard deviations above the average were calculated for each breed comparison to identify outlier SNPs that showed the greatest amount of differentiation. As expected from the pairwise estimates, the Katahdin-Rambouillet SNP comparisons showed the lowest average FST and smallest standard deviation. The FST threshold for Katahdin-Rambouillet outlier SNPs was 0.52, and for both Rambouillet-Dorper and Katahdin-Dorper comparisons, the threshold was 0.58 (Table 5). The pairwise FST statistics for each SNP in the Katahdin-Rambouillet analysis were compared against ROH island results (Fig. 4).

Table 5 Average FST estimates between Katahdin, Dorper, and Rambouillet breeds

The nearest gene of the ten SNPs with the highest FST estimates from each pairwise FST analysis was determined (Table 6). The SNP OAR2_231739122.1 had the highest FST (0.9069) in the Katahdin-Rambouillet analysis and was positioned 935 bp downstream of the CXCR2 gene. For the Katahdin-Dorper analysis, the SNP OAR2_88734520.1 had an FST value of 0.9735 and was within the CCDC171 gene, and for Rambouillet-Dorper, the SNP OAR1_292866363.1 had an FST value of 0.8829 and was positioned within the METTL6 gene. Among the top ten SNPs identified in the Rambouillet-Dorper analysis, four were located within or near the LCORL gene. In addition, all pairwise FST comparisons identified outlier SNPs within the region of the FRY and RXFP2 genes (Table 7 and Fig. 7).

Table 6 Genomic information for 10 SNPs with the highest FST in each pairwise breed comparison
Table 7 Fixation index (FST) scores for SNPs in the region harboring the FRY and RXFP2 genes
Fig. 7
figure 7

SNPs with high FST in the region of the FRY and RXFP2 genes. a Stacked bar chart representing genotype frequencies at SNP s24045.1; b Stacked bar chart representing genotype frequencies at SNP OAR10_29065568.1; c Stacked bar chart representing genotype frequencies at SNP OAR10_29223007.1; d Stacked bar chart representing genotype frequencies at SNP s02289.1; e Stacked bar chart representing genotype frequencies at SNP OAR10_29274817.1; f Stacked bar chart representing genotype frequencies at SNP OAR10_29341212.1; g Stacked bar chart representing genotype frequencies at SNP OAR10_29448537.1; h Stacked bar chart representing genotype frequencies at SNP OAR10_29469450.1; i Stacked bar chart representing genotype frequencies at SNP OAR10_29654158.1; j Stacked bar chart representing genotype frequencies at SNP OAR10_29737372.1; k Stacked bar chart representing genotype frequencies at SNP s18834.1. D: Dorper; K: Katahdin; R: Rambouillet

In total, 554 outlier SNPs were identified in the Katahdin-Rambouillet FST analysis. Of these 554 SNPs, 157 were positioned within 200,000 bp of at least one other outlier SNP, which together defined 63 FST regions containing 233 unique and characterized genes (see Additional file 9: Table S7). These genes included the tRNA genes TRNAC-GCA (four CNV), TRNAW-CCA (three CNV), TRNAE-CUC (two CNV), and TRNAS-GGA (one CNV). The FST regions were in part consistent with the ROH results, with seven SNPs being identified through both the Katahdin ROH and Katahdin-Rambouillet outlier FST analyses, and 18 SNPs being identified through both the Rambouillet ROH and Katahdin-Rambouillet outlier FST analyses (see Additional file 10: Table S8). The GO pathway enrichment analysis of these genes identified significant enrichment for biological processes including regulation of cellular glucuronidation and glucuronosyltransferase activity, and significant enrichment for the molecular function pathway UDP-glycosyltransferase activity (see Additional file 11: Table S9). KEGG Mapper pathway analysis revealed genes that are involved in many pathways, including, among others, B cell receptor signaling, taste transduction, circadian rhythm, cytokine-cytokine receptor interaction, endocytosis, growth hormone synthesis/secretion/action, hematopoietic cell lineage, HIF-1 signaling, IL-17 signaling, and longevity regulation pathways. In addition, there were a number of pathways related to viral infection, such as herpes simplex virus 1, human immunodeficiency virus 1, and human cytomegalovirus (see Additional file 12: Table S10).

Analyses with the Dorper breed were used to better understand allelic differentiation in the Katahdin and Rambouillet breeds. All FST regions were analysed to identify the genes that were present in both pairwise comparisons of a breed: for example, genes identified in both the Rambouillet-Dorper and Katahdin-Rambouillet FST analyses were further explored to give context to the FST signals associated with the Rambouillet. Fifty-one genes were in common for the Rambouillet, 31 for the Katahdin, and 21 for the Dorper comparisons (Table 8). Many of these genes were previously identified in studies on signatures of selection or are candidate genes for production and/or health traits in sheep. The genes reported in Table 8 for each breed were queried through KEGG. These analyses identified pathways for metabolic pathways, olfactory transduction, and cytokine-cytokine receptor interaction (Table 9). In addition, genes in common between ROH islands and FST regions were evaluated for Katahdin and Rambouillet breeds (Table 10).

Table 8 Genes that were in common between FST comparisons
Table 9 KEGG results for genes which were in common between both comparisons of Rambouillet or Katahdin
Table 10 Genes that were called within the ROH islands and pairwise FST comparison regions

The Katahdin-Dorper analysis identified 513 outlier SNPs, including 109 SNPs located near to each other, which defined 49 regions. The 144 genes contained within these regions were used for pathway analysis. Genes in these regions were part of KEGG pathway terms, including chemokine signaling and cytokine-cytokine receptor interaction (see Additional file 13: Table S11). In the Rambouillet-Dorper analysis, 569 outlier SNPs were identified, with 167 SNPs defining 73 FST regions. In total, 264 genes were located within these regions. Some of the pathways identified by KEGG Mapper analysis included biosynthesis of unsaturated fatty acids, neutrophil extracellular trap formation, olfactory transduction (with nine related genes), and prion disease (see Additional file 14: Table S12). The pathway analyses of the Dorper ROH results are provided as supplementary material (see Additional file 15: Table S13; Additional file 16: Table S14; Additional file 17: Table S15; Additional file 18: Table S16). In addition, the outlier FST results described here from the Weir and Cockerham model implemented in Plink were compared against the results of a BayeScan FST analysis (see Additional file 19: Table S17).

Protein–protein interactions were identified for genes implicated by pairwise FST analyses with STRING. Query with the genes from the Katahdin-Rambouillet FST outlier regions revealed the largest number of interactions with ribosomal proteins RPL23A (19 proteins), RPL5 (18 proteins), and RPL10A, RPL6, and RPL7, each with 15 protein–protein interactions. Query with the genes from the Rambouillet-Dorper FST outlier regions identified 19 protein–protein interactions with RPL8, 18 interactions with RPL6, and 17 interactions each with RECQL4 and RPL10A. The largest number of interactions in the Katahdin-Dorper FST outlier regions were with EIF4A3, KCNQ1, NPM1, and TSSC4, all of which had seven protein–protein interactions (see Additional file 8: Table S6).

Discussion

In this study, we analyzed the genetic diversity and signatures of selection of Rambouillet, Katahdin, and Dorper sheep through within-breed (ROH approach) and pairwise breed (FST outlier approach) comparisons. The results reported here concern genomic regions that are under selection in the considered U.S. sheep. In addition, this study identified commonalities with previously identified signatures of selection from a diverse range of breeds, which allows our findings to contribute to a broad conversation about the signatures of selection and breed diversity of worldwide sheep.

A landmark paper in sheep genetic diversity research was published in 2012 by Kijas and colleagues [115]. Their study analyzed population structure and signatures of selection of 2819 sheep sampled from 74 breeds, including 102 Rambouillet sheep. The inbreeding coefficient (F) and Ne reported for these Rambouillet sheep were 0.14 and 709, respectively. These statistics are quite different from those of the Rambouillet analyzed in the current study. While a larger number of Rambouillet (N = 745) were sampled in this paper, the average inbreeding coefficient (FROH) was found to be higher (0.169) and the current Ne was smaller (NeEstimator: 56.9, GONe: 111.8) than those reported previously [115]. It is possible that these differences result from differences between the F statistic and Ne calculation methods. A previous study found that two genotype-based inbreeding coefficients (FSNP and FROH) had correlation coefficients ranging from 0.78 to 0.88, indicating that while related, the degree of inbreeding is not entirely comparable between subcategories of F statistics [116]. The results of the current study found that F and FROH showed different levels of inbreeding, with pairwise breed differences being most significant in FROH estimates. The Ne reported in our study are estimated for current and historic generations while the Ne estimates reported previously by Kijas and colleagues [115] are not described in terms of generational distance and are therefore difficult to compare directly.

A previous study was conducted with genotype data from 4884 Katahdin sheep [72]. The 581 Katahdin sheep used in our study were also analyzed as part of this larger population, although the genotype data used between these publications were collected from separate platforms. In spite of the difference in sample sizes between these studies, the SNeP Ne estimations 13 generations ago were very similar (Ne of 172 or 178 estimated from 4884 or 581 sheep, respectively). This agreement between Ne estimations supports the validity of these estimates.

The Ne estimates reported between the SNeP, GONe, and NeEstimator approaches varied. Based on SNeP, Rambouillet had consistently larger Ne than Katahdin, and both breeds showed a decline in Ne between each subsequent generation. This ranking of the Rambouillet and Katahdin breeds was also observed in the historic Ne reported through GONe, although in more recent generations there were multiple reranking events in which the Ne of Katahdin exceeded that of Rambouillet, and in the current generation, both NeEstimator and GONe calculated a larger Ne for Katahdin than for Rambouillet. However, Katahdin had significantly higher average F and FROH compared to Rambouillet, which suggests comparatively less current breed diversity. While the largest and most conserved ROH was identified in Rambouillet, Katahdin had a greater proportion of ROH within the larger 6–12 Mb class. The presence of more ROH of greater length suggests recent inbreeding events in the history of the Katahdin breed. Taken together, these results support the importance of evaluating evidence from multiple inbreeding and Ne models, since relying on a single estimate might bias interpretation of genetic diversity.

The largest ROH Island was positioned between 32.8 and 48.0 Mb on chromosome 6 and encompassed 64 unique genes. This region was previously reported to be under selection in sheep and cattle [23, 117,118,119,120] and many of the associated genes were suggested as candidates for production traits. In cattle, the ABCG2, IBSP, PIGY, PKD2, and MEPE genes have been associated with yearling weight [121]; NCAPG, LCORL, and LAP3 with body weights and calving ease [122]; and ABCG2, LAP3, NCAPG, DCAF16, and LCORL with milk total solid percentage [123]. Similarly in sheep, quantitative trait loci (QTL) associated with the DCAF16, LAP3, LCORL, NCAPG, NAP1L5, and PPARGC1A genes have been reported for somatic cell score or milk yield in dairy sheep [124]; DCAF16, LCORL, and NCAPG genes associated with body weight [125]; SPP1 and LAP3 genes associated with weight [126]; LCORL associated with meat productivity [127]; SLIT2 and ABCG2 genes associated with fat deposition and milk production, respectively [115, 128]; and genes HERC3, HERC5, HERC6, IBSP, and SPP1 associated with parasite resistance [129]. In the current study, the ROH island identified in this region was highly conserved, with more than 94% of the Rambouillet animals having a ROH called. Rambouillet sheep are raised for their carcass and wool characteristics and, as a breed, are generally considered to be susceptible to parasite infection [130]. The ROH island in this region may be the result of past selection for weight or carcass traits and the homozygosity at genes relating to milk yield or parasite resistance may have occurred through hitchhiking [131].

Similarly, the ROH island identified on chromosome 3 contains genes that were previously described within signatures of selection in goats and cattle, including ADAMTS20, GXYLT1, IRAK4, PRICKLE1, PUS7L, TWF1, YAF2, and ZCRB1 [132,133,134]. Expression of ADAMTS20 has been reported to be upregulated in ewes with endometritis [135], and members of the ADAMTS family have been associated with divergent prolificacy of sheep and goats [136]. In goats, the ADAMTS20 gene has also been associated with coat color [137]. The IRAK4 gene encodes interleukin-1 receptor-associated kinase 4, a key regulator of innate immune signaling responses [138]. In addition, the GXYLT1, PRICKLE1, YAF2, and ZCRB1 genes have been associated with resistance/susceptibility to Mycobacterium avium spp. paratuberculosis infection (responsible for Johne’s disease) in Holstein cattle [139]. An association study in dogs identified TWF1 as a candidate gene for deafness, and this gene has potential roles in hair-bundle development and melanocyte dendricity [140]. Both IRAK4 and TWF1 were implicated as genes of interest by Tajima’s D analysis with two strains of Qinchuan cattle [133]. The occurrence of these genes within signatures of selection in other species and the associations with reproduction, coat color, and animal health suggest potential traits which may be under selection in the Rambouillet breed.

Genes that were previously reported in signatures of selection in worldwide sheep breeds were identified through FST analysis of the U.S. Rambouillet, Katahdin, and Dorper sheep in the current study. These genes include NF1, a negative regulator of the ras pathway (identified in the Katahdin-Rambouillet and Rambouillet-Dorper FST comparison), OR2AG1 (identified in the Katahdin-Rambouillet and Rambouillet-Dorper FST comparison, and other olfactory receptor genes identified in all FST comparisons); and RXFP2, associated with horn phenotypes in sheep (present in all FST comparisons), which were reported by Kijas et al. [115]. A number of the genes identified in both FST comparisons with Katahdin (Katahdin-Rambouillet and Katahdin-Dorper) have been associated with milk production or resistance and susceptibility for mastitis. These genes included CCNT1, previously associated with milk and cheese-making traits [91]; LALBA, associated with milk production traits [141]; MRPL42, a candidate gene for mastitis resistance [105]; and the gene suppressor of cytokine signaling 2 (SOCS2) associated with mastitis susceptibility [106, 142]. Genes present in both Rambouillet FST comparisons (Katahdin-Rambouillet and Rambouillet-Dorper) have been shown to be connected with wool growth, including: FRY, involved in wool development through a previous FST analysis [78]; NR1D1, found to be differentially expressed and differentially methylated at different stages of hair follicle development [90]; TNFSF18, a candidate gene for staple length and fiber diameter traits [98]; and RXFP2, which may be linked to hair follicle growth through the cAMP synthesis pathway [143]. The NF1, EVI2A, EVI2B, and OMG genes were also identified in both of the Rambouillet FST analyses and were previously associated with adaptive response to physical exhaustion and fat deposition [85].

Copies of the EIF2S2 gene were identified in FST regions on chromosomes 13 and 25 in the Rambouillet-Dorper and Katahdin-Rambouillet FST analyses, respectively. The copy-number variant of EIF2S2 on chromosome 25 is caused by retrotransposition (retroCNV), which results in an insertion of the EIF2S2 retrogene into the 3′ UTR of IRF2BP2. This insertion has been found to be responsible for the “wooly” fleece phenotype of modern sheep [84, 144]. As a hair breed, the coats of Katahdin sheep are more similar to those of ancestral sheep breeds, in which an inner coat of fine wooly fibers lays below an outer coat comprised of hair fibers. Modern wooly sheep, such as the Rambouillet, lack the double coat of hair and ancestral sheep breeds, and instead possess a single coat comprised of wooly fibers of mostly uniform dimension [84]. It is unclear what is the biological mechanism that is responsible for Rambouillet having differentiation in the regions of both the retroCNV and the EIF2S2 gene, but the high FST in the region on chromosome 25 is likely related to differences between hair and wool coat types.

The ROH island identified on chromosome 25 in the Katahdin sheep overlapped partly with the Katahdin-Rambouillet FST region containing the IRF2BP2 gene. The FST region ranged from 6.6 to 7.0 Mb while the ROH island ranged from 6.9 to 7.8 Mb, creating an overall signature of selection encompassing a region from 6.6 to 7.8 Mb on chromosome 25. Three SNPs positioned at, respectively, 6.94 Mbp, 7.67 Mbp, and 7.73 Mbp, were identified by both the ROH and FST outlier analyses. This region contains the TOMM20, RBM34, ARID4B, GGPS1, TBCE, B3GALNT2, ATMIN, GNG4, and LYST genes, which were previously implicated in tail fat deposition, the ancestral-like coarse wool phenotype, coat color regulation, and response to Brucella ovis infection [145,146,147,148,149]. The identification of this signature of selection through both FST and ROH analyses, which encompasses genes that are important for both immune pathways and hair type, suggests a potential relationship between the genetic control of these traits in Katahdin sheep.

The ROH island identified on chromosome 23 in Katahdin sheep harbors genes compiled in KEGG immune pathways, including the LAMA1 gene in the ECM-receptor interaction, toxoplasmosis, and viral myocarditis pathways; TUBB6 in the phagosome, Salmonella infection and pathogenic E. coli infection pathways; and PTPRM in the adherens junction and cell adhesion molecules pathways. In a study with Scottish Blackface lambs, the LAMA1 gene was found to be divergently expressed between lambs with low versus high fecal egg count phenotypes [150]. The PTPRM gene has been described to have a role in the telogen phase of hair follicle growth in Dorper sheep [151], and in studies with humans and biomedical models, it has been shown to be expressed by T cells and to have a dysregulated expression in patients with immune-mediated skin disease [152, 153]. While there is no current evidence for a specific immune role for TUBB6 in sheep, transcription of TUBB6 and other phagosome-associated genes were found to be downregulated in the pituitary gland of nutrient-restricted ewes during late gestation [154], and enhanced transcription was identified during the late phase of Eimeria bovis infection in culture with host bovine cells [155].

The SNP with the highest FST score for the Katahdin-Rambouillet analysis was OAR2_231739122.1, which is positioned 935 bp downstream of the CXCR2 gene. This gene encodes the principal membrane-bound chemokine receptor that is responsible for mediating neutrophil recruitment [156]. In sheep, CXCR2 has been implicated in clinical mastitis and resistance to gastrointestinal nematode infection [157, 158]. A number of immune-related pathways were revealed through KEGG Mapper pathway analysis of the genes present within the Katahdin-Rambouillet FST regions. These pathways included leukocyte transendothelial migration (AFDN gene), B cell receptor signaling (CD79B gene), cytokine-cytokine receptor interactions (CSF3, GH1, BMPR1A, GDF5, TNFSF18 genes), IL-17 signaling (CSF3 gene), inflammatory mediator regulation of TRP channels (ADCY6, NTRK1, TRPM8 genes), and NF-kappa B signaling (BCL2L1, MAP3K14 genes), among others. The high level of FST associated with these genes and pathways may reflect differences in allele frequencies contributing to the more robust immune response attributed to Katahdin sheep [33, 157].

Through KEGG Mapper pathway analysis for Katahdin-Dorper FST regions, pathways including chemokine signaling (ADCY6) and cytokine-cytokine receptor interaction (TNFRSF13B), NOD-like receptor signaling (NLRP3 and PANX1), B cell receptor signaling (CD81), and phagosome pathways (STX7) were identified. These results are of particular interest in light of the divergent immune abilities of Dorper and Katahdin sheep [159, 160]. In addition, KEGG cytokine-cytokine receptor interaction pathways were identified in all FST comparisons, although the genes involved differed between analyses (see Additional file 20: Fig. S3). In the Katahdin-Rambouillet comparison, the CSF3, GH1, BMPR1A, GDF5, and TNFSF18 genes were identified; in the Rambouillet-Dorper comparison, the IL31RA, CSF3, IL17RA, GHR, IL6ST, BMPR1A, and TNFSF18 were identified; and in the Katahdin-Dorper comparison, only the TNFRSF13B was identified, as previously stated. About half of the pathway genes overlapped between the Rambouillet comparisons (CSF3, BMPR1A, and TNFSF18), suggesting that these FST regions were most strongly associated with differentiation of allele frequencies in Rambouillet sheep. There were no overlapping genes between the two Katahdin comparisons, which suggests that the greatest amount of differentiation between Katahdin and Rambouillet and between Katahdin and Dorper occur in different regions of the cytokine-cytokine receptor interaction pathways. These FST results may have a role in the overall immune response of Katahdin sheep compared to the Rambouillet and Dorper breeds that are susceptible to parasites.

The sheep used in this study were not phenotyped for coat color or the presence or absence of horns or scurs. However, Rambouillet are well-known for their light-colored coat [161], with variation being observed in different propensities for yellowing rather than pigment [162]. Dorper sheep have a white body and a black head and neck, while White Dorper sheep have an entirely white coat [163]. The Dorper analyzed in this study were from a flock founded by both Dorper and White Dorper sheep and showed a variety of color patterns, including some animals with black heads and/or black spotting and others with solid white coats. Katahdin sheep do not have a definite coat color and may have a variety of colors and patterns [164]. These breed-specific differences in coat color may have influenced the identification of signatures of selection that contain genes associated with coat pigment in the present study. Horn phenotypes are also expected to differ between these breeds. Many Katahdin producers prefer their sheep to be polled, although scurred and horned animals are permitted by breed standards [164]. The horn/polled phenotype in Rambouillet has been described as strictly sex-linked [165]. Rams in the African Dorper breed are variable and can have horns, scurs, or be polled, while females are scurred or polled [165]. These known differences may explain why multiple FST outlier SNPs were detected near the RXFP2 gene in all pairwise comparisons of this study.

Conclusions

This study provides insights into the signatures of selection and genetic diversity of three popular U.S. sheep breeds. The results described here support previous reports on genes that underlie signatures of selection in sheep and provide additional insights into the biological differences between Rambouillet, Katahdin, and Dorper sheep. The results of the FST analyses indicated strong population differentiation associated with genes relevant to milk production in Katahdin sheep and wool growth in Rambouillet sheep. A large and highly conserved ROH island was identified in Rambouillet sheep that contained genes of known importance for growth and carcass traits, including LCORL and NCAPG. Signatures of selection were identified in genes relevant to ancestral versus wooly coat types (IRF2BP2, retroCNV EIF2S2) and horn/polled phenotypes (RXFP2) in addition to many genes involved in immune-related pathways. These findings likely have relevance to the variations in physical appearance and parasite resistance ability of these breeds. Further analysis of these FST and ROH signatures of selection may provide greater insight into the selection pressures being exerted on these important breeds in the U.S. sheep industry.