Introduction

Small ruminants (sheep and goats) make immense socio-economic and cultural contributions across the globe. However, gastrointestinal nematode (GIN) infections pose the main constraint to grazing small ruminants1, 2. The control and treatment of GIN infections is estimated to cost tens of billions of dollars worldwide3 and has traditionally relied on chemotherapeutics but, their extended use has resulted in economic losses and raised concerns of environmental health, food safety, and the development of resistance in parasites for all the major groups of anthelminthic drugs1, 4, 5. The evolution of anthelmintic resistance and changes in climate, land-use and farming practices are likely to alter the geographic distribution and infection patterns of parasites and their impacts, calling for the development of sustainable control strategies2, 6.

Evidence for host genetic variability for GIN resistance has been observed in ruminant livestock suggesting selective breeding is a feasible control option7. Indicator traits, such as faecal egg count (FEC), antibody assays, packed cell volume (PCV) and FAMACHA scores have been used to identify resistant and resilient animals8, 9. However, the mechanisms underlying genetic differences in resistance to GIN infections remain poorly understood, with resistance being a physiologically complex trait that develops over time, and indicator traits measured at specific time points may fail to represent all the pathways involved. Most often, quantifying resistance has been through artificial challenge using variable doses of larvae with which the rate, time and specificity of infection are controlled. However, traditional communal grazing management practised in smallholder mixed farming, nomadic and pastoral systems predominate in the (sub)tropics. In these systems, multiple nematode species account for natural infections which occur gradually, and findings from single infection studies may not mirror the patterns of infection encountered in these traditional systems10. There are reports showing between- as well as within-breed variation in resistance to GIN in small ruminants11,12,13. These observations and the genetic fragmentation observed in most breeds, implies that information derived from one breed cannot be extrapolated to another but needs to be validated for individual breeds5.

Advances in genomic technologies offer the opportunity to investigate the nature of genetic variation underlying complex traits14. The investigation and discovery of putative candidate genes, genomic and regulatory variants underpinning GIN resistance may provide a better understanding of the underlying molecular mechanisms and accelerate genetic gains in breeding programs. Genome-wide scans using single nucleotide polymorphisms (SNPs) have identified candidate regions, genes and QTLs putatively associated with GIN resistance on almost all ovine chromosomes11, 15,16,17,18,19. Most of the regions lie within and/or in proximity to MHC II and the interferon family of genes including the multitude of genes activated following interferon exposure20, 21, that are components of the immune response. Other QTLs for GIN resistance implicate other mechanisms, including innate and acquired immune response, gastric mucosal protection and haemostasis pathways18. Here, we utilise genome-wide autosomal genotype data generated with the Illumina Ovine 600 K SNP BeadChip and bioinformatic analysis, to conduct genome-wide screens in the hope of identifying regions and loci, linked to individual animal variability to GIN resistance in traditionally managed autochthonous Tunisian sheep.

Materials and methods

Study cohorts, sample collection, DNA extraction and genotyping

This study required no ethical approvals as all the samples were procured from slaughterhouses and in the presence of a veterinarian. In total, 309 blood samples were collected from indigenous sheep brought for slaughter at eight commercial slaughterhouses (Tunis (Tunis abattoir), Ariana (Ariana abattoir), Bizerte (Bizerte, Mateur and Sajnène abattoirs), Béja (Béja abattoir) and Jendouba (Jendouba and Tabarka abattoirs)) in northern Tunisia where GIN infections present one of the main constraints to small ruminant production22. The blood samples were collected using EDTA vacutainers by puncture of the jugular vein. DNA was extracted from each blood sample using WIZARD Genomic DNA Purification Kit (Promega, Madison, WI, USA) and then stored at − 80 °C. Out of the 309 samples, 101 were selected and assigned to two extreme groups based on GIN infection rates i.e., Non-infected (n = 58) and Infected (n = 43), based on faecal egg counts (FEC). FEC were estimated from 5 grams of stool by concentration floatation followed by the McMaster technique and the eggs per gram calculated to estimate the degree of infection23. The FEC readings ranged from zero (0) to 3800 eggs per gram. All individuals reporting an FEC = 0 were classified as non-infected and those with FEC > 100 were classified as infected.

Information on the grazing pattern and history of anthelmintic use was obtained from animal owners before slaughter. From faecal examinations, eggs of gastro-intestinal helminths (including nematodes and cestodes), Eimeria spp., Trichuris spp. and Nematodirus spp., were identified in the study individuals23 suggesting infections from multiple GIN species. The DNA samples were genotyped with the Illumina 600 K SNP BeadChip (Illumina Inc., San Diego, CA, USA) at GeneSeek Neogen Genomics, Lincoln NE, USA. The BeadChip comprises 606,006 probes that target genome-wide SNPs, among which 577,401 are autosomal, 27,314 are on the X chromosome and 1291 are unassigned.

Data quality control and screening

The 606,006 raw SNP genotypes were processed for quality control with PLINK1.924. The following criteria was used: (1) one individual was randomly selected from one pair of highly related animals with an identity-by-state score (IBS) of greater than 0.99, (2) SNPs with minor allele frequencies (MAF) of no less than 0.01 were retained, (3) individuals and SNPs with call rates lower than 90% and 95%, respectively were discarded and (4) all unmapped SNPs and those on the sex chromosomes were excluded. This generated a dataset of 540,528 autosomal SNPs and 92 samples (Infected = 41; Non-infected = 51). This dataset was subjected to LD pruning using the parameters 50 5 0.5 representing window size, step size and r2 threshold, respectively resulting in 335,070 SNPs that were used for population structure and phylogenetic analysis.

Estimation of genetic diversity

Expected (HE) and observed (HO) heterozygosity, effective population size (NE), and patterns of LD decay were investigated for the two cohorts (infected and non-infected). HE and HO were calculated using PLINK v1.9. Pair-wise LD was evaluated using the correlation coefficient (r2) between alleles at two separate SNP loci using PLINK v1.9 with default settings. Following Sved25, NE over generation time was estimated with the equation NEt = (1/4c) (1/r2 − 1), where NEt is the effective population size t generations ago (t = 1/2c); r2 is the LD between pairwise SNPs; and c is the genetic distance in Morgan between pairs of SNPs.

For each cohort, two measures of inbreeding were calculated; (1) SNP based inbreeding coefficient (F) calculated with PLINK v1.9 and (2) runs of homozygosity (ROH) based inbreeding coefficient (FROH). For the latter, ROH streatches were first computed using the “detectRUNS” package in R26. The FROH, was computed as the ratio of the total length of ROH to the length of autosomes (2.45 Gb)27.

Three estimates of ROH were calculated taking into account three genomic distance categories, ROH < 5 Mb, 5 Mb < ROH < 10 Mb, and ROH > 10 Mb, indicative of ancestral (more than ten generations), middle (5–10 generations), and recent (within five generations) inbreeding, respectively28. For the calculation, ROHs were identified for each individual using PLINK v 1.9 with the following parameters: (1) the minimum number of SNPs in a sliding window was 50; (2) the minimum ROH length was set to 1 Mb to eliminate the impact of strong LD; (3) each ROH spanned a minimum of 80 consecutive SNPs; (4) one heterozygous and five missing calls per window were allowed to avoid false negatives that may arise due to occasional genotyping errors or missing genotypes; (5) the minimum SNP density was set at one SNP every 100 kb, and the maximum gap between consecutive SNPs was set to 1 Mb.

Population structure analysis

The 335,070 SNPs were used to infer genetic structure and divergence using three methods. Principal component analysis (PCA) was performed using PLINK v1.9 and the first two PC’s were plotted to visualize individual relationships. The proportion of shared ancestry between individuals was inferred with the unsupervised mode of ADMIXTURE tool v1.3029. This mode does not assume any background information on the number and frequency of alleles in ancestral populations. The ADMIXTURE tool was run with values of K increasing gradually from 1 to 6, to derive cross-validation (CV) errors. The lowest value of the CV error indicates the most likely number of ancestral populations. Five runs were performed for each K. To test for consistency in the ADMIXTURE results, we repeated the ADMIXTURE runs using three datasets of 6,000 SNPs, each drawn at random without replacement from the 335,070 SNPs. The pairwise allele sharing distance (ASD)30 was computed using the program “asd”. A distance matrix of all pairwise ASD dissimilarities, calculated as 1-ASD, was generated and subjected to hierarchical clustering with the Neighbour-Joining algorithm to yield an individual clustering dendrogram. ASD calculation does not require estimates of allele/genotype frequencies making it valid when sample sizes are small. It is also suitable for detecting outliers and is robust to high LD among SNPs.

Detection of selection signatures and association analysis

To investigate the molecular genetic basis underlying natural variation in GIN infection, we investigated genome-wide distributions of ROH in each individual in the infected and non-infected cohorts using PLINK v1.9. For this analysis, we used the 540,528 SNPs that passed the quality thresholds. Using the “detectRUNS”, we counted the number of times a given SNP occurred in the identified ROH in each cohort and presented a Manhattan plot of all the tested SNPs against their autosomal positions. The most frequently observed SNPs in ROHs occurring at the top 25% of the empirical distribution were taken as the most significant loci underlying an ROH under selection. To identify regions of ROH that are likely associated with variations in GIN infection, we compared the ROH regions between the infected and non-infected cohorts and identified the ones that were specific to the non-infected cohort.

We performed the logistic regression (LR) GWAS, FST and XP-EHH analyses to investigate further the genome regions associated with variation in GIN infection. LR-GWAS was performed with PLINK v1.9 using the non-infected cohort as the test sample and the infected cohort as the control. To obtain the 99% confidence intervals for the estimated parameters, the “-ci 0.99” and “-covar” options were invoked and Fisher’s exact test was used to generate the p-values considering location, breed, sex, and age as covariates. The raw p-values were subjected to Bonferroni correction to control the likelihood of any false positive results among the markers identified to be under selection. The corrected p-values were standardized and the − Log 10 (p-value) of 4.25 (the top 0.001) was set as the cut-off threshold to identify candidate regions. The estimations were summarized in 200 Kb window sizes and the genes falling within the candidate regions were identified. The package ’qqman’ in R version 3.5.1 was used to generate the Manhattan plot.

The population differentiation statistic, FST, was used to investigate regions of the genome that have diverged between the two cohorts. The unbiased pairwise FST31 was computed using the HIERFSTAT Package of R32 using a window size of 200 Kb and a window-step size of 100 Kb. Windows with less than five SNPs were excluded from the analysis. The FST values were then standardized by Z transformation following Ahbara et al.33. Windows falling within the top 0.001% of the FST values in each chromosome were considered the putative candidates under divergent selection.

The cross-population extended haplotype homozygosity (XP-EHH)34 was used to compare expected haplotype homozygosity (EHH) and integrated haplotype score (iHS) between the two cohorts to detect selection and its direction. XP-EHH scans SNPs that are homozygous in one population but polymorphic in the other through pairwise comparison of EHH scores. Positive XP-EHH scores indicate selection in the test sample, while negative scores indicate selection in the control. The XP-EHH scores were estimated as:

$${\text{XP}}-{\text{EHH}}=ln\left({\frac{{I}_{A}}{{I}_{B}}}\right)$$

where IA is the integrated EHH value of the test population and IB is the integrated EHH value of the reference population. Haplotype phasing was inferred for each cohort simultaneously on all SNPs using BEAGLE v3.3.135. The XP-EHH test was performed with the “rehh” package of R36 and the raw XP-EHH scores were standardized to a distribution with zero mean and unit variance. Selection candidates were considered as the regions contained in any of the 200 Kb windows with a significance threshold of p < 0.001; this equates to an XP-EHH value of 4 at the default settings of “rehh” estimation.

Functional annotation of candidate regions

The candidate regions identified by ROH were analysed and the ones that were specific to the non-infected cohort identified. We also analysed the ROH regions of the non-infected cohort, LR-GWAS, FST and XP-EHH candidate regions and the ones that overlapped between at least two approaches were identified and merged using Bedtools v.2.28.037. Genes that were spanned by the ROH (non-infected cohort) and overlapping candidate regions were retrieved using the Biomart/Ensembl (http://www.ensembl.org/biomart) tool based on the Ovine v3.1 reference genome assembly. The set of genes identified in the candidate regions were assessed for biological enrichment gene ontology (GO) and KEGG Pathway (www.kegg.jp/kegg/kegg1.html) terms compared to the full list of Ovis aries autosomal protein-coding genes with the functional annotation tool in DAVID v6.838 using O. aries as the background species. We also mapped the ROH (non-infected) and overlapping candidate regions with those reported in the sheep quantitative trait loci (QTL) database Release 42 (QTLdb; https://www.animalgenome.org/cgi-bin/QTLdb/OA/summary) to identify overlapping QTLs, which may suggest associations with response to parasite infections. To provide further biological interpretation, gene functions were determined from the NCBI database (http://www.ncbi.nlm.nih.gov/gene/) and review of literature.

Results

Genetic diversity estimates

The average estimates of HO, HE, F, FROH and ROH size did not differ significantly (p = 0.05) between the two cohorts (Table 1). However, the non-infected cohort showed marginally higher values of F and ROH size while the infected animals had marginally higher values of HO, HE and FROH. The number and length of ROH estimated for the 0–5 Mb, 5–10 Mb, > 10 Mb genome length categories are shown in Table 2. The most and least frequent ROH length was observed for the 0–5 Mb and > 10 Mb length categories, respectively. The average length of ROH per animal was highest, in the 5–10 Mb length category for non-infected cohort and > 10 Mb length category for the infected cohort. The two cohorts show similar patterns of LD decay over genomic distance although the non-infected cohort showed overall lower LD (r2) (Fig. 1a). In general, the pattern of LD decay shows higher LD at shorter distances which declines rapidly and plateaus off from 0.4 Mb for both cohorts (Fig. 1b). The trend in NE over generation time was the same for both cohorts (Fig. 1c). They are characterised by an increase in NE from 1000 generations ago, which attains maximum value at approximately 330 generations ago, and then declines to the present time. Generally, the non-infected cohort had higher NE across all generations.

Table 1 Estimates of genetic diversity parameters for the infected and non-infected cohorts of Tunisian sheep.
Table 2 Number and length of ROH for each cohort of autochthonous Tunisian sheep for each ROH length category.
Figure 1
figure 1

Trends in LD decay (a, b) and NE across 1000 generations (c) in non-infected and infected cohorts of autochthonous Tunisian sheep. INF infected cohort; NINF non-infected cohort.

Population structure analysis

Population structure and relationship was investigated using PCA (Fig. 2a), ADMIXTURE tool (Fig. 2b,c) and ASD phylogeny (Fig. 2d). The first and second PC’s of the PCA explained, respectively 1.74% and 1.45% of the total genetic variation. The study animals did not differentiate into distinct genetic groups/clusters that correspond to their infection status. Following ADMIXTURE analysis, the lowest CV error was at K = 1 (Fig. 2b) which suggests no genetic differentiation. The ADMIXTURE plot reveals that a similar pattern of genomic composition characterizes the two cohorts at 2 ≤ K ≤ 6 (Fig. 2c). This pattern was replicated when ADMIXTURE was performed using three datasets comprising 6000 SNPs each, drawn at random without replacement from the 335,070 SNPs. The ASD phylogeny also showed no genetic stratification (Fig. 2d).

Figure 2
figure 2

Population genetic structure and phylogenetic analysis of the two cohorts of autochthonous Tunisian sheep (a) PCA cluster analysis showing PC1 and PC2; (b) Cross-validation plot for admixture analysis; (c) Admixture analysis plot showing the genetic backgrounds present in the study cohorts for 2 ≤ K ≤ 6; (d) ASD phylogenetic tree of individuals of the study population. INF infected cohort; NINF non-infected cohort.

Genome-wide selection signature analysis

The ROH analysis identified 60 ROH regions in the two cohorts (Fig. 3a,b) that spanned 311 genes. The LR-GWAS, FST and XP-EHH identified 346, 32, and 68 regions (Fig. 4a–c), respectively which spanned 673, 152, and 295 genes. These 446 candidate regions overlapped with 645 genes (Supplementary Table S1) of which 71 were found in candidate regions that were identified by at least two methods (Fig. 5).

Figure 3
figure 3

Manhattan plot showing genome-wide distribution frequency of SNPs in stretches of ROH regions. The dashed lines indicate the 25% threshold for each cohort. INF infected cohort; NINF non-infected cohort.

Figure 4
figure 4

Manhattan plots showing the genome-wide distribution of SNPs following (a) LR-GWAS (b) FST and (c) XP-EHH analysis using the non-infected and infected cohorts of autochthonous Tunisian sheep. INF infected cohort; NINF non-infected cohort.

Figure 5
figure 5

Venn diagram showing the number of genes that were specific and common to the four selection signature methods performed in this study.

We considered the ROH analysis as a method that identifies selection signatures within a cohort. We therefore compared the ROH results of infected and non-infected cohorts (Fig. 3a,b). This identified 23 ROH regions that were specific to the non-infected cohort and which spanned 80 putative genes (Table 3) of which 30 remain uncharacterised/unannotated (prefixed with “ENSOARG”). Six of the 23 candidate regions occurred on OAR 1, 2, 3, 6, 10 and 14 and overlapped with FECGEN, LATRICH_2 and NFEC QTLs (Table 3) that are associated with health traits and in particular, parasite resistance.

Table 3 Candidate regions that are specific to the non-infected cohort as identified by ROH analysis.

The LR-GWAS was used to identify candidate regions and possible SNPs associated with GIN infection, while FST, and XP-EHH were also used to investigate selection signatures by contrasting the non-infected and infected cohorts. These three approaches identified 30 candidate regions that overlapped between at least two methods across 17 autosomes (Table 4). When the ROH regions that are specific to the non-infected cohort are considered, the four methods identify 35 candidate regions overlapping between at least two approaches across 19 autosomes and span 121 genes including 11 which remain uncharacterized (Table 4). Of the 35 candidate regions, five that were identified by ROH to be specific to the non-infected cohort overlap with at least one region that was identified by either LR-GWAS, FST and/or XP-EHH and span 13 genes (Table 4) while one region found on OAR 5 (region number 14) was identified by all the four methods (Table 4). This region spans four genes (LOC101104745, PCDHGA1, PCDHGA2, ENSOARG00000000218) and one QTL trait (BIRTH_WT, Body weight (birth)). None of the four genes are associated, directly or indirectly, with endoparasite resistance. Nineteen regions found across OAR 1, 3, 6, 8, 11, 12, 13, 14, 17, 18 and 21 span FECGEN, TFEC_1, HFEC, NFEC, LATRICH_2, IGA, SAOS, WORMCT, PEPSL and CEOSIN QTLs that are linked to response to GIN infection (Table 4). Ten regions overlapped with production (growth) QTL traits (BW, BIRTH_WT, BONE_WT) across OAR 2, 4, 5 and 6, four regions overlapped with meat and carcass traits associated with fatness QTLs (HCWT, FAT_WT, LMYP) across OAR 2 and OAR 10, and anatomy QTLs (BONE_WT) on OAR 24 (Table 4).

Table 4 Candidate regions, associated genes and QTLs that overlapped between at least two methods of detecting selection signatures.

Functional enrichment analysis was first tested in the pool of 51 genes, excluding uncharacterized genes, that are present in the 23 ROH candidate regions that were specific to the non-infected cohort (Table 3). A second functional enrichment analysis was performed with the set of 110 genes, excluding 11 uncharacterized ones that were present in the 35 candidate regions that overlapped between at least one ROH, LR-GWAS, FST and XP-EHH regions (Table 4). We found two functional term clusters that were significantly (enrichment score > 1.5) enriched for the genes present in the ROH regions of the non-infected cohort (Table 5). These were associated with “immune system process” and “cytokine receptor interaction”. The second analysis resulted in three significantly (enrichment score > 1.5) enriched clusters. The first cluster was associated with the GO biological terms “Carboxylesterase type B” and “Hydrolase activity”. The “Apical plasma membrane” term, which has a role in intestinal innate immunity, and “Integrin signalling” that functions in immune cells were among the most significant GO biological terms in the second and third clusters, respectively.

Table 5 Enriched functional term clusters following DAVID analysis of genes identified by ROH analysis in candidate regions that were specific to the non-infected cohort and those that overlapped between at least two methods of detecting selection signatures.

Discussion

GIN infections and associated gastroenteritis impact negatively the production efficiency of ruminant livestock, and their management is essential to meet future demand for animal source foods. The observation of large variations in GIN infection suggests variability at the genome level that underpins inter-animal variability in susceptibility39, 40. Rouatbi et al.23 observed inter-individual variation in GIN infection in autochthonous Tunisian sheep leading to the suggestion that it could have a genetic basis. We therefore generated and analysed Ovine 600 K SNP BeadChip genotype data, from 92 samples that comprised GIN infected and uninfected animals under field challenge from Rouatbi et al.23 study. The aim was to investigate signatures of variability in GIN infection and resistance in autochthonous Tunisian sheep that are managed under natural grazing and no history of anthelmintic intervention. We applied four methods to detect genomic regions associated with GIN resistance. Although our study yielded some interesting findings, we acknowledge that studies that use naturally acquired field exposure to deliver a challenge always run the risk of uninfected animals being a combination of truly highly resistant animals, animals that never saw infection and animals that have had infection cleared due to chemical treatment. Additionally, producer provided information is at times not always accurate. These factors may serve to dilute the certainty that the two groups are in fact functionally dissimilar.

Although we expected the two cohorts to show differences in genetic diversity and structure, this was not the case; they showed similar levels of genetic diversity and inbreeding and the three clustering algorithms provided corroborating evidence of lack of stratification that was consistent with infection status. The lack of genetic differentiation and structure was also reported between prolific and non-prolific cohorts of Bonga sheep from Ethiopia41 and in the Brazilian Santa Inês breed which shows variability in resistance to GIN infection19. The absence of genetic stratification appears to be a characteristic of sheep from the Maghreb as it has also been observed in sheep populations from Algeria42 and Morocco43 implicating extensive intermixing and cross-mating. There are four possible explanations for the lack of genomic stratification corresponding to infection status. (1) The selection pressure driving the differences in GIN infection is/has not been stringent or long enough to result in differentiation at the genome level; (2) the level of parasite infection may be too low to result in meaningful genomic variation; (3) it may point to a lack of farmer-driven selection that is biased towards the use of uninfected animals for breeding; and/or (4) there could be a high natural selection pressure whereby all animals effectively have the resistance genotype and therefore rather than a distribution, spanning susceptible to resistant, there is a skewed distribution that spans “moderately resistant” to “highly resistant” individuals. One or all of these reasons may explain the lack of genetic differentiation between the two cohorts. The level of genetic diversity in the two cohorts is higher than that observed in commercial breeds, but it falls within the range of values reported in other sheep found in Africa and China44,45,46.

We assessed the average length and number of ROH per individual at three genomic distance categories and the trends in NE over generation time for each cohort. The two cohorts had a high frequency of ROH in the shorter (0–5 Mb) length category reflecting older evolutionary events such as past or ancestral inbreeding. The two cohorts showed similar patterns and trends in LD decay and NE. However, the non-infected cohort had higher values of NE across all generations. This difference is difficult to explain but we presume it may be the result of higher mortalities in the infected cohort. Both cohorts show a gradual increase in NE from 1000 generations ago. This is followed by a drastic decline from around 330 generations ago to the present time suggesting a bottleneck event. A similar trend was observed in Ethiopian and Sudanese local sheep47. Assuming a generation time of three years for traditionally managed local sheep, it can be inferred that the increase in NE started around 3000 years ago. The start of the decline in NE 330 generations back translates to around 990 years ago. Between 240 and 1140 years ago, three favourable climatic periods interspersed with short extreme dry spells were experienced in the continent48. We suggest that the sheep populations most likely thrived when conditions were optimal but shrunk during subsequent droughts. The footprints of these demographic perturbations appear to have been retained in the genomes of the indigenous sheep.

Our analysis revealed 35 candidate regions, spanning 110 genes, that overlapped between at least two out of the four methods we used to detect genomic regions associated with differences in endo-parasite infections. The identification of overlapping candidate regions that are under selection by different approaches suggests plausible evidence for selective influences at the genome49. The convergence of our results points to the reliability of our findings and suggest that they are unlikely to be the outcome of chance events or analytical artefacts.

Both the within (ROH) and between-population (LR-GWAS, FST, XP-EHH) approaches identified genomic regions that could be driving GIN resistance in autochthonous Tunisian sheep. At least two or more methods identified the same candidate regions that colocalized with the FECGEN, TFEC_1, HFEC, NFEC, LATRICH_2, IGA, SAOS, WORMCT, PEPSL and CEOSIN QTLs that have been associated with health traits and in particular parasite resistance, immune capacity and disease susceptibility in sheep showing resistance to GIN50,51,52. This result indicates that some common QTLs underly parasite resistance traits in sheep and support a role for convergent evolution in driving host GIN resistance. For instance, the FECGEN QTL which results in reduced FEC in unmanaged, naturally-parasitized domestic sheep53 and in the primitive Soay sheep20, has been associated with a microsatellite allele (o(IFN)-γ126) found in the first intron of the interferon gamma (IFN)-γ gene and with increased titre of Teladorsagia circumcincta-specific IgA. It has been shown that the effects of the o(IFN)-γ126 allele and IgA on FEC are not independent, and that IgA may mediate the (IFN)-γ effect on FEC20. Furthermore, the IGA QTL could be related to early response to incoming larvae, whereas the FECGEN QTL may be associated with the ability to avoid the development of adult parasites52, 54. The fact that the candidate regions spanned QTLs associated with different QTL traits was expected because the animals analysed in this study are grazed in communal pastures where challenge from multiple GIN parasite species is common and could present different aspects of host-parasite interaction during infection. Indeed, eight nematode species, T. circumcincta, T. trifurcate, Haemonchus contortus, Marshallagia marshalli, Trichostrongylus vitrines, T. axei, Ostertagia lyrate, O. ostertagi and O. occidentalis, were found to colonize the abomasum of the study individuals23.

QTL and GWAS studies suggest that GIN infections can evoke several host responses that enhance innate and acquired immune responses, gastric mucosal protection, haemostasis pathways, delay in parasite development and reduction in number of eggs produced by GIN16. These manifestations depend on the nematode species, parasite exposure, and host-specific factors including age, sex, genetic make-up, hormonal and nutritional status55. The animals analysed in our study are grazed under natural open pastures and are exposed to a wide range of GIN. Our expectation, therefore, was that they would exhibit a large repertoire of host defence mechanisms which may be reflected in the functions of the putative genes present in the candidate regions. We observed two organic cation transporters, SLC22A4 (OCTN1) and SLC22A5 (OCTN2) which are carriers of hydroxyurea, in an overlapping candidate region on OAR 5 (Table 4). Hydroxyurea can inhibit ribonucleotide reductase (RR) and is retained at high concentrations in tissues of various mammalian species including sheep56. Compared to viral or bacterial ribonucleotide reductase, the mammalian RR is less susceptible to inhibition by hydroxyurea56. This action can be critical in inhibiting viral and bacterial replication without affecting mammalian cellular growth. Hydroxyurea can therefore offer a level of immunity against GIN infection as part of the innate immune defence system which may inhibit endoparasites in the gastrointestinal tract of non-infected animals. Some variants present in SLC22A4 and SLC22A5 have also been associated with Inflammatory Bowel and Crohn’s disease’s in humans57, 58.

We observed BMPR2 (Bone Morphogenetic Protein Receptor Type 2) in an overlapping candidate region on OAR 2 (Table 4). Ligands of this gene are members of the TGFβ superfamily whose homologues are key players in inducing immunological tolerance59. Elevated expression of TGFβ have been observed in mammalian hosts that mount an effective immune response against GIN59,60,61,62. BMPR2 was found to be highly expressed in mesenteric lymph nodes of cattle that were selected for resistance to intestinal nematodes63 suggesting association with parasite resistance. BMPR2 also occurred in a cluster of genes that were upregulated following Eimeria acervulina infection in chicken and associated with the functional term “Inflammatory Response” in the “Disease and Disorders” category64. Several mechanisms may operate to increase the levels of TGFβ under parasite infection including, host homeostasis to minimize immunopathology under chronic infection; pathogens triggering TGFβ production or activation; or parasite mimicry of the host cytokine to drive the same pathway as the hosts TGFβ59. Examples of all three have been reported in diverse sheep breeds59 and the one that operates in autochthonous Tunisian sheep remains unknown calling for further investigation.

The IL-4 (Interleukin-4) and IL-13 (Interleukin-13) occurred in a candidate region on OAR 5 that was specific to the non-infected cohort. IL-4 plays a crucial role in the differentiation of naive T helper (Th) cells into Th2 effector cells which promote humoral immunity and provide protection against intestinal helminths65. In sheep, Jacobs et al.66 observed that impaired IL-4 signalling promoted the establishment of H. contortus and increased larval burden. An increase in IL-13 in intestinal lymph cells was observed in sheep selected for increased resistance to nematodes67. Two genes RUFY4 (RUN and FYVE Domain Containing Protein 4) and VIL1 (Villin 1) and two IL-8 receptors (CXCR1 and CXCR2) occurred in a strong candidate region on OAR 2 that was also identified by ROH to be specific to the non-infected cohort. RUFY4 is a positive regulator of autophagy and is expressed in a cell-specific manner or under specific immunological conditions associated with IL-4 expression68. VIL1, has been shown to protect gut epithelial cells by decreasing epithelial damage69. It was among the top 30 proteins that were differentially regulated between resistant and susceptible sheep and may play an important role in maintaining the epithelial integrity of abomasal mucosa in response to haemonchosis70. It can therefore be hypothesized that the interaction between RUFY4 and VIL1 with IL-4 may enhance long-term endoparasite resistance in sheep. The two IL-8 receptors were found to be expressed and distributed in normal and morphologically damaged large intestines, suggesting that IL‐8 may play an important role in mediating immune response in gastrointestinal tract, beyond that of potentiating neutrophil recruitment and inflammation71 following intestinal epithelial damage by endoparasites.

Protective immunity against GIN and subsequent parasite expulsion is known to be mediated by Th2 immune response that is orchestrated by the secretion of cytokines such as IL-4, IL-5, IL-10 and IL-13, which promote the recruitment of eosinophils, basophils, mast cells, goblet cell hyperplasia and concurrent mucus and antibody production72,73,74. Nematode expulsion is known to rely on smooth muscle contraction and increased mucus production75. The latter was found to be mediated by intelectin 1 (ITLN), a calcium dependent lectin, that is expressed by Paneth cells in the small intestine of mouse76 and mucus neck cells in the abomasum of sheep77. The expression of ITLN in sheep goblet cells was found to be upregulated by IL-477. High expression of ITLN was also found to be induced during nematode expulsion in Trichinella spiralis and Nocardia brasiliensis in rodents78, 79 and T. circumcincta infections in sheep77, 80. Th2 response has also been described as a mediator for acute wound healing during helminth infection81. We therefore suggest that the upregulation of ITLN by IL-4 may play a role in enhancing GIN resistance in autochthonous Tunisian sheep through parasite expulsion and enhancing wound healing.

Other possible candidate genes that have been associated with GIN resistance and occur in our overlapping candidate regions include FGF2, FAM78B, SPATA5, SPPL2B and FAM96B. FGF2 also known as basic fibroblast growth factor (bFGF and/or FGF-β), is a growth factor and signaling protein that can synergize with IL-17 in the gut to activate the ERK pathway and induce genes for repairing damaged intestinal epithelium82. The gut epithelium is essentially the first line of defence against microbiota and pathogens and therefore, it plays a critical role in enhancing mucosal immunity. FAM78B (Family with Sequence Similarity 78-Member B) was in an overlapping region on OAR 1. In a genome-wide association study of endo-parasite phenotypes, FAM78B was found in a region on bovine chromosome 3 that spanned SNPs that were most strongly associated with antibody response to O. ostertagi40. SPATA5 (Spermatogenesis Associated 5) was among five genes that were included in the most significant functional term cluster linked with immunity-related and cell-proliferation processes in an investigation of genomic regions and genes for gastrointestinal parasite resistance in Djallonke sheep83. SPPL2B (Signal peptide peptidase-like 2B), a member of the signal peptide peptidase-like protease (SPPL) family, localizes to endosomes, lysosomes and plasma membrane and plays a role in enhancing innate and adaptive immunity by cleaving TNFα in activated dendritic cells that triggers IL-12 production84. In humans and chicken, there is strong evidence linking polymorphisms in FAM96B (family with sequence similarity 96 member B) with gastrointestinal and metabolic diseases, and with the development of the digestive system and disorder networks85, 86. These findings suggest that several mechanisms may be employed to trigger and sustain GIN resistance in sheep under traditional grazing and exposure to multiple GIN species.

Our analysis revealed several candidate genomic regions spanning a number of production (growth), and meat and carcass (fatness and anatomy) QTLs. This result was unexpected given our analytical strategy. Kipper et al.87 reported a reduction of 5% and 31% in average daily feed intake and average daily weight gain, respectively in parasitized pigs. Endoparasites tend to limit host nutrient availability by reducing host food intake, digestion, absorption and nutrient assimilation, resulting in nutritional deprivation and destabilization of host growth and development88, 89. To counter against these negative effects, we suggest that natural selection may be acting on the regions spanning growth QTLs to ensure growth and developmental stability of the study populations under GIN challenge. The parallel selection signatures overlapping the growth QTLs may therefore be adaptive strategy that ensures the survival of the study populations.

In conclusion, we assessed the diversity and population structure of two cohorts (infected and non-infected) of autochthonous sheep from Tunisia that were classified based on FEC levels. The two cohorts were characterised by similar levels of genetic diversity and the same genome background suggesting common history and genetic admixture. Four methods of detecting selection signatures identified regions of the genome that were most likely associated with GIN resistance suggesting that the animals have established a certain level of immunity under natural challenge. The functions of the putative genes and the overlapped QTLs suggest multiple strategies, including host immune response, intestinal epithelium damage repair, mucus production and parasite expulsion, play a role in GIN resistance in sheep. This may be due in part to the fact that GIN resistance is the net outcome of many physiological and immunological pathways, and thus resistance in animals could be owing to variation at a large number of loci. Furthermore, natural selection is also acting concurrently on regions spanning growth related QTLs to ensure developmental stability under GIN challenge. Although the data used in our study is relatively small, we believe this does not compromise the integrity of our findings as it is compensated for by the high density of the marker loci used. We have also limited our discussion to candidate regions that were specific to the non-infected cohort and those that overlap between at least two analytical approaches. This research confirms the importance of obtaining data from local sheep populations managed in their production environments to gather information on genomic regions of functional significance in GIN resistance.