Background

Gastrointestinal nematodes are one of the most serious causes of disease in domestic ruminants worldwide [1, 2]. Production losses due to parasitism are two-fold; the direct cost of anthelmintic treatment and production losses due to ill-thrift and in extreme cases death [3]. In the face of the increasing incidence of anthelmintic resistance and the need to minimise drench residues in animal products, new strategies for control are required [4].

Breeding for host resistance has been shown to be a viable method of nematode control [5]. Host resistance is heritable, with wide variability among individuals, and rapid genetic progress has been demonstrated in both research and commercial flocks [6, 7]. Moreover, computer simulation models have shown that selection for host resistance, using the phenotype low faecal worm egg count, should be stable over a short time frame such as 20 years [8]. This is supported by field data, where it was shown that when gastrointestinal nematodes were exposed to genetically resistant or susceptible sheep over a sustained period of time they showed no evidence of adaptation to their host [9]. These findings support the hypothesis that resistance is determined by many genes each with a relatively small effect [10] and that selection for parasite resistance based on faecal egg count (FEC) is sustainable in the medium to long-term.

With sheep it is possible to manipulate breeding lines to produce strong phenotypic differences, in well-defined pedigrees, in a relatively short space of time. Reduction of variation in genomic regions surrounding a beneficial mutation due to strong and recent selection is known as a “selective sweep”; identification of regions that have undergone selective sweeps can help to reveal genes underlying phenotypic differences. Different statistics pick up different patterns of variation left by selection of a beneficial mutation. Wright’s fixation index (FST) is a single marker test that detects highly differentiated alleles, where positive selection in one area causes larger frequency differences between populations as compared to neutrally evolving alleles. Peddrift [11] is a program that also uses single markers to calculate exact probabilities of allele frequency differences, by using the recorded pedigree structure to take into account minor allele frequencies, genetic drift, founder and sampling effects. Evidence of selection is shown by divergence from the expected distribution (given by a P-value). Unlike FST and Peddrift, tests based on linkage disequilibrium, such as the extended haplotype homozygosity (EHH) statistic and its derivatives, are dependent on SNP spacing and frequency, as they are multi-marker tests. The integrated haplotype score (iHS) [12] and cross population EHH (XP-EHH) [13] tests are both based on extended haplotype. While iHS detect partial selective sweeps a moderate frequency (~50-80%), XP-EHH detects alleles that have risen to near fixation in one population (>80% frequency), but remain polymorphic in the population as a whole. Studies that search for signatures of selective sweeps tend to use multiple tests as they are largely complementary; iHS and XP-EHH have been used to search for recent positive selection in humans [13, 14], as well as other species such as cattle [15, 16].

Recent advances in genomic technologies have provided new opportunities to detect regions in the sheep genome that have undergone selection. The advent of the SNP50 BeadChip provided 54,241 evenly spaced Single Nucleotide Polymorphisms (SNP) across the sheep genome for association analysis. The chip has already been utilised to map causal mutations for traits controlled by a single locus [1721] and detect signatures of selection among sheep breeds [22, 23]. The identification of genes or linked markers that have a significant association to parasite resistance could accelerate the genetic improvement of resistance to internal nematodes through marker-assisted selection [24]. Additionally, identification of genes under selection in animals selected for resistance or susceptibility to gastrointestinal parasites will help in our understanding of the biological processes underlying this trait.

The SNP50 BeadChip provides a rapid way to detect regions under selection, which can be further fine-mapped using Sequenom® or other technologies. To this end, lines of sheep that have been selected for resistance, resilience, or susceptibility coupled with high-density genetic maps are a key resource that would enable future marker assisted selection of animals without the need for parasite challenge. Here we utilise data from Romney and Perendale parasite selection lines to conduct whole genomic screens for selection, in the hope of identifying loci, within and between the two breeds, that affect individual host resistance or susceptibility to nematode parasites.

Results

Quality control

After quality control (see methods) the final data set consisted of 46,736 SNP for the Romney data set and 48,436 SNP for the Perendale data set. In total 177 Romney (82 high FEC and 95 low FEC) and 146 Perendale (72 high FEC and 74 low FEC) animals passed the quality control. The average MAF of the remaining SNP over all samples was 0.24 (SD = 0.16) in the Romney data set and 0.26 (SD = 0.15) in the Perendale data set.

Genome-wide analysis

Two analytical methods, FST[25] and Peddrift [11], were used to detect differentiation between resistant and susceptible animals based on SNP allele frequencies. While FST is a generic population differentiation statistic, Peddrift is specific to this example in that it was designed to account for structure within the population surveyed.

As individual SNP may not show a strong signal, a 5-SNP moving average (WIN5) was used to identify regions with strong signatures of selection over multiple SNP, which also reduces noise [26]. The average WIN5 FST value in the Romney selection lines (Figure 1A) was 0.0567 (SD = 0.0386), while differentiation was lower in the Perendale selection lines (Figure 1B) with an average WIN5 FST of 0.0299 (SD = 0.0388). A total of 16 genomic regions contained the top 0.1% of markers (Table 1) ranked using WIN5 –log10(combined Peddrift P-values) (Figure 1C), with four regions containing genes that have previously been implicated or are candidates for resistance or susceptibility to gastrointestinal nematodes. The first region, on chromosome 1 (region 2), contains the leukocyte surface antigen CD53, as well as DENND2D and three genes from the chitinase family, acidic mammalian chitinase (CHIA), chitinase 3-like 2 (CHI3L2) and oviduct-specific glycoprotein (OVGP1). Selection was also observed on chromosome 4 (region 5), chromosome 16 (region 14) and chromosome 19 (region 15), containing genes previously implicated in resistance to gastrointestinal nematodes.

Figure 1
figure 1

Genome wide signatures of selection. A moving window of 5 FST values between the resistant and susceptible Romney (A) and Perendale (B) lines. (C) A moving average (of 5 SNP) showing the one-tailed probability of the chi-squared distribution of the combined Romney and Perendale Peddrift P-values. Results are expressed as -logl0 (significance probability). Regions of interest as defined by WIN5 –log10 (combined Peddrift P-values) (Table 1) are shown in red.

Table 1 Genomic regions containing the top 0.1% of SNP, ranked using a moving (5 SNP) average (WIN5) of –log 10 (combined Peddrift P-values)

Investigation of selection sweep on OAR1

In total, 41 extra SNPs were genotyped in region 2; after quality control using the same criteria applied to the SNP50 BeadChip data, 15 of these SNP were used for further analyses. As a consequence of genotyping extra SNP, the peak FST value in the region increased slightly from 0.3475 (Table 1) to 0.3895.

The LD correlation coefficient r2 in region 2 was calculated for each of the selection lines separately (Additional files 1, 2, 3 and 4). All four analyses showed a haplotype block between 12 SNP (Table 2) in region 2. The Romney selection lines showed high linkage disequilibrium (r2 > 0.8) within the haplotype block, consistent with selection being imposed on the locus [27].

Table 2 The 11 SNP core haplotype shown by Sweep (v1.1) to be in LD in region 2 (Table 1 ) on chromosome 1

Analysis using Sweep (v1.1) showed that 11 of the 12 above SNP created a haplotype block. In the Romney lines two contrasting haplotypes were observed, denoted 4 and 8 (Table 2). In the Romney lines haplotype 4 was present in 88.4% of the susceptible population, and 32.1% of the resistant population. In the Perendale animals the frequency of haplotype 4 was higher in the resistant animals (65.5%) compared to the susceptible animals (32.5%). There were six additional haplotypes observed in the Perendale selection lines, although they were less frequent (2-8% of the population).

The integrated haplotype score (iHS; Figure 2B) [12] and cross-population extended haplotype homozygosity test (XP-EHH; Figure 2C) [13] are designed to uncover selected alleles with higher frequency than expected to their haplotype length. The most significant results were in the Romney susceptible animals where iHS values approached significance (P = 0.0518).

Figure 2
figure 2

Signatures of selection observed in region 2 (Table1) in Romney and Perendale selection lines. (A) FST between resistant and susceptible lines, -log P-values from standardised |iHS| (B) and |XP-EHH| (C) analyses.

Sequencing

After examination of the signals of selection in region 2 (Figure 2), the candidate gene CHIA (chitinase, acidic) was chosen for exon sequencing. CHIA has previously been associated with the control of helminth infection [28]. Other genes in the region include CD53, CHI3L2, CHIA, and DENND2D. Sequencing the CHIA exons of animals homozygous for both haplotypes showed the presence of several SNP (Additional file 5), however there were no SNP that distinguished the animals of different haplotype or selection line. One SNP at base 1174 of the ovine CHIA mRNA could potentially differentiate animals homozygous for haplotype 4 or 8. This, however, would require genotyping in further animals to validate.

Discussion

Using single-marker tests for differentiation between selection lines, multiple areas were discovered where allele frequency differed between resistant and susceptible lines (Figure 1). This was expected, as variation in complex traits such as resistance to parasites are understood to be controlled by many polymorphisms, each of a small effect [10]. The classic model of a selective sweep involves a beneficial allele being rapidly driven to fixation (‘hard sweep’). However, with complex traits selection may occur through polygenic adaptation, where adaptation occurs by simultaneous selection on variants at many loci. Selection under a polygenic adaptation model would result in modest allele frequency changes across the genome, which may be undetectable using current methods for detecting selection [29]. Despite this, the ‘hard sweep’ and polygenic models are not mutually exclusive, and the alleles with largest effect sizes may sweep to fixation [30]. Polygenic traits will therefore show increased FST across the genomes, with alleles of a large effect showing increased FST in that particular region.

Divergent lines of Romney [6, 31] and Perendale [7] sheep were selectively bred for high and low FEC by AgResearch Ltd from 1978 and 1986, respectively (Table 3). One of the aims of this study was to discover if the Romney and Perendale selection lines have utilised the same genetic strategy in developing resistance or susceptibility to internal parasites. Combined Peddrift values were used to define the regions to examine for candidate genes as the test was designed to account for structure within each of the populations surveyed. While peaks were observed in both lines, these were better defined when smoothing, via a 5-SNP window, was applied (Figure 1C).

Table 3 Summary data from Romney and Perendale selection lines[6, 7]

It must be noted that the strongest signals of selection were observed in the Romney selection lines, and the strength of the selection would have affected the combined data. As expected, the most extreme values for all statistics in the Romney selection lines were larger than those observed in the Perendale selection lines.

The Perendale lines have not been selected for as long (23 versus 31 years) and the genetic divergence in the selected trait is only half as large (Table 3). The effective population size of the foundation animals is also likely to have had a strong bearing on the differences between the breeds, although this is difficult to determine due to the combination of two separate Romney selection lines. The effective population size (Ne) for NZ industry Romneys 5 generations ago was estimated as 226 and for Perendale 109, derived from extensive analysis of more than 10,000 New Zealand animals genotyped with the SNP50 BeadChip [32]. As an interbreed of Cheviot and Romney [33] the Perendale animals are also likely to have a higher (Ne) [34, 35] than a pure breed, which may contribute to the observed pattern in the data. However, since its establishment, the Perendale breed has had a smaller population, which may contribute to low LD between closely spaced markers distances but greater LD between distant markers.

In the regions showing signatures of selection, candidate genes were defined as those with a previously reported role in immunity. We recognise that by examining in detail only those genes with obvious functional links to immunity we have eliminated some genes that could have novel and unexpected effects on the trait concerned, or may contain as yet unidentified features that have an effect separate from the gene itself. However, we believe our approach is a tractable solution, with the data and annotation currently available, and there will be potential to extend this analysis in the future. For example, several regions that appeared to be under selection from our analyses appear to contain no underlying genes (Table 1; Additional file 6). The current annotation of the sheep genome is not as comprehensive as that in humans or even cattle, and these areas cannot be completely dismissed as containing no genes or regulatory elements. This can only be improved following the recent annotation of version 3.1 of the sheep genome by Ensembl (Ensembl release 74). It has also been observed that while some proposed candidates for selection have strong support in the form of a functional mutation with an identified phenotypic effect, often the functional target is unknown [36].

The discovery that the same core haplotype (haplotype 4) in region 2 (Table 2) is observed in both susceptible Romney and resistant Perendale animals does not have an obvious explanation, but could be due to epistatic effects or a recent novel mutation. There was no correlation between haplotype and average estimated FEC breeding value. Following this, there are several possible reasons for the observed differences. It appears that selection in region 2 is primarily occurring in the Romney susceptible line. This is supported by the greater number of haplotypes that were observed in the Perendale selection lines in the Sweep (v1.1) analysis. Sequencing the CHIA exons of animals homozygous for both haplotypes showed the presence of several SNP; however none were responsible for the observed haplotype. The observed effect could also be driven by a regulatory element, such as a transcription factor, that could be interacting with a locus or loci in other parts of the genome [37]. In addition, while perhaps not the most likely scenario, a causal mutation in the region could have occurred separately in Perendale and Romneys, on the opposite haplotype block, which would explain the differences observed. Unravelling this, however, is complicated by the fact that the Perendale breed was formed in 1956 by crossing a Cheviot over a Romney, thus half of the Perendale genome is in effect of Romney origin.

Comparison with other studies showed that only two of the regions identified using Peddrift values (Table 1) were contained within a previously identified QTL (Sheep QTLdb [38]). Region 8 overlaps a QTL located on chromosome 7 (CSAP35E–MCM149; OAR7:44,018,971-81,694,614) for resistance to Haemonchus contortus infestation in merino sheep [39]. The QTL was not considered by the authors a good candidate for fine-mapping because evidence for the QTL decreased with confirmatory mapping. Region 16 is contained within two suggestive QTL detected on chromosome 25 in a genome scan for for resistance to Haemonchus contortus resistance in Romane x Martinik Black Belly backcross lambs [40]. The suggestive QTL, for sex ratio in the adult worm population (0.4-40.7 Mb; OARv2.0) and packed cell volume after second challenge (6.6-44 Mb; OARv2.0), were discovered using linkage analysis with SNP data.

Previously many studies have focussed specifically on chromosomes 3 and 20, which contain interferon gamma (IFNG) and the major histocompatibility complex (MHC) region respectively. The SNP50 BeadChip contains four SNP within the IFNG locus (OAR3:151,528,059-151,532,204); the maximum WIN5 –log10 (combined Peddrift P-values) in the region was 0.62, which was only slightly higher than the genome-wide average of 0.42. The Romney and Perendale FST peaks were 0.0505 and 0.0377 respectively, which when compared to the genome wide distributions (Figure 2A & B) is fairly low. Both Romney and Perendale selection lines showed no obvious signals of selection on the other common candidate region, the MHC region on chromosome 20, with a chromosome-wide WIN5 –log10 (combined Peddrift P-values) peak of 1.56. While this value is reasonably high when compared to the genome-wide distribution (Figure 1), the highest ranked SNP in the region, going by WIN5 –log10 (combined Peddrift P-values), was 167th (OAR20_1876702).

Four regions (Table 1) contained genes that have previously been implicated or are candidates for resistance or susceptibility to gastrointestinal nematodes; OAR 1 (CD53, CHI3L2, CHIA and DENND2D), OAR 4 (RELN), OAR 16 (NSUN2) and OAR 19 (HRH1).

The leukocyte surface antigen CD53 contributes to the transduction of CD2-generated signals in T cells and natural killer (NK) cells [41]. NK cells have been shown to produce significant amounts of IL-5, which contributes to eosinophil recruitment in an in vivo mouse model of allergic inflammation [42], and may also be involved in T-cell-independent eosinophil recruitment after helminth infections [43]. The CD53 protein forms functional interactions with prominent leukocyte receptors including MHC molecules and the surface of B cells [44], and has been shown to be down-regulated upon stimulation of human neutrophils with TNF-α [45]. In humans CD53 deficiency has been associated with recurrent infectious diseases caused by bacteria, fungi and viruses [46], and polymorphisms in the gene have been associated with regulation of TNF-α levels [47]; up-regulation of TNF-α has been observed in the abomasal lymph node of sheep challenged with T. circumcincta 5 days after infection [48], and abomasal mucosa of resistant DRB1*1101 carrier lambs at 3 days post infection [49].

Chitinases are a group of digestive enzymes that break down glycosidic bonds in chitin, which is present in fungi and the exoskeletal elements of some animals, including nematodes and arthropods [50]. Mammalian chitinases and chitinase-like proteins are known to be up-regulated and secreted in TH2 induced inflammatory responses, including nematode infection [51] suggesting these genes are plausible candidate genes for mediating resistance status.

CHIA[52] has previously been associated with the development of the immune response in mammals and control of helminth infection [28]. Induction of CHIA is via a TH2 specific, IL-13 mediated pathway, and has been implicated in TH2 dominated disorders such as asthma [53]. In mice it has been shown that chitin is a recognition element for tissue infiltration by innate immune cells implicated in allergic and helminth immunity (including eosinophils and basophils) and this process can be negatively regulated by a vertebrate chitinase [54]. Despite this, there is no evidence in the literature that CHIA has previously been implicated in increased resistance or susceptibility to gastrointestinal parasites in ungulates.

Chitinase-like proteins can bind chitin, however, due to mutations in their active domains they do not have chitinolytic enzyme activity [28]. The chitinase-like molecule, CHI3L1, has been shown to be up-regulated in the abomasum of sheep in response to T. circumcincta challenge of previously infected animals [55]. CHIA expression levels were also examined in the same study, and while expression was observed up-regulation of transcripts was not significant. Expression of CHI3L2 (UGID: 1230481; http://www.ncbi.nlm.nih.gov/UniGene) has been observed in the abomasum of 18 and 21 week old steers exposed to Ostertagia ostertagi[56]. Expression also been observed in the abomasal lymph node of resistant and susceptible Blackface lambs infected with T. circumcincta in comparison to sham-infected controls [57]. In human macrophages CHI3L2 has been found to be upregulated by IL-4 and TGF-β [58].

While the TH1/TH2 dichotomy has not been proven in sheep, the components involved in response to gastrointestinal parasite infection are typical of a TH2 pathway; immunity is associated with the production of TH2-associated cytokines, increased numbers of mast cells, peripheral and tissue eosinophilia, and elevated production of multiple antibody isotypes [5962]. HRH1 is predominantly expressed on TH1 cells, in an IL3-upregulatable manner [63]. Mice lacking HRH1 had lower percentages of Interferon-γ (IFNG)-producing cells, and produced higher levels of antigen-specific IgG1 and IgE. Mice lacking either HRH1 or HRH2 tended to have a higher frequency of IL4-producing cells. Jutel et al. [63] concluded that histamine secreted from mast cells and basophils potently influences TH1 and TH2 responses, as well as antibody isotypes, as a regulatory loop in inflammatory reactions. In Blackface lambs challenged over a period of three months with T. circumcincta, significantly increased levels of HRH1 expression in the abomasal lymph node were observed [57].

While the genes DENND2D, RELN and NSUN2 do not have obvious roles in immunity, they have previously been reported as being upregulated in susceptible animals. The DENND2D protein was found to be more abundant in genetically susceptible sheep following infection by H. contortus[64]. RELN was upregulated in Suffolk (susceptible) compared to Texel (resistant) animals three days post infection with T. circumcincta[65]. Finally, in a study comparing gene expression in the duodenum following natural infection of lambs from the Perendale selection lines used in this study, NSUN2 was found to be more highly expressed in susceptible animals [66].

For complex traits, where many loci of small to moderate effect are likely to influence phenotype, the 50,000 SNP available for ovine analysis, which are also biased to high MAF SNP, may not provide enough information. In sheep, single markers were estimated to explain a maximum of 0.48% or 0.08% of the phenotypic variance in FEC following challenge with either T. colubriformis or H. contortus respectively [10]. It has been suggested in cattle, based on the FST difference between adjacent loci, that 150,000 evenly spaced SNP would be required to study selective signatures across the bovine genome [67].

In humans, the search for selective sweeps is aided by a large number of densely spaced SNP, with over 3.1 million SNP available from Phase II of the HapMap project (approximately one marker per 1 kb) [68]. Densely spaced SNP give greater power when using statistical tests that rely on linkage disequilibrium (LD), as signals of selection are less likely to be lost. The SNP50 BeadChip, while providing uniform genome-wide coverage, is estimated to have only one marker every 46 kb. Fine-mapping, where more SNP are genotyped in an area of interest using, for example, Sequenom® technology, allows further analysis of LD in areas that appear to be under selection. With the information obtained from more SNP, definition of LD in the area increases, improving the ability to be able to localise causal variants using numerous statistical methods, such as iHS and XP-EHH, that have been developed to identify signatures left in the genome by selection.

As previously discussed the SNP50 BeadChip has already been used to map causal mutations for traits controlled by a single locus, and furthermore used to validate and detect selection sweeps in sheep [22, 23]. While it is perhaps surprising that only two of the regions under selection correlated with a previously identified QTL, this lends support to the widely held theory that parasite resistance is under the control of many loci with a moderate effect. New genomic approaches, including the SNP50 BeadChip, and sequencing of whole genomes [69] and transcriptomes [70], provide an opportunity to rapidly look for and find genome-wide signals of selection [71, 72].

Conclusions

Genome wide analysis of selection signatures revealed 16 regions, which included genes involved in chitinase activity and the cytokine response. Many of the signals of selection found in this study are novel observations; further knowledge of the genes involved in gastrointestinal parasite resistance or susceptibility can only increase our understanding of the mechanisms involved.

Methods

Ethics statement

This study was carried out in strict accordance of the guidelines of the 1999 New Zealand Animal Welfare Act and was approved by the AgResearch’s Invermay Animal Ethics committees (Permit Numbers include: 497; 551; 593; 636; 10441; 10820).

Selection lines

Divergent lines of Romney [6, 31] and Perendale [7] sheep were selectively bred for high and low FEC by AgResearch Ltd from 1978 and 1986, respectively (Table 3). The Perendale selection flocks were established from an initial group of 111 rams, ranked for FEC, with the high and low FEC animals mated with 148 foundation dams. The number of foundation animals for the Romney selection lines is more difficult to define, due to divergent lines from two separate locations being merged to make the final selection lines in 1993 [6]. Selection lines have now been discontinued. Animals were selected as lambs solely on the basis of FEC following a natural mixed species nematode challenge. The predominant parasites were of the Trichostrongylus and Teladorsagia genera, however Cooperia, Haemonchus, and Nematodirus species were also present, with other genera being less abundant [6, 7, 31]. In the 1984–89 lamb crops, of the Perendale selection lines, the natural challenge was augmented further by an artificial challenge with H. contortus larvae.

Genotyping data

Animals were genotyped using the SNP50 BeadChip (Additional file 7), using high concentration DNA obtained from heparinised blood [73]. In total 180 Romney (83 high FEC animals and 97 low FEC animals) and 149 Perendale (74 high FEC animals and 75 low FEC animals) animals were genotyped. Using pedigree information, animals were chosen to be as unrelated as possible, however 66 sires and dams were also included (17 sires and 10 dams from the Romney lines, and 3 sires and 36 dams from the Perendale lines).

SNP locations for version 3.1 of the sheep genome were obtained from CSIRO (http://www.livestockgenomics.csiro.au/sheep/oar3.1.php; OARv3.1). Minor allele frequency (MAF) and call rate was calculated for each SNP. Quality control checks [74] excluded those SNP that had a call rate less than 99% and a MAF (over all animals of a breed) of less than 2%. Individual animals were removed from the analysis if there were more than 1% genotyping failure. Additionally, SNP not in Hardy-Weinberg equilibrium (HWE; p < 10-6) within selection line were also excluded. The Bonferroni correction was used to address the problem of multiple comparisons [75]. An experiment-wise significance level (α = 0.05) was chosen, and the number of tests was taken to be the number of SNP (n = 50,000), giving β = α /n = 1 × 10-6 as the test-wise significance level for HWE. This is conservative as the Bonferroni correction factor is based on independent tests. While departure from HWE may be caused by selection, it is more likely that extreme violations indicate a poorly performing SNP [76].

Genome-wide analysis

Two single-marker tests for differentiation, FST and Peddrift, were used to distinguish signals of selection between selection lines from whole-genome data. FST, which describes the proportion of variance within a species that is due to population subdivision, was calculated using Fisher’s [25] method for each breed:

F ST = var p p 1 - p

Where the variance of p is computed across sub-populations, and p(1-p) is the expected frequency of heterozygotyes. The value of FST can theoretically range from 0 (no differentiation) to 1 (complete differentiation, in which subpopulations are fixed for different alleles).

Allele frequency differences at each SNP were also compared using Peddrift [11]. Peddrift calculates exact probabilities of allele frequency differences, taking into account genetic drift, founder and sampling effects. The method simulates genotypes through the actual pedigree data. Evidence of selection is shown by divergence from the expected Chi-squared (X2) distribution. Peddrift was run for both Romney and Perendale lines together using known pedigrees (with 2,000,000 simulations) to estimate the distribution of X2 under the null hypothesis of no selection. Results are expressed for each breed as a P-value for each marker.

To explore regions under selection across both breeds, the Peddrift P-values for each SNP were combined; if they have the same overall hypothesis, results from two independent tests can be combined using Fisher’s method [25], using the formula:

X 2 =-2 i = 1 k log e p i

where pi is the P-value for the ith hypothesis test. The combined P-value was found by comparing X2 to a χ22k distribution. To reduce noise a 5 SNP moving average (WIN5) of –log10 of the combined P-values was used; signatures of selection are shown by SNP in a region showing low P-values. The concordance between Peddrift p-values for each SNP in Romney and Perendale was investigated by setting a p-value upper threshold of 0.01. There were 21 SNP under this threshold in both breeds, more than would be expected if there was no association in the two breeds by chance (14), suggesting that some regions had been selected in common which supports using the combined approach.

SNP were ranked using WIN5 –log10 (combined Peddrift P-values), and the top 0.1% of markers (n = 44) were used to determine regions under selection. The method of Kijas et al. [23] was used to define the boundaries of the regions; neighbouring markers were included until two consecutive markers ranked outside of the top 5%. The second marker that ranked outside of the top 5% was excluded and the position of the region determined using sheep genome assembly 3.1.

Candidate regions were annotated using Ensembl release 74 (as of 1/2014), and gene function determined using Online Mendelian Inheritance in Man (OMIM) and a literature search. Candidate genes were defined as those with a known role in the immune response. Sheep QTL were obtained from the Sheep QTLdb [38].

Detecting signatures of selection

Fine-mapping allows further analysis of LD in areas that appear to be under selection; with the information obtained from more SNP, definition of LD in the area increases, improving the ability to be able to localise causal variants. One region on chromosome 1 (region 2) was chosen for fine-mapping with a denser set of SNP (Additional file 8), using iPLEX™ genotyping assay for the Sequenom® MassARRAY® platform. This region was chosen for fine-mapping as it contained multiple candidate genes. Selection sweep statistics were subsequently used to clarify the observed signals of selection.

All known SNP in region 2 were examined for suitability for sequencing on the Sequenom® MassARRAY® platform; these included SNP discovered on both the Solexa and 454 platforms (http://www.sheephapmap.org/genseq.php). In total 41 extra SNP were genotyped.

Linkage disequilibrium (LD) between two loci was visualised using the correlation coefficient r2 within each selection line using Haploview [77], with areas of strong LD indicating areas under selection.

Haplotype phase estimation was performed using fastPHASE [78]. Haplotypes were subsequently used to calculate the selection statistics EHH, XP-EHH and iHS. The EHH statistic was computed using Sweep v1.1 [79], while the iHS and XP-EHH statistics were calculated using scripts obtained from the Pritchard lab (http://hgdp.uchicago.edu/Software/). Standardized iHS (|iHS|) was calculated using the genome-wide empirical distributions, following the method of Voight et al. [12]. Ancestral alleles for the SNP50 BeadChip SNP were obtained from Dr Clare Gill of Texas A&M University (2009, pers. comm.), and were determined by running 11 outgroup bovid species on the SNP50 BeadChip. A cross-species megaBLAST of Sequenom® primers against Bos taurus, Sus scrofa, Canis familiaris, Equus caballus and Homo sapiens was used to discover ancestral alleles for the remaining SNP.

Sequencing

Four animals were chosen for sequencing using standard amplicon sequencing (Additional file 9) with BigDye technology on an AB3730XL (Applied Biosystems). Animals consisted of one resistant and susceptible animal from each breed. Animals were selected based on homozygosity of an 11 SNP core haplotype shown by Sweep (v1.1) to be in LD (Table 2). Forward and reverse sequences were combined into contigs using Vector NTI® (Invitrogen), and consensus sequences BLASTed back against Ovis aries CHIA mRNA (XM_004002314.1) to search for SNP.

Data availability

The data sets supporting the results of this article are included within the article (and its additional files).