Background

South Africa has a rich variety of cattle breeds, i.e. Sanga types (e.g. Afrikaner and Nguni), European Bos taurus breeds (e.g. Angus, Hereford and Holstein), those of unclear origin such as the Drakensberger breed, and some locally developed composite breeds (e.g. Bonsmara and Brangus). Nguni and Afrikaner cattle are indigenous breeds that have been farmed for centuries in South Africa [1]. During the mid-20th century, Afrikaner cattle were crossbred with Bos taurus breeds that originated from Europe such as Hereford and Shorthorn to develop the Bonsmara breed [1]. Afrikaner, Drakensberger and Bonsmara cattle are used for beef production, while the Nguni is a dual-purpose breed that is farmed for beef and milk production, particularly in traditional farming systems. Afrikaner cattle are well adapted to the veld conditions of the warm, arid and extensive grazing areas of South Africa, and are known to have a lower susceptibility to most of the country’s endemic diseases such as redwater, heartwater and gallsickness [2]. Nguni cattle are farmed in a variety of biomes in South Africa, which are characterized by periodic drought, seasonal dry periods and nutritional shortages in the natural veld, and this breed is also resistant to a variety of external and internal parasites and stock diseases [2]. Drakensberger cattle are concentrated in the sourveld regions of South Africa, and are used in extensive and intensive beef production systems. All these breeds have participated in animal recording systems since the early 1960s [3] and have been subjected to selection for traits of economic importance such as reproduction and growth. The process of domestication, subsequent breed formation and artificial selection, coupled with the recent rapid decrease in effective population size from a very large ancestral population, has left detectable signatures of selection in numerous regions of the cattle genome [4]. When selection acts on a mutation, it also affects linked sites and leaves a signature in the flanking chromosomal regions. Signals that can be observed on selected genes include: (1) a spectrum of allele frequencies among closely linked sites that is shifted towards extreme frequencies, (2) an excess of homozygous genotypes, and (3) a high frequency of long haplotypes [5].

The availability of high-density single nucleotide polymorphism (SNP) genotyping assays has made it possible to scan the cattle genome for positions that may have been targeted by selection [6]. The detection of signatures of selection is relevant since it may contribute to better understand the mechanisms that underlie traits that have been exposed to intensive natural and artificial selection. Such information also provides important insights into the mechanisms of evolution [7], selection of loci for breeding and selection programs [8] and is useful for the annotation of significant functional genomic regions [9]. However the detection of selection signatures is challenging for several reasons. First, the effects of selection on the distribution of genetic variation can be confounded with patterns of genetic variation that are caused by demographic events such as the size, structure and mating pattern of a population [10]. Adaptive hitchhiking, population expansion and population reduction (e.g. bottlenecks) can also result in an excess of rare alleles [11]. Second, most studies have been conducted using SNP assays that contain only common SNPs. Thus, the variability and distribution of allele frequencies and the levels of linkage disequilibrium (LD) are all strongly affected by this SNP ascertainment bias [9]. Despite these challenges, the detection of signatures of selection has been the focus of several theoretical (simulated) and empirical (observed) studies [8, 12, 13].

Several methods have been used to detect selection signatures, including those based on LD, spectra of allele frequencies and characteristics of haplotype structures in selected populations [14]. These methods have been used to infer genomic regions that were affected by domestication, breed formation and selection for specific production traits in livestock. In chickens, Rubin et al. [15] detected selective sweep regions that are potentially associated with domestication and the specialization of broiler and layer birds using sequence data. They also found a region that harboured the TSHR gene that is associated with metabolic regulation and photoperiod control of reproduction in vertebrates. In pigs, putative selective sweeps were reported on chromosomes 1 and 3 [16]. In addition, genomic regions that contain the IGF2, PRLR and GHR genes were shown to have been exposed to intensive selection in pigs [17]. Furthermore, genomic regions that are associated with behaviour, immune response and feed efficiency were detected based on F ST (fixation index) estimates of divergence in cattle using high-density SNP assays [4]. Using population differentiation (F ST) and Integrated Haplotype Score approaches, Qanbari et al. [18] identified 236 genomic regions that are potentially under selection in Holstein cattle. Both approaches suggested selection in the vicinity of the SIGLEC5 gene on Bos taurus chromosome (BTA) 18, a region that was shown to include a major quantitative trait locus (QTL) with large effects on productive life and fertility traits in Holstein cattle [18]. Studies based on sequence data do not suffer from SNP ascertainment bias as do studies that are performed using commercially available SNP assays.

The possibility that variants with large effects may underlie the adaptation of South African cattle breeds has prompted investigations on the genetic basis of adaptation to ticks, parasites, drought and diseases [1921] and of their ability to produce good quality beef [22]. In a study by Makina et al. [23], some signals of admixture and genetic relatedness were detected between the Afrikaner, Nguni, Drakensberger and Bonsmara breeds. Allowing for six ancestral populations revealed that the Nguni breed shares ancestry with the Afrikaner breed, with approximately 8 % of its genome derived from the Afrikaner breed. The Bonsmara breed shares ancestry with both Nguni (3 %) and Afrikaner (5 %) breeds, while the Drakensberger breed shares 5 % of its genome with the Nguni and Bonsmara and only 3 % with the Afrikaner breed. Besides, the indigenous and locally-developed South African cattle breeds and European Bos taurus (Angus and Holstein) breeds have been shown to be clearly differentiated [23], which agrees with their separate histories of domestication and long divergence time periods [24]. However, little is known about the genetic variation that underlies traits of economic importance in cattle breeds of South Africa. Consequently, we conducted a genome-wide scan across six South African cattle breeds to identify genomic regions that have been exposed to strong selection during domestication, breed formation and creation of biological types.

Methods

Animal samples and quality control

A total of 249 animals representing the Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31) and Holstein (n = 29) breeds were genotyped using the Illumina BovineSNP50 BeadChip v2 which features 54,609 SNPs distributed throughout the bovine genome with an average spacing of 47 kb [25]. The genotyped samples were derived from a previous study [23] and were approved for this research by the University of Pretoria Ethical Committee (E087-12). Blood, hair and semen were used to extract genomic DNA. These samples were selected based on pedigree data to select against full-sib and half-sib animals in order to maximize the genetic diversity represented within each sampled population. Furthermore, identity-by-descent analysis was performed using the data generated from the Bovine SNP50 BeadChip to select only the individuals with an identity score of less than 0.25 using PLINK version 1.07 [26]. Only SNPs that were uniquely mapped to autosomes on the UMD3.1 assembly were included in the analyses. Samples with more than 10 % missing genotypes were excluded.

Two methods were used for quality control of the data. The first analytical approach detected selective sweeps within each breed by searching for local reductions in genetic variation using minor allele frequencies (MAF). Thus, the BovineSNP50 data were first filtered to retain loci with a call rate per breed of at least 95 % and 51,406 (Afrikaner), 50,870 (Nguni), 50,389 (Drakensberger), 51,242 (Bonsmara), 50,922 (Angus) and 52,294 (Holstein) SNPs remained. The second analytical approach targeted the identification of signatures of divergent selection between breeds using population differentiation (F ST). Thus, SNPs with a call rate less than 95 % and a MAF less than 2 % across all breeds [26] were removed leaving 45,657 SNPs. Furthermore, SNPs that were in high LD were pruned using indep 50 5 2 in the PLINK version 1.07 [26]. A total of 21,290 SNPs remained after pruning and were used for the detection of signatures of selection using F ST. Pruning of SNPs that are in high LD has been shown to reduce the mean SNP heterozygosity within the European cattle breeds that were used to discover the common SNPs for the design of the BovineSNP50 assay and therefore it partially counters the effects of SNP ascertainment bias [27].

Identification of selection signatures

Combining alternative approaches to detect selection signatures has been suggested as a means of increasing the reliability of these studies [5]. Thus, two methods were used to detect putative selection signatures. The first method searched for strong recent selection signatures, for which haplotypes have been driven to complete fixation within each breed [13]. This is based on the observation that intensive selection for variants ultimately leads to a complete loss of variation within the chromosomal region that surrounds the selected variant and results in the complete fixation of the haplotype that harbours the selected variant [13]. The second method searched for loci with exceptionally high F ST owing to differential selection histories between populations, which leads to distortions in allele frequencies between populations at loci that flank the selected variants [12]. This approach is based on the fact that local positive selection tends to reduce the heterozygosity of specific loci in a population by increasing the frequency of one allele in one breed, which results in a higher proportion of between-breed than within-breed genetic variation [10].

To identify signatures of intensive recent selection within South African cattle breeds, the BovineSNP50 data were analysed separately for each breed taking into consideration that the total number of variable SNPs differed between breeds because of the ascertainment bias due to how SNP discovery is performed for the design of the BovineSNP50 assay [13]. To identify selective sweeps within each breed, a minimum number of five breed-specific contiguous monomorphic SNPs (Table 1) spanning 100 kb (UMD3.1 coordinates) and with a MAF lower than 0.01 was required. To allow for the possibility of new mutations, genotyping errors and assembly errors, which may have incorrectly assigned a SNP to a sweep, a minimum MAF of ≤0.01 was allowed [13].

Table 1 Number of animals genotyped from six breeds

To determine the appropriate number of contiguous SNPs within each breed with a MAF ≤0.01 to declare a selective sweep, a trade-off between type 1 error and the size of the detected signature was required. According to Ramey et al. [13], if 15 % of the SNPs are monomorphic within a breed (Table 1), the probability that N contiguous SNPs are monomorphic is 0.15 N under the null hypothesis of no selective sweep in the genome. For example, assuming independence, and testing of 51,406 (Afrikaner), 50,870 (Nguni), 50,389 (Drakensberger), 51,242 (Bonsmara), 50,922 (Angus) and 52,294 (Holstein) SNPs on 29 autosomes, we would expect to find 0.15 N × (52,294-29 × (N − 1)) regions where N contiguous SNPs have fixed alleles. For N = 5, this corresponds to 4.0 false positives per breed but only 0.6 false positives when N = 6. While increasing the number of contiguous monomorphic SNPs decreases the number of type 1 errors, it also increases the size of the signature that can be detected to, on average, (N − 1) × 47 kb [13]. Therefore, an intermediate balance of these conflicting constraints was chosen (Table 1) based on the idea that signatures identified in two or more breeds or any sweep that overlaps with previously reported sweeps would provide strong evidence for the existence of the sweep and these should share a common haplotype.

To identify genomic regions that have been subjected to local positive selection among South African cattle breeds, we identified regions of the genome that showed high levels of population subdivision between the breeds [10, 28] using population-specific F ST [29]. Unbiased estimates of F ST as described by Weir and Cockerham [29] were calculated using SNP Variation Suite (SVS) version 8 [30] for each of the SNPs between all (15) pairs of cattle breeds in this study. Values were interpreted using the qualitative guidelines proposed by Wright [31] where an F ST greater than 0.25 indicates very great differentiation, F ST ranging from 0.15 to 0.25 great differentiation, from 0.05 to 0.15 moderate differentiation and an F ST less than 0.05 little differentiation among the populations.

Unbiased estimates of F ST can assume negative values, which do not have a biological interpretation, thus all negative values were set to 0.0 [29]. To determine the variation in allele frequency between loci, an empirical genome distribution of F ST values for all autosomal SNPs was constructed across the breeds.

Based on the relationships between breed pairs, the most differentiated breed pairs were selected as candidate pairs for the detection of signatures of selection. Thus, the dairy Holstein was used as the control breed for the analyses on the other five beef breeds, while the Angus beef breed (British origin and less adapted to tropical regions) was used for all four tropically-adapted South African beef breeds to search for signatures of selection that may be associated with environmental adaptation.

A sliding window of five SNPs was used to compute averages for F ST and the resulting smoothed F ST values for each of the compared breed pairs were plotted against chromosomal coordinates for the central SNP in the window based on the UMD3.1 assembly using SNP Variation Suite (SVS) version 8.1 (SVS 8.1; Golden Helix Inc., Bozeman, Montana) [30]. The most differentiated regions representing the 2 % SNPs with the highest F ST (≥0.25) were identified and these were considered to be under selection.

Annotation and functional analysis of identified genomic regions

Genomic coordinates for all identified selected regions were used for the annotation of genes that were fully or partially contained within each selected region using the University of California, Santa Cruz Genome Browser [32]. The functions and pathways in which these genes are involved were assessed using Panther [33]. In addition, the Bovine QTL database available online at http://www.animalgenome.org/cgi-bin/QTLdb/BT/search was searched to identify any overlap with previously published bovine QTL within the candidate regions.

Results

Fixed haplotypes

Descriptive data characteristics such as MAF, percentage of polymorphic SNPs and Hardy–Weinberg equilibrium for the breeds under study were previously reported [23]. Table 2 shows putative selective sweeps detected within each breed, identified by detecting haplotypes that showed complete fixation.

Table 2 Potential candidate genes and previously detected QTL within detected selective sweep regions within breeds

Twenty candidate genomic regions on 13 chromosomes were identified as harbouring putative selective sweeps (Table 2). Putative signatures of selection were identified for all six breeds i.e. ranging from one region (Nguni) to six regions (Holstein) per breed. Seventeen predicted putative signatures were breed-specific and three were shared between breeds with one shared between Drakensberger and Bonsmara (BTA5) and two between Angus and Holstein (BTA10 and 16) (Fig. 1). The average size of the breed-specific sweeps was 267.54 kb, ranging from 162.16 to 530.46 kb while the average size for the common signatures was 245.86 kb, ranging from 95.94 to 448.56 kb. No common sweeps were found between the Afrikaner, Nguni and Drakensberger breeds using the method for which haplotypes were fixed.

Fig. 1
figure 1

Selective sweep regions shared between two breeds. a Bonsmara and Drakensberger. b Angus and Holstein. c Angus and Holstein

Highly differentiated genomic regions

The empirical genome-wide distribution of F ST values for all autosomal SNPs was constructed to examine variation in allele frequency between loci (Fig. 2). The distribution was highly skewed towards small F ST values. About 31 % of SNPs had an F ST less or equal to 0.05 while only 2 % had an F ST greater or equal to 0.25. This was consistent with other studies [28, 34, 35] that observed a skewed F ST distribution and agrees with the theory of selection on traits that are primarily governed by many loci of small effect [10].

Fig. 2
figure 2

Genome-wide distribution of F ST across all autosomes for all 15 breed comparisons

Using the population differentiation approach, 27 candidate genomic regions were identified as potentially under divergent selection. These regions were distributed across 14 chromosomes (Table 3) indicating that about 8.5 Mb of the sequence in these South African cattle breeds is under strong divergent selection. The average size of the candidate genomic regions under selection was 328.88 kb, with the largest region observed between the Afrikaner and Holstein breeds on BTA16 (860.14 kb) between 73,143 and 933,282 bp and the smallest region observed between the Bonsmara-Holstein pair on BTA20 (85.52 kb) between 11,932,262 and 12,017,779 bp.

Table 3 Genomic regions identified as being under divergent selection in six cattle breeds in South Africa and their associated QTL

Figure 3 shows Manhattan plots of F ST values for the comparisons between the five breeds that generated the largest number of differentiated regions. The number of F ST peaks per chromosome varied from 0 to 2 across these comparisons. Nine of these differentiated regions (BTA3, 5, 9, 16, 18, 21 and 24) were shared among breed pairs, with the Afrikaner vs. Holstein and Nguni vs. Holstein pairs sharing the most differentiated regions. The Afrikaner vs. Holstein pair had the largest number of differentiated regions (8) while the Angus vs. Holstein pair had the smallest number (2). The most strongly differentiated region was observed between the Afrikaner and Holstein breeds on BTA9 between 105,263,583 and 105,587,941 bp. Comparisons of Angus vs. Afrikaner, Nguni, Drakensberger and Bonsmara revealed a differentiated genomic region on BTA24 between 54,571,696 and 54,964,769 bp (Fig. 4), which was shared by all of the South African cattle breeds.

Fig. 3
figure 3

Smoothed F ST values for the four breed pair comparisons across the autosomal genome. a Nguni vs Holstein. b Drakensberger vs Holstein. c Bonsmara vs Holstein. d Angus vs holstein

Fig. 4
figure 4

Distribution of F ST values for four breed pair comparisons on BTA24. AFR Afrikaner, NGU Nguni, DRA Drakensberger, BON Bonsmara and ANG Angus

Functional annotation of genomic regions showing evidence of selection

Using the candidate genomic regions that were obtained from both the within- and between-breed analyses, 33 reference sequences were annotated to identify potentially expressed genes. Additional file 1: Table S1 provides full names for all annotated genes in this study. The number of candidate genes obtained per reference sequence varied from one to eight across the genomic regions. Using the Panther [33] website, several candidate genes were linked to important biological functions and pathways in cattle. For example, a region that includes the keratin gene family (KRT222, KRT24, KRT25, KRT26, and KRT27) and one heat shock protein gene (HSPB9) on BTA19 between 42,896,570 and 42,897,840 bp was found to be under selection in Nguni cattle and had previously been associated with tropical adaptation in Zebu cattle [36]. Other regions that included MTPN (Afrikaner), CYM (Afrikaner and Nguni), CDC6, CDK10, EBFI and TNS4 (Nguni), NDUFA12, ALOX15B and ALOX12B (Bonsmara) and SLC25A48 and SERPINA3-8 (Drakensberger) may have been selected due to their association with immune response. Selected regions that contain ADIPOR2 (Afrikaner), PTGS (Nguni), HOXC12, HOXC13, WC13 and OVOS2 (Drakensberger and Bonsmara) may have been selected due to the effects of these genes on reproduction, while those that contain SLC6A17 and PREP may have been selected due to the effects of these genes on fatty acid biosynthesis.

Furthermore, candidate genes related to nervous system development were also identified, for example, WNT5B, FMOD, PRELP (Afrikaner), CCR7 (Nguni) and OVOS, SLC6A17 (Bonsmara) were localized in selected regions. Candidate genes involved in enzyme regulatory activities, e.g., MYO6, RBBP8 (Bonsmara), CYM, LAX1 (Afrikaner), ATP2B (Nguni) and SLC16A4 (Drakensberger) and genes involved in growth and metabolic processes, e.g., DDX19A (Afrikaner), KCNB1, IGFBP (Nguni), TGFB1 (Drakensberger), MYO6 (Bonsmara), AJAPI (Angus) and ATOX1 (Holstein) were also identified within selected regions. Candidate genes involved in muscle organ development and skeletal development including KIAAI1797, EFHD2 (Bonsmara) and MTPN, TMEM51 (Afrikaner) were also identified as being in regions under selection. Finally, MC1R on BTA18 (between 14,757,060 and 14,758,700 bp) which has previously been associated with coat colour in cattle [37] was detected as being under selection in Nguni cattle.

All genomic regions that showed evidence of selection were further analysed to determine whether any of these overlapped with previously reported QTL in cattle. The online database of published bovine QTL revealed that most of the genomic regions overlapped with previously reported regions harbouring QTL that affect milk, fat, carcass, body weight, stature, clinical mastitis, calving ease, tick resistance, gastrointestinal nematode burden and reproductive traits (Tables 2, 3). For example, a region on BTA24 that was detected for the Afrikaner, Nguni, Drakensberger and Bonsmara breeds overlapped with a QTL region that was previously associated with gastrointestinal nematode burden.

The putative signatures of selection that were identified in this study were compared to previously detected bovine sweeps (Table 4). Ten of these candidate genomic regions were supported by previously published data on signatures of selection and clearly harbour variants of large phenotypic effect in cattle.

Table 4 Overlapping regions possessing signatures of selection detected in previous studies in cattle

Discussion

This study used two approaches to identify putative selective sweeps that could be associated with phenotypes, which contribute to domesticability, biological types (adaptation, draught, meat and milk) and to desirable morphologies that might have impacted the extent and distribution of variability within the genomes of South African cattle breeds. The first approach detected complete sweeps that indicate fixation of long haplotypes within breeds as suggested by Ramey et al. [13]. However, the effects of selection on the distribution of genetic variation can be confounded with patterns of genetic variation caused by demographic events such as the size, structure and mating pattern of a population [10]. To distinguish between the effects of selection and those of demographic events, Hayes et al. [38] suggested that the location of the detected loci should be investigated. For instance, demographic events may alter patterns of allele frequencies across the entire genome while selection events are more likely to alter allele frequencies at the loci that are in close vicinity to the mutations that are under selection [38]. In addition, fixed long homozygous haplotypes can also occur due to strong inbreeding following a founder effect [38]; however, a study by Makina et al. [23] demonstrated that the level of inbreeding was relatively low within each of the breeds studied here. Long homozygous haplotypes in breeds that were not included in the design of the BovineSNP50 assay (e.g. Nguni and Afrikaner) could have been created by chance because of the SNP ascertainment bias which would lead to lower overall average MAF for the SNPs on the assay in these breeds. To partially counter this effect, the number of loci required to declare a selective sweep, N, was defined individually for each breed (Table 1) and a larger N was required for breeds with larger numbers of monomorphic and low MAF SNPs.

LD-based methods such as the long range haplotype, extended haplotype homozygosity and integrated haplotype score approaches can be also used to identify genomic regions with unusually long haplotypes that have a high frequency in the population [39]. These approaches are useful to identify variants that have undergone a partial or incomplete selective sweep, in which a new mutation has a frequency that has risen to a modest value in the population but has yet to reach fixation [40]; however these approaches are somewhat sensitive to marker density, which was relatively low in this study. While the across-population extended haplotype homozygosity test can compare haplotype lengths between populations to control for local variation in recombination rate [41], signals of strong recent selection were analyzed within each breed.

The second approach detected genomic regions with high F ST between African and European breed pairs using sliding windows throughout the genome [14] to reveal differentiation that could result from different selection histories for production or adaptation to local environments. However, such differentiation could be caused by drift. In contrast to the first approach, the F ST approach can detect different types of selection signatures [40], which may explain why the two methods did not produce overlapping signals. One of the limitations associated with the first approach was the calibration relative to the size of the sweeps. While intensive selection in a small population can cause the rapid fixation of a long haplotype, weak selection in a large population would result in the fixation of only a short haplotype, which may not be identified with this approach [13]. Because of the requirement that each of the N contiguous loci should have a MAF less than α, for a small α, N was chosen to be sufficiently large so that the probability of observing N contiguous loci with a MAF less than α by chance alone would be very low and a sufficiently small chromosomal region was defined so that the targeted sweeps would not be smaller than 47 × (N − 1) kb, where 47 kb represents the median interval between SNPs on the BovineSNP50 assay [13]. Furthermore, the design of the BovineSNP50 assay led to lower average MAF and larger numbers of monomorphic SNPs for the Afrikaner and Nguni breeds, which are phylogenetically distant from the breeds that were used to discover the SNPs on the assay [25]. To adjust for this phylogenetic bias, N was individually defined for each breed (Table 1) and a larger N was required for breeds with larger numbers of monomorphic and low MAF SNPs. Finally, the ascertainment bias of common SNPs in the design of the BovineSNP50 assay might explain the inability to detect common sweeps among the Afrikaner, Nguni and Drakensberger breeds using the first analytical method.

Overall, this study detected 47 candidate genomic regions that are potentially either historically or currently under selection within and between six cattle breeds in South Africa. Twenty of these candidate genomic regions were detected within breeds and 27 were detected as regions that had diverged between breeds. In addition, 12 of these candidate genomic regions were shared between breeds and ten had previously been reported [13, 36, 4244]. Furthermore, no putative selection signatures were predicted to be shared across the South African (indigenous and locally developed) and Bos taurus cattle breeds (Angus and Holstein), which is probably due to the different environmental and demographic forces to which these breeds were exposed during breed formation [2].

Domestication has caused considerable changes in the morphology and behaviour of livestock species, as has artificial selection for the specific traits that were selected during breed formation and subsequently for specific breeding objectives [17]. Coat colours are easily identifiable phenotypes that probably played an important role in selection before farmers gained access to objective measurements [17]. In certain breeds, such as Nguni, colour patterns have cultural connotations and coloured hides have different economic values [1]. The melanocyte stimulating hormone receptor gene (MC1R) on BTA18 between 14,757,060 and 14,758,700 bp, which influences the production of eumelanin and pheamelanin pigment and is responsible for the pigmentation of skin, eyes and hair [45], was found to be differentially selected between Holstein and Nguni cattle but not between the South African Afrikaner (red), Drakensberger (black) or Bonsmara (red) breeds. This could be due to specific alleles at the MC1R gene that are under selection in the Nguni breed. Ramey et al. [13] observed a sweep at MC1R in Hanwoo cattle which are yellow. Furthermore, Stella et al. [43] and Flori et al. [46] reported that the MC1R gene was under selection in cattle. MC1R has been proposed to have three alleles, i.e. E D for breeds with a black coat (e.g., Holstein, Angus and Murray Grey), e for breeds with recessive red coat (e.g., Limousin, Shorthorn and Hereford) and E +, also called “wild type” for all other breeds except Hereford [47]. The dominant E D allele is responsible for black coat colour, whereas the recessive e/e genotype results in red coats. However, wild type E + E + homozygotes may display variable colour patterns, since other genes (e.g., Agouti) can influence the pigments produced [37]. The presence of a putative selection signature on MC1R in Nguni cattle, which are characterized by multi-coloured skin patterns that may present various forms (white, brown, golden yellow, black, dappled, or spotted), is of interest and suggests the existence of additional functional alleles at MC1R as was also suggested by the presence of a sweep at MC1R in yellow Hanwoo cattle [13]. Identifying the mutations that underlie these signals would allow a better understanding of the role of MC1R in coat colour patterning in cattle.

Behavioural changes such as reduction in fear and anti-predator responses and increase in sociability are believed to have been selected during domestication [48]. This study detected several putative selection signatures that could be related to the development of the nervous system as well as the regulation of a wide range of tissue and cell functions including behaviour, for example, regions harbouring WNT5B, FMOD, and PRELP (Afrikaner), CCR7 (Nguni) and OVOS, and SLC6A17 (Bonsmara). The Bovine HapMap Consortium [6] and Gautier et al. [44] also reported selection signatures in regions that contain genes associated with the nervous system of cattle.

South African cattle are farmed in regions that are characterized by periodic drought, seasonal dry periods, and nutritional shortages in the natural veld and are subjected to a variety of external and internal parasites and stock diseases [1]. A number of candidate genes and of gene families that were previously associated with one or more performance attributes of tropical adaptation [36, 44] have been selected in Nguni cattle. For example, keratin genes (KRT222, KRT24, KRT25, KRT26 and KRT27) and one heat shock protein gene (HSPB9) on BTA19 between 42,896,570 and 42,897,840 bp were found to be under selection. Heat shock proteins are differentially expressed between indicine and taurine cattle in the tropical environments of Africa and are associated with tropical adaptation in Zebu cattle [36, 44]. Keratins (heteropolymeric structural proteins) form the basis of the structural constituent of the epidermis during epidermal development. Epidermal development occurs in response to adaptation to different climatic and environmental conditions, including tick exposure [49]. In addition, keratins play a role in the formation of the hair shaft [50]. Skin colour and the thickness of the hair directly influence the thermo-tolerance of cattle that live in the tropics [51]. Nguni cattle have a smoother and shinier hair coat than European cattle breeds. Due to these characteristics, Nguni cattle regulate their body temperature and maintain cellular functions more efficiently during heat [20] and also resist better to tick infestation [19]. The absence of such signals in other local cattle breeds such as Afrikaner, Drakensberger and Bonsmara, which also display some ability to survive under extreme conditions [19] may be explained by the fact that the method based on F ST is most efficient at detecting differentiation when the region is near fixation for alternate alleles in the breeds compared [39]. Thus, while these loci may be under selection in these breeds, the desirable alleles may still have intermediate frequencies. This agrees with the results of Muchenje et al. [19] and Marufu et al. [21] who reported that Nguni cattle were more resistant to ticks and could better survive to extreme conditions than other local South African breeds.

Several candidate genes that are related to antigen recognition, which is a key process in the development of immune response were identified as being under selection in this study, and include MTPN (Afrikaner), CYM (Afrikaner and Nguni), CDC6, CDK10, KCNBI and TNS4 (Nguni), NDUFA12, ALOX15B, and ALOX12B (Bonsmara), and SLC25A48 and SERPINA3-8 (Drakensberger). The CD family of immune response genes was described by Meissener et al. [52] as being closely involved with molecular functions and pathways of the major histocompatibility complex (MHC). The TNFAIP8L2 gene has a major role in individual immune homeostasis [53] and the NDUFA12 gene that has diverging allele frequencies between taurine and Zebu cattle is associated with tick resistance. These observations are consistent with the tolerance of Afrikaner, Nguni, Drakensberger and Bonsmara cattle to various tick and parasitic diseases [19, 21]. Furthermore, candidate genomic regions that include the MTPN and PDPR (Afrikaner), DCC (Afrikaner, Nguni, Drakensberger and Bonsmara), OTX2 (Angus), DNAH2, TMEM88 and GUCY2D (Bonsmara), EBF1 (Nguni), and CXCL14 and SLC25A48 (Drakensberger) genes overlap with previously identified QTL that affect tick resistance and nematode tolerance in cattle.

Several candidate genes within the selected regions are indirectly or directly involved in reproductive pathways including spermatogenesis, ovulation rate, oestrus processes, testis development and prostaglandin development in cattle. These included OVOS2 (Bonsmara), ADIPOR2 (Afrikaner and Nguni), WC1 (Drakensberger and Bonsmara), RBBP8 (Bonsmara), SERPINA3-8, HOXC12 and HOXC13 (Drakensberger), and FBXL4 (Afrikaner and Nguni). It has been shown that all these breeds are able to reproduce under harsh environmental conditions; they are considered to be excellent dam lines for crossbreeding, with few calving difficulties [1], which supports the presence of putative selection signatures at loci involved in reproduction that probably occurred during the adaptation of these breeds to South African conditions. In addition, these regions overlap with previously reported QTL associated with reproduction in cattle.

Candidate genes related to growth and muscle development were also detected as being under selection, i.e. DDX19A, TMEM51, and MTPN (Afrikaner), IGFBP4, (Nguni), TGFB1 and KCNB1, (Drakensberger), MYO6, KIAAI1797 and EFHD2 (Bonsmara), AJAP1 (Angus), and ATOX1 (Holstein). In addition, some of these regions overlap with previously identified QTL that are associated with stature, body weight and growth in cattle. Furthermore, some of the putative selection signatures detected in this study overlap with previously reported QTL that affect milk yield and quality (BTA3, 5, 10, 16 and 23), feed efficiency (BTA13, 16 and 18), fat thickness (BTA5, 18 and 19), marbling score and carcass weight (BTA3, 5, 16, 20 and 27) as well as somatic cell count (BTA3, 5, 7, 9, 18 and 22).

The overall goal of this study was to identify candidate genomic regions targeted by selection within and between the major cattle breeds of South Africa. The fact that 12 of the identified candidate genomic regions were shared among several of the breeds analysed in this study and that 10 were validated by previous studies reduces the probability of detecting false positives [13]. False positives that could have been introduced by the SNP ascertainment bias or the LD pruning in the F ST analyses should be identified in future studies using the BovineHD BeadChip or sequence data. Results of this study provide insights into the genetic mechanisms that underlie traits of economic importance among cattle breeds in South Africa in particular with regard to adaptation to tropical and subtropical environments via increased resistance to tick and parasite-borne diseases and enhanced reproduction and production potential.

Conclusions

This study represents the first attempt to localize candidate genomic regions targeted by selection in breeds adapted to South African conditions. Several candidate genomic regions either directly or indirectly involved in tropical adaptation, immune response activation, tick and parasite resistance, production and reproduction performance were detected. Moreover, candidate selected regions that overlap with QTL reported in the cattle QTL database provide additional evidence for the significance of the detected regions under selection. This study identified candidate loci that are important for the development of South African cattle breeds and should be prioritized for functional dissection.