Background

As a consequence of over 10,000 years of domestication, migrations and natural as well as artificial selection, a wide range of phenotypically distinct cattle populations spread around the world. Several research initiatives have combined molecular marker datasets with historic and archaeological records in order to investigate the origin, history, genetic diversity, and differentiation of cattle populations (see Groeneveld et al., 2010 [1] for a review on the topic). The collected evidences suggest that domestic cattle descend from the extinct aurochs (Bos primigenius) and are divided into two distinct but interfertile species: the humpless taurine cattle (Bos taurus) and the humped indicine or zebu cattle (Bos indicus). It is accepted that taurine and zebu cattle have arisen from separate centres of domestication about 8,000 years BC in the Fertile Crescent (modern-day countries of Israel, Jordan, Lebanon, Cyprus and Syria, and parts from Egypt, Turkey, Iraq, Iran and Kuwait) and the Indus valley (current Pakistan), respectively [2, 3]. From these regions, cattle have spread throughout Europe, Asia and Africa due to the expansion of agriculture [4, 5]. Taurine cattle were imported to the American continent after 1492, mainly from Iberian importations; in the early 20th century, Indian zebu cattle were introduced in Central and South America because of their adaptability to the tropical environment.

Molecular markers have been essential to the investigation of the history and genetic differentiation of domestic cattle. Recent studies applying genome-wide single nucleotide polymorphism (SNP) markers to investigate genetic structure and differentiation in multiple cattle breeds (e.g., [69]) resolved hypotheses that were not possible to be tested by using sparse panels of molecular markers. However, markers included in the most widely used SNP panel, the Illumina® BovineSNP50 BeadChip assay (50 k), were discovered in reduced representation libraries from pooled DNA samples of six taurine breeds [10], which leads to biased estimates of genetic structure and differentiation in zebu cattle [7].

As an alternative, amplified fragment length polymorphism (AFLP) markers [11] have been used for almost two decades. Due to their random nature and high reproducibility, they have been enabling ascertainment bias-free analysis of diversity in any species since before the advent of high throughput genotyping and sequencing technologies [12]. AFLP markers are produced by digesting genomic DNA with both a rare cutter and a frequent cutter restriction enzyme, with subsequent ligation of synthetic adapters to the restriction fragments to serve as primer-binding sites, and selective amplification of subsets of the restriction fragments with primers carrying additional nucleotides at their 3’ end [13].

Although AFLP markers are highly informative [1315] and unbiased, there are few examples of the application of this type of marker in multiple breed, large-scale population differentiation analysis in cattle. Negrini et al. [16] used 81 AFLP and 19 microsatellite markers to estimate genetic distances among 51 breeds of cattle, including taurine and zebu cattle, and found that the AFLP panel could differentiate between zebu and taurine cattle better than the panel of microsatellites. Two studies in pigs [17, 18] showed the potential of AFLP to survey genetic diversity at the continental scale. Because AFLP polymorphisms are mainly (but not exclusively) based on point mutations, these markers are expected to indicate evolutionary divergence better than microsatellites with variable mutation rates. For instance, a microsatellite-based bovine phylogeny [19] was not in agreement with a phylogeny based on sequence data [20], which was not the case for an AFLP-based phylogeny [21]. Thus, AFLP appears to be a valuable complementary tool for studies of genetic diversity in cattle populations around the world.

Aiming at an unbiased view of genetic structure and differentiation between taurine and zebu cattle breeds from distinct continental areas, we compiled a worldwide multi-breed AFLP dataset. We do not intend to suggest the use of sparse panels of molecular markers over the present portfolio of high-density SNP arrays, or to interrogate their legitimacy for diversity research in cattle. Instead, we intend to propose an unbiased model of cattle differentiation which complements the assessment of genetic distance estimates obtained from molecular markers that are likely to suffer from ascertainment bias.

Methods

Sampling and molecular data

A total of 1,593 individuals were genotyped for 135 AFLP markers, representing 13 zebu and 58 taurine breeds. The presence (genotype ‘1’) or absence (genotype ‘0’) of a band was scored considering AFLP as dominant markers, and occasional faint bands were considered as missing data. These samples were obtained from 23 countries from 9 distinct continental areas: Southern Asia (3 zebu breeds), Southwestern Asia (2 taurine breeds), Eastern Europe (3 taurine breeds), Central Europe (24 taurine breeds), Northern Europe (10 taurine breeds), Southern Europe (10 taurine breeds), Western Europe (8 taurine breeds), Western Africa (7 zebu breeds and 1 taurine breed), and South America (3 zebu breeds). This dataset builds on the data reported by Negrini et al. [16] by inclusion of samples of 20 additional breeds (Table 1). Individuals or markers presenting 5% or more missing data were excluded from the study. Further details on the AFLP protocol and repeatability of the genotypes obtained can be found in Additional file 1.

Table 1 Continental areas, countries and breeds of taurine and zebu cattle sampled

Genetic distances and distance-based clustering

We used AFLPsurv v1.0[22] to calculate three different measures of pairwise genetic distances between populations: FST[23], Nei’s D [24] and Reynolds’ distance [25]. We grouped animals according to breed or continental area. The three Southern Asian breeds were excluded in the analyses for individual breeds because of their low sample size (n = 12). We used the base package in R v2.15.0[26] to perform spectral decompositions on the matrices of pairwise genetic distances between groups in order to construct low-dimensional representations of the genetic relationships among the surveyed populations. The dissimilarities between pairs of groups were captured in n-1 dimensional spaces of n observations (eigenvectors), where n is the number of groups, via classical multi-dimensional scaling (CMDS) [27]. The proportion of genetic variance explained by each eigenvector was calculated by dividing its respective eigenvalue by the sum of all eigenvalues, and expressed as percentages. Additionally, we applied the Neighbor-Net method to the distance matrices by using SPLITSTREE v4.13.1[28].

Expected heterozygosity and ancestry informative markers

With the particular interest of identifying geographical patterns in the extent of genetic diversity in the cattle breeds analyzed, we used AFLPsurv v1.0[22] to calculate expected heterozygosities for each continental area under the assumption of Hardy-Weinberg equilibrium. Essentially, the same values were obtained averaging per area over the expected heterozygosities of the separate breeds (data not shown). Additionally, we applied an ad hoc statistic to identify taurine and zebu ancestry informative markers (i.e., AFLP markers with large differences in band presence frequencies between taurine and zebu breeds). For each AFLP marker, we computed the band presence frequency across all breeds, and then calculated the mean for the pool of taurine and zebu breeds. We then calculated the difference in band presence frequency as Δf = f taurine  − f zebu . Positive and negative values indicate markers that are informative of taurine or zebu ancestry, respectively. We used thresholds of +0.55 and −0.55 to identify taurine and zebu ancestry informative AFLP markers, respectively. Finally, the average of band presence frequency of informative markers was computed for each breed in order to assess the relative level of taurine/zebu introgression across the investigated breeds.

Model-based clustering

We estimated individual ancestry coefficients as parameters of a statistical model, following the Bayesian approach implemented in STRUCTURE v2.2[29]. This is referred as the admixture model adapted for AFLP markers with independent allele frequencies (see [29, 30] for details). Briefly, it is assumed that the genomes of the sampled individuals derive from one or more of K ancestral populations, and the proportion of the individuals’ ancestry from each one of these populations is estimated via a Markov chain Monte Carlo algorithm. The assumption that the alleles are independent (i.e. linkage equilibrium) is reasonable in the present study, as the AFLP panel used is sparse and the markers are unlikely to be closely located on the genome. We applied this model from K = 1 to K = 60, and ran 5 replicates of 150,000 iterations for each analysis after a burn-in of 100,000 iterations [31].

We applied two methods to identify the most likely number of ancestral populations underlying the observed data. The first method uses the ∆K statistic described by Evanno et al. [32], which is based on the rate of change in the log-likelihood of data between successive K values. The second method was abstracted from the approaches for model selection reviewed by Johnson & Omland [33], and is based on the concept of relative likelihood. First, the Akaike Information Criterion (AIC) is calculated for each model, from K = 1 to K = 60, as follows: AIC = 2p − 2 ln(L), where p and 1n(L) are the number of parameters and the log-likelihood of the estimated model, respectively. Next, AIC differences are calculated for each model i as Δ i  = AIC i  − AICmin, where AICmin corresponds to the lowest AIC among all models; and relative likelihoods are computed as L i = e 1 / 2 Δ i . Then, relative likelihood values are normalized across all K models to produce Akaike weights ω i = L i / j = 1 K L j . These can be interpreted as the probability that the respective model is the one that presents the minimum information loss among all competing models, and was used as an alternative approach to estimate the optimal number of K.

Results

Quality control

After the exclusion of individuals exhibiting 5% or more missing genotypes, 1,470 animals remained from the initial set of 1,593 (see Table 1 for details). From a total of 135 genotyped AFLP loci, 8 were excluded due to missing data (>5%), and the final set of AFLP markers included 127 loci. As most of the analyses reported hereafter assume marker neutrality, the impact of the inclusion of putative markers under selection in all downstream analyses was evaluated. In all cases, the exclusion of candidate outlier markers resulted in no significant difference in the estimates of genetic distances and ancestry coefficients (Additional file 2). Therefore, all subsequent analyses were conducted using the entire set of 127 markers.

Genetic distance-based clustering

Different genetic distances were highly correlated (data not shown) and yielded consistent results (Additional file 3; Additional file 4: Figure S1; Additional file 5: Figure S2; Additional file 6: Figure S3; Additional file 7: Figure S4). We present the results obtained from Reynolds’ distance (Figure 1), which was shown to be insensitive to variation in the number of markers [34].

Figure 1
figure 1

Reynolds’ distance-based clustering of cattle according to continental areas. A) Continental areas sampled. Light brown = Southwestern Asia, purple = Eastern Europe, yellow = Central Europe, dark blue = Northern Europe, dark red = Southern Europe, orange = Western Europe, light green = Western Africa, dark green = Southern Asia and South America. Arrows indicate cattle migration routes. B) Classical multi-dimensional scaling plot. Circles: taurine cattle; triangles: zebu cattle. Percentages inside brackets correspond to the variance explained by each respective eigenvector. C) Neighbor-Net clustering. Nodes represent continental areas and edges are proportional to genetic distances.

The Nigerian zebu breeds Sokoto Gudali and White Fulani were the closest related populations (Reynolds’ distance = 0.005). In contrast, in spite of a possible contribution of Spanish ancestry to Brazilian cattle, Brazilian and Spanish breeds are well separated with the largest distance between Nellore and the inbred Betizu (Reynolds’ distance = 0.656).

The first two eigenvectors of the CMDS analysis of continental groups of cattle (Figure 1B) explained together 79.4% of the total genetic variance, and were centered on Southwestern Asian taurine cattle. The first eigenvector corresponds to the difference between taurine and zebu cattle with Southern Asian and South American zebu clustered together, and an intermediate position of Western African zebu cattle. The second eigenvector adds a geographical component correlating with the latitude of the region of origin of cattle populations (Figure 1A-B). The Neighbor-Net clustering method produced results similar to those found in the CMDS analysis (Figure 1C).

Model-based clustering

The log-likelihoods obtained from the admixture model with independent allele frequencies, assuming K = 1 to K = 60, were compared using ∆K and AIC weights in order to identify the most likely number of ancestral populations underlying the samples. Both ∆K and AIC weights selected the model with K = 2 as the most likely among all competing models (Additional file 8: Figure S5). Assuming the two inferred clusters approximate the founder B. taurus and B. indicus populations (Figure 2A), we found variable levels of zebu introgression across taurine cattle breeds from all continental areas, which were especially marked in Southwestern Asian taurines. While South American zebu breeds did not present evident taurine introgression, this was detected in all Western African zebu breeds.

Figure 2
figure 2

Admixture analysis of taurine and zebu cattle. A) Model-based clustering of cattle breeds under the admixture model with independent allele frequencies and 2 assumed ancestral populations (K). Each individual is represented by a vertical bar that can be partitioned into colored fragments with length proportional to cluster contribution. B) Bar plots of band presence frequencies for the set of taurine (above) and zebu (below) ancestry informative markers. Bar errors represent standard errors. See Table 1 for breed codes.

Higher K values were not supported by both ∆K and AIC weights and were not in agreement with genetic distances (data not shown). This indicates that models with K > 2 were susceptible to stochastic errors and represented poorly the underlying ancestry components of our samples. This may be due to model overfitting, by estimation of more parameters than allowed by the observed data. Hence, for our dataset, the model-based clustering analysis was limited to K = 2 due to the low number of dominant markers and estimation of unobserved genotypes.

Ancestry informative markers and expected heterozygosities

We identified 6 taurine and 5 zebu ancestry informative markers via ∆f, and calculated the average band presence frequency for these markers across all breeds (Figure 2B). We observed that the taurine markers had in Western African zebus a higher frequency of band presence than in South American zebus, and the opposite was also found for zebu markers.

We found a geographical pattern of decrease in the expected heterozygosity in taurine cattle, declining from Southwestern Asia to Western Europe and Western Africa (Additional file 9: Figure S6). Despite the limited sample size, Southern Asian zebus were estimated to be more diverse than the pools of taurine breeds. The estimate obtained for the closely related South American zebu was slightly lower than in Southwestern Asian taurines, but still higher than in European cattle. Furthermore, Southern Asian and Western African zebus exhibited the highest expected heterozygosity among all continental groups analyzed.

Discussion

The performance of AFLP technology in cattle was previously assessed and reported to produce genotyping data with an error rate equal to or less than 2% across laboratories [16], which is consistent with the repeatability of the data reported in the present study (Additional file 1). Here, we revisited the use of AFLPs to investigate the relationship among 13 zebu and 53 taurine cattle breeds. As AFLP markers are discovered as samples are genotyped, the assessment of genetic structure and differentiation reported in this article is free of ascertainment bias.

As expected, the largest genetic distances were found between zebu and taurine breeds (Additional file 3). The Bayesian-clustering analysis also highlighted that these populations descend from distinct genetic pools (Figure 2). We found a decrease of the genetic diversity correlating with geographical distance to Southwestern Asia (Additional file 9: Figure S6). This observation is in agreement with the mitochondrial DNA (mtDNA) findings of Troy et al. [35], which suggested a Southwestern Asian origin of European cattle with Anatolia or the Fertile Crescent as the most likely centre of taurine cattle domestication. Hence, the loss of diversity with increasing distance from the most plausible domestication centre as observed here is in line with the hypothesis that the ancestral taurine genetic pool was derived from the wild aurochs captured in Southwestern Asia. Apparently, any introgression from European or African aurochs was not at such a large scale that it effectively counteracted the loss of diversity during migration from Southwestern Asia.

Using sequence data of 17 genes, spanning 37 kb, Murray et al. [36] found the nucleotide and haplotype diversity in B. indicus to be higher than in B. taurus. In the present study, we also found that the expected heterozygosity in the South American zebu breeds was higher than in the European taurine breeds. Considering that the South American zebu breeds analyzed here were introduced in the American continent in the early 20th century by import of Indian animals, this finding is also consistent with a separate origin of B. indicus in South Asia.

The expected heterozygosity in Southern Asian cattle was estimated to be higher than the closely related South American breeds. Although this finding is consistent with loss of diversity during sampling and importation of animals to South America, Southern Asian cattle were represented by few samples in our dataset, and the assessment of the extent of genetic diversity in this continental group is limited. However, these results support that the B. indicus species are at least as diverse as B. taurus cattle.

The CMDS and Neighbor-Net analyses showed that zebu cattle from South America are more closely related to Southern Asian cattle than Western African zebu (Figure 1). Furthermore, except for Southern Asian zebus, Western African zebu breeds presented the highest expected heterozygosity among all continental groups. Most likely, this was due to a relatively higher level of admixture [5, 37, 38].

The closer proximity of Western African zebu to taurine cattle in the CMDS plot and in the Neighbor-Net of Reynolds’ distances also suggests that African zebus are more admixed with taurine cattle than South American zebus (Figure 1). This observation is reinforced by the model-based clustering and the ancestry informative markers analyses, where these African breeds seemed to carry substantial levels of taurine introgression (Figure 2). This may reflect that zebu cattle and taurine-zebu crossbreds in Africa resulted from crosses between taurine dams and zebu sires as shown by their taurine mtDNA haplotypes: import of zebu sires started in the 2nd millennium BC and was stimulated by the Arabian invasions in the 7th century [4, 39]. However, it is also plausible that this taurine inheritance played a role in local adaptation. For instance, trypanosomiasis is endemic in the Western Sub-Saharan region, and whereas indigenous taurines are tolerant, zebus may show variable susceptibility.

Similar crossbreeding was carried out in South America. When in the early 20th century the import of large numbers of zebu cattle to Brazil started, the indigenous herds mainly consisted of descendants from the taurine cattle imported since the late 15th century after the discovery of America. The model-based clustering analysis clearly showed a genetic composition of Brazilian zebu close to their Indian ancestors (Figure 2A-B), indicating intensive backcrossing to zebu bulls during several generations. So while mtDNA is a fingerprint of the historical origin of the herd and is probably randomly segregating [40, 41], the nuclear genome has been subject to directional selection against taurine haplotypes via backcrossing. Thus, artificial selection may have retained taurine haplotypes only if these were linked to favourable traits (e.g., weight, carcass, etc.). Applying whole genome sequence data or a high density SNP array may be useful to identify taurine haplotypes favoured by selection in these populations.

Ancestry informative markers also detected zebu introgression in the taurine gene pool (Figure 2). The highest level of introgression was found in Southwestern Asia, as previously observed with microsatellites [37]. This event likely contributed to the highest diversity that is observed in this area and, therefore, should not be attributed entirely to the vicinity of Southwestern Asian breeds to the putative B. taurus centre of domestication. A low level of admixture was also detected in Southern and Central Italian breeds, the Sicilian Cinisara and Modicana in particular, confirming a previous report [42]. The zebu admixture appears to decrease across the Alps towards Central and Western Europe with few exceptions (e.g., Aberdeen Angus). Interestingly, we confirmed the low level of B. indicus introgression in Pinzgauer breed postulated by Caroli et al. [43] on the basis of casein haplotype structure in Austria, but did not detect substantial zebu ancestry in the Piedmontese breed as previously suggested [44]. Given the limited number of ancestry informative markers (5 zebu and 6 taurine), these results are only indicative and can be confounded by stochastic variation.

Conclusions

We used AFLP markers to set an unbiased baseline for multi-breed taurine and zebu cattle genetic structure and divergence. These markers suggested that zebu breeds are at least as diverse as taurine cattle, but further investigation is needed to determine if zebu cattle is more diverse than taurine cattle. We found a gradual loss of diversity in taurine breeds departing from the domestication centre, which is consistent with previous findings. Western African zebu breeds are more genetically distant to Indian zebus than South American zebu cattle by substantial taurine introgression. Although the South American zebus also have maternal taurine introgression, most of the taurine component of the nuclear genome seems to have disappeared through backcrossing. Furthermore, the AFLP data indicated limited zebu introgression in the Italian Podolian breeds.