A genome-wide association study using international breeding-evaluation data identifies major loci affecting production traits and stature in the Brown Swiss cattle breed
- First Online:
- Cite this article as:
- Guo, J., Jorjani, H. & Carlborg, Ö. BMC Genet (2012) 13: 82. doi:10.1186/1471-2156-13-82
- 6.7k Downloads
The genome-wide association study (GWAS) is a useful approach to identify genes affecting economically important traits in dairy cattle. Here, we report the results from a GWAS based on high-density SNP genotype data and estimated breeding values for nine production, fertility, body conformation, udder health and workability traits in the Brown Swiss cattle population that is part of the international genomic evaluation program.
GWASs were performed using 50 k SNP chip data and deregressed estimated breeding values (DEBVs) for nine traits from between 2061 and 5043 bulls that were part of the international genomic evaluation program coordinated by Interbull Center. The nine traits were milk yield (MY), fat yield (FY), protein yield (PY), lactating cow’s ability to recycle after calving (CRC), angularity (ANG), body depth (BDE), stature (STA), milk somatic cell score (SCS) and milk speed (MSP). Analyses were performed using a linear mixed model correcting for population confounding. A total of 74 SNPs were detected to be genome-wide significantly associated with one or several of the nine analyzed traits. The strongest signal was identified on chromosome 25 for milk production traits, stature and body depth. Other signals were on chromosome 11 for angularity, chromosome 24 for somatic cell score, and chromosome 6 for milking speed. Some signals overlapped with earlier reported QTL for similar traits in other cattle populations and were located close to interesting candidate genes worthy of further investigations.
Our study shows that international genetic evaluation data is a useful resource for identifying genetic factors influencing complex traits in livestock. Several genome wide significant association signals could be identified in the Brown Swiss population, including a major signal on BTA25. Our findings report several associations and plausible candidate genes that deserve further exploration in other populations and molecular dissection to explore the potential economic impact and the genetic mechanisms underlying these production traits in cattle.
Genome-wide association study
Principle component analysis
Single nucleotide polymorphism
Kilo base pairs
Mega base pairs
Minor allele frequency
Quantitative trait locus
Bos taurus chromosome N
Estimated breeding values
Deregressed estimated breeding values
Lactating cow’s ability to recycle after calving
Milk somatic cell
Genome wide association studies (GWAS) have for a number of years been a useful tool for detecting genetic variants associated with complex traits in human genetics [1, 2]. With the advancement of genotyping and sequencing technology, platforms for high-density genotyping have been developed for other species as well – a development that has facilitated GWAS also in e.g. livestock [3, 4, 5, 6, 7], domesticated plants  and model organisms .
Since the seminal QTL mapping work in cattle by Georges et al. , a large number of studies have reported QTL for many different traits in various breeds. The limitation of QTL mapping in identifying the causal variants underlying the studied traits using the low to moderate dense marker maps have been discussed in depth elsewhere [11, 12]. In contrast to traditional QTL mapping strategies, GWAS opens new opportunities to make efficient use of outbred cattle populations for high-resolution mapping of loci of even modest effects underlying important production traits . A main motivation for performing GWASs in domesticated animals, like dairy cattle, is thus to discover genes, or potentially even causal mutations, contributing to the phenotype of economically important traits. Such findings could be important for improving the accuracy of breeding value estimation and also to increase our understanding of the mechanisms underlying long-term selection response in artificial breeding program.
A distinct feature of dairy cattle populations is the recent small effective population size that results from a widespread use of artificial insemination. Furthermore, dairy cattle have been subjected to long-term, intensive directional artificial selection and assortative mating scheme for milk production traits in general, and milk yield in particular. This population history has largely influenced the pattern of linkage disequilibrium (LD) in the dairy cattle breeds. For instance, the Bovine Hapmap Consortium  reported low, but nonzero, levels of LD at distances of up to 1 Mb in several dairy cattle breeds. As GWAS analyses exploit LD, it is therefore possible to perform powerful genome-wide analyses in dairy cattle with a marker density much lower than that in humans, i.e. using markers positioned every 100 kb or so . Recent GWAS studies in dairy cattle have primarily focused on the Holstein breed [3, 4, 5, 16, 17, 18], due to its widespread use across the world. Some studies have also been published in other breeds, including one of direct gestation length in Brown Swiss cattle .
A major challenge in the statistical analysis of GWAS data is the sensitivity to systematic confounding factors that might lead to false positive associations, and the primary causes of such confounding result from either population stratification or family structure . So far, several different methods to correct for systematic confounding have been proposed [21, 22, 23]. However, due to the multiple families and multiple generations in livestock GWAS datasets, the systematic confounding is rather complex. Linear mixed models have been suggested as a way to effectively correct for subpopulation and/or family structure in order to reduce the rate of false positives without too much loss in power [24, 25, 26, 27], which makes this method the approach of choice in such situations.
The Brown Swiss cattle is best known for their high milk yield and ability to produce well under challenging conditions in terms of temperature and feed. It is important in dairy production in Western Europe and North America. Since 1996, the Interbull Centre has routinely received data from Brown Swiss bulls from many countries and carried out an international breeding evaluation resulting in comparable estimated breeding values (EBVs) for these bulls across countries. In this study, we make use of the international EBVs of Brown Swiss bulls from several countries and genotypes from the Illumina Bovine SNP50 Beadchip (Illumina Inc., San Diego, USA) that were collected for use in the national and international genomic prediction effort, to perform a GWAS to identify loci contributing to several for economically important traits in Brown Swiss cattle.
Description of evaluated phenotypes
Descriptive statistics of the phenotypes (deregressed EBVs) used in the study
Pairwise Pearson phenotypic correlations
Estimation of population stratification using principle component analysis
To examine the possible population structure in the analyzed population, principal component analysis (PCA) was performed for all 7038 individuals based on the kinship matrix estimated from SNPs. Two sub-groups were identified of 6784 and 254 bulls, respectively (see Additional file 1: Figure S1). The total amount of genetic variance explained by the first two principal components was, however, low (12.01% and 1.68%), indicating that this stratification is unlikely to have any strong influence on the outcome of the GWAS analysis. This was also confirmed by further exploratory analyses showing that analyzing the two sub-classes separately produced similar results to those when the whole population was analyzed jointly (results not shown).
Genome-wide association analysis accounting for confounding
Numbers and distribution of genome-wide significant SNPs detected by EMMAX with genomic control
A peak consisting of five SNPs with genome-wide significant effects on angularity was detected at the distal region of BTA11. Four of these five SNPs were located between 87.3 Mb and 88.0 Mb. The closest gene mapping to this peak is the c-abl oncogene 1 (ABL1) (Figure 2B), but other genes including pyroglutamylated RFamide peptide (QRFP), fibrinogen C domain containing 1 (FIBCD1), laminin gamma 3 (LAMC3) were close to the signal. Other SNPs associated with angularity were also detected on BTA1, 8, 12 and 29.
Several SNPs affecting somatic cell score were located in a peak on BTA24, although only one of these (located at 31.1 Mb) reached genome wide significance. An interesting candidate microRNA (bta-mir-2380) was located close to the peak at 31.1 Mb (see Additional file 5: Figure S4A).
A signal associated with milking speed was identified on BTA6 and two SNPs, at 90.3 and 90.5 Mb, reached the genome-wide significance level. Here, the afamin (AFM or ALB2), alpha-fetoprotein (AFP or FETA_Bovine), albumin (ALB) as well as the Interleukin-8 (IL8) genes were located in the close vicinity of the peak (see Additional file 5: Figure S4B).
Genome-wide association analysis for production traits conditional on stature
Population structure and GWAS
Here, we made use of the data available from the international genetic evaluation of dairy cattle at Interbull Centre. In this way, we could include most of the Brown Swiss progeny proven bulls from seven countries in a GWAS to obtain a large, powerful sample for detecting genetic variants associated with economically important dairy cattle traits. From a population genetics point of view, this population represents a major part of the paternal genetic material in the contemporary global Brown Swiss population. Given the large population size, this study is expected to provide useful insights to important loci under selection for the main production traits within the Brown Swiss cattle population. To date, no GWAS study on milk production traits and stature in Brown Swiss population has been published, making these results of greater interest as they will allow comparisons with results on studies of other dairy breeds.
Our GWAS study is a population-based study and is then potentially sensitive to population admixture and familial relatedness caused by recent selection and/or non-random mating. It is well known that these systematic factors could lead to spurious association results, and different approaches correcting for these have been proposed and discussed [21, 22, 23]. In this study, we first used PCA analysis to examine whether population structure could be of major concern. The results indicate that the sample of Brown Swiss bulls could be divided into two groups – one large and one small. Compared to other PCA analysis results in structured populations, however, the first and second principal components here explained a much smaller proportion of the total genetic variation. Even though the genomic kinship indicated that the population could be divided into two groups, the overlap with the pedigree structure was small. Most members of the smaller group of bulls originated from Switzerland. We hypothesized that these might be from the original Brown Swiss cattle. However, a PCA analysis of all Brown Swiss cattle from Switzerland (Bapst, personal communication) showed that there is no overlap between the old Brown Swiss cattle and the smaller group in our study. Our interpretation is that the apparent confounding is a false positive result that likely arose in the PCA analysis . As these results suggested that the population structure was not of main concern, we did not exclude the individuals from the smaller group in the final GWAS analysis.
Unlike the normal pattern of confounding factors in human GWAS studies, familial relationship under intensively directional artificial selection is the chief confounding factor in domesticated animal GWAS studies. This can potentially result in genome-wide inflation of the P-values in the GWAS and result in false positives in the same way as population stratification. When high familial relatedness exists, but only slight population stratification, linear mixed models are useful for performing powerful association analyses, while at the same time reducing the false positive rate [24, 25, 26, 27]. In these analyses, a polygenic random effect term is included to represent the family structure. In our study, the analyses were performed using EMMAX, and the first set of results indicated that an intermediate or small inflation still occurred. There are several potential explanations for this. First, it might be caused by the impact of strong artificial selection influencing a smaller subset of loci, which then will be different from the general impact of subpopulation divergence on the whole genome. Such inflation will differ between different types of traits. Our results from a simple regression analysis of the data are in line with this explanation, as the inflation varied greatly between different types of traits in these analyses (data not shown). Secondly, when a number of loci have strong effects on the traits, the overall inflation of the P-values might be affected by some strong signals in the upper tails of the P values. This is something observed here, where for most traits were the strongest deviations from the expected P-values under the null-hypothesis is observed in the upper tail of the distribution. Also, the inflation was larger for production traits than for the other traits. As Brown Swiss cattle, in the same way as other dairy cattle, have been most strongly selected for milk production traits, the difference in inflation between the traits could reflect the genomic influence of selection on a limited number of major loci.
Comparison with reported results and pleiotropy for production and body size traits
As the number of GWAS studies in cattle is still limited, we have made an attempt to overlap our association signals with those of previously reported QTL. Although the complete and exact translation of bovine genetic distances into physical distances is not available, we used the information of the physical map location provided in the cattle QTLdb in animal genome database  to do these comparisons.
In our study, the main association was found on BTA25. Most of the significant SNPs were located around 1.1-1.4 Mb and affected stature, milk yield, fat yield, protein yield, lactating cow’s ability to recycle after calving and body depth. A second peak was mapped for body depth at around 33 Mb. Several earlier studies have reported overlapping QTL for milk- and protein yield on BTA25 at around 32.9- 44.0 Mb (45–69 cM) in Holstein and Finnish Ayrshire [3, 29, 30, 31, 32]. This peak thus overlapped with our association signal for body depth. No earlier milk production QTL has been reported for our major region. Given the major effect of our locus on stature it is interesting, however, to note that recently reported QTL influencing body weight in an Angus population  was located in this region (79 kb-13.3 Mb or 0.6-14 cM). Given that stature, body depth and body weight are all measures of body size, it is possible for them to be sharing underlying pathways and causal variants. Indeed, Cole et al.  report that some associated SNPs were shared by the two traits in US Holstein population. The most interesting candidate gene in this region in the NCBI and ENSEMBL databases [34, 35] is IGFAL. This gene is a serum protein that binds IGFs that regulate growth, development and other physiological process. It interacts with the growth hormones and increases their half-life and their vascular localization . Courtland et al.  have earlier reported that sex-specific effects of body- and bone size depended on IGFAL. Furthermore, IGFAL is also known to be associated with growth deficiencies in human , which further suggests IGFAL as a major candidate gene for these growth related traits . Based on the GWAS analysis for production traits conditional on the phenotypes of stature, it is reasonable that the alleles at this locus have a pleiotropic effect on the growth and milk production traits. Although part of the difference in production could be explained by the basic biological logic that bigger body size leads to higher milk yield, our results indicate that this locus also has direct effects on milk yield, fat yield and protein yield independent of stature.
To date, a total of 16 QTL affecting angularity have been identified  on 11 autosomes. A QTL for angularity in a Dutch Holstein population was detected on BTA11 at 9.70-10.66 Mb (19.4 cM) , but far away from our signal at 87.3-88.0 Mb.
Schulman et al.  reported a QTL affecting somatic cell score at between 28.8 and 30.1 Mb (35.1 cM) on BTA24 in a Finnish Ayrshire cattle population. Recently, another QTL for the same trait was detected between 30.1 and 43.0 Mb (35.5-48.8 cM) on the same chromosome in Danish Holstein cattle . Our association peak at 31.1 Mb for somatic cell score was in the same region as these QTL and also contains a microRNA (bta-mir-2380; Additional file 5: Figure S4A), which is expressed upon viral infection .
Milking speed is a workability trait that is very important to dairy producers. Cows that milk out fast require less labour in the milking hall. However, fast-milking cows may be at increased risk for mastitis . The significantly associated SNPs in the peak on BTA6 for milking speed found in this study overlapped with a previously identified QTL in three French dairy cattle breeds . Interestingly, the association signal is located close to IL8 (Additional file 5: Figure S4B), a known member of interleukin family. Further explorations of the relationship between the SNP polymorphism of IL8, milking speed and immune-response could potentially provide new insights to the biological mechanisms underlying this trait.
Here we report the results from a GWAS analysis of nine production, fertility, conformation, udder health and workability traits using data from the international breeding evaluation program for Brown Swiss cattle. 74 genome-wide significant SNPs were found to be associated with one or multiple traits using an analysis based on a linear mixed model with genomic control. A strong, pleiotropic locus affecting stature, milk yield, fat yield, protein yield, lactating cow’s ability to recycle after calving and body depth was found on BTA25. Furthermore, particularly interesting signals were found on BTA11 for angularity, BTA 24 for somatic cell score and BTA6 for milking speed. Most of these signals overlapped with earlier reported QTL for related traits in dairy and beef cattle. Several known functional candidate genes could also be identified in these regions. Our study shows the usefulness of data from international breeding evaluation for identifying genetic variants associated with complex traits and that the overlap between association and QTL signals is apparently large in cattle. Further replication studies, as well as functional dissection of the molecular mechanisms underlying the reported signals, are needed to fully understand the complexity of trait regulation. But this study provides a first important step along this path.
Genotypes and national estimated breeding values (EBVs) for a large number of Brown Swiss bulls are routinely delivered to Interbull Centre as part of the international breeding evaluation program for dairy cattle . The national EBVs are then used to calculate an international EBV for each bull that could be used in selection of sires in Brown Swiss cattle breeding programs across the world. At present, the information of 7038 Brown Swiss bulls is available at Interbull Centre. We chose to only use the progeny tested proven bulls in our genome-wide association study, as national genetic evaluation for different traits have different starting points, and different traits were measured at different time in the process of progeny testing. The sample sizes were different for the traits (n = 2061-5043; Table 1).
Adjustment of EBV
Where DEBV is de-regressed EBV, PA represents parent average, EBV is estimated breeding value, and RELdau is reliability from daughters. This conversion analysis was done using the R package . The bulls used in this study are elite bulls whose EBV has high reliability. The distribution of reliability values has a strong kurtosis and it was therefore deemed that the use of reliability values to weight the DEBV would not improve the results.
Genotype quality control
Genotype information was initially available for 44,826 SNPs on the Illumina Bovine SNP50K Beadchip. We applied the following quality control of these SNPs before conducting statistical analysis: SNPs were discarded if i) its call rate was less than 90%, ii) if its minor allele frequency (MAF) was less than 2% or iii) it departed from Hardy-Weinberg equilibrium at a threshold of p < 0.01. Individuals were also excluded from the analysis if they had more than 10% missing genotypes. There were slight differences in the outcome of the quality control between the traits as the sample of individuals for the traits differed slightly for the reason described above.
Principle component analysis
Linear mixed model GWAS analyses accounting for confounding
Where Y denotes the vector of deregressed EBVs (DEBVs), X is the vector of genotypes at the locus being tested, β is the additive fixed effect attributed to the locus, and ε is the vector of residual error with ε ~ N (0, Iσ2e), and u is the vector of the background polygenic effects with u ~ N (0, Gσ2u).
The kinship matrix, G, describes the genome-wide relatedness between the individuals and is estimated once based on the identity-in-state (IBS) of the genotyped markers. The parameters of the model σ2u and σ2e are estimated using restricted maximum likelihood (REML) for each SNP. Generalized least squares (GLS) is employed to estimate the effect β and an F-test test to test the null hypothesis H0: β =0.
Significance testing was based on Bonferroni corrected significance thresholds correcting for the number of SNP loci tested. Genomic control  was also used in order to correct for the weak inflation that still existed.
Candidate gene identification
This study was funded by a Future Research Leader Grant from the Swedish Foundation for Strategic Research and a EURYI Award from ESF to ÖC. Jiazhong Guo acknowledges the China Scholarship Council for the scholarship award for his visiting PhD studies in Swedish University of Agriculture Science. We would like to acknowledge the InterGenomics partners (Arbeitsgemeinschaft der Österreichischen BraunviehZüchter, Austria; Arbeitsgemeinschaft Deutsches Braunvieh, Germany; Associazione nazionale allevatori bovini della razza Bruna, Italy; Brown Swiss Association of the US, USA; Brune Genetique Services, France; Braunvieh Schweiz, Switzerland; and Zveza rejcev govedi rjave pasme Slovenije, Slovenia) to give us permission to use their data in this study. We also thank Xia Shen and other members in the Computational Genetics section for fruitful discussions.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.