Assessing the degree of stratification between closely related Holstein-Friesian populations
Genomic information is an important part of the routine evaluation of dairy cattle and provides the wide availability of animals genotyped using single nucleotide polymorphism (SNP) microarrays. We analyzed 2243 Polish and 2294 German Holstein-Friesian bulls genotyped using the Illumina BovineSNP50 BeadChip. For each bull, estimated breeding values (EBVs) calculated from national routine genetic evaluation were available for production traits and for somatic cell score (SCS). Separately for each population, we estimated SNP haplotypes, pairwise linkage disequilibrium (LD), and SNP effects. The SNP genetic covariance between both populations was estimated using a bivariate mixed model. The average LD was lower in the Polish than in the German population and, with increasing genomic distance, LD decays 1.7 times more rapidly in German than in Polish cattle. The comparison of SNP allele frequencies for base populations estimated separately using Polish and German data revealed a very good agreement. The comparison of genetic effects corresponding to various window lengths defined in bp emerged a systematic pattern: regardless of the length of the compared region, few significant differences were found for production traits, while many were observed for SCS. For each trait, the German population had much higher SNP variances than the Polish population and the genetic covariance estimates were all positive. Depending on traits’ inheritance mode, the additive genetic variation can be stored in many genes following the infinitesimal model (like for SCS) or distributed between genes with high effects and the polygenic “background” (like for production traits). Accounting for those differences has implications on the prospective international genomic evaluation.
KeywordsGerman and Polish Holstein-Friesian cattle Linkage disequilibrium Production traits Single nucleotide polymorphism Somatic cell score
In dairy cattle, many countries have incorporated genomic information into their genetic evaluation systems (Hayes et al. 2009; VanRaden et al. 2009) and it has become evident that the genomic information is now an important part of the routine evaluation of genetic merit of dairy cattle (Liu et al. 2010). From 2006 onwards, such programs have also been implemented in Germany and Poland.
Apart from conducting genomic evaluation on an industrial basis, such data is also a valuable source of information for geneticists, since the broad availability of single nucleotide polymorphisms (SNPs) genotyped on well-defined populations with detailed information on phenotypes, environmental factors, and pedigree provides a basis for investigating the genetic background of complex traits. Moreover, a substantial number of genome-wide association studies (GWAS) has been performed on traits routinely measured and selected in dairy cattle, which is best illustrated by 34,754 QTL and association results of 731 studies submitted to the cattleQTLdb (http://www.animalgenome.org/QTLdb/cattle, release 31). As a consequence, several genes with large effects on such traits have been identified as candidate genes (e.g., PPARGC1 and APBB2, Suchocki et al. 2013) or even in the form of causal mutations (e.g., DGAT1, Grisart et al. 2002). Currently, in genetic analyses of dairy populations, the emphasis is put on genes with intermediate additive effects and on loci of which their impact on trait variation is manifested through non-additive effects, such as dominance or epistasis (e.g., Sun et al. 2014; Kemper et al. 2016). The former are of great interest in selection programs, especially in view of the possible decrease of genetic variation attributed to major genes, while the latter are important for a better understanding of the genetics of complex traits. Unfortunately, in order to guarantee reasonable type 1 and (especially) type 2 error rates in hypothesis testing involving such genes, a large sample size is required. On one hand, very large sample sizes are currently relatively easy to obtain for dairy cattle thanks to the common use of the Illumina BovineSNP50 BeadChip in national selection programs and exchange of data between countries. On the other hand, national populations may differ in selection goals and, consequently, in biological adaptation as a response to selection, which, furthermore, affects SNP allele frequencies and linkage disequilibrium (LD) patterns (Rosenberg et al. 2010). For instance, in the German Holstein-Friesian population, selection has been based on a balanced breeding goal comprising production, reproduction, and functional traits, while in the Polish population, for many decades, the breeding emphasis had been solely put on protein and fat yields, with other, non-production, traits included in the selection index only from 2007, and a functional longevity from 2014. Therefore, it is essential to know the differences in the genomic structure of populations before considering country-specific data in a joint analysis.
The aim of the study was to use estimated breeding values (EBVs) and SNP genotypes to compare the patterns of genetic diversity between German and Polish Holstein-Friesian populations. In particular, the investigated aspects cover: (1) the comparison of LD patterns, (2) the assessment of differences in the effects of genomic regions on the selected traits, (3) the evaluation of differences in SNP allele frequencies, and (4) the estimation of the polygenic (co)variance components for production and udder health traits between German and Polish Holstein-Friesian populations.
Materials and methods
The analyzed datasets comprised 2243 Polish and 2294 German Holstein-Friesian bulls. Both groups were defined as bulls for which EBVs on a national basis were available only for one of the countries. For each bull, EBVs calculated based on the national routine genetic evaluation models and corresponding effective daughter contributions were available for milk (MY), protein (PY), and fat (FY) yields, as well as for somatic cell score (SCS). Moreover, each individual was genotyped using the Illumina BovineSNP50 BeadChip version 1. Separately for each dataset, SNPs were filtered based on two criteria: the minor allele frequency (MAF) had to be ≥ 0.01 and the call rate ≥ 90%. Only SNPs that were present in both populations after filtering were kept, resulting in a final list comprising 39,557 SNPs representing an intersection of markers in the German and Polish datasets.
Estimation of linkage disequilibrium
Estimation of SNP effects
Testing differences between populations
Estimation of SNP covariance between populations
Differences in linkage disequilibrium pattern
Differences in effects of genomic regions
Differences in SNP allele frequencies in base populations
The comparison of SNP allele frequencies for base populations estimated separately based on Polish and German data revealed a very good agreement, expressed by high Pearson correlation coefficients of 0.97, and a correspondingly very good fit of a linear regression with the estimated intercept of 0.01, slope of 0.99, and the r2 coefficient amounting to 0.95.
Covariance between populations
The pattern of estimated (co)variance components for additive SNP effects was consistent across traits. For each trait, the German population had much higher SNP variances than the Polish population, with the ratio varying between 1.8 for SCS and 2.8 for PY. Genetic covariance estimates were all positive, resulting in genetic correlations of 0.22, 0.24, 0.30, and 0.39 for MY, PY, FY, and SCS, respectively.
Our comparisons showed that, although both populations seem to have a very similar genetic background, revealed by the high similarity of allele frequencies in base populations, the differences in selection goals and pressure imposed on Polish and German Holstein-Friesian populations over generations caused a detectable degree of differentiation.
The emerging picture based on the panel of medium density SNPs exhibits LD extending over several hundreds of kilobases in both populations. This is generally consistent with the results of previous studies in cattle populations, which used a similar panel of ascertained SNPs (e.g., Banos and Coffey 2010; Qanbari et al. 2010, among others). A denser catalog of unascertained SNPs generated from whole genome re-sequencing however, revealed LD decaying at a much faster rate in cattle (Qanbari et al. 2014). Given that the magnitude of LD as measured by r2 depends on allele frequencies, the difference between the studies can be partially attributed to the biased SNPs selection on the genotyping arrays, where SNPs are ascertained non-randomly, aiming at frequent alleles and a comprehensive coverage of the genome, resulting in a uniform allele frequency spectrum. Furthermore, the assembly of large LD blocks that appeared in array-based analyses breaks into series of shorter tracts when LD is assessed from sequence data (e.g., Service et al. 2006). For further discussion on the comparison of array- vs. sequence-based LD, we refer to Qanbari et al. (2014). The faster LD decay observed in the German than in the Polish population is a consequence of a higher LD on average in that population, which is then diminished by recombination between SNPs, located further apart from one another. The strength of LD is of key importance for the genome-based analysis of population history, as it is indicative of genetic forces that a population has experienced during evolution, domestication, and selection.
The comparison of effects of genomic regions showed that, for production traits (MY, PY, and FY), the additive genetic effects are very similar in the German and Polish populations, especially for windows up to 1 Mbps in length, while an opposite picture emerges for SCS. This is supported by differences in selection programs between both countries. Production traits have long been in the focus of both Poland and Germany, indicating a similar selection pressure over generations. On the contrary, SCS has been included in the German total merit index already from 1997, while in Poland, the selection index has been enhanced by SCS only relatively recently, in 2008. This results in a strong variation pattern between the two analyzed populations.
Yet another aspect arises in the comparison of estimated genetic correlations between Poland and Germany. Polygenic-based genetic correlations published by Interbull (http://www.interbull.org/ib/maceev_archive, release December 2016) report lower estimates for production (0.84–0.90) traits than for SCS (0.96), which is in agreement with SNP-based correlations estimated in our study. Both models assume a normal distribution of genetic variation across the genome, but, in reality, for production traits, there is an accumulation of high effect SNPs (most remarkably the DGAT1 region on chromosome 14 with effects on MY and FY), which results in lower estimated country correlations despite no significant differences observed between particular genomic regions. In contrast to that, the inheritance mode of SCS appears to be of a purely polygenic nature, i.e., determined by many genes of moderate to small effects, and, thus, meets model assumptions, which is then reflected by a higher country correlation of 0.96. Note that, generally, lower SNP-based (this study) than polygenic-based (Interbull) correlations are due to no common individuals between the German and Polish pedigrees used in this study.
The comparison of Polish and German Holstein-Friesian populations showed that observed differences in the estimated effects of genomic regions depend on differences in the linkage disequilibrium (LD) pattern between populations and on traits’ inheritance mode. Accounting for such differences has direct implications on the prospective international genomic evaluation based on across-country single nucleotide polymorphism (SNP) effect estimation. Therefore, a proposed option would be the use of a cumulated/averaged effect of SNP groups binned by their genomic location (bp) or, preferentially, by LD, instead of single SNP estimates in the SNP MACE model.
Compliance with ethical standards
This study was funded by the Polish National Science Centre grant no. 2015/19/B/NZ9/03725 and did not involve research in animals. SNP data for the Polish population were obtained within the frame of the MASinBULL project.
Conflict of interest
The authors declare that they have no financial or other conflicts of interest in relation to this research and its publication.
- Grisart B, Coppieters W, Farnir F, Karim L, Ford C, Berzi P, Cambisano N, Mni M, Reid S, Simon P, Spelman R, Georges M, Snell R (2002) Positional candidate cloning of a QTL in dairy cattle: Identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition. Genome Res 12:222–231. https://doi.org/10.1101/gr.224202 CrossRefPubMedGoogle Scholar
- Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, GuelphGoogle Scholar
- Kemper KE, Littlejohn MD, Lopdell T, Hayes BJ, Bennett LE, Williams RP, Xu XQ, Visscher PM, Carrick MJ, Goddard ME (2016) Leveraging genetically simple traits to identify small-effect variants for complex phenotypes. BMC Genomics 17:858. https://doi.org/10.1186/s12864-016-3175-3 CrossRefPubMedPubMedCentralGoogle Scholar
- Liu Z, Seefried F, Reinhardt F, Thaller G, Reents R (2010) Dairy cattle genetic evaluation enhanced with genomic information. In: Proceedings of the 9th World Congress on Genetics Applied to Livestock Production (WCGALP), Leipzig, Germany, August 2010Google Scholar
- Service S, DeYoung J, Karayiorgou M, Roos JL, Pretorious H, Bedoya G, Ospina J, Ruiz-Linares A, Macedo A, Palha JA, Heutink P, Aulchenko Y, Oostra B, van Duijn C, Jarvelin MR, Varilo T, Peddle L, Rahman P, Piras G, Monne M, Murray S, Galver L, Peltonen L, Sabatti C, Collins A, Freimer N (2006) Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nat Genet 38:556–560. https://doi.org/10.1038/ng1770 CrossRefPubMedGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.