The impact of genomic relatedness between populations on the genomic estimated breeding values

Ma, Peipei; Huang, Ju; Gong, Weijia; Li, Xiujin; Gao, Hongding; Zhang, Qin; Ding, Xiangdong; Wang, Chonglong

doi:10.1186/s40104-018-0279-4

The impact of genomic relatedness between populations on the genomic estimated breeding values

Short report
Open access
Published: 16 August 2018

Volume 9, article number 64, (2018)
Cite this article

Download PDF

You have full access to this open access article

Journal of Animal Science and Biotechnology Aims and scope Submit manuscript

The impact of genomic relatedness between populations on the genomic estimated breeding values

Download PDF

Peipei Ma^1,2,
Ju Huang¹,
Weijia Gong³,
Xiujin Li¹,
Hongding Gao⁴,
Qin Zhang¹,
Xiangdong Ding¹ &
…
Chonglong Wang⁵

2392 Accesses
6 Citations
3 Altmetric
1 Mention
Explore all metrics

Abstract

In genomic selection, prediction accuracy is highly driven by the size of animals in the reference population (RP). Combining related populations from different countries and regions or using a related population with large size of RP has been considered to be viable strategies in cattle breeding. The genetic relationship between related populations is important for improving the genomic predictive ability. In this study, we used 122 French bulls as test individuals. The genomic estimated breeding values (GEBVs) evaluated using French RP, America RP and Chinese RP were compared. The results showed that the GEBVs were in higher concordance using French RP and American RP compared with using Chinese population. The persistence analysis, kinship analysis and the principal component analysis (PCA) were performed for 270 French bulls, 270 American bulls and 270 Chinese bulls to interpret the results. All the analyses illustrated that the genetic relationship between French bulls and American bulls was closer compared with Chinese bulls. Another reason could be the size of RP in China was smaller than the other two RPs. In conclusion, using RP of a related population to predict GEBVs of the animals in a target population is feasible when these two populations have a close genetic relationship and the related population is large.

Comparing genomic prediction accuracy from purebred, crossbred and combined purebred and crossbred reference populations in sheep

Article Open access 30 September 2014

Breed of origin of alleles and genomic predictions for crossbred dairy cows

Article Open access 06 November 2021

Impact of QTL properties on the accuracy of multi-breed genomic prediction

Article Open access 08 May 2015

Short communication

Since genomic selection (GS) was first described by Meuwissen et al. [1], with the constantly decreasing genotyping cost, this technology has revolutionized breeding of both livestock and crops in the last few years. The size of reference population (RP) and the relationship between the reference and candidate population were reported to be the important factors affecting accuracy of genomic prediction [2,3,4].

The advantage of using GS has been limited due to limited size of RP. Firstly, a low number of progeny-test proven bulls were available in each country especially in countries which mainly relied on importing bull semen from the other countries, e.g. China [5]. Secondly, it is not economically feasible to genotype all the animals as RP since the contribution of the cows may be less than the cost for genotyping them [6]. To gain accuracy of GEBV, two strategies were used in practice. One strategy is to combine the reference data from several countries. The other one is to use the RP from a commercial institute e.g. CDCB (https://www.uscdcb.com/what-we-do/genomics). However, it was reported that the relationship between RP and candidate individual was a crucial factor for prediction accuracy in genomic prediction [7, 8]. Therefore, it is necessary to investigate the relationship between populations before applying these strategies.

The objectives of this study were 1) to investigate the correlations between genomic estimated breeding values (GEBVs) for French bulls using Chinese, French and American RP separately; 2) to explore the reasons led to different GEBVs, by analyzing the linkage disequilibrium (LD) phase persistence, genetic relatedness, and population structure among French, American and Chinese populations.

Materials and methods

Data

A total of 122 French bulls were used as test set in this study. The GEBVs of milk yield, fat percentage, protein percentage, confirmation and feet_legs evaluated using American RP and French RP separately was provided by Gènes DIFFSUION. The GEBVs of these 122 French bulls using Chinese RPs were estimated in this study. The Chinese RP consisted of 1,568 Chinese cows with both genotype and phenotype. De-regressed proof (DRP) was used as the response variable for genomic prediction in this study. Genotypes of 270 French bulls, 270 American bulls and 270 Chinese bulls were used to compare the relationship among three populations. These 270 French bulls were the progenies of the imported French bulls and cows. So did the American bulls. The Chinese bulls were randomly selected from the native population. All the animals were genotyped using Illumina Bovine SNP50 BeadChip (Illumina, San Diego, CA, USA). After deleting SNPs with a minor allele frequency smaller than 0.01, 45,404 SNPs on 29 autosomes were retained.

Model

GBLUP [9] was used for prediction of GEBV using Chinese RP. The model is as follows:

$$ \boldsymbol{y}=\mathbf{1}\mu +\mathbf{Zg}+\mathbf{e} $$

where y is a vector of DRP from Chinese population, μ is the overall mean, g is a vector of GEBV, 1 is a vector of ones, Z is the design matrix for linking g to y, and e is a vector of the random residuals. Random effects were assumed to be normally distributed as g~N(0,$ \mathbf{G}{\sigma}_g^2 $) and e~N(0,$ {\mathbf{I}\sigma}_e^2 $),where $ {\sigma}_g^2 $is the additive genetic variance, $ {\sigma}_e^2 $ is the residual variance, G is the genomic relationship matrix constructed with all the markers using the formula G = MM^′/ ∑ 2p_i(1 − p_i) [9]. The genotypes in M were coded as 0, 1, and 2 for A₁A₁, A₁A₂ and A₂A₂ and then centralized by subtracting 2p_i [9], where p_i was the allele frequency of A₂ and was calculated based on the genotypes from the individuals used in the model. DMU package [10] was used to estimate variance components and obtain solutions of the mixed model equations.

Validation of genomic predictive ability

The Spearman’s rank correlation coefficient between GEBVs predicted using different RPs was used as a measurement of concordance of GEBVs. The correlation coefficient between GEBVs evaluated from Chinese RPs and from French RPs was named as COR_CF. Accordingly, COR_CA was used for that between Chinese RPs and American RPs and COR_FA for that between French RPs and American RPs.

The measurement of relatedness between different populations

To examine the genetic relatedness between different RPs, three measurements of genetic distance were performed for 270 French bulls, 270 American bulls and 270 Chinese bulls: 1) the persistence of LD phase between two populations. It was calculated as the correlation of linkage disequilibrium (r²) of adjacent marker pairs on each autosome [11, 12]. The persistence of LD phase between each pair of these three populations was named PER_CF, PER_CA, PER_FA. 2) the number of pair of related individuals between different populations. All pair-wise relationship can be classified as monozygotic twins, 1^st-, 2^nd- or 3^rd- degree relatives by the estimation of kinship coefficients using Kinship-based INference for Genome-wide association study (KING) software package [13]. 3) the principal components (PCs) of marker genotype data. Principal components analysis (PCA) was performed on genotype using KING [13]. We used the plot of PC2 against PC1 as the description of genetic similarity among three populations.

Results

The comparison of genomic prediction using different RPs

The spearman’s rank correlation coefficient between GEBVs using RP from two of three countries is shown in Table 1. For all traits, the correlation between GEBVs using French RP and using American RP (COR_FA) is much larger than the correlation between GEBVs using Chinese RP and using American RP (COR_CA) or French RP (COR_CF). COR_FA for fat percentage achieved the highest (0.862) while COR_CA for milk yield was the lowest (0.060). COR_CF ranged from 0.133 (for feet_legs) to 0.442 (for conformation). COR_CA was similar as COR_CF and ranged from 0.060 (for milk yield) to 0.420 (for protein percentage). COR_FA was much larger than COR_CF and COR_CA and ranged from 0.472 (for feet_legs) to 0.862 (for fat percentage).

Table 1 Spearman’s rank correlation coefficient between GEBVs evaluated using different RP

Full size table

The plot of GEBVs of milk yield using different RPs is presented in Fig. 1. The trends of GEBVs using American RP and French RP are similar while the trends of GEBVs of using Chinese RP are relative different from the GEBVs using the other two RPs.

LD and persistence of LD phase

The LD of each chromosome from each population and persistence of LD phase (PER) between populations are shown in Table 2. The mean r² of adjacent SNP pairs within each chromosome ranged from 0.13 (chromosomes 27 and 28) to 0.19 (chromosomes 6 and 14) for Chinese RP, 0.14 (chromosomes 27 and 28) to 0.20 (chromosomes 6 and 14) in both France and USA RPs. The mean r² across all chromosomes were 0.16 in China and 0.17 in France and USA. The persistence of LD phase between France and USA RPs was apparently higher than that between China and the other two countries. The PER_CF ranged from 0.893 of chromosome 28 to 0.959 of chromosome 14. The PER_CA ranged from 0.931 of chromosome 9 to 0.973 of chromosome 14. The PER_FA ranged from 0.942 of chromosome 19 to 0.974 of chromosome 29.

Table 2 Linkage disequilibrium (LD) of adjacent markers for each Bos Taurus autosome (BTA)

Full size table

The kinship coefficients and classification of all pair-wise relationship

The number of pairs of related individuals in each relationship group which was determined by KING software was listed in Table 3. There was 1 pair of individuals in 1^st-degree, 1 in 2^nd-degree and 596 in 3^rd-degree based on genomic relationship between Chinese population and French population. Based on genomic relationship between Chinese population and American population, there were 2 pairs of individuals in 1^st-degree, 0 in 2^nd-degree and 1,174 in 3^rd-degree. Compared with genomic relationship between Chinese population and French population or American population, there were much more pairs of individuals in 1^st, 2^nd and 3^rd degrees based on genomic relationship between French population and American population, which meant there were more related individuals in these two populations.

Table 3 The number of pairs of related individuals between different populations

Full size table

The principal component analysis (PCA)

Figure 2 illustrates that the relationship between French population and American population was closer than the relationship between them and Chinese population.

Discussion

In this study, we investigated the difference on GEBVs for French Holstein bulls using references from different countries. The genomic relatedness between different populations were investigated to illustrate the results. The results showed that the correlation between GEBVs estimated using French RP and using American RP was higher than the correlation between GEBVs estimated using Chinese RP and French/American RP. The LD phase persistence analysis, kinship coefficients and the PCA showed that the relationship between French population and American population was closer than that between Chinese population and American or French population.

For combined RP, a close relationship between populations reflects a similar LD structures among populations which enabled the joint prediction feasible. Lund et al. [14] used European Holsteins as joint reference to predict Nordic Holstein, Dutch Holstein, French Holstein and German Holstein and found reliability improved by up to 10% compared with using separate RP. A joint Nordic Red dairy cattle RP was intended to improve the accuracy of genomic prediction in the previous study [15]. However, the results showed that the prediction for Swedish and Finnish population was improved slightly when the Danish Red dairy cattle were added into the RP since the relationship between Finnish Red and Swedish Red was closer compared with the relationship between Danish Red and the other two populations [15]. Similar pattern was observed when G matrix was used to measure the relationship among three countries in our study and the report from Brøndum et al. [15]. Higher related individuals were observed between Swedish and Finnish Red in their study and between American population and French population in our study. It is consisted with the conclusion from previous studies that the prediction ability was improved by including related individuals in the RP [16, 17]. The average of kinship among individuals from different countries was calculated, and the results showed the average relationship of any two countries was similar with the others (data not shown). One of the reasons could be that too many small values diluted the close relationship, which illustrated that the average of kinship matrix was not suitable as the criterion to measure the relationship between populations.

Another reason leading to the spearman’s rank correlation coefficient between GEBVs using Chinese RP and using American/French RP smaller than the other two correlations could be that the size of RP was different. Since Chinese RP in this study only included 1568 individuals, which may be not as informative as proven bulls from the other two countries. The combined RP between Nordic Holstein population, which is one member of Eurogenomics, and Chinese Holstein population had been utilized to investigate the improvement of reliability of genomic prediction in previous studies [5, 18]. The results showed the reliability of genomic prediction for Chinese population was improved greatly while little improvement for Nordic population [5]. Therefore, the size of RP should be considered when joint-population prediction was conducted besides taking the relationship between different populations into account. There is possibility to improve the genomic prediction ability for populations with a small number of RP even if the relationship between the added population and target population is distant.

Conclusions

Information from the other related populations was applied to improve the predictive ability. However, our results showed that the GEBVs were in different rank when a loose related population was used as RP. Integrating results from previous studies, we concluded that it was feasible to predict the GEBVs of a target population using RP of a related population in the condition that there was a close genetic relationship between these two populations and the size of related population is large.

Abbreviations

DRP:: De-regressed proof
GEBVs:: Genomic breeding values
LD:: Linkage disequilibrium
PCA:: Principal component analysis

References

Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.
PubMed PubMed Central CAS Google Scholar
Goddard ME, Hayes BJ. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet Nature Publishing Group. 2009;10:381–91.
Article CAS Google Scholar
Gao H, Christensen OF, Madsen P, Nielsen US, Zhang Y, Lund MS, et al. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population. Genet Sel Evol. 2012;44:8.
Article PubMed PubMed Central CAS Google Scholar
Pszczola M, Strabel T. Mulder H a, Calus MPL. Reliability of direct genomic values for animals with different relationships within and to the reference population. J Dairy Sci. 2012;95:389–400.
Article PubMed CAS Google Scholar
Zhou L, Ding X, Zhang Q, Wang Y, Lund MS, Su G. Consistency of linkage disequilibrium between Chinese and Nordic Holsteins and genomic prediction for Chinese Holsteins using a joint reference population. Genet Sel Evol. 2013;45:7.
Article PubMed PubMed Central Google Scholar
Pryce J, Hayes B. A review of how dairy farmers can use and pro fit from genomic technologies. Anim Prod Sci. 2012;52:180–4.
Article CAS Google Scholar
Gao H, Su G, Janss L, Zhang Y, Lund MS. Model comparison on genomic predictions using high-density markers for different groups of bulls in the Nordic Holstein population. J Dairy Sci. 2013;96:4678–87.
Article PubMed CAS Google Scholar
Habier D, Tetens J, Seefried F-R, Lichtner P, Thaller G. The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol. 2010;42:5.
Article PubMed PubMed Central CAS Google Scholar
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.
Article PubMed CAS Google Scholar
Madsen P, Sørensen P, Su G, Damgaard LH, Thomsen H, Labouriau, R. DMU - a package for analyzing multivariate mixed models. In: Proceedings of the 8th World Congress on Genetics Applied to Livestock Production. Minas Gerais: Instituto Prociência. 2006;11–27.
Sargolzaei M, Schenkel FS, Jansen GB. Schaeffer LR. Extent of Linkage Disequilibrium in Holstein Cattle in North America. 2008:2106–17.
de Roos APW, Hayes BJ, Spelman RJ, Goddard ME. Linkage disequilibrium and persistence of phase in Holstein–Friesian. Jersey and Angus Cattle Genet. 2008;179:1503–12.
CAS Google Scholar
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–73.
Article PubMed PubMed Central CAS Google Scholar
Lund MS, Roos APW De, Vries AG De, Druet T, Ducrocq V, Guillaume F, et al. Improving genomic prediction by EuroGenomics collaboration. Proc WCGALP 2010, Leipzig. 2010;7–10.
Google Scholar
Brøndum RF, Rius-Vilarrasa E, Strandén I, Su G, Guldbrandtsen B, Fikse WF, et al. Reliabilities of genomic prediction using combined reference data of the Nordic red dairy cattle populations. J Dairy Sci. 2011;94:4700–7.
Article PubMed CAS Google Scholar
Habier D, Fernando RL, Dekkers JCM. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–97.
PubMed PubMed Central CAS Google Scholar
Wu X, Lund MS, Sun D, Zhang Q, Su G. Impact of relationships between test and training animals and among training animals on reliability of genomic prediction. J Anim Breed Genet. 2015;132:366–75.
Article PubMed CAS Google Scholar
Ma P, Lund MS, Ding X, Zhang Q, Su G. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population. J Anim Breed Genet. 2014;131:462–72.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

The authors are grateful to the National Natural Science Foundation of China, China Agriculture Research System, Changjiang Scholar and Innovation Research Team in University and Anhui Science and Technology for their support. We also greatly appreciate the very diligent work by the two anonymous reviewers and the associate editor. The comments and suggestions give a great contribution to the improvement of this manuscript.

Funding

This research was supported by the earmarked fund for China Agriculture Research System (CARS-36), the National Natural Science Foundation of China (31671327, 31701077, 31371258), the Program for Changjiang Scholar and Innovation Research Team in University (Grant No. IRT1191), Anhui Science and Technology Key Project (17030701008), Anhui Academy of Agricultural Sciences Key Laboratory Project (18S0404).

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due to data subject to Dairy Association of China but are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
Peipei Ma, Ju Huang, Xiujin Li, Qin Zhang & Xiangdong Ding
Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, 200240, China
Peipei Ma
Gènes DIFFSUION, 3595 Route de Tourna, 59500, Douai, France
Weijia Gong
Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
Hongding Gao
Key Laboratory of Pig Molecular Quantitative Genetics, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei, 230031, China
Chonglong Wang

Authors

Peipei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ju Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weijia Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xiujin Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongding Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Ding
View author publications
You can also search for this author in PubMed Google Scholar
Chonglong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

PM, XL, HG, QZ, XD, CW conceived and designed this study. PM, JH, XL,WG did the analysis. PM, JH, HG contributed to the writing of manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiangdong Ding or Chonglong Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Ma, P., Huang, J., Gong, W. et al. The impact of genomic relatedness between populations on the genomic estimated breeding values. J Animal Sci Biotechnol 9, 64 (2018). https://doi.org/10.1186/s40104-018-0279-4

Download citation

Received: 06 April 2018
Accepted: 19 July 2018
Published: 16 August 2018
DOI: https://doi.org/10.1186/s40104-018-0279-4

The impact of genomic relatedness between populations on the genomic estimated breeding values

Abstract

Similar content being viewed by others

Comparing genomic prediction accuracy from purebred, crossbred and combined purebred and crossbred reference populations in sheep

Breed of origin of alleles and genomic predictions for crossbred dairy cows

Impact of QTL properties on the accuracy of multi-breed genomic prediction

Short communication

Materials and methods