Theoretical and Applied Genetics

, Volume 128, Issue 4, pp 693–703 | Cite as

Shrinkage estimation of the genomic relationship matrix can improve genomic estimated breeding values in the training set

  • Dominik Müller
  • Frank Technow
  • Albrecht E. Melchinger
Original Paper

Abstract

Key message

We evaluated several methods for computing shrinkage estimates of the genomic relationship matrix and demonstrated their potential to enhance the reliability of genomic estimated breeding values of training set individuals.

Abstract

In genomic prediction in plant breeding, the training set constitutes a large fraction of the total number of genotypes assayed and is itself subject to selection. The objective of our study was to investigate whether genomic estimated breeding values (GEBVs) of individuals in the training set can be enhanced by shrinkage estimation of the genomic relationship matrix. We simulated two different population types: a diversity panel of unrelated individuals and a biparental family of doubled haploid lines. For different training set sizes (50, 100, 200), number of markers (50, 100, 200, 500, 2,500) and heritabilities (0.25, 0.5, 0.75), shrinkage coefficients were computed by four different methods. Two of these methods are novel and based on measures of LD, the other two were previously described in the literature, one of which was extended by us. Our results showed that shrinkage estimation of the genomic relationship matrix can significantly improve the reliability of the GEBVs of training set individuals, especially for a low number of markers. We demonstrate that the number of markers is the primary determinant of the optimum shrinkage coefficient maximizing the reliability and we recommend methods eligible for routine usage in practical applications.

Notes

Conflict of interest

The authors declare no conflict of interest associated with this study.

Ethical standards

The authors declare that ethical standards are met, and all the experiments comply with the current laws of the country in which they were performed.

Supplementary material

122_2015_2464_MOESM1_ESM.pdf (2.7 mb)
Supplementary material 1 (pdf 2787 KB)

References

  1. Astle W, Balding DJ (2009) Population structure and cryptic relatedness in genetic association studies. Stat Sci 24(4):451–471. doi: 10.1214/09-STS307. http://projecteuclid.org/euclid.ss/1271770342, arXiv:1010.4681v1
  2. Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in Maize. Crop Sci 47(3):1082. doi: 10.2135/cropsci2006.11.0690. https://www.crops.org/publications/cs/abstracts/47/3/1082
  3. de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2), pp. 327–45. doi: 10.1534/genetics.112.143313. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3567727&tool=pmcentrez&rendertype=abstract
  4. Dekkers JCM (2007) Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet 124(6):331–41. doi: 10.1111/j.1439-0388.2007.00701.x. http://www.ncbi.nlm.nih.gov/pubmed/18076470
  5. Endelman JB (2011) Ridge regression and other kernels for genomic selection with R Package rrBLUP. Plant Genome J 4(3):250. doi: 10.3835/plantgenome2011.08.0024. https://www.crops.org/publications/tpg/abstracts/4/3/250
  6. Endelman JB, Jannink JL (2012) Shrinkage estimation of the realized relationship matrix. G3 2(11):1405–13. doi: 10.1534/g3.112.004259. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3484671&tool=pmcentrez&rendertype=abstract
  7. Frisch M, Melchinger AE (2007) Variance of the parental genome contribution to inbred lines derived from biparental crosses. Genetics 176(1):477–88, doi: 10.1534/genetics.106.065433. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1893034&tool=pmcentrez&rendertype=abstract
  8. Goddard ME, Wray NR, Verbyla K, Visscher PM (2009) Estimating effects and making predictions from genome-wide marker data. Stat Sci 24(4):517–529. doi: 10.1214/09-STS306. http://projecteuclid.org/euclid.ss/1271770346, arXiv:1010.4710v1
  9. Goddard ME, Hayes BJ, Meuwissen THE (2011) Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet 128(6):409–21, doi: 10.1111/j.1439-0388.2011.00964.x. http://www.ncbi.nlm.nih.gov/pubmed/22059574
  10. Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177(4):2389–97. doi: 10.1534/genetics.107.081190. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2219482&tool=pmcentrez&rendertype=abstract
  11. Habier D, Fernando RL, Garrick DJ (2013) Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics 194(3):597–607. doi: 10.1534/genetics.113.152207. http://www.ncbi.nlm.nih.gov/pubmed/23640517
  12. Hayes B, Goddard M (2010) Genome-wide association and genomic selection in animal breeding. Genome 53(11): 876–83. doi: 10.1139/G10-076. http://www.ncbi.nlm.nih.gov/pubmed/21076503
  13. Hayes BJ, Bowman PJ, Chamberlaina J, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92(2):433–43. doi: 10.3168/jds.2008-1646. http://www.ncbi.nlm.nih.gov/pubmed/19164653
  14. Henderson CR (1973) Sire evaluation and genetic trends. J Anim Sci, pp 10–41Google Scholar
  15. Hill W, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38(6):226–231. http://link.springer.com/article/10.1007/BF01245622
  16. Hill WG (2010) Understanding and using quantitative genetic variation. Philos Trans R Soc Lond Ser B Biol Sci 365(1537);73–85. doi: 10.1098/rstb.2009.0203. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2842708&tool=pmcentrez&rendertype=abstract
  17. Hill WG, Weir BS (2011) Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet Res 93(1):47–64. doi: 10.1017/S0016672310000480. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3070763&tool=pmcentrez&rendertype=abstract
  18. Kang HM, Zaitlen Na, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178(3):1709–23. doi: 10.1534/genetics.107.080101. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2278096&tool=pmcentrez&rendertype=abstract
  19. Lande R, Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124(3):743–56. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1203965&tool=pmcentrez&rendertype=abstract
  20. Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, 1st edn. Sinauer Associates, SunderlandGoogle Scholar
  21. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829. http://www.genetics.org/content/157/4/1819.abstract
  22. Montana G (2005) HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. Bioinformatics (Oxford, England) 21(23): 4309–11, doi: 10.1093/bioinformatics/bti689. http://www.ncbi.nlm.nih.gov/pubmed/16188927
  23. Powell JE, Visscher PM, Goddard ME (2010) Reconciling the analysis of IBD and IBS in complex trait studies. Nat Rev Genet 11(11): 800–5. doi: 10.1038/nrg2865. http://www.ncbi.nlm.nih.gov/pubmed/20877324
  24. R Core Team (2014) R: a language and environment for statistical computing. http://www.r-project.org/
  25. Riedelsheimer C, Melchinger AE (2013) Optimizing the allocation of resources for genomic selection in one breeding cycle. TAG Theoret Appl Genet 126(11):2835–48. doi: 10.1007/s00122-013-2175-9. http://www.ncbi.nlm.nih.gov/pubmed/23982591
  26. Riedelsheimer C, Technow F, Melchinger AE (2012) Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC genomics 13(1):452. doi: 10.1186/1471-2164-13-452. http://www.mendeley.com/research/comparison-of-whole-genome-prediction-models-for-traits-with-contrasting-genetic-architecture-in-a-d-1/
  27. Searle SR, Casella G, McCulloch CE (1992) Variance components, 1st edn. Wiley-Interscience, HobokenCrossRefGoogle Scholar
  28. Smith JSC, Hussain T, Jones ES, Graham G, Podlich D, Wall S, Williams M (2008) Use of doubled haploids in maize breeding: implications for intellectual property protection and genetic diversity in hybrid crops. Mol Breed 22(1):51–59. doi: 10.1007/s11032-007-9155-1. http://link.springer.com/10.1007/s11032-007-9155-1
  29. Technow F (2013) hypred: simulation of genomic data in applied genetics. http://cran.r-project.org/web/packages/hypred/
  30. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–23. doi: 10.3168/jds.2007-0980. http://www.ncbi.nlm.nih.gov/pubmed/18946147
  31. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden Pa, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–9. doi: 10.1038/ng.608. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3232052&tool=pmcentrez&rendertype=abstract

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.University of HohenheimStuttgartGermany
  2. 2.DuPont PioneerJohnstonUSA

Personalised recommendations