Theoretical and Applied Genetics

, Volume 127, Issue 3, pp 749–762 | Cite as

The impact of population structure on genomic prediction in stratified populations

  • Zhigang Guo
  • Dominic M. Tucker
  • Christopher J. Basten
  • Harish Gandhi
  • Elhan Ersoz
  • Baohong Guo
  • Zhanyou Xu
  • Daolong Wang
  • Gilles Gay
Original Paper


Key message

Impacts of population structure on the evaluation of genomic heritability and prediction were investigated and quantified using high-density markers in diverse panels in rice and maize.


Population structure is an important factor affecting estimation of genomic heritability and assessment of genomic prediction in stratified populations. In this study, our first objective was to assess effects of population structure on estimations of genomic heritability using the diversity panels in rice and maize. Results indicate population structure explained 33 and 7.5 % of genomic heritability for rice and maize, respectively, depending on traits, with the remaining heritability explained by within-subpopulation variation. Estimates of within-subpopulation heritability were higher than that derived from quantitative trait loci identified in genome-wide association studies, suggesting 65 % improvement in genetic gains. The second objective was to evaluate effects of population structure on genomic prediction using cross-validation experiments. When population structure exists in both training and validation sets, correcting for population structure led to a significant decrease in accuracy with genomic prediction. In contrast, when prediction was limited to a specific subpopulation, population structure showed little effect on accuracy and within-subpopulation genetic variance dominated predictions. Finally, effects of genomic heritability on genomic prediction were investigated. Accuracies with genomic prediction increased with genomic heritability in both training and validation sets, with the former showing a slightly greater impact. In summary, our results suggest that the population structure contribution to genomic prediction varies based on prediction strategies, and is also affected by the genetic architectures of traits and populations. In practical breeding, these conclusions may be helpful to better understand and utilize the different genetic resources in genomic prediction.


Quantitative Trait Locus Population Structure Single Nucleotide Polymorphism Genetic Gain Genomic Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors of the current manuscript would like to thank researchers and institutions who contributed to the development of the rice and maize diversity panels. In addition, the authors would like to express gratitude to the editor and three anonymous reviewers for their detailed input in assessment and improvement of the manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

122_2013_2255_MOESM1_ESM.pdf (372 kb)
Supplementary material 1 (PDF 371 kb)


  1. Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350PubMedCrossRefGoogle Scholar
  2. Bastiaansen J, Coster A, Calus M, Van Arendonk J, Bovenhuis H (2012) Long-term response to genomic selection: effects of estimation method and reference population structure for different genetic architectures. Genet Sel Evol 44:3PubMedCentralPubMedCrossRefGoogle Scholar
  3. Beavis WD (1994) QTL analysis: power, precision and accuracy. In: Paterson AH (ed) Molecular dissection of complex traits. CRC Press, Boca Raton, pp 145–162Google Scholar
  4. Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47:1082–1090CrossRefGoogle Scholar
  5. Cook JP, McMullen MD, Holland JB, Tian F, Bradbury P, Ross-Ibarra J, Buckler ES, Flint-Garcia SA (2012) Genetic architecture of maize kernel composition in the nested association mapping and inbred association panels. Plant Physiol 158:824–834PubMedCentralPubMedCrossRefGoogle Scholar
  6. Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Predictions of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724PubMedCrossRefGoogle Scholar
  7. Crossa J, Pérez P, Hickey J, Burgueño J, Ornella L, Cerón-Rojas J, Zhang X, Dreisigacker S, Babu R, Li Y, Bonnett D, Mathews K (2013) Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. doi: 10.1038/hdy.2013.16 PubMedGoogle Scholar
  8. Daetwyler HD, Swan AA, van der Werf JHJ, Hayes BJ (2012) Accuracy of pedigree and genomic predictions of carcass and novel meat quality traits in multi-breed sheep data assessed by cross-validation. Genet Sel Evol 44:33PubMedCentralPubMedCrossRefGoogle Scholar
  9. de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385PubMedCrossRefGoogle Scholar
  10. de los Campos G, Gianola D, Rosa G, Weige K, Crossa J (2010) Semiparametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 92:295–308CrossRefGoogle Scholar
  11. de Oliveira EJ, de Resende DV, da Silva Santos V, Ferreira CF, Oliveira GAF, da Silva MS, de Oliveira LA, Aguilar-Vildoso GI (2012) Genome-wide selection in cassava. Euphytica 187:263–276CrossRefGoogle Scholar
  12. Edriss V, Fernando RL, Su GS, Lund MS, Guldbrandtsen B (2013) The effect of using genealogy-based haplotypes for genomic prediction. Genet Sel Evol 45:5PubMedCentralPubMedCrossRefGoogle Scholar
  13. Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Prentice Hall, LondonGoogle Scholar
  14. Flint-Garcia SA, Thuillet AC, Yu JM, Pressoir G, Romero SM, Mitchell SE, Doebley J, Kresovich S, Goodman MM, Buckler ES (2005) Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J 44:1054–1064PubMedCrossRefGoogle Scholar
  15. Garris AJ, Tai TH, Coburn J, Kresovich S, McCouch S (2005) Genetic structure and diversity in Oryza sativa L. Genetics 169:1631–1638PubMedCrossRefGoogle Scholar
  16. Guo Z, Tucker D, Lu J, Kishore V, Gay G (2012) Evaluation of genome-wide selection efficiency in maize nested association mapping populations. Theor Appl Genet 124:261–275PubMedCrossRefGoogle Scholar
  17. Guo Z, Tucker D, Wang D, Basten C, Ersoz E, Briggs W, Lu J, Li M, Gay G (2013) Accuracy of across-environment genome-wide prediction in maize nested association mapping populations. G3 3:263–272PubMedCrossRefGoogle Scholar
  18. Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177(4):2389–2397PubMedGoogle Scholar
  19. Habier D, Fernando RL, Garrick DJ (2013) Genomic-BLUP decoded: a look into the black box of genomic prediction. Genetics 194(3):597–607PubMedCrossRefGoogle Scholar
  20. Hayes B, Bowman P, Chamberlain A, Goddard M (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443PubMedCrossRefGoogle Scholar
  21. Heffner EL, Jannink JL, Iwata H, Souza E, Sorrells ME (2011) Genomic selection accuracy for grain quality traits in biparental wheat populations. Crop Sci 51:2597–2606CrossRefGoogle Scholar
  22. Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9:166–177PubMedCrossRefGoogle Scholar
  23. Janss LG, de los Campos G, Sheehan N, Sorensen D (2012) Inferences from genomic models in stratified populations. Genetics 192:693–704PubMedCrossRefGoogle Scholar
  24. Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31(9):497–504PubMedCrossRefGoogle Scholar
  25. Kärkkäinen HP, Sillanpää MJ (2012) Back to basics for Bayesian model building in genomic selection. Genetics 191:969–987PubMedCrossRefGoogle Scholar
  26. Karoui S, Carabaño MJ, Díaz C, Legarra A (2012) Joint genomic evaluation of French dairy cattle breeds using multiple-trait models. Genet Sel Evol 44:39PubMedCentralPubMedCrossRefGoogle Scholar
  27. Lande R, Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743–756PubMedGoogle Scholar
  28. Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265:2037–2048PubMedCrossRefGoogle Scholar
  29. Lee SH, van der Werf JHJ, Hayes BJ, Goddard ME, Visscher PM (2008) Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genet 4(10):e1000231PubMedCentralPubMedCrossRefGoogle Scholar
  30. Legarra A, Robert-Granie C, Manfredi E, Elsen JM (2008) Performance of genomic selection in mice. Genetics 180:611–618PubMedCrossRefGoogle Scholar
  31. Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet 120:151–161PubMedCrossRefGoogle Scholar
  32. Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen TH (2009) The accuracy of genomic selection in Norwegian red cattle assessed by cross-validation. Genetics 183:1119–1126PubMedCrossRefGoogle Scholar
  33. Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G (2011) Beyond missing heritability: prediction of complex traits. PLoS Genet 7(4):e1002051PubMedCentralPubMedCrossRefGoogle Scholar
  34. Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517PubMedCrossRefGoogle Scholar
  35. Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedGoogle Scholar
  36. Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW (2009) A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet Sel Evol 41:56PubMedCentralPubMedCrossRefGoogle Scholar
  37. Mujibi FDN, Nkumah JD, Durunna ON, Stothard P, Mah J, Wang Z, Basarab J, Plastow G, Crews DH Jr, Moore SS (2011) Accuracy of genomic breeding values for residual feed intake in crossbred beef cattle. J Dairy Sci 89:3353–3361Google Scholar
  38. Nakaya A, Isobe SN (2012) Will genomic selection be a practical method for plant breeding? Ann Bot 110(6):1303–1316PubMedCrossRefGoogle Scholar
  39. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:2074–2093CrossRefGoogle Scholar
  40. Piyasatian N, Fernando R, Dekkers JCM (2007) Genomic selection for marker-assisted improvement in line crosses. Theor Appl Genet 115:665–674PubMedCrossRefGoogle Scholar
  41. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal component analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909PubMedCrossRefGoogle Scholar
  42. Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463PubMedCentralPubMedCrossRefGoogle Scholar
  43. Pritchard JK, Donnelly P (2001) Case-control studies of association in structured or admixed populations. Theor Popul Biol 60:227–237PubMedCrossRefGoogle Scholar
  44. Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R, Altmann T, Stitt M, Willmitzer L, Melchinger AE (2012) Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44:217–220PubMedCrossRefGoogle Scholar
  45. Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink JL, Melchinger AE (2013) Genomic predictability of interconnected bi-parental maize populations. Genetics. doi: 10.1534/genetics.113.150227 PubMedGoogle Scholar
  46. Rolf MM, Taylor JF, Schnabel RD, Mckay S, McClure M, Northcutt S, Kerley M, Weaber R (2010) Impact of reduced marker set estimation of genomic relationship matrices on genomic selection for feed efficiency in Angus cattle. BMC Genet 11:24PubMedCentralPubMedCrossRefGoogle Scholar
  47. Saatchi M, McClure MC, McKay SD, Rolf MM, Kim J et al (2011) Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation. Genet Sel Evol 43:1–16CrossRefGoogle Scholar
  48. Technow F, Bürger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups. G3 3:197–203PubMedCrossRefGoogle Scholar
  49. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423PubMedCrossRefGoogle Scholar
  50. Villumsen TM, Janss L, Lund MS (2008) The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genet 126:3–13CrossRefGoogle Scholar
  51. Visscher PM, Yang J, Goddard MEA (2012) A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. (2010). Twin Res Hum Genet 13:517–524CrossRefGoogle Scholar
  52. Windhausen VS, Atlin CN, Hickey JM, Crossa J, Jannink JL, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Beyene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger AE (2012) Effectiveness of genomic predictions of maize hybrid performance in different breeding populations and environments. G3 2:1427–1436PubMedCrossRefGoogle Scholar
  53. Wolc A, Stricker C, Arango J, Settar P, Fulton JE, O’Sullivan NP, Preisinger R, Habier D, Fernardo R, Garrick D, Lamont SJ, Dekkers JCM (2011) Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Genet Sel Evol 43:5PubMedCentralPubMedCrossRefGoogle Scholar
  54. Wray NR, Yang J, Hayes BJ, Price AL, Michael E, Goddard ME, Visscher PM (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515PubMedCrossRefGoogle Scholar
  55. Würschum T, Reif JC, Kraft T, Janssen G, Zhao YS (2013) Genomic selection in sugar beet breeding populations. BMC Genet 14:85PubMedCentralPubMedCrossRefGoogle Scholar
  56. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42:565–569PubMedCentralPubMedCrossRefGoogle Scholar
  57. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208PubMedCrossRefGoogle Scholar
  58. Zhao KY, Tung CW, Eizenga GC, Wright MH, Ali L, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467PubMedCentralPubMedCrossRefGoogle Scholar
  59. Zhao YS, Gowda M, Liu WX, Würschum T, Maurer HP, Longin FH, Ranc N, Reif JC (2012) Accuracy of genomic selection in European maize elite breeding populations. Theor Appl Genet 124:769–776PubMedCrossRefGoogle Scholar
  60. Zhong SQ, Dekkers JCM, Fernando RL, Jannink JL (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182:355–364PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zhigang Guo
    • 1
  • Dominic M. Tucker
    • 2
  • Christopher J. Basten
    • 1
  • Harish Gandhi
    • 3
  • Elhan Ersoz
    • 4
  • Baohong Guo
    • 4
  • Zhanyou Xu
    • 4
  • Daolong Wang
    • 1
  • Gilles Gay
    • 1
  1. 1.Syngenta Biotechnology, Inc.DurhamUSA
  2. 2.Syngenta, Inc.ClintonUSA
  3. 3.Syngenta India Ltd.R.R. DistrictIndia
  4. 4.Syngenta, Inc.SlaterUSA

Personalised recommendations