Theoretical and Applied Genetics

, Volume 127, Issue 6, pp 1375–1386 | Cite as

Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years

  • Theresa Albrecht
  • Hans-Jürgen Auinger
  • Valentin Wimmer
  • Joseph O. Ogutu
  • Carsten Knaak
  • Milena Ouzunova
  • Hans-Peter Piepho
  • Chris-Carolin Schön
Original Paper


Key message

The calibration data for genomic prediction should represent the full genetic spectrum of a breeding program. Data heterogeneity is minimized by connecting data sources through highly related test units.


One of the major challenges of genome-enabled prediction in plant breeding lies in the optimum design of the population employed in model training. With highly interconnected breeding cycles staggered in time the choice of data for model training is not straightforward. We used cross-validation and independent validation to assess the performance of genome-based prediction within and across genetic groups, testers, locations, and years. The study comprised data for 1,073 and 857 doubled haploid lines evaluated as testcrosses in 2 years. Testcrosses were phenotyped for grain dry matter yield and content and genotyped with 56,110 single nucleotide polymorphism markers. Predictive abilities strongly depended on the relatedness of the doubled haploid lines from the estimation set with those on which prediction accuracy was assessed. For scenarios with strong population heterogeneity it was advantageous to perform predictions within a priori defined genetic groups until higher connectivity through related test units was achieved. Differences between group means had a strong effect on predictive abilities obtained with both cross-validation and independent validation. Predictive abilities across subsequent cycles of selection and years were only slightly reduced compared to predictive abilities obtained with cross-validation within the same year. We conclude that the optimum data set for model training in genome-enabled prediction should represent the full genetic and environmental spectrum of the respective breeding program. Data heterogeneity can be reduced by experimental designs that maximize the connectivity between data sources by common or highly related test units.


Predictive Ability Breeding Population Genetic Group Single Nucleotide Polymorphism Marker Double Haploid Line 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Daniel Gianola, Sofia daSilva, and Torben Schulz-Streeck for helpful comments on the manuscript. We also thank Ruedi Fries and Hubert Pausch for processing of SNP arrays. This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed—Synergistic plant and animal breeding” (FKZ: 0315528A).

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standards

The authors declare that the experiments comply with the current laws of Germany.

Supplementary material

122_2014_2305_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (PDF 1088 kb)


  1. Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350PubMedCrossRefGoogle Scholar
  2. Astle W, Balding DJ (2009) Population structure and cryptic relatedness in genetic association studies. Stat Sci 24:451–471CrossRefGoogle Scholar
  3. Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 53:707–719CrossRefGoogle Scholar
  4. Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banzinger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724PubMedCentralPubMedCrossRefGoogle Scholar
  5. Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A, Clarke JD, Graner EM, Hansen M, Joets J, Le Paslier MC, McMullen MD, Montalent P, Rose M, Schön CC, Sun Q, Walter H, Martin OC, Falque M (2011) A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One 6:e28334PubMedCentralPubMedCrossRefGoogle Scholar
  6. Gianola D (2013) Priors in whole-genome regression: the Bayesian alphabet returns. Genetics 194:573–596PubMedCrossRefGoogle Scholar
  7. Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml user guide release 3.0. VSN International Ltd. Hemel HempsteadGoogle Scholar
  8. Gordillo GA, Geiger HH (2008) Alternative recurrent selection strategies using doubled haploid lines in hybrid maize breeding. Crop Sci 48:911–922CrossRefGoogle Scholar
  9. Grubbs FE (1950) Sample criteria for testing outlying observations. Ann Math Stat 21:27–58CrossRefGoogle Scholar
  10. Guo Z, Tucker DM, Lu J, Kishore V, Gay G (2012) Evaluation of genome-wide selection efficiency in maize nested association mapping populations. Theor Appl Genet 124:261–275PubMedCrossRefGoogle Scholar
  11. Guo Z, Tucker DM, Basten CJ, Gandhi H, Ersoz E, Guo B, Xu Z, Wang D, Gay G (2014) The impact of population structure on genomic prediction in stratified populations. Theor Appl Genet 127:749–762Google Scholar
  12. Habier D, Tetens J, Seefried F-R, Lichtner P, Thaller G (2010) The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 42:5PubMedCentralPubMedCrossRefGoogle Scholar
  13. Hayes BJ, Bowman PJ, Chamberlain AC, Verbyla K, Goddard ME (2009) Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet Sel Evol 41:51PubMedCentralPubMedCrossRefGoogle Scholar
  14. Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160CrossRefGoogle Scholar
  15. Hofheinz N, Borchardt D, Weissleder K, Frisch M (2012) Genome-based prediction of test cross performance in two subsequent breeding cycles. Theor Appl Genet 125:1639–1645PubMedCrossRefGoogle Scholar
  16. Holland JB, Nyquist WE, Cervantes-Martínez CT (2003) Estimating and interpreting heritability for plant breeding: an update. Plant Breed Rev 22:9–112Google Scholar
  17. Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31:497–504PubMedCrossRefGoogle Scholar
  18. Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12:375–391PubMedGoogle Scholar
  19. Melchinger AE, Utz HF, Schön CC (1998) Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects. Genetics 149:383–403PubMedCentralPubMedGoogle Scholar
  20. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedCentralPubMedGoogle Scholar
  21. Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S (2012) Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 2:1595–1605PubMedCentralPubMedCrossRefGoogle Scholar
  22. Piepho HP, Williams ER, Fleck M (2006) A note on the analysis of designed experiments with complex treatment structure. HortScience 41:446–452Google Scholar
  23. Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO (2012) A stage-wise approach for the analysis of multi-environment trials. Biom J 54:844–860PubMedCrossRefGoogle Scholar
  24. Reif J, Gumpert FM, Fischer S, Melchinger AE (2007) Impact of interpopulation divergence on additive and dominance variance in hybrid populations. Genetics 176:1931–1934PubMedCentralPubMedCrossRefGoogle Scholar
  25. Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R, Altmann T, Stitt M, Willmitzer L, Melchinger AE (2012) Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44:217–220PubMedCrossRefGoogle Scholar
  26. Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink JL, Melchinger AE (2013) Genomic predictability of interconnected biparental maize populations. Genetics 194:493–503PubMedCrossRefGoogle Scholar
  27. Rincent R, Laloë D, Nicolas S, Altmann T, Brunel D, Revilla P, Rodríguez VM, Moreno-Gonzalez J, Melchinger AE, Bauer E, Schön CC, Meyer N, Giauffret C, Bauland C, Jamin P, Laborde J, Monod H, Flament P, Charcosset A, Moreau L (2012) Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics 192:715–728PubMedCentralPubMedCrossRefGoogle Scholar
  28. Saatchi M, McClure MC, McKay SD, Rolf MM, Kim JW, Decker JE, Taxis TM, Chapple RH, Ramey HR, Northcutt SL, Bauck S, Woodward B, Dekkers JCM, Fernando RL, Schnabel RD, Garrick DJ, Taylor JF (2011) Accuracies of genomic breeding values in American Angus beef cattle using k-means clustering for cross-validation. Genet Sel Evol 43:40PubMedCentralPubMedCrossRefGoogle Scholar
  29. Schulz-Streeck T, Ogutu JO, Karaman Z, Knaak C, Piepho HP (2012) Genomic selection using multiple populations. Crop Sci 52:2453–2461CrossRefGoogle Scholar
  30. Stram DO, Lee JW (1994) Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177PubMedCrossRefGoogle Scholar
  31. Utz HF, Melchinger AE, Schön CC (2000) Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross-validation and validation with independent samples. Genetics 154:1839–1849PubMedCentralPubMedGoogle Scholar
  32. Wimmer V, Albrecht T, Auinger HJ, Schön CC (2012) synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics 28:2086–2087PubMedCrossRefGoogle Scholar
  33. Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schön CC (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195:573–587PubMedCrossRefGoogle Scholar
  34. Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrels ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Beyene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger AE (2012) Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 2:1427–1436PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Theresa Albrecht
    • 1
    • 2
  • Hans-Jürgen Auinger
    • 1
  • Valentin Wimmer
    • 1
    • 4
  • Joseph O. Ogutu
    • 3
  • Carsten Knaak
    • 4
  • Milena Ouzunova
    • 4
  • Hans-Peter Piepho
    • 3
  • Chris-Carolin Schön
    • 1
  1. 1.Plant Breeding, Center of Life and Food Sciences WeihenstephanTechnische Universität MünchenFreisingGermany
  2. 2.Institute for Crop Science and Plant BreedingBavarian State Research Center for AgricultureFreisingGermany
  3. 3.Bioinformatics Unit, Institute of Crop ScienceUniversität HohenheimStuttgartGermany
  4. 4.KWS SAAT AGEinbeckGermany

Personalised recommendations