Theoretical and Applied Genetics

, Volume 127, Issue 6, pp 1375–1386

Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years

  • Theresa Albrecht
  • Hans-Jürgen Auinger
  • Valentin Wimmer
  • Joseph O. Ogutu
  • Carsten Knaak
  • Milena Ouzunova
  • Hans-Peter Piepho
  • Chris-Carolin Schön
Original Paper

DOI: 10.1007/s00122-014-2305-z

Cite this article as:
Albrecht, T., Auinger, HJ., Wimmer, V. et al. Theor Appl Genet (2014) 127: 1375. doi:10.1007/s00122-014-2305-z

Abstract

Key message

The calibration data for genomic prediction should represent the full genetic spectrum of a breeding program. Data heterogeneity is minimized by connecting data sources through highly related test units.

Abstract

One of the major challenges of genome-enabled prediction in plant breeding lies in the optimum design of the population employed in model training. With highly interconnected breeding cycles staggered in time the choice of data for model training is not straightforward. We used cross-validation and independent validation to assess the performance of genome-based prediction within and across genetic groups, testers, locations, and years. The study comprised data for 1,073 and 857 doubled haploid lines evaluated as testcrosses in 2 years. Testcrosses were phenotyped for grain dry matter yield and content and genotyped with 56,110 single nucleotide polymorphism markers. Predictive abilities strongly depended on the relatedness of the doubled haploid lines from the estimation set with those on which prediction accuracy was assessed. For scenarios with strong population heterogeneity it was advantageous to perform predictions within a priori defined genetic groups until higher connectivity through related test units was achieved. Differences between group means had a strong effect on predictive abilities obtained with both cross-validation and independent validation. Predictive abilities across subsequent cycles of selection and years were only slightly reduced compared to predictive abilities obtained with cross-validation within the same year. We conclude that the optimum data set for model training in genome-enabled prediction should represent the full genetic and environmental spectrum of the respective breeding program. Data heterogeneity can be reduced by experimental designs that maximize the connectivity between data sources by common or highly related test units.

Supplementary material

122_2014_2305_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (PDF 1088 kb)

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Theresa Albrecht
    • 1
    • 2
  • Hans-Jürgen Auinger
    • 1
  • Valentin Wimmer
    • 1
    • 4
  • Joseph O. Ogutu
    • 3
  • Carsten Knaak
    • 4
  • Milena Ouzunova
    • 4
  • Hans-Peter Piepho
    • 3
  • Chris-Carolin Schön
    • 1
  1. 1.Plant Breeding, Center of Life and Food Sciences WeihenstephanTechnische Universität MünchenFreisingGermany
  2. 2.Institute for Crop Science and Plant BreedingBavarian State Research Center for AgricultureFreisingGermany
  3. 3.Bioinformatics Unit, Institute of Crop ScienceUniversität HohenheimStuttgartGermany
  4. 4.KWS SAAT AGEinbeckGermany

Personalised recommendations