Theoretical and Applied Genetics

, Volume 126, Issue 1, pp 69–82 | Cite as

Comparisons of single-stage and two-stage approaches to genomic selection

  • Torben Schulz-Streeck
  • Joseph O. Ogutu
  • Hans-Peter PiephoEmail author
Original Paper


Genomic selection (GS) is a method for predicting breeding values of plants or animals using many molecular markers that is commonly implemented in two stages. In plant breeding the first stage usually involves computation of adjusted means for genotypes which are then used to predict genomic breeding values in the second stage. We compared two classical stage-wise approaches, which either ignore or approximate correlations among the means by a diagonal matrix, and a new method, to a single-stage analysis for GS using ridge regression best linear unbiased prediction (RR-BLUP). The new stage-wise method rotates (orthogonalizes) the adjusted means from the first stage before submitting them to the second stage. This makes the errors approximately independently and identically normally distributed, which is a prerequisite for many procedures that are potentially useful for GS such as machine learning methods (e.g. boosting) and regularized regression methods (e.g. lasso). This is illustrated in this paper using componentwise boosting. The componentwise boosting method minimizes squared error loss using least squares and iteratively and automatically selects markers that are most predictive of genomic breeding values. Results are compared with those of RR-BLUP using fivefold cross-validation. The new stage-wise approach with rotated means was slightly more similar to the single-stage analysis than the classical two-stage approaches based on non-rotated means for two unbalanced datasets. This suggests that rotation is a worthwhile pre-processing step in GS for the two-stage approaches for unbalanced datasets. Moreover, the predictive accuracy of stage-wise RR-BLUP was higher (5.0–6.1 %) than that of componentwise boosting.


Genomic Selection Standard Variety Boost Regression Tree Adjusted Means Mixed Model Framework 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Best linear unbiased prediction


Genomic estimated breeding value


Genomic selection


Randomized complete block design


Restricted maximum likelihood


Ridge regression BLUP


Single nucleotide polymorphism



We thank AgReliant Genetics for providing the datasets. This research was funded by AgReliant Genetics and the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed—Synergistic plant and animal breeding” (Grant ID: 0315526). Three anonymous referees are thanked for very useful and constructive comments.

Conflict of interest

The authors declare that they have no competing interests.


  1. Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350PubMedCrossRefGoogle Scholar
  2. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79CrossRefGoogle Scholar
  3. Berk RA (2008) Statistical learning from a regression perspective. Springer, New YorkGoogle Scholar
  4. Boulesteix AL, Hothorn T (2010) Testing the additional predictive value of high-dimensional molecular data. BMC Bioinforma 11:78CrossRefGoogle Scholar
  5. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:477–505CrossRefGoogle Scholar
  6. Buja A, Mease D, Wyner AJ (2007) Comment: boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:506–512CrossRefGoogle Scholar
  7. Calus MPL, Veerkamp RF (2007) Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. J Anim Breed Genet 124:362–368PubMedCrossRefGoogle Scholar
  8. Cullis BR, Thomson FM, Fisher JA, Gilmour AR, Thompson R (1996) The analysis of the NSW wheat variety database. 1. Modelling trial error variance. Theor Appl Genet 91:21–27CrossRefGoogle Scholar
  9. Cullis BR, Gogel BJ, Verbyla AP, Thompson R (1998) Spatial analysis of multi-environment early generation trials. Biometrics 54:1–18CrossRefGoogle Scholar
  10. Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139CrossRefGoogle Scholar
  11. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 38:367–378Google Scholar
  12. Hastie TJ, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  13. Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443PubMedCrossRefGoogle Scholar
  14. Henderson CR (1977) Best linear unbiased prediction of breeding values not in the model for records. J Dairy Sci 60:783–787CrossRefGoogle Scholar
  15. Heslot N, Yang HP, Sorrels ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160Google Scholar
  16. Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner, B (2010) mboost: model-based boosting. R package version 2.0-6.
  17. John JA, Williams ER (1995) Cyclic and computer generated designs, 2nd edn. Chapman and Hall, LondonGoogle Scholar
  18. Long N, Gianola D, Rosa GJM, Weigel KA, Avendano S (2007) Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J Anim Breed Genet 124:377–389PubMedCrossRefGoogle Scholar
  19. Macciotta NPP, Gaspa G, Steri R, Pieramati C, Carnier P, Dimauro C (2009) Pre selection of most significant SNPS for the estimation of genomic breeding values. BMC Proc 3(Suppl 1):S14PubMedCrossRefGoogle Scholar
  20. Mathew T, Nordström K (2010) Comparison of one-step and two-step meta-analysis models using individual patient data. Biom J 52:271–287PubMedGoogle Scholar
  21. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedGoogle Scholar
  22. Möhring J, Piepho HP (2009) Comparison of weighting in two-stage analyses of series of experiments. Crop Sci 49:1977–1988CrossRefGoogle Scholar
  23. Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection using SNP markers. BMC Proc 5(Suppl 3):S11PubMedCrossRefGoogle Scholar
  24. Piepho HP (2009) Ridge regression and extensions for genome-wide selection in maize. Crop Sci 49:1165–1176CrossRefGoogle Scholar
  25. Piepho HP, Möhring J (2006) Selection in cultivar trials—is it ignorable? Crop Sci 146:193–202Google Scholar
  26. Piepho HP, Williams ER, Fleck M (2006) A note on the analysis of designed experiments with complex treatment structure. Hortic Sci 41:446–452Google Scholar
  27. Piepho HP, Schulz-Streeck T, Ogutu JO (2011) A stage-wise approach for analysis of multi-environment trials. Biuletyn Oceny Odmian 33:7–20Google Scholar
  28. Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO (2012a) A stage-wise approach for analysis of multi-environment trials. Biom J (in press)Google Scholar
  29. Piepho HP, Ogutu JO, Schulz-Streeck T, Estaghvirou B, Gordillo A, Technow F (2012b) Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci 52:1093–1104CrossRefGoogle Scholar
  30. Qiao CG, Basford KE, DeLacy IH, Cooper M (2000) Evaluation of experimental designs and spatial analysis in wheat breeding trials. Theor Appl Genet 100:9–16CrossRefGoogle Scholar
  31. Rao CR, Toutenburg H, Shalabh, Heumann C (2008) Linear models and generalizations least squares and alternatives. Springer, BerlinGoogle Scholar
  32. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  33. Schulz-Streeck T, Piepho HP (2010) Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models. BMC Proc 4(Suppl 1):S8PubMedCrossRefGoogle Scholar
  34. Schulz-Streeck T, Ogutu JO, Piepho HP (2011) Pre-selection of markers for genomic selection. BMC Proc 5(Suppl 3):S12PubMedCrossRefGoogle Scholar
  35. Schulz-Streeck T, Estaghvirou B, Technow F (2012) rrBlupMethod6: re-parametrization of RR-BLUP to allow for a fixed residual variance. R package, version 1.2.
  36. Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New YorkCrossRefGoogle Scholar
  37. Smith AB, Cullis BR, Gilmour AR (2001a) The analysis of crop variety evaluation data in Australia. Aust N Z J Stat 43:129–145CrossRefGoogle Scholar
  38. Smith A, Cullis B, Thompson R (2001b) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147PubMedCrossRefGoogle Scholar
  39. Tutz G, Reithinger F (2007) A boosting approach to flexible semiparametric mixed models. Stat Med 26:2872–2900PubMedCrossRefGoogle Scholar
  40. Van Houwelingen HC, Arends LR, Stijnen T (2002) Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 21:589–624PubMedCrossRefGoogle Scholar
  41. Welham S, Gogel BJ, Smith AB, Thompson R, Cullis BR (2010) A comparison of analysis methods for late-stage evaluation trials. Aust N Z J Stat 52:125–149CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Torben Schulz-Streeck
    • 1
  • Joseph O. Ogutu
    • 1
  • Hans-Peter Piepho
    • 1
    Email author
  1. 1.Bioinformatics Unit, Institute of Crop ScienceUniversity of HohenheimStuttgartGermany

Personalised recommendations