Theoretical and Applied Genetics

, Volume 130, Issue 2, pp 363–376 | Cite as

Genomic assisted selection for enhancing line breeding: merging genomic and phenotypic selection in winter wheat breeding programs with preliminary yield trials

  • Sebastian Michel
  • Christian Ametz
  • Huseyin Gungor
  • Batuhan Akgöl
  • Doru Epure
  • Heinrich Grausgruber
  • Franziska Löschenberger
  • Hermann BuerstmayrEmail author
Open Access
Original Article


Key message

Early generation genomic selection is superior to conventional phenotypic selection in line breeding and can be strongly improved by including additional information from preliminary yield trials.


The selection of lines that enter resource-demanding multi-environment trials is a crucial decision in every line breeding program as a large amount of resources are allocated for thoroughly testing these potential varietal candidates. We compared conventional phenotypic selection with various genomic selection approaches across multiple years as well as the merit of integrating phenotypic information from preliminary yield trials into the genomic selection framework. The prediction accuracy using only phenotypic data was rather low (r = 0.21) for grain yield but could be improved by modeling genetic relationships in unreplicated preliminary yield trials (r = 0.33). Genomic selection models were nevertheless found to be superior to conventional phenotypic selection for predicting grain yield performance of lines across years (r = 0.39). We subsequently simplified the problem of predicting untested lines in untested years to predicting tested lines in untested years by combining breeding values from preliminary yield trials and predictions from genomic selection models by a heritability index. This genomic assisted selection led to a 20% increase in prediction accuracy, which could be further enhanced by an appropriate marker selection for both grain yield (r = 0.48) and protein content (r = 0.63). The easy to implement and robust genomic assisted selection gave thus a higher prediction accuracy than either conventional phenotypic or genomic selection alone. The proposed method took the complex inheritance of both low and high heritable traits into account and appears capable to support breeders in their selection decisions to develop enhanced varieties more efficiently.


Genomic Selection High Prediction Accuracy Genomic Estimate Breeding Value Training Population Well Linear Unbiased Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Selection and development of new varieties of autogamous crops relies on a number of different breeding schemes including the pedigree and bulk methods as well as breeding acceleration using doubled haploids or single seed descent with off-season generations. Notwithstanding, they all share a step of conventional phenotypic selection based on preliminary yield trials in their methodology. These preliminary yield trials are for the larger part unreplicated as merely a limited amount of seed is available from each selection candidate at this stage. Although the phenotypic data obtained in this way allow only preliminary predictions of their final values they strongly influence the selection of lines that enter the following more resource-demanding multi-environment trials, a crucial decision in every line breeding program as a large amount of resources are allocated for thoroughly testing these potential varietal candidates.

Genomic selection using genome-wide dense marker maps has been suggested as a more efficient alternative to conventional selection methods (Meuwissen et al. 2001) and several studies have shown its great potential in line breeding to enhance the selection for major agronomic traits like yield both in legumes (Jarquín et al. 2014; Burstin et al. 2015; Tayeh et al. 2015) and small grain cereals (Asoro et al. 2011; Sallam et al. 2015; Spindel et al. 2015; He et al. 2016; Michel et al. 2016). Additionally, genomic selection could support the accumulation of many small effect alleles to provide higher and more durable quantitative disease resistance (Lorenz et al. 2012; Ornella et al. 2012; Daetwyler et al. 2014; Arruda et al. 2015; Rutkoski et al. 2015b), which could be subsequently combined with labor-intensive and costly to assess quality traits (Heffner et al. 2011b; Schmidt et al. 2015).

The broad range of possible applications has led to different strategies concerning the implementation of genomic selection into line breeding schemes (Heffner et al. 2010; Longin et al. 2015; Spindel et al. 2015; Marulanda et al. 2016), though it is generally suggested that a genomic selection step is integrated before multi-environment trials are being conducted. Breeders might thus consider the replacement of traditional preliminary yield trials by genomic selection to spare phenotyping costs or even integrating them into the genomic selection framework as they deliver a first insight into the future performance of the putative varietal candidates (Endelman et al. 2014). An additional concern of genomic selection is the choice of lines that shall constitute the training population (Rincent et al. 2012; Isidro et al. 2015; Marulanda et al. 2015) especially if breeders conduct selection, which is not always optimal for genomic selection models (Zhao et al. 2012). Nevertheless, high quality phenotypic data for multiple traits is usually available for many advanced lines that were already tested in multi-environment trials and could possibly be used to build more suitable training populations. Hence, a comparison between conventional phenotypic selection based on preliminary yield trials and genomic selection together with an appropriate training population design is needed to shed more light on this issue for the optimization and enhancement of line breeding schemes. The objectives of this study were thus to investigate (i) the possibilities and merit of a posteriori training population designs, (ii) integrating phenotypic information from preliminary yield trials into the genomic selection framework and (iii) compare conventional phenotypic selection with various genomic selection approaches in line breeding schemes on the example of bread wheat.

Materials and methods

Plant material and phenotypic data

We analyzed a population of 861 genotyped lines from a commercial winter wheat (Triticum aestivum L.) breeding program that descend from multiple families and were either in the F4:6 generation or directly derived by the double haploid method. Different subpopulations containing 64–192 lines were tested orthogonally in multi-environment trials from 2010 to 2015. Phenotypic data of these lines was thus of high quality, as they were thoroughly tested in all trial locations that spanned from Austria over Serbia, Croatia, Hungary, and Romania to the Central Anatolian High Plateau in Turkey. We also analyzed F4:5 generation preliminary yield trials where all lines in the population were pretested in one location and year in Austria from 2011 to 2014 before multi-environment trials were conducted.

Unreplicated earlier generation lines were tested along with replicated check varieties in all trials. The replicated check varieties allowed correcting for spatial field trends according to standard procedure in plant breeding. The entire population of genotyped earlier generation lines from 2011 to 2014 comprised 1203 lines, with 731 lines being unique to the preliminary yield trials. The number of genotyped lines in these preliminary yield trials varied accordingly between 151 and 539 lines as this study also included historical data before genomic selection was routinely implemented into the winter wheat breeding program at hand. Phenotypic records included grain yield (dt ha−1) and protein content (%), which was determined by near infrared spectroscopy (NIRS) directly at harvest.

Statistical analysis of phenotypic data

We followed a two stage analysis strategy of the phenotypic data, where each individual yield trial was analyzed separately in the first stage. Various models correcting for row and/or column effects as well as autoregressive variance–covariance structure of the residuals were introduced (Burgueño et al. 2000) and the best model was chosen by Akaike’s information criterion (AIC) to calculate best linear unbiased estimates (BLUE) for each trial. The heritability was estimated by \(h^{2} = \sigma_{G}^{2}/( {\sigma_{G}^{2} + \frac{1}{2}{\text{MVD}}})\), where \(\sigma_{G}^{2}\) designates the genetic variance and \({\text{MVD}}\) the mean variance of a difference of the BLUEs (Piepho and Möhring 2007) and trials with a heritability larger than 0.3 were forwarded for further analysis.

Across trial analysis of the multi-environment trials were conducted separately for each year using a linear mixed model of the form:
$$y_{ij} = \mu + g_{i} + t_{j} + gt_{ij} + e$$
was fitted for all traits, where \(y_{ij}\) are the BLUEs from the first stage, \(\mu\) is the grand mean, and \(g_{i}\) is the effect of the ith line. The effect of the jth trial \(t_{j}\) was fixed, while the line by trial interaction effect \(gt_{ij} \frac{1}{2}\) was random. The residual variance was fixed and the inverse of the squared standard errors of the means derived from the first stage of analysis were used as weights in this stage to take the varying accuracy of phenotypic records into account (Möhring and Piepho 2009). Additionally, best linear unbiased predictions (BLUP) were derived for preliminary yield trials by modeling a random effect for the inbred lines in which the heritability was estimated by \(h^{2} = 1 - \left( {{\text{VD}}_{\text{BLUP}} /2\sigma_{G}^{2} } \right)\) with \({\text{VD}}_{\text{BLUP}}\) being the mean variance of a difference of the BLUPs (Cullis et al. 2006). The replicated check varieties were thereby used to estimate row and column effects as well as the error variance. The individual records of the unreplicated lines could in this way be adjusted accordingly, taking spatial trends in the preliminary yield trials into account. All phenotypic analyses were conducted using the statistical package ASReml 3 (VSN International, 2015) for the R programming environment (R development core team 2016).

Genotypic data

DNA was extracted following the protocol by Saghai-Maroof et al. (1984) using leaf samples that were collected from F4:5 or doubled haploid lines by sampling minimum ten plants per line during early summer. All 861 lines tested in multi-environment trials as well as the 731 lines unique to preliminary yield trials were genotyped using the DarT genotyping-by-sequencing (GBS) approach (Diversity Array Technologies 2015). Quality control was applied by filtering out markers with a call rate lower than 90%, a minor allele frequency smaller than 0.05, and more than 10% of missing data. Missing data of the remaining 6.6 K SNP markers was imputed by an MVN-EM algorithm (Poland et al. 2012). The same marker data was again used for training genomic selection models with F4:6 lines. The minor change in average heterozygosity was expected to introduce a small error which was nevertheless seen to be acceptable considering the cost-benefit ratio of re-genotyping all lines in the F4:6 generation.

Genomic selection and estimation of breeding values in preliminary yield trials

Marker effects were estimated using a ridge regression best linear unbiased prediction (RR-BLUP):
$${\mathbf{y}} = {\mathbf{Xb}} + {\mathbf{Zu}} + {\mathbf{e}}$$
where \({\mathbf{y}}\) is an Nx1 vector of BLUEs obtained in the phenotypic analysis, \({\mathbf{b}}\) is a vector of F fixed effects and \({\mathbf{X}}\) its corresponding NxF design matrix. \({\mathbf{Z}}\) is a NxM matrix, which coded the M markers as either +1 or −1 for homozygous loci and 0 for heterozygous loci. Random marker effects were assumed to follow a normal distribution \({\mathbf{u}} \sim N\left( {0, {\mathbf{I}}\sigma_{u}^{2} } \right)\) with variance \(\sigma_{\text{u}}^{2}\) and \({\mathbf{e}} \sim N\left( {0, {\mathbf{I}}\sigma_{e}^{2} } \right)\). The kinship between lines was estimated by the genomic relationship matrix, which was computed according to Endelman and Jannink (2012):
$${\mathbf{K}} = {\mathbf{WW}}^{\text{T}} /2\varSigma \left( {p_{k} - 1} \right)p_{k}$$
where \({\mathbf{W}}\) is a centered NxM marker matrix of the i lines with \(W_{ik} = Z_{ik} - 2p_{k}\) and \(p_{k}\) being the allele frequency at the kth locus. The derived variance–covariance matrix was used to fit mixed linear models of the form:
$${\mathbf{y}} = {\mathbf{Xb}} + {\mathbf{Zg}} + {\mathbf{e}}$$
where \({\mathbf{y}}\) is an Nx1 vector of BLUEs obtained in the phenotypic analysis, \({\mathbf{g}}\) is an Nx1 vector of genotypic effects with \({\mathbf{g}} \sim N\left( {0, {\mathbf{K}}\sigma_{G}^{2} } \right)\) and the genetic variance \(\sigma_{G}^{2}\) as well as its corresponding random effect design matrix \({\mathbf{Z}}\). The shrinkage parameter was given by \(\lambda^{2} = \sigma_{e}^{2} /\sigma_{g}^{2}\) where \(\sigma_{e}^{2}\) is the variance of the residuals that followed \({\mathbf{e}} \sim N\left( {0, {\mathbf{I}}\sigma_{e}^{2} } \right)\). The mixed linear models were completed by F fixed effects, which were contained in the vector \({\mathbf{b}}\) and its corresponding NxF design matrix \({\mathbf{X}}\). Fixed effects included years in the case of prediction with multiple years and the grand mean for preliminary yield trials.

Breeding values for all the lines tested in preliminary yield trials were estimated by explicitly entering their phenotypic records i.e., BLUES for grain yield and protein content into model (4). In this way, genetic relationship between the lines were exploited to strengthen the predictiveness of preliminary yield trials although most selection candidates were tested unreplicated in just one plot (Endelman et al. 2014). We like to refer to this method as kinship enhanced best linear unbiased prediction of phenotypic breeding values (KBLUP) in this study to differentiate it from the genomic best linear unbiased prediction (GBLUP) model, where selection candidates are predicted purely on their relationship with a training population without any phenotypic records. Models for estimating marker effects by RR-BLUP were implemented using the R package rrBLUP (Endelman 2011), whereas the GBLUP and KBLUP models for predicting future line performance were fitted with the implementation of ASReml 3 (VSN International 2015) for R (R development core team 2016).

Cross-validation accuracy and training population design

We first investigated the merit of a posteriori designing a training population by picking a specific set from the entire available population of lines. The phenotypic variance of the training population is a major factor correlated with the prediction accuracy (Isidro et al. 2015; Marulanda et al. 2015), thus we aimed to maximize the phenotypic variance by sampling the highest and lowest performing lines from each respective year for entering into the training population.

The impact of this sampling method on the prediction accuracy was tested by 6-fold cross-validation, where the training and selection populations were built by randomly sampling 20–60 lines from each year and every year constituted a fold. GBLUP models were fitted with randomly sampled training populations and the benefit of maximizing the phenotypic variance was studied by equally sampling lines from the tails of the distribution e.g., the 30 highest and 30 lowest performing lines from a given year. The selection population was always equivalent in both cases and the training population size varied accordingly between 100 and 300 lines. This entire approach corresponds essentially to sampling both genotypes and environments for estimating a less upward biased prediction accuracy of genomic selection than obtained by sampling genotypes alone (Albrecht et al. 2014; Michel et al. 2016). Furthermore, the prediction accuracy of the full data set was estimated by leaving all lines from one year out as validation population and training a GBLUP model with all lines from the remaining 5 years at a time, which resulted in training population sizes of approximately 700 lines and validation populations that were on average composed of 140 lines. The benefit of a posteriori training population design was assessed by sampling 20–90% of the lines from each year in the training population, either randomly or with half of the lines coming again from either tail of the distribution.

Comparison between conventional phenotypic and genomic selection

The accuracy of conventional phenotypic selection was estimated by correlating the line performance in preliminary yield trials in 2011–2014 and BLUEs from multi-environment trials the following year. This estimate was based on 96–145 retested lines that formed the selection populations and despite a certain selection pressure still covered a broad range of both protein content and grain yield (Fig S1). Line performance per se was thereby predicted by classical BLUP as well as the above described KBLUP that took genetic relationships among lines within preliminary yield trials into account.

Pure genomic selection is on the other hand undertaken without prior knowledge of line performance from preliminary yield trials. We compared this approach with conventional phenotypic selection by predicting the performance of the same 96–145 retested lines but excluded all their phenotypic data from both the year of the preliminary yield trial and the multi-environment trials to fit GBLUP models. The influence of the training population constitution was studied by setting up a cross-validation scheme, using alternatively all possible three-way combinations of the remaining four years in which lines from the selection population did not occur (Fig S2). Hence, every one of the four selection populations was predicted by four different training populations. The training population size was fixed at 180 lines and constructed by sampling an equal number of 60 lines from each one of the training population years. The prediction accuracy of the different selection populations was finally obtained by correlating the genomic estimated breeding values (GEBV) with the BLUEs from the across trial analysis of the multi-environment trials.

Genomic assisted selection and marker selection

Although genomic selection is a relatively new approach the implementation of preliminary yield trials has been part of most line breeding schemes for a long time. We like to simplify the problem of predicting untested lines in untested years to predict tested lines in untested years in this study by integrating phenotypic information from preliminary yield trials into the genomic selection framework. Therefore, we first estimated the line breeding values by the KBLUP model for every preliminary yield trial and GEBVs from the GBLUP model for every one of the previously described training by selection population combinations. The heritability for the GBLUP model was estimated via the shrinkage parameter \(\lambda^{2} = \sigma_{e}^{2} /\sigma_{g}^{2}\) which could be written as:
$$\sigma_{e}^{2} /\sigma_{g}^{2} = \left( {1/h^{2} } \right) - 1$$
This approximation by Hofheinz et al. (2012) also allowed us to estimate the heritability h 2 for the unreplicated preliminary yield trials via both the genetic variance \(\sigma_{g}^{2}\) and the residual variance \(\sigma_{e}^{2}\) as computed by the KBLUP model. The estimated heritabilities were subsequently used as weights in a heritability index, which was built with predictions from both the GBLUP and KBLUP models:
$${\text{GEBV}}_{\text{Index}} = {\text{GBLUP}}_{\text{Scaled}} * {\text{w}}_{\text{GBLUP}} + {\text{KBLUP}}_{\text{Scaled}} * {\text{w}}_{\text{KBLUP}}$$
where \({\text{GEBV}}_{\text{Index}}\) are the GEBVs obtained for genomic assisted selection, \({\text{GBLUP}}_{\text{Scaled}}\) and \({\text{KBLUP}}_{\text{Scaled}}\) are the scaled predictions from the GBLUP and KBLUP models, and the weights \({\text{w}}_{\text{GBLUP}}\) and \({\text{w}}_{\text{KBLUP}}\) are equivalent to the heritabilities computed by (5). The scaling of the prediction was done as appropriate for index selection by subtracting the mean of the predictions and subsequent division by the variance for each GEBV. It should be note that only the selection candidates were involved in the scaling process.
Prior knowledge of line performance from preliminary yield trials enabled furthermore a knowledge-based and more sophisticated selection of markers actually associated with the trait of interest. For this purpose, marker effects were first estimated by fitting RR-BLUP models separately for the preliminary yield trial and the training population of lines in each fold i.e., training by validation population combination of the employed cross-validation scheme. Markers whose effect showed a change of sign between these two models were considered to rather introduce errors into the prediction model and were removed from the marker and genomic relationship matrix before GEBVs were estimated by GBLUP. All phenotypic data involved in the validation of the models was explicitly excluded from this process. RR-BLUP models were also refitted with the selected markers to investigate the proportional change of markers with the same and different sign. We like to highlight at this point that this marker selection approach was only undertaken on the side of the training population from multi-environment trials as no beneficial effect of marker selection was observed when estimating breeding values in preliminary yield trials by KBLUP (data not shown). Assuming larger information content of the GBLUP model in this case the index weight was accordingly adjusted:
$$w_{\text{GBLUP}} = h_{\text{GBLUP}}^{2} /\left( {1 - \left| {r_{{{\text{GBLUP}};{\text{KBLUP}}}} } \right|} \right)$$
where \(w_{\text{GBLUP}}\) is the index weight, \(h_{\text{GBLUP}}^{2}\) the heritability estimated from the GBLUP model following (5) and \(\left| {r_{{{\text{GBLUP}};{\text{KBLUP}}}} } \right|\) the absolute value of the correlation between predicted breeding values of lines in selection population based on multi-environment (GBLUP) and preliminary yield trial (KBLUP) data. The adjustment was undertaken as after the marker selection the heritability estimated in the GBLUP model by (5) was reduced, yet a dynamic index with a larger weight on the GBLUP that is based on phenotypic data obtained from several years and locations was seen to be beneficial.

Selection decision inferences and a one-year selection experiment

After this comparison between selection methods in terms of prediction accuracy we continued by studying their influence on actual selection decisions. An appropriate selection decision by either conventional phenotypic, genomic or genomic assisted selection could be made if lines from preliminary yield trials that are predicted to be among the highest performing lines would also show a superior performance in multi-environment trials. We recorded thus the 5–50% of lines from each training population combination (Fig S2) that were predicted to be among the highest and lowest performing ones by the different selection methods. A comparison was then made whether the conventional phenotypic, genomic or genomic assisted selection approach correctly identified the actual highest and lowest performing lines with a higher frequency averaged over all training by selection population combinations.

Finally, a selection experiment was conducted to test the efficiency of genomic selection compared to conventional phenotypic selection. A set of 60 lines was purely genomically selected in 2013, while the involved wheat breeder selected 70 lines using all available phenotypic information from preliminary yield trials and beyond without genomic information. Among the 60 genomically selected lines 10 lines were chosen for their excellent predicted grain yield, whereas the other 50 were advanced due to superior predicted performance based on a genomic selection index that took grain yield, protein yield as well as fusarium head blight and stripe rust resistance into account (Ametz 2015). The tested set was completed by the five worst performing lines according to the genomic selection index and 31 randomly sampled lines, which were all retested in the multi-environment trials of 2014.


Maximizing the phenotypic variance of the training population

We found a classical relationship of higher prediction accuracy with increasing training population size using the 6-years as folds for cross-validation, while this effect was more pronounced for protein content than grain yield (Fig. 1a). The benefit of maximizing the phenotypic variance by sampling the highest and lowest performing lines as training population from each year was minimal in comparison to the full training population when leaving one year out as a validation population at a time (Fig. 1b), while for the 6-fold cross-validation an average increase in prediction accuracy of 7% was observed for both traits. A prediction accuracy of r = 0.37 could be reached for example using a randomly sampled training population of 300 lines but was already surpassed when we fitted prediction models with 150 lines from the two tails of the distribution (r = 0.38).
Fig. 1

Effect of the training population design on the prediction accuracy for grain yield and protein content. The lines in the training population were either randomly sampled or taken from the tails of the distribution, while the selection population was the same set of randomly sampled lines in both designs using a 6-fold cross-validation in which the years constituted the folds (a). Leaving all lines from 1 year out as validation population sampling 20–90% of the lines from each year in the training population either randomly or with half of the lines coming again from the tails of the distribution, where the dotted horizontal line designates the average accuracy when training with the entire set of lines of the remaining 5 years (b)

The impact of the training population design was also preserved at maximal training population sizes of 300 lines where the accuracy was r = 0.55 in comparison to r = 0.53 with a random sample for predicting the protein content. Likewise, grain yield was slightly (5%) better predicted using the highest and lowest performing lines for training (r = 0.39). The mean accuracies for both sampling methods were furthermore significant different according to a Wilcoxon rank sum test (p < 0.01), thus we chose to design training populations consisting of 60 lines from each year with 30 coming from either tail of the distribution to provide a high prediction accuracy with equally sized training populations for all folds in the comparison between conventional phenotypic and genomic selection.

Predicting the performance of tested and untested lines across years

It is of foremost importance in applied plant breeding programs to select the most promising lines which should enter resource demanding multi-environment trials with a high accuracy to develop successful varieties. We accordingly assessed the correlation between the predicted performance in the year of this selection decision and the actual performance in the following year, utilizing lines that were retested in multi-environment trials 2012–2015.

Classically, lines that will enter more thoroughly testing are selected purely on the basis of phenotypic information from preliminary yield trials. A rather low average prediction accuracy of r = 0.21 was found for grain yield using this method, while the highly heritable protein content could be predicted with a reasonable accuracy of r = 0.45 (Table 1). The predictive ability of preliminary yield trials could be further enhanced by introducing a genomic relationship to estimate breeding values employing the KBLUP model. Grain yield strongly profited from this method as the accuracy increased by 50% taking the genomic relationships among lines in the unreplicated preliminary yield trials into account.
Table 1

Comparison between different selection methods by the prediction accuracy for grain yield and protein content across years, using multi-environment trials (MET), preliminary yield trials (PYT) and the genomic relationship matrix (GRM) as complementing information sources

Selection method


Information source

Prediction accuracy




Grain yield

Protein content






0.21 ± 0.09

0.45 ± 0.08






0.33 ± 0.27

0.52 ± 0.14






0.39 ± 0.07

0.50 ± 0.06

Genomic assisted





0.46 ± 0.07

0.61 ± 0.04

Genomic assisted§





0.48 ± 0.05

0.63 ± 0.04

Breeding values based on genetic relationships among lines in unreplicated preliminary yield trials

Genomic and phenotypic predictions were merged by a heritability index

§Markers were pre-selected before fitting the prediction models

Genomic selection on the other hand predicted the performance by the genetic relationship between thoroughly tested lines from multi-environment trials and the younger lines i.e., selection candidates without using any of their phenotypic records. Genomic selection was clearly superior to conventional phenotypic selection and nearly twice the accuracy (r = 0.39) could be achieved when predicting grain yield across years with the GBLUP model, whereas approximately the same accuracy was estimated using either GBLUP or KBLUP for protein content.

Both selection methods tackle though different problems: Genomic selection by the GBLUP model is predicting untested lines in untested years with high quality information, while the enhanced phenotypic selection by KBLUP is predicting preliminary tested lines in untested years. Merging the information sources by a heritability index gave a strong advantage over both methods alone, which was 18 and 40% over the GBLUP and KBLUP, respectively, for the low heritable trait grain yield. Even the highly heritable and well predicted protein content benefitted from using this genomic assisted selection approach, resulting in an average prediction accuracy of r = 0.61 which was 18–22% better than either the best phenotypic or genomic selection model.

Most astonishing though was the advantage over the conventional phenotypic selection (BLUP). With a prediction accuracy of r = 0.46 genomic assisted selection was 119% higher than conventional phenotypic selection for grain yield and gave with r = 0.61 also 36% more accurate predictions for the future performance of lines with respect to their protein content. Additionally, this approach gave a higher stability of the prediction accuracy than pure genomic selection by GBLUP as reflected by the lower standard error, and thus narrower confidence interval (Table 1).

Prior knowledge from preliminary yield trials gave furthermore the opportunity for a pre-selection of markers associated with the trait of interest in the selection population. Estimation of marker effects by RR-BLUP for both multi-environment and preliminary yield trials separately revealed that around 50% of the marker effects changed their sign between both models, and thus putatively introduced noise when predicting GEBVs (Fig. 2). Removing these markers from the computation of the genomic relationship matrix gave an additional slight increase in prediction accuracy when employing a genomic assisted selection (Table 1).
Fig. 2

Marker effect estimates before (grey) and after (red) pre-selection of markers. Marker effects were scaled and centered to allow a comparison between different training by selection population combinations

Interestingly, we found though merely an advantage for pre-selecting markers when it was conducted before fitting GBLUP models but not for the KBLUP which utilized phenotypic records from preliminary yield trials. A noteworthy observation was that after refitting RR-BLUP models with pre-selected markers, some marker effects still showed a change of sign (Fig. 2). Nevertheless, this percentage of putatively noisy markers decreased to 10% resulting in a majority of markers to estimate effects in the same direction.

Genomic assisted selection with additional marker selection also turned out to be a robust approach, which gave constantly higher prediction accuracy than pure genomic selection for all validation by training population combinations (Fig. 3). According to a Wilcoxon rank sum test, the average prediction accuracy of this approach was also significantly higher both for grain yield (p < 0.05) and protein content (p < 0.01) than what could be achieved by predicting with standard GBLUP alone.
Fig. 3

Comparison between the prediction accuracy of genomic and genomic assisted selection for every training by selection population combination to predict grain yield and protein content across years

The influence of genomic assisted selection on selection decisions

The observed high and robust prediction accuracy of the genomic selection approaches promised a reasonably good identification of the highest performing lines in preliminary yield trials for further testing in multi-environment trials. We tested this prospect by examining whether or not the best

10–50% lines according to their prediction were indeed among the best in multi-environment trials. Genomic selection did especially well in this scenario at high selection intensities as applied in typical line breeding schemes and could be improved using a genomic assisted selection with marker selection (Fig. 4).
Fig. 4

Proportion of correctly selected best and worst performing lines with respect to grain yield by conventional phenotypic selection (BLUP), genomic selection (GBLUP) and genomic assisted selection with pre-selected markers (FULL) at varying selection intensity

Assuming a breeder would select the best 200 from a total population of 1000 lines (20%), approximately 60 (30%) of these are correctly identified by conventional phenotypic selection but 90 (45%) by genomic assisted selection following the estimates in this study. It is moreover of interest to be informed about the worst lines to discard them by negative selection. This scenario gave nearly orthogonal results to the characterization of the highest performing lines, and the ability to identify the lines from the lower tail of the distribution was verified by the selection experiment (Fig. 5).
Fig. 5

Performance of lines chosen by different selection methods in the selection experiment during the vegetation period 2014

Conventional phenotypic selection by the breeder and the genomic selection index performed equally well and surpassed the grain yield of randomly selected lines by 3 dt ha−1, which corresponded to a 3% gain by selection. This could be achieved even though the selection index gave a large weight to protein yield i.e., a trait with low prediction accuracy (Michel et al. 2016). Aside from grain yield, the breeder took also a multitude of morphological, quality as well as disease resistance traits into account that are associated with high and stable performance of the selected lines.


This study focused on the prospect of enhancing the efficiency of selection decisions by implementing genomic selection into line breeding schemes. Integrating phenotypic information from preliminary yield trials into the genomic selection framework was combined with a posteriori training population design and resulted in a superior genomic assisted selection. The practical application in commercial bread wheat served as a representative example of this new selection approach.

A two-tailed training population design

A main driving force of prediction accuracy in genomic selection is the relationship between training and selection population (Clark et al. 2012; Habier et al. 2013; Wientjes et al. 2013). Accordingly, genomic selection is expected to give more accurate predictions if lines included in the training population are closely related to (Asoro et al. 2011; Lehermeier et al. 2014; Lorenz and Smith 2015) or even come from the same population as the selection candidates (Windhausen et al. 2012; Charmet et al. 2014). The underlying population structure can be readily deciphered when multiple large bi-parental populations (Heffner et al. 2011a; Schulz-Streeck et al. 2012; Riedelsheimer et al. 2013; Lehermeier et al. 2014) or larger heterotic groups (Technow et al. 2013; Lehermeier et al. 2014; Spindel et al. 2015) are directly involved in the development of varietal candidates. Training and selection populations in line breeding schemes on the other hand, are usually pre-selected by usage of the pedigree method resulting in small families with varying degree of relatedness. Furthermore, breeders frequently introgress foreign material in their breeding pools and lines are often derived by crosses between introduced and their own germplasm, resulting in an unclear population structure in such mixed line breeding populations (Sallam et al. 2015; He et al. 2016; Michel et al. 2016). Simulation (Habier et al. 2013) and empirical (Lorenz and Smith 2015) studies clearly showed that adding distant relatives to prediction models can have detrimental effects on the accuracy, thus there is serious need for an appropriate training population design to achieve high prediction accuracies with genomic selection in line breeding.

A straightforward approach is the maximization of genetic diversity in the training population on the basis of marker data, which additionally enables to choose a subset of lines before phenotyping and saving costs for field trials (Huang et al. 2013). While this method is applicable to various genomic studies, the choice by the average expected reliability of contrast of lines (CDmean) was especially recommended for genomic selection (Rincent et al. 2012). It was further fine-tuned by Isidro et al. (2015) who integrated breeders’ knowledge about the population structure into their choice of training populations. These approaches as well as the usage of a genetic algorithm based on reliability measures (Akdemir et al. 2015) have shown superior performance for a multitude of traits and crops in comparison to randomly choosing a training population (Rincent et al. 2012; Akdemir et al. 2015; Isidro et al. 2015; Rutkoski et al. 2015a; Tayeh et al. 2015). Marulanda et al. (2015) finally compared more than 21 indices corresponding to eight factors putatively correlated with prediction accuracy in a vast simulation study and found the phenotypic variance to be a major criterion for training population design. Hence, picking individuals from a two-tailed distribution to maximize the phenotypic variance as suggested by Isidro et al. (2015) seems to be a very suitable training population design strategy which was empirically verified in this study.

Notwithstanding, designing training populations a priori based on phenotypic variance might be difficult if the breeding material was not thoroughly tested yet. Moreover, in applied line breeding programs the major goal is to develop new and better performing varieties irrespective of any prediction accuracies. Selecting a posteriori training populations from the numerous potential line varieties in advanced generations might for this reason be a more convenient strategy. Such training populations should preferably include well phenotyped lines that are related to the current selection population and come from both tails of the distribution to ensure a large phenotypic variance. We also recommend to specifically tailoring them for each trait of interest separately, a procedure which is readily realized as the necessary phenotypic data is most cases already available. Even though the beneficial effect of a higher prediction accuracy due to a large phenotypic variance might diminish with increasing training population sizes (Marulanda et al. 2015), models will be computational less burdening but at the same time keeping a high prediction accuracy. Likewise, a two-tailed training population design could guide the choice which lines with historical phenotypic data should be sent to genotyping and might be very useful if few phenotypic records are available for labor-intensive and costly traits such as brewing quality in barley (Schmidt et al. 2015).

Attention should nevertheless be taken if selection is conducted before training populations are built, a common situation in all plant breeding programs that can lead to a strong bias in prediction accuracy of genomic selection approaches (Zhao et al. 2012). The accompanied loss in prediction accuracy could be substantial when carrying out unidirectional selection (Zhao et al. 2012) but usually a broad range of products is developed in line breeding; so even though the population mean is shifted upwards when going into the phase of testing experimental varieties in multi-environment trials a lot of variance from preliminary yield trials is still kept (Fig S1).

Merging conventional phenotypic and genomic selection

One of the most critical decisions in variety development is the selection of lines that should enter multi-environment trials. The limited phenotypic data that are available for this purpose in early generations led to the suggestion of supporting conventional phenotypic selection by marker assisted selection (Knapp 1998; Lande and Thompson 1990). The implementation of classical marker assisted selection was, however, of limited success for quantitatively inherited traits that are controlled by many loci, while with the advent of genomic selection handling these complex genetic architectures became a much more feasible task in recent years (Jannink et al. 2010; Crossa et al. 2014; Heslot et al. 2015). Although genomic selection has been found to be superior to conventional phenotypic selection and gave outstanding results in several selection experiments (Combs and Bernardo 2013; Beyene et al. 2015; Rutkoski et al. 2015b), genomic predictions rely strongly on genetic relationships and not on physical measurements on the selection candidates.

Hence, preliminary yield trials have the clear advantage of generating solid phenotypic data of which quality can be strongly improved by modeling genetic relationships among the tested lines (Endelman et al. 2014). Integrating pedigree or marker data into the estimation of breeding values has been shown to achieve much higher accuracies when selecting already phenotyped lines in several scenarios (Bauer et al. 2006; Oakey et al. 2007a; Viana et al. 2010; Endelman et al. 2014; Cowling et al. 2015), and was accordingly a very valuable option for enhancing the prediction of line performance across years in this study. The usage of this enhanced phenotypic data from preliminary yield trials for estimating breeding values tackled the problem of predicting tested lines in untested years, while genomic selection usually addresses the more challenging problem of predicting untested lines in untested years.

Merging the before-mentioned merits of genomic selection based on high quality phenotypic data from multi-environment trials with phenotypic selection in preliminary yield trials resulted in a genomic assisted selection that performed much better than either phenotypic or genomic selection alone. The benefits of this approach have also been indicated in bi-parental maize populations for predicting phenotyped doubled haploid lines across years (Lorenz 2013; Riedelsheimer and Melchinger 2013). Krchov et al. (2015) could empirically verify these prospects by combining genomic predictions and phenotypic records with the index weights suggested by Lande and Thompson (1990) for a more accurate prediction of grain yield and moisture in maize hybrids across years. A simple heritability index gave a 12% higher prediction accuracy than the former suggested method in our study, most likely as the additional modeling of a genomic relationship matrix significantly improved the phenotypic data from the preliminary yield trials. The attained genomic assisted selection method resulted furthermore in a higher prediction accuracy for both grain yield and protein content than the other selection approaches, highlighting its superior ability to address the complex inheritance of both low and high heritable traits.

Various marker selection approaches have been proposed for taking the genetic architecture of such traits into account (Heslot et al. 2012; Ogutu et al. 2012; Resende et al. 2012). These efforts are often obstructed by different genetic backgrounds (Schulz-Streeck et al. 2011, 2012) and linkage phase change between the training and selection population (Riedelsheimer et al. 2013; Lorenz and Smith 2015). The incorporation of preliminary yield trials into the genomic selection framework could promote a more targeted pre-selection of marker sets due to prior knowledge of the genetic variation in different selection populations. Hence, we tried to tailor the set of markers fitting the population of selection candidates to account for these altering genetic backgrounds by dropping markers whose effect changed in sign between the training and selection population. Although, we suggest here a rather rough approach that dropped half the markers from the corresponding matrix, the direct pulling of information from preliminary yield trials gave a high and stable average prediction accuracy in combination with genomic assisted selection.

High and stable prediction accuracies are obviously desirable but often very difficult to acquire due to the presence of huge genotype by environment interactions in plant breeding. The prediction of individual trials or locations across years is an especially difficult task (Dawson et al. 2013) and we observed a large variation in prediction accuracy for this undertaking in our study (Fig S3), fitting the results of other studies with autogamous crops (Heslot et al. 2014; Lado et al. 2016). Once multi-environment trials are being conducted, more options open up for enhancing the selection of variety parents like imputing untested lines in tested locations (Burgueño et al. 2011; Jarquín et al. 2014; Crossa et al. 2016; Lopez-Cruz et al. 2015) or enhancing the reliability of breeding values by a relationship matrix (Bauer et al. 2006; Oakey et al. 2007b; Bauer et al. 2009; Müller et al. 2015). Hence, predicting lines for the entire target population of environments might be a better strategy to select candidates that should enter multi-environment trials. These multi-environment trials could afterwards guide selection decisions in breeding for local adaptation to specific regions and variety registration.

Genomic assisted selection for more sophisticated breeder´s decisions

The chance of selecting the highest performing lines for multi-environment trials was much higher by genomic selection than conventional phenotypic selection in our study, and could be further increased by implementing genomic assisted selection. Depending on the breeding scheme it has been suggested to conduct positive genomic selection for the best lines (Bassi et al. 2016) or discarding the worst lines by negative selection (Longin et al. 2015), while we observed no difference of any genomic selection approach to correctly identify lines from either tail of the distribution. Nevertheless, these considerations are valid for single traits only and it is generally not recommended to sequentially select for one trait after another as a lower gain in selection is expected by such tandem selection (Hazel and Lush 1942). Different multivariate models have been developed to take this problem of simultaneous selection for several traits at the same time into account (Bauer and Léon 2008; Viana et al. 2010; Jia and Jannink 2012).

A computational less demanding alternative could be the usage of genomic selection indices (Ceron-Rojas et al. 2015; Schulthess et al. 2015), and even a simple index based on grain yield, protein content and disease resistance gave a similar gain as conventional phenotypic selection by the breeder in our selection experiment. Genomic selection approaches are thus enabling more sophisticated selection decisions but the knowledge and experience of breeders is still the best guarantee for success, while genomic selection indices can be an additional tool to ease their decisions given the multitude of traits to consider.


This study showed the strong advantage of genomic selection over conventional phenotypic selection in line breeding schemes on the example of bread wheat. The advantage was further enhanced by a posteriori selecting a training population that maximized the phenotypic variance and the integration phenotypic information from preliminary yield trials into the genomic selection framework. Conducting preliminary yield trials is a common procedure in most line breeding programs, thus we suggested exploiting their information by merging phenotypic and genomic selection for genomic assisted selection. The easy to implement and robust genomic assisted selection gave a higher prediction accuracy than either one of the other methods alone and allowed a more sophisticated selection decision with regard to lines entering multi-environment trials. The proposed method took the complex inheritance of both low and high heritable traits into account and could support breeders in developing varieties that preferably combine high yield, quality, disease resistance and tolerance against abiotic stresses.

Author contribution statement

SM wrote the manuscript, SM and CA analyzed the data. HGR supported in the statistical analysis. FL, DE, BA and HGU designed the field trials and collected the phenotypic data. FL and HB initiated and guided through the study. All authors read and approved the final manuscript.



Open access funding provided by University of Vienna. We like to thank Maria Bürstmayr and her team for the tremendous work when extracting the DNA of several hundred wheat lines each year as well as Herbert Hetzendorfer for managing the collection of the phenotypic data. This research was funded by the EU Eurostars projects “E! 6399 Genomic selection of wheat varieties for robustness, yield and quality” and “E! 8959 Genomic selection for nitrogen use efficiency in wheat”. We thank the anonymous reviewers for their valuable comments and suggestions for improving the manuscript.

Compliance with ethical standards

Conflict of interest

The authors declare no conflict of interest.

Ethical standard

The authors declare that the experiments comply with the current laws of Austria.

Supplementary material

122_2016_2818_MOESM1_ESM.pdf (136 kb)
Fig. S1 Variation of grain yield and protein content in preliminary yield trials 2010–2014. (PDF 135 kb)
122_2016_2818_MOESM2_ESM.pdf (65 kb)
Fig. S2 Cross-validation scheme used for comparing the different selection methods. Genomic selection models were fitted with training populations of 180 lines, where 60 lines of this training population came from 3 different years (green). Phenotypic and genomic assisted selection included additional data from the year of a preliminary yield trial (orange). All models were validated with a validation population of lines retested in multi-environment trials following the year of a preliminary yield trial (red). (PDF 64 kb)
122_2016_2818_MOESM3_ESM.pdf (67 kb)
Fig. S3 Comparison between the prediction accuracy of genomic and genomic assisted selection for every training by selection population combination to predict grain yield and protein content of individual trials across years. (PDF 67 kb)


  1. Akdemir D, Sanchez JI, Jannink J-L (2015) Optimization of genomic selection training populations with a genetic algorithm. Genet Sel Evol 47:38. doi: 10.1186/s12711-015-0116-6 CrossRefPubMedPubMedCentralGoogle Scholar
  2. Albrecht T, Auinger HJ, Wimmer V et al (2014) Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years. Theor Appl Genet 127:1375–1386. doi: 10.1007/s00122-014-2305-z CrossRefPubMedGoogle Scholar
  3. Ametz C (2015) Genomic selection in bread wheat. Dissertation, University of Natural Resources and Life Sciences, Vienna, AustriaGoogle Scholar
  4. Arruda MP, Brown PJ, Lipka AE et al (2015) Genomic selection for predicting fusarium head blight resistance in a wheat breeding program. Plant Genome 8:1–12. doi: 10.3835/plantgenome2015.01.0003 CrossRefGoogle Scholar
  5. Asoro FG, Newell M, Beavis WD et al (2011) Accuracy and training population design for genomic selection on quantitative traits in elite North American oats. Plant Genome 4:132–144. doi: 10.3835/plantgenome2011.02.0007 CrossRefGoogle Scholar
  6. Bassi FM, Bentley AR, Charmet G et al (2016) Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.). Plant Sci 242:23–36. doi: 10.1016/j.plantsci.2015.08.021 CrossRefPubMedGoogle Scholar
  7. Bauer AM, Léon J (2008) Multiple-trait breeding values for parental selection in self-pollinating crops. Theor Appl Genet 116:235–242. doi: 10.1007/s00122-007-0662-6 CrossRefPubMedGoogle Scholar
  8. Bauer AM, Reetz TC, Léon J (2006) Estimation of breeding values of inbred lines using best linear unbiased prediction (BLUP) and genetic similarities. Crop Sci 46:2685–2691. doi: 10.2135/cropsci2006.01.0019 CrossRefGoogle Scholar
  9. Bauer AM, Hoti F, Reetz TC et al (2009) Bayesian prediction of breeding values by accounting for genotype-by-environment interaction in self-pollinating crops. Genet Res 91:193–207. doi: 10.1017/S0016672309000160 CrossRefGoogle Scholar
  10. Beyene Y, Semagn K, Mugo S et al (2015) Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci 55:154–163. doi: 10.2135/cropsci2014.07.0460 CrossRefGoogle Scholar
  11. Burgueño J, Cadena A, Crossa J (2000) User’s guide for spatial analysis of field variety trials using Asreml. CIMMYT, MexicoGoogle Scholar
  12. Burgueño J, Crossa J, Cotes JM et al (2011) Prediction assessment of linear mixed models for multienvironment trials. Crop Sci 51:944–954. doi: 10.2135/cropsci2010.07.0403 CrossRefGoogle Scholar
  13. Burstin J, Salloignon P, Martinello M et al (2015) Genetic diversity and trait genomic prediction in a pea diversity panel. BMC Genome 16:1–17. doi: 10.1186/s12864-015-1266-1 CrossRefGoogle Scholar
  14. Ceron-Rojas JJ, Crossa J, Arief VN et al (2015) A genomic selection index applied to simulated and real data G3(5):2155–2164. doi: 10.1534/g3.115.019869 Google Scholar
  15. Charmet G, Storlie E, Oury FX et al (2014) Genome-wide prediction of three important traits in bread wheat. Mol Breed 34:1843–1852. doi: 10.1007/s11032-014-0143-y CrossRefPubMedPubMedCentralGoogle Scholar
  16. Clark SA, Hickey JM, Daetwyler HD, van der Werf JH (2012) The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet Sel Evol 44:4. doi: 10.1186/1297-9686-44-4 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Combs E, Bernardo R (2013) Genomewide selection to introgress semidwarf maize germplasm into U.S. Corn Belt inbreds. Crop Sci 53:1427–1436. doi: 10.2135/cropsci2012.11.0666 CrossRefGoogle Scholar
  18. Cowling WA, Stefanova KT, Beeck CP et al (2015) Using the animal model to accelerate response to selection in a self-pollinating crop G3(5):1–43. doi: 10.1534/g3.115.018838 Google Scholar
  19. Crossa J, Pérez P, Hickey J et al (2014) Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112:48–60. doi: 10.1038/hdy.2013.16 CrossRefPubMedGoogle Scholar
  20. Crossa J, de los Campos G, Maccaferri M et al (2016) Extending the Marker × environment interaction model for genomic-enabled prediction and genome-wide association analysis in durum wheat. Crop Sci 56:1–17. doi: 10.2135/cropsci2015.04.0260 CrossRefGoogle Scholar
  21. Cullis BR, Smith AB, Coombes NE (2006) On the design of early generation variety trials with correlated data. J Agri Biol Envir Stat 11:381–393. doi: 10.1198/108571106X154443 CrossRefGoogle Scholar
  22. Daetwyler HD, Bansal UK, Bariana HS et al (2014) Genomic prediction for rust resistance in diverse wheat landraces. Theor Appl Genet 127:1795–1803. doi: 10.1007/s00122-014-2341-8 CrossRefPubMedGoogle Scholar
  23. Dawson JC, Endelman JB, Heslot N et al (2013) The use of unbalanced historical data for genomic selection in an international wheat breeding program. F Crop Res 154:12–22. doi: 10.1016/j.fcr.2013.07.020 CrossRefGoogle Scholar
  24. Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255. doi: 10.3835/plantgenome2011.08.0024 CrossRefGoogle Scholar
  25. Endelman JB, Jannink JL (2012) Shrinkage estimation of the realized relationship matrix G3(2):1405–1413. doi: 10.1534/g3.112.004259 Google Scholar
  26. Endelman JB, Atlin GN, Beyene Y et al (2014) Optimal design of preliminary yield trials with genome-wide markers. Crop Sci 54:48–59. doi: 10.2135/cropsci2013.03.0154 CrossRefGoogle Scholar
  27. Habier D, Fernando RL, Garrick DJ (2013) Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics 194:597–607. doi: 10.1534/genetics.113.152207 CrossRefPubMedPubMedCentralGoogle Scholar
  28. Hazel LN, Lush JL (1942) The efficiency of three methods of selection. J Hered 33:393–399Google Scholar
  29. He S, Schulthess AW, Mirdita V et al (2016) Genomic selection in a commercial winter wheat population. Theor Appl Genet 129:641–651. doi: 10.1007/s00122-015-2655-1 CrossRefPubMedGoogle Scholar
  30. Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breeding with genomic selection: gain per unit time and cost. Crop Sci 50:1681–1690. doi: 10.2135/cropsci2009.11.0662 CrossRefGoogle Scholar
  31. Heffner EL, Jannink J, Sorrells ME (2011a) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4:65–75. doi: 10.3835/plantgenome2010.12.0029 CrossRefGoogle Scholar
  32. Heffner EL, Jannink JL, Iwata H et al (2011b) Genomic selection accuracy for grain quality traits in biparental wheat populations. Crop Sci 51:2597–2606. doi: 10.2135/cropsci2011.05.0253 CrossRefGoogle Scholar
  33. Heslot N, Yang H-P, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160. doi: 10.2135/cropsci2011.09.0297 CrossRefGoogle Scholar
  34. Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet 127:463–480. doi: 10.1007/s00122-013-2231-5 CrossRefPubMedGoogle Scholar
  35. Heslot N, Janinnk JL, Sorrells ME (2015) Perspectives for genomic selection applications and research in plants. Crop Sci 55:1–30. doi: 10.2135/cropsci2014.03.0249 CrossRefGoogle Scholar
  36. Hofheinz N, Borchardt D, Weissleder K, Frisch M (2012) Genome-based prediction of test cross performance in two subsequent breeding cycles. Theor Appl Genet 125:1639–1645. doi: 10.1007/s00122-012-1940-5 CrossRefPubMedGoogle Scholar
  37. Huang BE, Clifford D, Cavanagh C (2013) Selecting subsets of genotyped experimental populations for phenotyping to maximize genetic diversity. Theor Appl Genet 126:379–388. doi: 10.1007/s00122-012-1986-4 CrossRefGoogle Scholar
  38. Isidro J, Jannink J-L, Akdemir D et al (2015) Training set optimization under population structure in genomic selection. Theor Appl Genet 128:145–158. doi: 10.1007/s00122-014-2418-4 CrossRefPubMedGoogle Scholar
  39. Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genom 9:166–177. doi: 10.1093/bfgp/elq001 CrossRefGoogle Scholar
  40. Jarquín D, Crossa J, Lacaze X et al (2014) A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127:595–607. doi: 10.1007/s00122-013-2243-1 CrossRefPubMedGoogle Scholar
  41. Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192:1513–1522. doi: 10.1534/genetics.112.144246 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Knapp SJ (1998) Marker-assisted selection as a strategy for increasing the probability of selecting superior genotypes. Crop Sci 38:1164–1174. doi:0.2135/cropsci1998.0011183X003800060055xCrossRefGoogle Scholar
  43. Krchov L-M, Gordillo GA, Bernardo R (2015) Multienvironment validation of the effectiveness of phenotypic and genomewide selection within biparental maize populations. Crop Sci 55:1068–1075. doi: 10.2135/cropsci2014.09.0608 CrossRefGoogle Scholar
  44. Lado B, Barrios PG, Quincke M et al (2016) Modeling genotype × environment interaction for genomic selection with unbalanced data from a wheat breeding program. Crop Sci 56:1–15. doi: 10.2135/cropsci2015.04.0207 CrossRefGoogle Scholar
  45. Lande R, Thompson R (1990) Efficiency of multistage marker-assisted selection in the improvement of multiple quantitative traits. Genetics 124:743–756PubMedPubMedCentralGoogle Scholar
  46. Lehermeier C, Krämer N, Bauer E et al (2014) Usefulness of multi-parental populations of maize (Zea mays L.) for genome-based prediction. Genetics 198:3–16. doi: 10.1534/genetics.114.161943 CrossRefPubMedPubMedCentralGoogle Scholar
  47. Longin CFH, Mi X, Würschum T (2015) Genomic selection in wheat: optimum allocation of test resources and comparison of breeding strategies for line and hybrid breeding. Theor Appl Genet. doi: 10.1007/s00122-015-2505-1 Google Scholar
  48. Lopez-Cruz M, Crossa J, Bonnett D et al (2015) Increased prediction accuracy in wheat breeding trials using a marker x environment interaction genomic selection model G3(5):569–582. doi: 10.1534/g3.114.016097 Google Scholar
  49. Lorenz AJ (2013) Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: a simulation experiment G3(3):481–491. doi: 10.1534/g3.112.004911 Google Scholar
  50. Lorenz AJ, Smith KP (2015) Adding genetically distant individuals to training populations reduces genomic prediction accuracy in Barley. Crop Sci 55:2657–2667. doi: 10.2135/cropsci2014.12.0827 CrossRefGoogle Scholar
  51. Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization of genomic selection for Fusarium head blight resistance in six-row barley. Crop Sci 52:1609–1621. doi: 10.2135/cropsci2011.09.0503 CrossRefGoogle Scholar
  52. Marulanda JJ, Melchinger AE, Würschum T (2015) Genomic selection in biparental populations: assessment of parameters for optimum estimation set design. Plant Breed 134:623–630. doi: 10.1111/pbr.12317 CrossRefGoogle Scholar
  53. Marulanda JJ, Mi X, Melchinger AE et al (2016) Optimum breeding strategies using genomic selection for hybrid breeding in wheat, maize, rye, barley, rice and triticale. Theor Appl Genet. doi: 10.1007/s00122-016-2748-5 PubMedGoogle Scholar
  54. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829. doi:11290733PubMedPubMedCentralGoogle Scholar
  55. Michel S, Ametz C, Gungor H et al (2016) Genomic selection across multiple breeding cycles in applied bread wheat breeding. Theor Appl Genet 129:1179–1189. doi: 10.1007/s00122-016-2694-2 CrossRefPubMedPubMedCentralGoogle Scholar
  56. Möhring J, Piepho HP (2009) Comparison of weighting in two-stage analysis of plant breeding trials. Crop Sci 49:1977–1988. doi: 10.2135/cropsci2009.02.0083 CrossRefGoogle Scholar
  57. Müller D, Technow F, Melchinger AE (2015) Shrinkage estimation of the genomic relationship matrix can improve genomic estimated breeding values in the training set. Theor Appl Genet 128:693–703. doi: 10.1007/s00122-015-2464-6 CrossRefPubMedGoogle Scholar
  58. Oakey H, Verbyla AP, Cullis BR et al (2007a) Joint modeling of additive and non-additive (genetic line) effects in single field trials. Theor Appl Genet 114:1319–1332. doi: 10.1007/s00122-007-0515-3 CrossRefPubMedGoogle Scholar
  59. Oakey H, Verbyla AP, Cullis BR et al (2007b) Joint modeling of additive and non-additive (genetic line) effects in multi-environment trials. Theor Appl Genet 114:1319–1332. doi: 10.1007/s00122-007-0515-3 CrossRefPubMedGoogle Scholar
  60. Ogutu JO, Schulz-Streeck T, Piepho H-P (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc 6:S10. doi: 10.1186/1753-6561-6-S2-S10 CrossRefPubMedPubMedCentralGoogle Scholar
  61. Ornella L, Singh S, Perez P et al (2012) Genomic prediction of genetic values for resistance to wheat rusts. The Plant Genome 5:136–148. doi: 10.3835/plantgenome2012.07.0017 CrossRefGoogle Scholar
  62. Piepho H, Möhring J (2007) Computing heritability and selection response from unbalanced plant breeding trials. Genetics 177:1881–1888. doi: 10.1534/genetics.107.074229 CrossRefPubMedPubMedCentralGoogle Scholar
  63. Poland J, Endelman J, Dawson J et al (2012) Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Genome 5:103–113. doi: 10.3835/Plantgenome2012.06.0006 CrossRefGoogle Scholar
  64. R development core team (2016) R: a language and environment for statistical computing. Accessed 7 Nov 2016
  65. Resende MFR, Munoz P, Resende MDV et al (2012) Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 190:1503–1510. doi: 10.1534/genetics.111.137026 CrossRefPubMedPubMedCentralGoogle Scholar
  66. Riedelsheimer C, Melchinger AE (2013) Optimizing the allocation of resources for genomic selection in one breeding cycle. Theor Appl Genet 126:2835–2848. doi: 10.1007/s00122-013-2175-9 CrossRefPubMedGoogle Scholar
  67. Riedelsheimer C, Endelman JB, Stange M et al (2013) Genomic predictability of interconnected biparental maize populations. Genetics 194:493–503. doi: 10.1534/genetics.113.150227 CrossRefPubMedPubMedCentralGoogle Scholar
  68. Rincent R, Laloë D, Nicolas S et al (2012) Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics 192:715–728. doi: 10.1534/genetics.112.141473 CrossRefPubMedPubMedCentralGoogle Scholar
  69. Rutkoski J, Singh RP, Huerta-Espino J et al (2015a) Efficient use of historical data for genomic selection: a Case study of stem rust resistance in wheat. Plant Genome 8:1. doi: 10.3835/plantgenome2014.09.0046 Google Scholar
  70. Rutkoski J, Singh RP, Huerta-Espino J et al (2015b) Genetic gain from phenotypic and genomic selection for quantitative resistance to stem rust of wheat. Plant Genome 8:2. doi: 10.3835/plantgenome2014.10.0074 Google Scholar
  71. Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW (1984) Ribosomal DNAsepacer-length polymorphism in barley: mendelian inheritance, chromosomal location, and population dynamics. PNAS 81:8014–8019. doi: 10.1073/pnas.81.24.8014 CrossRefPubMedPubMedCentralGoogle Scholar
  72. Sallam AH, Endelman JB, Jannink JL, Smith KP (2015) Assessing genomic selection prediction accuracy in a dynamic barley breeding population. Plant Genome 8:1. doi: 10.3835/plantgenome2014.05.0020 CrossRefGoogle Scholar
  73. Schmidt M, Kollers S, Maasberg-Prelle A et al (2015) Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet 129:1–11. doi: 10.1007/s00122-015-2639-1 Google Scholar
  74. Schulthess AW, Wang Y, Miedaner T et al (2015) Multiple-trait- and selection indices-genomic predictions for grain yield and protein content in rye for feeding purposes. Theor Appl Genet 129:273–287. doi: 10.1007/s00122-015-2626-6 CrossRefPubMedGoogle Scholar
  75. Schulz-Streeck T, Ogutu JO, Piepho H-P (2011) Pre-selection of markers for genomic selection. BMC Proc 5(Suppl 3):S12. doi: 10.1186/1753-6561-5-S3-S12 CrossRefPubMedPubMedCentralGoogle Scholar
  76. Schulz-Streeck T, Ogutu JO, Karaman Z et al (2012) Genomic selection using multiple populations. Crop Sci 52:2453–2461. doi: 10.2135/cropsci2012.03.0160 CrossRefGoogle Scholar
  77. Spindel J, Begum H, Akdemir D et al (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11:1–25. doi: 10.1371/journal.pgen.1004982 Google Scholar
  78. Tayeh N, Klein A, Le Paslier M-C et al (2015) Genomic prediction in pea: effect of marker density and training population size and composition on prediction accuracy. Front Plant Sci 6:1–11. doi: 10.3389/fpls.2015.00941 Google Scholar
  79. Technow F, Bürger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups G3(3):197–203. doi: 10.1534/g3.112.004630 Google Scholar
  80. Viana JMS, Sobreira FM, De Resende MDV, Faria VR (2010) Multi-trait BLUP in half-sib selection of annual crops. Plant Breed 129:599–604. doi: 10.1111/j.1439-0523.2009.01745.x CrossRefGoogle Scholar
  81. Wientjes YCJ, Veerkamp RF, Calus MPL (2013) The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction. Genetics 193:621–631. doi: 10.1534/genetics.112.146290 CrossRefPubMedPubMedCentralGoogle Scholar
  82. Windhausen VS, Atlin GN, Hickey JM et al (2012) Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3(2):1427–1436. doi: 10.1534/g3.112.003699 Google Scholar
  83. Zhao Y, Gowda M, Longin FH et al (2012) Impact of selective genotyping in the training population on accuracy and bias of genomic selection. Theor Appl Genet 125:707–713. doi: 10.1007/s00122-012-1862-2 CrossRefPubMedGoogle Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  • Sebastian Michel
    • 1
  • Christian Ametz
    • 2
  • Huseyin Gungor
    • 3
    • 4
  • Batuhan Akgöl
    • 3
  • Doru Epure
    • 5
  • Heinrich Grausgruber
    • 6
  • Franziska Löschenberger
    • 2
  • Hermann Buerstmayr
    • 1
    Email author
  1. 1.Department for Agrobiotechnology (IFA-Tulln), Institute for Biotechnology in Plant ProductionUniversity of Natural Resources and Life Sciences, Vienna (BOKU)TullnAustria
  2. 2.Saatzucht Donau GesmbH and CoKGProbstdorfAustria
  3. 3.ProGen Seed A.ŞAntakyaTurkey
  4. 4.Faculty of Agriculture and Natural Sciences, Department of Field CropsUniversity of DüzceDüzceTurkey
  5. 5.Probstdorfer Saatzucht Romania SRLBucharestRomania
  6. 6.Plant Breeding Division, Department of Crop ScienceUniversity of Natural Resources and Life Sciences, Vienna (BOKU)TullnAustria

Personalised recommendations