Advertisement

Robust estimation of heritability and predictive accuracy in plant breeding: evaluation using simulation and empirical data

Abstract

Background

Genomic prediction (GP) is used in animal and plant breeding to help identify the best genotypes for selection. One of the most important measures of the effectiveness and reliability of GP in plant breeding is predictive accuracy. An accurate estimate of this measure is thus central to GP. Moreover, regression models are the models of choice for analyzing field trial data in plant breeding. However, models that use the classical likelihood typically perform poorly, often resulting in biased parameter estimates, when their underlying assumptions are violated. This typically happens when data are contaminated with outliers. These biases often translate into inaccurate estimates of heritability and predictive accuracy, compromising the performance of GP. Since phenotypic data are susceptible to contamination, improving the methods for estimating heritability and predictive accuracy can enhance the performance of GP. Robust statistical methods provide an intuitively appealing and a theoretically well justified framework for overcoming some of the drawbacks of classical regression, most notably the departure from the normality assumption. We compare the performance of robust and classical approaches to two recently published methods for estimating heritability and predictive accuracy of GP using simulation of several plausible scenarios of random and block data contamination with outliers and commercial maize and rye breeding datasets.

Results

The robust approach generally performed as good as or better than the classical approach in phenotypic data analysis and in estimating the predictive accuracy of heritability and genomic prediction under both the random and block contamination scenarios. Notably, it consistently outperformed the classical approach under the random contamination scenario. Analyses of the empirical maize and rye datasets further reinforce the stability and reliability of the robust approach in the presence of outliers or missing data.

Conclusions

The proposed robust approach enhances the predictive accuracy of heritability and genomic prediction by minimizing the deleterious effects of outliers for a broad range of simulation scenarios and empirical breeding datasets. Accordingly, plant breeders should seriously consider regularly using the robust alongside the classical approach and increasing the number of replicates to three or more, to further enhance the accuracy of the robust approach.

Background

Genomic studies, whether from an association, prediction or selection perspective, constitute a field of research with increasing statistical methodological challenges given the growing complexity (population structure, coancestry, etc), dimension of datasets, measurement errors and atypical observations (outliers). Outliers often arise from atypical environments, years, field pests or other phenomena. Here, regression models are the tool of choice whether in studies involving human, animal or plant applications. However, it is well known that the performance of these models is poor when their underlying assumptions are violated and their unknown parameters are estimated by the classical likelihood [49]. For example, violation of the normality assumption – depending on its severity – may lead to both biased parameter estimates and coefficients of determination [7] and strongly interfere with variable selection [5]. In the case of the linear mixed model, such violation can tamper with the estimation of variance components [24], which itself can be very challenging even when data are normally distributed but the sample size is small. Violation of model assumptions due to contamination of data with outliers can have several other deleterious effects on regression models. In genomic association studies, for example, departure from normality can induce power loss in the detection of true associations and inflate the number of detected spurious associations [22]. In plant genomics such violations of model assumptions and the associated biases often translate into inaccurate estimates of heritability and predictive accuracy [10]. This can have significant practical consenquences because predictive accuracy is the single most important measure of the performance of genomic prediction (GP). The reduction of these adverse effects through the use of more robust methods is thus of considerable practical importance [48].

Recently, [9] proposed a method for estimating heritability and predictive accuracy simultaneously (Method 5) and compared its performance with several contending methods from the literature including a popular method in animal breeding (Method 7). More details on Methods 5 and 7 can be found in the “Genomic prediction” section. The authors concluded from these comparisons that Methods 5 and 7 consistently gave the least biased, most precise and stable estimates of predictive accuracy across all the scenarios they considered. Additionally, Method 5 gave the most accurate estimates of heritability [9]. Both methods are founded on the linear mixed effects model as well as on ridge regression best linear unbiased prediction (RR-BLUP) through a two-stage approach [3436]. The first stage of this two-stage approach involves phenotypic analysis and thus is likely to be adversely affected by contaminated phenotypic plot data. In particular, contamination can undermine the accuracy with which the adjusted means are estimated in the first stage and thus negatively impact estimation of both heritability (only Method 5) and predictive accuracy in the subsequent second stage where RR-BLUP is used [15]. Estaghvirou et al. [10] later examined the performance of the same seven methods in the presence of one outlying observation under 10 simulated contamination scenarios. These simulations reaffirmed that Methods 5 and 7 performed the best overall and produced the best estimates of both heritability (only Method 5) and predictive accuracy across all the contamination scenarios they considered. However, one outlying observation for their dataset with a sample size of 698 genotypes corresponds to a level of contamination of merely 0.1%. As stated by [10], outliers may arise in plant breeding studies from measurement errors, inherent characteristics of the studied genotypes, enviroments or even years. As the process generating the outliers may vary across locations and/or trials, it is conceivable that a non-neglegible percentage of phenotypic observations may be typically contaminated when large field trial datasets are considered. As a result, the composite effects of such substantial levels of contamination on the accuracy of methods for estimating heritability and accuracy of GP can be potentially considerable. Such outliers may not always be easy to detect and eliminate prior to phenotypic data analysis. Therefore, using robust statistical procedures for phenotypic data analysis of field trial datasets can help ameliorate the adverse effects of outliers.

Robust statistical methods have been around for a long time and are designed to be resistant to influential factors such as outlying observations, non-normality and other problems associated with model misspecification [17]. Therefore, the use of robust methods has been advocated for inference in the linear and linear mixed model setups [6, 25], as well as in ridge regression [1, 15, 26, 27, 45, 52]. As a result of such considerations and the recent advances in computing power, it is not surprising that there has been a strong, renewed interest in exploring these techniques to robustify existing methods or develop new procedures robust to moderate deviations from model specifications [24, 41].

Consequently, to tackle the problem of biased estimation of heritability and predictive accuracy due to contamination of phenotypic data with outliers, we aim to robustify the first phase of the two-stage analysis used in GP. We use a Monte-Carlo simulation study encompassing several contamination scenarios to assess the performance of the proposed robust approach relative to: (i) the approach used by [35], and (ii) simulated underlying true breeding values taken as the gold standard. These assessments are carried out at each of the two stages involved in predicting breeding values by comparing the accuracy with which the two approaches estimate true genotypic values in phenotypic analysis. In a third stage, we compare the heritabilities (H2) and predictive accuracies (PA) estimated by the two competing approaches using Method 5 (H2 and PA) and Method 7 (PA only). In addition, we compare the heritability estimated by Method 5 with the generalized heritability estimated by Oakey’s method [29]. The latter method was not evaluated by [9].

Also, an application of the methodology to real commercial maize (Zea mays) and rye (Secale sereale) datasets is presented and used to empirically assess the usefulness of the proposed robust approach. Lastly, we discuss how to effectively apply the proposed robust approach to phenotypic data analysis and the estimation of heritability and predictive accuracy of GP in plant breeding.

The robust and the classical approaches are implemented in the R software using the code in the supplementary materials (Additional file 5). The ASREML-R package is used to fit the models at the second stage.

Materials and methods

Datasets

Rye dataset: The Rye data were obtained from the KWS-LOCHOW project and is described in more detail elsewhere [2, 3]. These data consist of 150 genotypes tested between 2009 and 2011 at several locations in Germany and Poland, using α designs with two replicates and four checks (replicated two times in the two replicates). Each trial was randomized independently of the others. The field layout of some trials was not perfectly rectangular. Trials at some locations and for some years had fewer blocks but larger size, i.e., two different sizes were used for a few trials. Blocks were nested within rows in the field layout. The dataset has 16 anomalous observations pertaining to distinct genotypes, that the breeders identified as outliers. Moreover, yield was not observed for one genotype. For this example we consider two complete datasets (320 observations): the first is the original dataset without any corrections, which we call the ’raw’ dataset, and the second is the original dataset with the 16 yield observations replaced with missing values, which we refer to as the ’processed’ dataset. In addition, we consider a cleaned version of the raw dataset (288 observations; called cleaned dataset) obtained by removing from the raw data the 16 outlying genotypes (32 observations) identified by both the breeders and the criterion used for outlier detection described in the “Example application” section. We note that because the empirical rye dataset has only two replicates, a single outlier will automatically generate an outlier with the same absolute value of opposite sign for the other replicate of the same genotype. Consequently, we removed a testcross genotype entirely from the cleaned dataset even if only one of its two replicate observations was outlying. The raw, processed and cleaned datasets comprise only 148, 148 and 132 genotypes with genomic information, respectively.

Maize dataset: The maize dataset was produced by KWS in 2010 for the Synbreed Project. The data set has 1800 yield observations on 900 doubled haploid maize lines and 11,646 SNP markers. Out of the 900 test crosses 698 were genotyped whereas 202 were not. The test crosses were planted in a single location (labelled RET) on nine 10 by 10 lattices each with two replicates. Six hybrid and five line checks connected the lattices (398 observations in total). The lines were crossed with four testers. After performing quality control, the breeder recommended replacement of 38 yield observations with missing values. A more elaborate description of this maize dataset is provided in [9, 11].

For this example we consider two datasets each with 1800 yield observations: the first is the original dataset without any corrections, which we call the ’raw’ dataset, and the second one is the original dataset with the 38 yield observations replaced with missing values, which we refer to as the ’processed’ dataset. Furthermore, we consider a third dataset (called cleaned raw dataset) obtained by removing 46 outliers from the raw dataset. The fourth dataset (called the cleaned and processed dataset) is obtained by removing seven outliers from the processed dataset. All the outliers satisfied the criterion for outliers described in the “Example application” section. As with the rye dataset, we removed a testcross genotype entirely from the raw dataset if at least one of the two replicate observations was outlying. Thus, the raw, processed, cleaned raw and cleaned and processed datasets have 1800, 1754, 1800 and 1793 yield observations and 698, 687, 698 and 697 genotypes with genomic information, respectively.

Genomic prediction

True correlation

The correlation between the true (g) and the predicted (\(\mathbf {\widehat {g}}\)) breeding values (true correlation or true predictive accuracy) can be calculated from simulated data as

$$ r_{g,\widehat{g}}=\frac{s_{g,\widehat{g}}}{\sqrt{s^{2}_{g}s^{2}_{\widehat{g}}}} $$
(1)

where \(s_{g,\widehat {g}}\) is the sample covariance between the true and predicted breeding values, \(s^{2}_{g}\) and \(s^{2}_{\widehat {g}}\) are the sample variances of the true and predicted genetic breeding values, respectively. This correlation is often the quantity of primary interest in breeding studies. The simulation study therefore assesses the accuracy with which \(r_{g,\widehat {g}}\) is estimated by Methods 5 and 7, whose details are described below.

Two-stage approach for predicting breeding values

Estaghvirou et al. [9] use the two-stage approach of [35] to predict true breeding values (g) that are then used to estimate heritability and predictive accuracy. This approach is quite appealing because it greatly alleviates the computational burden of the single-stage approach [47], without compromising the accuracy of the results.

The single-stage model can be written as

$$ \mathbf{y}=\phi\mathbf{1}+\mathbf{f} $$
(2)

where y is the vector of the observed phenotypic plot values, ϕ is the general mean, f is a vector that combines all the fixed, random design and error effects (replicates, blocks, etc.). For the simulated data f has four random effects only, namely, f=Zgg+Zrur+Zbub+e where (i) Zg is the design matrix for the genotypes with \(\mathbf {g}\sim N\left (0,\mathbf {Z}_{s}\mathbf {Z}_{s}^{T}\sigma ^{2}_{s}=\tilde {\mathbf {G}}\right)\), Zs is the matrix of biallelic markers of the single nucleotide polymorphisms (SNPs), coded as −1 for genotypes AA, 1 for BB and 0 for AB or missing values and \(\sigma ^{2}_{s}\) is the variance of the marker effects; (ii) Zr is the design matrix for the replicate effects with \(\mathbf {u}_{r}\sim N\left (0,\sigma ^{2}_{r}\mathbf {I}\right)\) and \(\sigma ^{2}_{r}\) is the variance of the replicate effects; (iii) Zb is the design matrix for the block effects with \(\mathbf {u}_{b}\sim N\left (0,\sigma ^{2}_{r:b}\mathbf {I}\right)\) and \(\sigma ^{2}_{r:b}\) is the variance of the block effects; and (iv) eN(0,R) are the residual errors and R is the variance-covariance matrix of the residuals. In our model \(\mathbf {R}=\sigma ^{2}_{e}\mathbf {I}\) where \(\sigma ^{2}_{e}\) is the residual plot error variance.

The two-stage approach basically breaks this model into two models. In the first stage, which we seek to robustify, we use the model

$$ \mathbf{y}=\mathbf{X}\boldsymbol{\mu}+\tilde{\mathbf{f}} $$
(3)

where y is defined as before, X=Zg is the design matrix for the genotype means, μ=ϕ1+g is the vector of unknown genotypic means with g denoting the genetic effects or breeding values, and \(\tilde {\mathbf {f}} = \mathbf {Z}_{r}\mathbf {u}_{r} + \mathbf {Z}_{b}\mathbf {u}_{b} + \mathbf {e}\). Note that in this first stage the genomic information regarding the SNP markers \(\left (\boldsymbol {\Gamma }=\mathbf {Z}_{s}\mathbf {Z}_{s}^{T}\right)\) is excluded from this analysis because genotype means μ are modelled as fixed. This is usually the case when stage-wise approaches are considered, in which case the genomic information is included only in the last stage [35].

In the second stage, the genotype means \(\widehat {\boldsymbol {\mu }}\) estimated at the first stage are used as a response variable in a model for estimating the true breeding values g specified as

$$ {\widehat{\boldsymbol{\mu}}}=\phi\mathbf{1}+\mathbf{g}+\tilde{\mathbf{e}} $$
(4)

where ϕ is the general mean and \(\tilde {\mathbf {e}}\sim N(0,\tilde {\mathbf {R}})\) with \(\tilde {\mathbf {R}}=var(\hat {\boldsymbol {\mu }}\mid \phi, \mathbf {g})\).

Note that any standard varieties or checks are dropped from the dataset before the adjusted means (\(\hat {\boldsymbol {\mu }}\)) from the first stage are submitted to the second stage. The mixed model equations for (4) can be solved to obtain the best linear unbiased prediction for g, \(\text {BLUP}(\mathbf {g})=\widehat {\mathbf {g}}\), using a ridge-regression formulation of BLUP, i.e., RR-BLUP.

In case weights are used when fitting the second-stage model, then \(\tilde {\mathbf {R}}\) should be replaced by W−1, with W being a weight matrix computed from the estimated first-stage variance-covariance matrix \(\tilde {\mathbf {R}}\). In our case we used Smith’s [46] and standard (ordinary) [35] weights. Specifically, \(\mathbf {W}_{sm}=diag(\tilde {\mathbf {R}}^{-1})\) for Smith’s and \(\mathbf {W}_{st}=(diag(\tilde {\mathbf {R}}))^{-1}\) for standard weights, respectively.

More details on the two-stage approach can be found in [9, 35, 36].

Method 5

This method (M5) calculates predictive accuracy as

$$\small E(r_{g,\widehat{g}})\approx\frac{trace(\mathbf{P}_{u}\mathbf{C}\tilde{\mathbf{G}})} {\sqrt{trace(\mathbf{P}_{u}\tilde{\mathbf{G}})trace\left(\mathbf{C}^{T}\mathbf{P}_{u}\mathbf{CV}\right)}} $$
(5)

where \(\mathbf {V}=\tilde {\mathbf {G}}+\tilde {\mathbf {R}}\) with V, \(\tilde {\mathbf {G}}\) and \(\tilde {\mathbf {R}}\) being the variance-covariance matrices for the phenotypes, genotypes and residual errors of the adjusted genotypes, respectively; \(\mathbf {P}_{u}=\frac {1}{n-1}\left (\mathbf {I}-\frac {1}{n}\mathbf {J}_{n}\right)\), with Jn a n×n matrix of ones; \(\mathbf {C}=\tilde {\mathbf {G}}\mathbf {V}^{-1}\mathbf {Q}\), with Q=I1(1TV−11)−11TV−1, and 1 denoting a vector of ones. Under this formulation, which provides a direct estimate of the correlation between the true (g) and the predicted (\({\mathbf {\widehat {g}}}\)) breeding values, the RR-BLUP of g is now given by \(\widehat {\mathbf {g}}=\tilde {\mathbf {G}}\mathbf {V}^{-1}\mathbf {Q}\widehat {\boldsymbol {\mu }}\) [34].

Heritability can then be computed from (5) as

$$H_{m_{5}}^{2}=[E(r_{g,\widehat{g}})]^{2}.$$

Method 7

This method (M7) is commonly used by animal breeders to directly compute predictive accuracy (ρ) from the mixed model equations (MME, [12, 28, 51]) by firstly computing the squared correlation between the true (g) and predicted breeding values (\(\widehat {\mathbf {g}}\)), i.e., reliability (ρ2).

Since the MME for the second-stage model (4) are given by

$$ \left[\begin{array}{c} \widehat{\phi} \\ \widehat{\mathbf{g}} \end{array}\right] = \left[\begin{array}{cc} {\mathbf{1}^{\intercal}}{\tilde{\mathbf{R}}^{-1}}\mathbf{1} & {\mathbf{1}^{\intercal}}{\tilde{\mathbf{R}}^{-1}} \\ {\tilde{\mathbf{R}}^{-1}}\mathbf{1} & {\tilde{\mathbf{R}}^{-1}}+{\tilde{\mathbf{G}}^{-1}} \end{array}\right]^{-} \left[\begin{array}{cc} {\mathbf{1}^{\intercal}}{\tilde{\mathbf{R}}^{-1}}{\widehat{\boldsymbol{\mu}}} \\ {\tilde{\mathbf{R}}^{-1}}{\widehat{\boldsymbol{\mu}}} \end{array}\right], $$
(6)

with the variance-covariance matrix of \((\hat {\phi }-\phi,\hat {\mathbf {g}}-\mathbf {g})\) given by

$$ \left[\begin{array}{cc} \mathbf{C_{11}} & \mathbf{C_{12}} \\ \mathbf{C_{21}} & \mathbf{C_{22}} \end{array}\right] = \left[\begin{array}{cc} {\mathbf{1}^{\intercal}}{\tilde{\mathbf{R}}^{-1}}\mathbf{1} & {\mathbf{1}^{\intercal}}{\tilde{\mathbf{R}}^{-1}} \\ {\tilde{\mathbf{R}}^{-1}}\mathbf{1} & {\tilde{\mathbf{R}}^{-1}}+{\tilde{\mathbf{G}}^{-1}} \end{array}\right]^{-}, $$
(7)

and the variance-covariance matrix of g and \(\widehat {\mathbf {g}}\) given by

$$ \left[\begin{array}{cc} \tilde{\mathbf{G}} & \tilde{\mathbf{G}}-\mathbf{C_{22}} \\ \tilde{\mathbf{G}}-\mathbf{C_{22}} & \tilde{\mathbf{G}}-\mathbf{C_{22}} \end{array}\right], $$
(8)

the reliability for each genotype is computed as

$$ \widehat{\rho}_{i}^{2}=\frac{(cov(g_{i},\widehat{g}_{i}))^{2}}{var(g_{i})var(\widehat{g}_{i})}=\frac{var(\widehat{g}_{i})}{var(g_{i})} $$
(9)

where only the diagonal elements of the matrices \(var(\mathbf {g})=\tilde {\mathbf {G}}\), \(var(\widehat {\mathbf {g}})=\tilde {\mathbf {G}}-\mathbf {C_{22}}=cov(\mathbf {g},\widehat {\mathbf {g}})\) are extracted. The average reliability across the genotypes in each dataset is then estimated by

$$ \widehat{\rho \ }^{2}_{m_{7}}=\frac{1}{n}\sum^{n}_{i=1}{\widehat{\rho \ }^{2}_{i}} $$
(10)

where n is the total number of genotypes in the dataset. Predictive accuracy (\(\widehat {\rho \ }_{m_{7}}\)) is then computed as the square root of \(\widehat {\rho \ }^{2}_{m_{7}}\). Alternatively, predictive accuracy can be computed as

$$ \widehat{\rho \ }_{m_{7}}=\frac{1}{n}\sum^{n}_{i=1}\sqrt{\widehat{\rho \ }^{2}_{i}}. $$
(11)

Further details on this derivation can be found in [36].

Oakey’s method

Oakey et al. [29] propose a generalized heritability measure that was recently re-expressed by [40] as

$$ H^{2}=\frac{\text{trace}(\mathbf{D})}{n-s} $$
(12)

where \(\mathbf {D}=\mathbf {I}_{n}-{\tilde {G}}^{-1}\mathbf {C_{22}}\), s is the number of zero eigenvalues and ns is the effective dimension of D. We also use this method to estimate heritability and compare this estimate with the estimate obtained by method M5.

Robust estimation

Robust estimation of the linear mixed model for phenotypic data analysis

In this section we briefly review the robust approach of [19] to linear mixed effects models that we use in an attempt to robustify the first stage of the two-stage approach to genomic prediction in plant breeding. This approach is implemented in the R software package robustlmm via the function rlmer() [20, 21].

We consider the general linear mixed model

$$ \mathbf{y} = \mathbf{X}\boldsymbol{\mu} + \mathbf{Hu} + \mathbf{e} $$
(13)

where y is a vector of observations, X is the design matrix for the fixed effects (intercept included), μ is the vector of unknown fixed effects, H is the design matrix for the random effects, uN(0,U) is the vector of unknown random effects and eN(0,R) is the vector of random plot errors. Note that for our first-stage model Hu=Zrur+Zbub and μ=ϕ1+g.

Model (13) also assumes that cov(u,e)=0 and as such we have that

$$\mathbf{y}\sim N(\mathbf{X}\boldsymbol{\mu},\mathbf{HUH'+R}).$$

We henceforward assume for simplicity that \(\mathbf {e}\sim N \left (\mathbf {0},\sigma ^{2}_{e}\mathbf {I}\right)\) and \(\mathbf {u}\sim N\left (\mathbf {0},\sigma ^{2}_{e} \mathbf {A}(\boldsymbol {\theta })\right)\) where the variance matrix A of the random effects depends on the vector of unknown variance parameters θ (this assumption can be relaxed to obtain more general formulations, see e.g., [19]). The variance of y now simplifies to

$$\text{var}(\mathbf{y})=\sigma^{2}_{e}\mathbf{HA}(\theta)\mathbf{H}'+\sigma^{2}_{e}\mathbf{I}=\sigma^{2}_{e}\mathbf{\Phi} $$

with Φ=HA(θ)H+I.

Because A(θ) is a positive-definite symmetric matrix and assuming that θ is known, one can obtain its Cholesky decomposition as chol(A(θ))=B(θ), set u=B(θ)b and rewrite model (13) as

$$ \mathbf{y}=\mathbf{X}{\boldsymbol{\mu}}+\mathbf{H}\mathbf{B}(\boldsymbol{\theta})\mathbf{b}+\boldsymbol{\mathbf{e}}, $$
(14)

where \(\mathbf {b}\sim N(\mathbf {0},\sigma _{e}^{2}\mathbf {I})\) so that we again have \(\mathbf {y}\sim N\left (\mathbf {X}\boldsymbol {\mu },\sigma ^{2}_{e}\mathbf {\Phi }\right)\).

The classical log-likelihood for (14) can be written as

$$ \begin{aligned} -2l(\boldsymbol \theta,\boldsymbol\mu,\sigma_{e}&\mid\mathbf{y})= \text{\textit{n}log}(2\pi)+ \text{log}\mid\sigma^{2}_{e}\boldsymbol\Phi\mid+\\ &+ \frac{1}{\sigma^{2}_{e}}(\mathbf{y}-\mathbf{X}\boldsymbol\mu)'\boldsymbol\Phi^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol\mu). \end{aligned} $$
(15)

Furthermore, for a given set of θ,μ and σe ([44], Chapter 7)

$$ \mathbf{b}^{*}=\mathbf{b}_{\text{BLUP}}=\sigma_{e}^{2}\mathbf{B}(\boldsymbol \theta)'\mathbf{H}'\boldsymbol\Phi^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol\mu). $$
(16)

From (15) and (16), an objective function that incorporates the observation-level residuals and the random effects as separate additive terms can be derived and expressed as

$$ \begin{aligned} \tilde{d}(\boldsymbol \theta,\boldsymbol\mu,\sigma_{e},\mathbf{b}^{*}\mid\mathbf{y})=\text{\textit{n}log}(2&\pi)+ \text{log}\mid\sigma^{2}_{e}\boldsymbol\Phi\mid+\\ &\frac{1}{\sigma^{2}_{e}}(\mathbf{e}^{*'}\mathbf{e}^{*}+\mathbf{b}^{*'}\mathbf{b}^{*}) \end{aligned} $$
(17)

where

$$\mathbf{e}^{*}=\mathbf{e}^{*}({\boldsymbol\mu},\mathbf{b}^{*})=(\mathbf{y}-\mathbf{X} {\boldsymbol\mu}-\mathbf{H}\mathbf{B}(\boldsymbol \theta)\mathbf{b}^{*}).$$

This particular trick is crucial in order to independently control contamination at the levels of the residual and random effects.

Assuming θ and σe are known and taking the partial derivatives of (17) with respect to μ and b, we get the following estimating equations for these effects,

$$ \left\{\begin{aligned} \mathbf{X}'\widehat{\mathbf{e}}^{*}/ \sigma_{e} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\boldsymbol{0}\\ \\ \left(\mathbf{B}(\boldsymbol{\theta})'\mathbf{H}'\widehat{\mathbf{e}}^{*}-\widehat{\mathbf{b}}^{*} \right) / \sigma_{e}=\boldsymbol{0} \end{aligned}\right. $$
(18)

where

$$ \widehat{\mathbf{e}}^{*}=\widehat{\mathbf{e}}^{*}\left({\widehat{\boldsymbol{\mu}}},\widehat{\mathbf{b}}^{*}\right) =\left(\mathbf{y}-\mathbf{X}{\widehat{\boldsymbol{\mu}}}-\mathbf{H}\mathbf{B}(\boldsymbol{\theta})\widehat{\mathbf{b}}^{*}\right). $$
(19)

If B(θ) is diagonal, as in our case, these equations are robustified by replacing \(\widehat {\mathbf {e}}^{*}\) and \(\widehat {\mathbf {b}}^{*}\) by bounded functions \(\psi _{e}(\widehat {\mathbf {e}}^{*})\) and \(\psi _{b}\left (\widehat {\mathbf {b}}^{*}\right)\), where the ψe and ψb functions need not be the same:

$$ \left\{\begin{aligned} &\mathbf{X}'\psi_{e}(\widehat{\mathbf{e}}^{*}/ \sigma_{e})/ \lambda_{e} \ \ =0\\ \\ &\mathbf{B}(\boldsymbol{\theta})'\mathbf{H}'\psi_{e}(\widehat{\mathbf{e}}^{*}/ \sigma_{e})/ \lambda_{e}-\psi_{b}(\widehat{\mathbf{b}}^{*}/ \sigma_{e})/ \lambda_{b}=0 \end{aligned}\right. $$
(20)

where \(\lambda _{\bullet }=\mathbb {E}_{0}[\psi '_{\bullet }]\) is required to balance the \(\widehat {\mathbf {e}}^{*}\) and \(\widehat {\mathbf {b}}^{*}\) terms in case different ψ functions are used; 1/λe and 1/λb are scaling factors (as in M-regression [17]) and cancel out in the special case where ψeψb.

If we let

$$w_{e}(e^{*})= \left\{\begin{array}{ll} \psi_{e}(e^{*})/e^{*} & \text{if} \ \ e^{*}\neq 0\\ \psi_{e}'(0) & \text{if} \ \ \varepsilon^{*}= 0 \end{array}\right., $$
$$w_{b}(b^{*})= \left\{\begin{array}{ll} \psi_{b}(b^{*})/{b^{*}} & \text{if} \ \ b^{*}\neq 0\\ \psi_{b}'(0) & \text{if} \ \ b^{*}= 0 \end{array}\right., $$

Λb=λe/λb, \(\mathbf {W}_{e}=\mathbf {Diag}(w_{e}(e^{*}_{i}/\sigma _{e}))\) and \(\mathbf {W}_{b}=\mathbf {Diag}(w_{b}(b^{*}_{i}/\sigma _{e}))\), and after some simplification, Eq. (20) can be written as

$$\left\{\begin{aligned} &\mathbf{X}'\mathbf{W}_{e} \widehat{\mathbf{e}}^{*}\ \ =\boldsymbol{0}\\ \\ &{\mathbf{B}(\boldsymbol{\theta})'\mathbf{H}'\mathbf{W}_{e}}{\widehat{\mathbf{e}}^{*}}- {\Lambda_{b}}{\mathbf{W}_{b}}{\widehat{\mathbf{b}}^{*}} =\boldsymbol{0} \end{aligned}\right. $$

which, after expanding \(\widehat {\mathbf {e}}^{*}\) with (19), yields the following system of linear equations:

$$\left[\begin{array}{cc} {\mathbf{X}^{\intercal}}{\mathbf{W}_{e}}{\mathbf{X}} & {\mathbf{X}^{\intercal}}{\mathbf{W}_{e}}{\mathbf{HB}(\boldsymbol{\theta})} \\ {\mathbf{B}(\boldsymbol{\theta})^{\intercal}}{\mathbf{H}^{\intercal}\mathbf{W}_{e}\mathbf{X}} & {\mathbf{B}(\boldsymbol{\theta})^{\intercal}}{\mathbf{H}^{\intercal}\mathbf{W}_{e}}{\mathbf{HB}(\boldsymbol{\theta})}+ {\Lambda_{b}}{\mathbf{W}_{b}} \end{array}\right] \left[\begin{array}{c} \widehat{\boldsymbol{\mu}} \\ \widehat{\mathbf{b}}^{*} \end{array}\right] = $$
$$ =\left[\begin{array}{cc} \mathbf{X}^{\intercal}\mathbf{W}_{e}\mathbf{y} \\ \mathbf{B}(\boldsymbol{\theta})^{\intercal}\mathbf{H}^{\intercal}\mathbf{W}_{e}\mathbf{y} \end{array}\right]. $$
(21)

The algorithm for estimating parameters of (21) begins with a predefined set of weights. It then alternates between computing \(\widehat {\boldsymbol {\mu }}\) and \(\widehat {\mathbf {b}}^{*}\) for a given set of weights and updating the weights for a given set of estimates. Koller and Stahel [18] and Koller [19] provide more details on the estimation of the scale and covariance parameters and the estimation procedure for the non-diagonal case.

If replicate and block (nested within replicates) are the only random effects apart from the residual error in the first-stage model (this is the case for the simulation study for our first-stage model and for the first-stage model for the rye dataset) then \(\boldsymbol {\theta }=\left (\frac {\sigma ^{2}_{r}}{\sigma ^{2}_{e}},\frac {\sigma ^{2}_{r:b}}{\sigma ^{2}_{e}}\right)\), where \(\sigma ^{2}_{r}\) and \(\sigma ^{2}_{r:b}\) are the variances for the replicate and block random effects, respectively. Also here, A(θ) is a two-block diagonal matrix (k=2 blocks). Furthermore, because we assume \(\mathbf {u}_{r}\sim N(0,\sigma ^{2}_{r}\mathbf {I})\) and \(\mathbf {u}_{b}\sim N(0,\sigma ^{2}_{r:b}\mathbf {I})\) for the first-stage model, B(θ)=[A(θ)]1/2 is a diagonal matrix.

In particular, for the simulated data consisting of 698 observations of maize yield from 2 replicates each having 39 blocks (more details in the “Simulation” section), we compute 2+39=41 weights (Wb) for the observations at the level of the random effects and 2×698=1396 weights (We) for the observations at the level of the fixed effects (i.e., for the residuals).

Robust approach to phenotypic analysis

Phenotypic data derived from field trials are prone to several types of contamination that may range from measurement errors, inherent characteristics of the genotypes and the environments to the years in which the trials were conducted. As such, if contaminated observations are present in the vector of phenotypes y in the first stage of phenotypic data analysis, then they can unduly influence the estimation of the means for the testcross genotypes (μ) in model (3), resulting in inaccurate estimates of adjusted phenotypic means \(\widehat {\boldsymbol \mu }\). In turn, these possibly inaccurate estimates of μ are passed on to the second stage of the procedure (model (4); adjusted RR-BLUP) from which the breeding values g are estimated. The possibly biased estimates of (g) may undermine the accuracy of the estimated heritability and predictive accuracy.

To minimize bias in the estimation of heritability and predictive accuracy, we propose using the preceding robust model for the first stage of phenotypic data analysis. The second stage then proceeds in the same way as the classical method except that, now, the robust estimates \(\widehat {\boldsymbol \mu }_{R}\) from the first stage are used in (4).

Simulation

Simulated datasets

We consider a real maize dataset from the Synbreed Project (2009−2014). This dataset was extracted for one location from a larger dataset and consists of 900 doubled haploid maize lines, of which only 698 testcrosses were genotyped, and 11,646 SNP markers. Six hybrid checks and five line checks were considered and genotypes were crossed with four testers as explained in more detail in [9]. Variance components estimated from this dataset (\(\sigma ^{2}_{r}=0\), \(\sigma ^{2}_{r:b}=6.27\), \(\sigma ^{2}_{e}=53.8715\) and \(\sigma _{s}^{2}=0.005892\)) were used to simulate the block and plot effects based on an α-design [31] with two replicates and the model

$$ y_{ijk}=\phi+r_{k}+b_{jk}+g_{i}+e_{ijk} $$
(22)

where yijk is the yield of the i-th genotype in the j-th block nested within the k-th complete replicate, ϕ is the general mean, rk is the fixed effect of the k-th complete replicate, bjk is the random effect of the j-th block nested within the k-th complete replicate, gi is the random effect of the i-th genotype, and eijk is the residual plot error associated with yijk. More details on (22) can be found in Table S3 in the supplementary materials of [10].

Our simulations consider 1000 simulated Maize datasets described as follows: each dataset consists of 698 observations of yield in 2 replicates, with the 698 genotypes distributed over 39 blocks as in Table 1. Four out of the 39 blocks have 17 observations, whereas the remaining 35 have 18 observations.

Table 1 A sample simulated Maize dataset

Simulation of outliers

The type of outliers we consider, commonly known in the literature as shift-outliers (or location outliers), are typically the hardest type to detect in multivariate settings because they have the same shape (the same covariance structure but shifted mean) as the overall data [39]. The shift-outliers can arise from various contamination sources, including the following: errors, inherent characteristics of the genotype(s) in a particular spatial location or replicate, or, occurrence of a specific phenomenon that negatively or positively impacts the genotype(s). Although our simulations focus on these particular cases, other types of outliers that we do not consider here are certainly conceivable (see [39] for more details).

In order to simulate outliers, a percentage of phenotypic observations in the dataset is chosen and contaminated by replacing the observed value of each selected observation by that value plus 5-, 8- or 10- times the standard deviation of the residual error (σ) used to simulate the phenotypic datasets. Additionally, we also consider two distinct scenarios of data contamination:

  1. (i)

    Random contamination: 1, 3, 5, 7 and 10% of the phenotypic data in only one of the two replicates are randomly contaminated, amounting to an overall data contamination rate of 0.5, 1.5, 2.5, 3.5 and 5%, respectively.

  2. (ii)

    Block contamination: phenotypic data in 1, 2, 3, 4 and 5 whole blocks in only one of the two replicates are contaminated, amounting approximately to 1.3, 2.6, 3.9, 5.2 and 6.5% overall rate of data contamination, respectively.

We use the notation “% cont" to denote a particular percentage (%) of data contamination with outliers, “sd” to denote the size of the outliers and “No.blocks" to refer to the number of contaminated blocks.

First- and second-stage models

In the first stage (Eq. 3), we consider yield as the response variable, the genotypes as the fixed effects and the replicates and blocks nested within replicates as the random effects. In the second stage (Eq. 4), we consider the adjusted genotypic means estimated in the first stage as the response variable, the intercept as the fixed effect and the genotypes as the random effects with a variance-covariance structure given by the genomic relationship matrix.

Comparing performance of the classical and robust approaches

The performance of the classical and robust approaches is evaluated in three steps, labelled L1, L2 and L3. L1 involves a comparison of results from the first stage; L2 entails a comparison of results from the second stage and L3 focuses on a comparison of the estimated heritability and predictive accuracy, which can be viewed as constituting the third stage. For each of the three levels, we consider the null scenario (uncontaminated datasets), random and block contamination scenarios.

Additionally, the influence of the Smith’s and standard weighting schemes used in the second stage of the two-stage approach are considered in L2.

The following quantities are computed and used to compare the performance of the classical and robust approaches at levels L1–L3.

L1: The mean squared deviation (MSD) of the estimated from the true genotypic means is computed for both the classical and robust approaches as

$$ {\text{MSD}_{\mu}} = {\sum_{l=1}^{1000}}{\sum_{i=1}^{698}}{\frac{(\widehat{\mu}_{il}-\mu_{il})^{2}}{698\times 1000}} $$
(23)

where μil is the true mean of the i-th genotype in the l-th simulation run and \(\widehat {\mu }_{il}\) is its estimate.

The estimates of MSD\(_{\widehat {\mu }}\) for the classical (C) and robust (R) approaches are compared for each scenario using

$$ {\text{MSD}_{\widehat{\mu}}} = \sum_{l=1}^{1000}\sum_{i=1}^{698}\frac{\left(\widehat{\mu}_{il}^{R}-\widehat{\mu}_{il}^{C}\right)^{2}}{698\times 1000} $$
(24)

and are expected a priori to agree for the null scenario.

It is also instructive to compute and plot

$$ {\text{MSD}_{\mu}^{i}} = \sum_{l=1}^{1000}\frac{(\widehat{\mu}_{il}-\mu_{il})^{2}}{1000} $$
(25)

for each genotype i=1,...,698 for both approaches. Furthermore, the overall estimated genotypic mean (across genotypes and simulations) is also computed and compared to the corresponding true genotypic mean. Moreover, since the rank order of genotypes is also of great importance in plant breeding studies, the Pearson correlation coefficient (rp) between the true and estimated genotypic means (predictive accuracy) is also computed and compared between the two approaches. This yields an estimate of the predictive accuracy for the genomic means.

L2: At this level, we compute the MSDs for the genomic breeding values g analogously to Eqs. (23)–(25). The rp between the true and estimated breeding values is again computed and used to compare the two methods and assess any improvement in the estimation of g when genomic information is included in the analysis. This provides an estimate of the accuracy of genomic prediction.

L3: Here, the methods are compared by computing the following MSDs,

$$ \text{MSD}_{\texttt{H}} = \sum_{l=1}^{1000}\frac{\left(\widehat{H}^{2}_{l}-(r_{g,\widehat{g}})^{2}\right)^{2}}{1000} $$
(26)
$$ \text{MSD}_{\texttt{PA}} = \sum_{l=1}^{1000}\frac{\left(\widehat{r}_{g, \widehat{g},l}-r_{g, \widehat{g}}\right)^{2}}{1000} $$
(27)

where \(r_{g, \widehat {g}}\) is the Pearson correlation computed between the true and the estimated breeding values and averaged across the 1000 simulations, \(\widehat {H}^{2}_{l}\) and \(\widehat {r}_{g, \widehat {g},l}\) are, respectively, the heritability and predictive accuracy estimated in the s-th simulation via the methods described earlier. These MSDs quantify the deviation of the estimated from the true heritability (H2) or predictive accuracy (PA). In addition, we provide boxplots of the estimated heritablity and predictive accuracy for the 1000 simulation runs for each scenario.

Simulation results

Null scenario

L1: The following computed MSDs

$$\text{MSD}_{\widehat{\mu}} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{\mu}_{il}^{R}-\widehat{\mu}_{il}^{C}\right)^{2}}{698\times 1000}\simeq 0.06, $$
$$\text{MSD}_{\mu}^{C} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{\mu}_{il}^{C}-{\mu}_{il}\right)^{2}}{698\times 1000}\simeq 28.97 \text{ and} $$
$$\text{MSD}_{\mu}^{R} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{\mu}_{il}^{R}-{\mu}_{il}\right)^{2}}{698\times 1000}\simeq 29.08 $$

show, as expected, that both methods perform similarly when the data are not contaminated (\(\text {MSD}_{\widehat {\mu }}\simeq 0\)). However, the classical method performs slightly better than the robust one (\({\text {MSD}_{\mu }^{C}}\lesssim {\text {MSD}_{\mu }^{R}}\)). Even so, both MSD values are not particularly close to zero. Still, as these MSD values are squared deviations averaged across all the 1000 simulation runs and 698 genotypes, they seem reasonable.

The slightly better performance of the classical relative to the robust method is also apparent in the per-genotype MSDs (Additional file 2: Figure S1). The two approaches produce virtually identical estimates for the overall mean of \(\widehat {\boldsymbol \mu }\) (i.e., mean{μil}, i=1,...,698, l=1,...,1000) and rp (Table 2).

Table 2 Estimated overall mean of \(\widehat {\boldsymbol {\mu }}\) and predictive accuracy expressed as the Pearson correlation coefficient rp obtained using the classical (CLS) and the robust (ROB) methods (averaged across the 1000 simulations)

The two methods estimate the variances of both the random effects and the residual errors equally well (Additional file 2: Figure S2).

The Smith’s and standard weights obtained in the first stage for both the classical and robust approaches are very small (Additional file 2: Figure S3). Precisely, the MSD between the two different types of weights is approximately 0.6×10−6 and the MSD between the values of each type of weight computed by the two approaches is about 0.6×10−5.

L2: There were no major differences between the estimated breeding values obtained using either the standard or Smith’s weighting schemes at the second stage (MSD g≃25 for both cases). For this reason we only present results produced using Smith’s weights.

The MSDs for the second stage

$$\text{MSD}_{\widehat{g}} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{g}_{il}^{R}-\widehat{g}_{il}^{C}\right)^{2}}{698\times 1000}\simeq 0.03 $$
$$\text{MSD}_{g}^{R} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{g}_{il}^{R}-{g}_{il}\right)^{2}}{698\times 1000}\simeq 25.55; $$
$$\text{MSD}_{g}^{C} = \sum\limits_{l=1}^{1000}\sum\limits_{i=1}^{698}\frac{\left(\widehat{g}_{il}^{C}-{g}_{il}\right)^{2}}{698\times 1000}\simeq 25.18 $$

show a modest improvement over the corresponding estimated genotypic means at the first stage and that the methods continue to perform similarly as in the first stage. Relative to the estimates for the first stage, the per-genotype MSDs (Additional file 2: Figure S13) increase for about 22% but decreases for about 47% of the genotypes. This trend is similar for both the classical and robust approaches. Additionally, for the second stage, the mean rp=0.903 for both approaches. This increase in rp relative to the first stage (≃18.2%) shows that using genomic information at the second stage improves genomic prediction and hence the ranking of genotypes. For the overall mean of the EBVs (\(\widehat {\mathbf {g}}\)), it drops to ≃5 from ≃9 for both approaches (first row, Additional file 1: Tables S2 & S4).

Quite interestingly, in terms of the estimation of the genetic variance, the robust approach performs slightly better than the classical (Additional file 2: Figure S14).

L3: Both the classical and robust approaches produce the following MSDs for heritability (Method M5 only) and predictive accuracy (Methods M5 and M7):

$$\text{MSD}_{\texttt{H}}^{\mathrm{M}5}\simeq 0.00 $$
$$\text{MSD}_{\texttt{PA}}\simeq \left\{\begin{array}{lll} 0.00 & \text{for}\ {\mathbf{M}}5\\ 0.01 & \text{for} \ {\mathbf{M}}7 \end{array}\right. $$

showing the estimates of heritability and predictive accuracy to be quite accurate. We note that estimates of heritability and predictive accuracy were computed by fixing the residual variance from the first stage to one as described in the “Genomic prediction” section. In general this produced more accurate estimates than the alternative for which the residual variance estimated in the first stage is used. Therefore all the results displayed here for the third stage use the former implementation.

Boxplots for the estimated PA (methods M5 and M7) and H2 (method M5 only) across the 1000 simulations for the null scenario are shown together with the ones for the random and block contamination scenarios (Additional file 2: Figures S19-S20). These suggest that method M5 produces more accurate estimates of PA than method M7.

Relative to method M5, Oakey’s heritability estimates are unacceptably lower than the simulated true values (Fig. 1). To further explore why these estimates were remarkably smaller than those produced by M5 or the simulated true values, we took 100 out of the 1000 simulation replicates and refitted the entire two-stage model by setting \(\tilde {\mathbf {G}}=\sigma ^{2}_{g}\mathbf {I}\) at the second stage. The heritability estimates for M5 and Oakey’s were virtually identical (MSD ≃0.1×10−27). This strongly suggests that Oakey’s method works fine with independent genotypes but performs poorly when the model used to estimate heritability has a kinship matrix. Consequently, we do not consider Oakey’s heritability estimates further except in a few comparisons in the “Example application” section.

Fig. 1
figure1

Estimates of heritability obtained from methods M5 and Oakey’s for the classical (CLS) and robust (ROB) approaches

Random contamination scenarios

L1: The MSDμ for these scenarios are similar between approaches for each level of contamination and size of outlier (Tables 3 and Additional file 1: Table S1). Hence for the random contamination scenarios, the robust and classical approaches produce comparable estimates for the genotypic means. The per-genotype MSDs also reaffirm the similar performance of the two approaches (Additional file 2: Figure S4). Nevertheless, it is noteworthy that even for the least extreme scenario, 1% contamination with an outlier of size 5 sd, the increase in the per-genotype \(\text {MSD}_{\mu }^{i}\) is non-negligible relative to the corresponding values computed by the classical method under the null scenario and used as a benchmark (Fig. 2). The per-genotype MSDs increase greatly with increase in the percentage contamination and size of outliers (Additional file 2: Figure S4).

Fig. 2
figure2

Boxplot of the 698 per-genotype mean squared deviations (\(\text {MSD}_{\mu }^{i}\)) under the null scenario (classical as reference) against the ones obtained in the random scenarios 1_5 for the classical (CLS) and robust (ROB) approaches

Table 3 MSDμ between the estimated genotypic means and the true breeding values considering the classical (CLS) and robust (ROB) approaches and the random contamination scenarios

The estimated overall mean of the estimated genotypic means (\(\widehat {\boldsymbol {\mu }}\)) for both approaches departs increasingly from the true overall mean of μ as both the level of contamination and size of outliers increase (Additional file 1: Table S2). The same trend is evident for the rp, implying deterioration of the ranking of genotypes (Additional file 1: Table S2).

The two methods also differ with respect to how well they estimate particular variance components. More precisely, the classical method estimates the variance for blocks nested within replicates (\(\sigma ^{2}_{r:b}\)) somewhat better than the robust method does from 5% contamination upwards. However, the robust method estimates the variances for replicates (\(\sigma ^{2}_{r}\)) and residual errors\(\left (\sigma ^{2}_{e}\right)\) far better than the classical method does (Additional file 2: Figures S6–S8).

The Standard and Smith’s weights computed for both the classical and robust approaches across the random contamination scenarios are shown in Additional file 2: Figures S9 and S10.

As the percentage of contamination and size of the outliers increase, the degree of overlap of the empirical frequency distributions of the classical and robust weights evidently reduces. In particular, the distributions do not overlap at all from the 3% contamination level upwards for the 8− and 10−sd shift-outliers. Also, the weights show an overall decreasing trend, which is more evident for the classical approach and the Standard weights.

L2: The mean squared deviations (MSDg) between the EBVs and the TBVs for the classical approach are displayed in Additional file 1: Table S5. The MSDg’s for the Standard and Smith’s weights show some minor differences in favour of the latter from 7% contamination and 8-sd shift-outliers onwards. Thus, only the second-stage results obtained using the Smith’s weights are presented in the remainder of this section.

The robust approach tends to produce smaller MSDs between the EBVs and the TBVs as the percentage of random contamination and the size of the shift-outliers increase (Tables 4 & Additional file 1: Table S1). The second-stage per-genotype MSDs do not show the increasing trend observed for the per-genotype MSDs from the first stage with values ranging between 0 and 100 (Additional file 2: Figure S16). In addition, the robust method always produces higher estimated predictive accuracy, expressed as the averaged Pearson correlation coefficient rp, than the classical method, implying better ranking of the genotypes (Additional file 1: Table S2). The overall mean EBVs (\(\widehat {\mathbf {g}}\)) is similar for both approaches but drops steadily as the percentage contamination and size of outliers increase, implying underestimation.

Table 4 Mean squared deviation of the estimated from the true genomic breeding values (MSDg) for the classical (CLS) and robust (ROB) approaches under the random contamination scenarios

The robust approach also produces more accurate estimates of the marker-effect variance (\(\sigma ^{2}_{s}\)) up to 10% contamination and 5-sd shift-outliers (Additional file 2: Figure S15). For the 10% contamination scenarios with 8- and 10-sd shift-outliers the robust estimates of \(\sigma ^{2}_{s}\) no longer overlap with the true marker-effect variance but their boxplots show a smaller inter-quartile range (IQR) and lower dispersion than those for the classical approach.

L3: The robust approach produced generally more accurate estimates for both H2 and PA for the random contamination scenarios (Additional file 2: Figures S18-S20). However, both approaches tend to underestimate both parameters as the percentage contamination and size of outliers increase. The \(\text {MSD}_{\texttt {H}}^{\texttt {M}5}\) ranged between approximately 0.00−0.09 and 0.00−0.07 for the classical and robust methods, respectively. The corresponding, \(\text {MSD}_{\texttt {PA}}^{\texttt {M}5}\) ranged between approximately 0.00−0.02 and 0.00−0.01 whereas \(\text {MSD}_{\texttt {PA}}^{\texttt {M}7}\) ranged between approximately 0.01−0.04 and 0.01−0.07. Overall, method M5 performs somewhat better than method M7 in estimating predictive accuracy (Additional file 1: Table S6 and Additional file 2: Figures S18-S20).

Block contamination scenarios

L1: Although the MSDμ for the block contamination scenarios is relatively stable for the robust approach (between 29.08 and 30.11), it increases with increasing level of contamination and size of the outliers for the classical approach (Tables 5 & Additional file 1: Table S3). In the worst block contamination scenario (5_10) the MSD for the classical approach is about 1.7 times larger than that for the robust approach. The per-genotype MSDs show even poorer performance for the classical method in estimating each of the 698 genotypic means (Additional file 2: Figure S5). By contrast, the robust approach maintains the errors at roughly the same level across the contamination scenarios; a level that is close to the one estimated for the null scenario. This is an attractive property of this method.

Table 5 Mean squared deviation of the estimated genotypic means from the true breeding values (MSDμ) for the classical (CLS) and robust (ROB) approaches under the block contamination scenarios

Block contamination had generally less debilitating effect on the accuracy of the estimated genotypic means than random contamination (Additional file 1: Tables S1 and S3). For example, the block contamination scenario 1_5, which corresponds to an overall 1.3% data contamination, produces smaller MSDs than the random contamination scenario 1_5, which corresponds to only 0.5% overall contamination (Figs. 2 and 3; Table 5).

Fig. 3
figure3

Boxplot of the 698 per-genotype mean squared deviations (\(\text {MSD}_{\mu }^{i}\)) under the null scenario (classical as reference) against the ones obtained in the block scenarios 1_5 for the classical (CLS) and robust (ROB) approaches

The performance of the two methods also differed noticeably with respect to the estimation of the overall mean of \(\widehat {\boldsymbol {\mu }}\). For example, for the worst case scenario (block 5_10) the overall mean of \(\widehat {\boldsymbol {\mu }}\) deviated from the true mean by merely 5.5% for the robust approach but by 50.2% for the classical approach (Additional file 1: Table S4), indicating superior performance of the robust approach. Nevertheless, the poor predictive performance of the classical approach at the first stage does not necessarily translate to a reduced predictive accuracy rp because it does not alter the relative ranking of the genotypes (Additional file 1: Table S4). Accordingly, the ranking of the genotypes does not differ much between the two approaches (estimated rp≃0.76 for both approaches across all scenarios).

An overall superior performance of the robust compared to the classical approach is also evident for the accuracy of the estimated variance components (Additional file 2: Figures S6-S8).

L2: In this case, the MSD g obtained in the second stage differ depending on whether the Smith’s or the standard weights are used. In particular, using the Smith’s weights produces more stable MSD g estimates across all the block contamination scenarios than using the standard weights, which tend to increase with increasing number of contaminated blocks and size of outliers (Additional file 1: Table S5). For this reason, only results obtained using the Smith’s weights are presented in the remainder of this section.

For all levels of contamination and size of outliers, the robust overall MSDs between the EBVs and the TBVs did not differ much and fluctuated around ≃25 (Additional file 1: Table S5), a value that is similar to the corresponding value for the null scenario (Table 6).

Table 6 Mean squared deviation of the estimated from the true breeding values (MSDg) for the classical (CLS) and robust (ROB) approaches under the block contamination scenarios

The per-genotype MSD g values vary little with increasing size of outliers but suggest that the classical method performs slightly better than the robust method (Additional file 2: Figure S17).

The average estimated predictive accuracy (rp) across all scenarios was approximately 0.90 for both approaches (Additional file 1: Table S4). Predictive accuracy thus increased from the first to the second stage for the classical (by 17%) and robust (by 18%) approaches, an increase comparable to that observed under the null scenario.

Finally, the robust method estimates the marker-effect variance \(\sigma ^{2}_{s}\) more accurately than the classical method throughout all the block contamination scenarios (Additional file 1: Table S15).

L3: MSD\(_{\texttt {H}}^{\texttt {M5}}\) and MSD\(_{\texttt {PA}}^{\texttt {M5}}\) were both ≃0.00 for both the classical and robust approaches across all the block contamination scenarios, with the classical producing marginally better results than the robust approach (Additional file 2: Figures S18 and S19). MSD\(_{\texttt {PA}}^{\texttt {M7}}\simeq 0.01\) for both approaches with the robust estimates of PA obtained via M7 showing slightly greater dispersion (Additional file 2: Figure S20). It is noteworthy that estimates of H2 and PA are rather stable across block contamination scenarios (Additional file 2: Figures S18-S20), consistent with the estimated marker-effect variances (Additional file 2: Figure S15).

Example application

In this section, we comparatively evaluate differences in the performances of the classical and robust approaches on raw empirical rye and maize datasets prior to quality control. Substantial differences in results between the two approaches would imply problems with the data that require closer inspection by the breeder or data analyst. Such inspection can be followed by data cleaning, which can be a very challenging and time-consuming task. For the two example datasets in this section, we perform data cleaning based on a simple rule of thumb that relies on the weight given to each observation by the robust method. Specifically, observations assigned weights smaller than 0.5 are flagged as outliers. More sophisticated outlier detection techniques are outside the scope of this paper [3, 23]. We apply the classical and robust approaches to the cleaned dataset and compare the results with each other and with the results for the raw dataset. We note that cleaning the data does not necessarily make it conform to model assumptions such as the normality of the errors. We note further that because empirical datasets for both examples each have only two replicates, the robust method usually assigns the same, or very similar, weights to both replicates. This is the reason that a testcross genotype is removed entirely from the cleaned dataset even if only one of its two replicate observations is outlying. This problem would be eliminated by replicating each genotype three or more times.

We similarly analyze the empirical datasets after taking into account the recommendations of the breeder based on quality control to demonstrate that quality control alone will not always detect and eliminate all sources of data contamination and hence does not preclude the use of robust statistical methods.

Rye dataset

In this example we consider only one trial from the Rye dataset described in the “Materials and methods” section, which otherwise has the same structure as the simulated maize data set shown in Table 1. The first- and second-stage models fitted to the Rye data set are the same as those described in the “Simulation” section.

The classical and robust approaches produced strikingly different estimates for the residual and blocks variances at the first stage as well as for heritability and predictive accuracy at the third stage (Table 7; CLS r and ROB r results). The robust weights assigned to each of the 320 observations in the first stage identified 32 observations for the exact same 16 genotypes identified as outliers by the breeders. When the 32 observations are removed from the data, which amounts to around a 10% reduction in the size of the dataset, then the classical and robust approaches produce very similar estimates, as is expected when data conform to the model’s assumptions (Table 7; CLS r* and ROB r* results). In particular, the distributions of the residuals from the classical first- and second-stage models fit to the cleaned dataset satisfy the normality assumption (Shapiro-Wilk normality test: p=0.9771 and p=0.6974, respectively), but the distribution of the residuals from the raw dataset do not (Shapiro-Wilk normality test: p<10−9). Inspection of the QQ plots of the residuals (not shown) further reinforced the results of the normality tests. This example clearly demonstrates how the robust approach ameliorates most of the devastating influences of outliers on the classical method. Thus, contamination with outliers inflates the estimated residual variance ∼20 times for the classical method but only ∼3.6 times for the robust method. By contrast, contamination reduces the estimated block variance from ∼11.5 to zero for the classical method but from ∼11.9 to ∼8.2 for the robust method. Lastly, contamination reduces the estimated heritability and predictive accuracy far more strikingly for the classical than for the robust approach. However, contamination inflates the marker-effect variance equally by a factor of two for both approaches. Although far lower than the simulated true values, Oakey’s heritability estimates are also shown and compared between the full and the cleaned datasets for completeness (Table 7).

Table 7 The estimated residual (\(\sigma _{e}^{2}\)), replicate (\(\sigma _{r}^{2}\)) and blocks within replicates (\(\sigma _{r:b}^{2}\)) variance components, genetic variance (\(\sigma _{s}^{2}\)), heritability computed by Method M5 (H2.M5) and predictive accuracy computed by Methods M5 (PA.M5) and M7 (PA.M7), using the classical (CLS) and robust (ROB) approaches, for the rye dataset

Results for the processed dataset are similar for the two approaches (Table 7 ; CLS qc and ROB qc). They are also quite similar to the results for the cleaned dataset, except for the estimated marker-effect variances, which are smaller for the cleaned dataset perhaps due to the decrease in the sample size at the second stage. Interestingly, for this particular case, the residuals from the first stage of the classical model fit satisfy the normality assumption (Shapiro-Wilk normality test: p=0.6974) but those from the second stage only marginally pass the normality test (Shapiro-Wilk normality test: p=0.0437). A quick look at the QQ plot of these residuals reveals two residuals that deviate substantially from the equality line (plot not shown). The robust method did not, however, assign any observation a weight smaller than 0.5 and and hence we did not analyse the cleaned and processed datasets. Nevertheless, if a less conservative threshold of 0.7, say, were used instead of 0.5, then, the robust method would have flagged one observation for a check genotype and two for one testcross genotype.

Maize dataset

In the first stage (Eq. 3), we consider yield as the response variable, the genotypes as the fixed effects and the trials, the replicates nested within trials and the blocks nested within replicates nested within trials as the random effects. In the second stage (Eq. 4), we consider the adjusted genotypic means estimated in the first stage as the response variable, the intercept as the fixed effect and the genotypes as the random effects with a variance-covariance structure given by the genomic relationship matrix. Note that only the 698 genotypes with available genomic information are submitted to the second stage. In addition, 46 observations of yield (amounting to around 2.6% overall contamination) were identified as outliers by the robust weights computed from the robust first-stage model using the raw dataset. Here, all the observations assigned weights of 0.5 or less by the robust model were classified as outliers. Among these 46 outliers, 22 corresponded to 11 genotypes with genomic information, meaning that the second stage for the cleaned raw dataset comprised only 687 (698−11) genotypes. Of the remaining 24 outliers, 18 correspond to 9 genotypes with no genomic information and 6 to 3 hybrid checks and 3 line checks. Overall, 11+9=20 test crosses and 4 of the 6 checks are a subset of the 38 yield observations recommended for removal (deletion) by the breeder during quality control. The robust method identified only 7 observations as outliers from the processed maize dataset (i.e., with 38 missing yield observations). Two of the 7 observations came from one genotyped test cross, 2 hybrid and 3 line checks. Furthermore, two out of these 7 outliers were also identified when the raw dataset was analysed with the robust method. A detailed treatment of outlier detection strategies is beyond the scope of this paper and can be found elsewhere [4, 23, 39, 48].

As with the Rye dataset, the classical and robust approaches produced different results for the full dataset (Table 8; CLS r and ROB r results). The differences between the two methods at the first and second stages translate into major differences in the estimated heritability and predictive accuracy. For the cleaned dataset (Table 8; CLS r* and ROB r* results) the methods produce more similar estimates of variance components, heritability and predictive accuracy, although estimates are not as close as the ones observed in the case of the rye dataset. In addition, the robust results for the full dataset are close to those obtained via the classical method applied to the cleaned dataset. Note that removal of the outliers was sufficient for the residuals from the first stage but not the second stage of the classical model fit to conform to the normality assumption, in contrast to the results for the rye dataset.

Table 8 The estimated residual (\(\sigma _{e}^{2}\)), trial (\(\sigma _{t}^{2}\)), replicates within trials (\(\sigma _{r}^{2}\)) and blocks within replicates within trials (\(\sigma _{b}^{2}\)) variance components, genetic variance (\(\sigma _{s}^{2}\)), heritability computed by Method M5 (H2.M5) and predictive accuracy computed by Methods M5 (PA.M5) and M7 (PA.M7), using the classical (CLS) and robust (ROB) approaches, for the maize dataset

The results for the full dataset after quality control (Table 8; CLS qc and ROB qc) are similar to those from the robust method (ROB r; Table 8) However, the residuals from the classical first-stage model fit violate the normality assumption. After removing the 7 outliers from the processed dataset (CLS qc* and ROB qc*) the classical and robust approaches produced even more similar results (Table 8) but the residuals still depart from the normality assumption.

As before, Oakey’s heritability estimates are also provided and compared between the full and the cleaned datasets (Table 8).

Discussion

The simulation results showed that the classical and robust approaches perform similarly when datasets are not contaminated and thus conform to the linear model assumptions. This is a desirable property for any method that seeks to be an alternative to the classical method. Since datasets do not usually conform to all model assumptions, we assessed the relative performance of both methods in estimating genetic breeding values, heritability and predictive accuracy, across a range of contamination scenarios with outliers, which tamper mostly with the assumptions of normality and variance homogeneity of the residuals. All the scenarios involved either random or block contamination (mimicking plausible field conditions), and for each contamination type, differed only in the percentage of the observations that were contaminated and the size of the outliers. Also, two weighting schemes were used with each dataset in the second stage of the two-stage approach.

The simulations revealed that block contamination has a lesser impact on the estimated parameters than random contamination. Also, the estimated true breeding values improve from the first to the second stage, based on the Pearson correlation coefficient, reaffirming the value of using genomic information in the analyses. In addition, the use of the Smith’s weights produces more consistent parameter estimates from the second stage onwards and is therefore recommended for the two-stage approach.

A comparison of the performance of the classical and robust two-stage approaches is summarized in Additional file 1: Table S7. In general, the proposed robust method shows a superior performance to the classical approach. In terms of the accuracy of heritability and genomic prediction, the robust approach clearly outperforms the classical for the random contamination scenarios but performs similarly to the classical approach for the block contamination scenarios. Also, method M5 produces more accurate estimates of predictive accuracy of genomic prediction than method M7. Quite surprisingly, the simulations suggest that Oakey’s method is unsuitable for estimating heritability when using a model with a kinship matrix.

Interestingly for the block contamination scenarios, the robust method generally outperformed the classical in both the first and second stages, but this did not translate into higher predictive accuracies. This is likely because the block effect (i.e., effect of blocks within replicates) is completely confounded with the effect of contamination within blocks. As a result, if the block effect is included in the model at the first stage it captures the effect of contamination within the block, yielding an inflated block variance for the classical but not the robust approach. This explains why the performance of the classical approach improves from the first to the second stage. It also emphasizes the need to include a random block effect in the first stage to account for intrablock variance especially when using the classical approach.

A noteworthy observation from the simulations is that if a study design has only two replicates, then the robust or the classical methods cannot identify only one of the two replicate observations as an outlier. Hence, when using an automated cleaning process, one has to discard twice as many observations as the actual number of outliers. This is because given only two replicates, a single outlier results in two large residuals with the same absolute value but opposite signs. This makes it hard to determine which of the two replicates is actually the outlier.

The robust method can also be useful to breeders doing variety testing for which only the first-stage model is required. Here, the robust approach had clearly superior performance for the block contamination scenarios. For the random contamination scenarios, except for the blocks within replicates variance, the robust method produced more accurate estimates of the variance components than the classical method did. Moreover, because late-generation breeding trials typically use only two replicates as breeders aim to maximize the number of different environments, the robust method will merely downweight but not require deleting both replicate observations if it identifies either one or both of them as outliers. This property of the robust method is highly desirable because it enables the plant breeder to obtain genomic predictions for all the target genotypes. By contrast, using the classical method only would result in discarding all the genotypes for which at least one of the two replicate observations is an outlier. This is because it is impossible to determine which of the two replicate observations is the true outlier.

The analysis of the real datasets also furnished some insights into the performance of the methods. For the rye dataset, for example, the 16 outliers identified by the breeders, were also detected by the robust model. Each of the 16 outliers belonged to a distinct block, thus mimicking a random contamination scenario. Also, 8 outliers were observed in the first replicate and the remaining 8 in the second replicate hence resulting in a balanced distribution of outliers at the replicate level. The differences between the results obtained by the classical and robust approaches for the complete dataset are consistent with the ones observed for the simulated random contamination scenarios. Removing the outliers from the data produced a closer agreement between the classical and the robust results but some slight difference from the results produced by the robust approach for the complete dataset still remained. A plausible explanation for this is that removing outliers from the data may: (i) substantially reduce sample size; (ii) alter the distribution of the data and (iii) potentially lead to the underestimation of variances for the cleaned data. This last point precisely matches what is observed for the estimated residuals and marker-effect variances in the two empirical data analyses.

The first-stage results from the analysis of the full empirical raw maize dataset showed a huge discrepancy in the estimated residual variance component, moderate disagreement in the estimated block variances and similar estimates of replicate and trial variances between the classical and robust approaches. This result is surprising and deviates from expectation based on the results of the analyses of the simulated data sets. A possible explanation for this unexpected result may relate to the difference in the models fitted to the simulated data and the empirical maize data set and the nature of the outliers. In this case, the 46 observations removed were unevenly spread across 17 out of the 20 blocks (of size 90) and amount to 3% of all the data. Two of these 17 blocks had approximately 8−9% contamination. The criterion used to identify the outliers in the maize data set was the robust weights computed from the robust model fit and was somewhat conservative as only the observations assigned weights equal to 0.5 or less were flagged. This criterion, when applied to the rye dataset, correctly identified the 16 outliers that had already been identified by the breeders. However, for the maize dataset this approach to outlier identification is probably too restrictive because the distribution of the residuals from the classical first-stage model fit to the cleaned dataset satisfied the normality assumption but the residuals from the classical second-stage model fit did not. This observation reinforces the view that successfully cleaning the data to eliminate outliers prior to analysis, plus satisfactorily addressing the drawbacks listed above can be exceedingly challenging. Of the 38 yield observations replaced with missing values as recommended by the breeder based on quality control, 24 were identified by the robust method as outliers based on the analysis of the raw dataset and consisted of either negative or zero yield values, which are evidently anomalous. The other 14 of the 38 deleted observations were plausible and were not identified as outliers by the robust method. Results of the analysis of the processed maize dataset with 38 missing yield observations set to missing were very similar between the two approaches. In particular, the results are also quite similar to those from the analysis of the raw dataset using the robust method. This finding emphasizes the stability and reliability of the robust approach both in the presence of outliers and missing observations.

Conclusion

We have proposed a robust framework for enhancing the accuracy of genomic prediction in plant breeding applicable to fully replicated designs or replicated genotypes in partially replicated designs. The approach does not apply to fully unreplicated designs or to unreplicated genotypes in partially replicated designs for which it is impossible to detect outliers using linear models. Hence, fully or partially unreplicated designs tradeoff the ability to detect outliers for cost-effectiveness of field trials. We emphasize that the ideas underlying the robustness of the proposed approach can be extended to multi-stage and single-stage approaches. But, because the single-stage approach is characteristically computationally expensive for large or complex data sets, robustifying it would make it even more computationally demanding.

In conclusion, we show not only the advantages of a robust approach to phenotypic data analysis and genomic prediction but also provide new insights into the potential problems associated with using the classical approach to phenotypic data analysis and genotypic prediction in plant breeding. Accordingly, in addition to performing standard data quality controls, plant breeders would do well to seriously consider using these robust methods regularly alongside the classical approach to better minimize the influence of outliers on genomic prediction.

Availability of data and materials

The datasets generated and analyzed during the current study are not publicly available due to being proprietary materials of KWS. These can only be shared upon reasonable request and after KWS express consent. Notwithstanding, an R code to simulate a synthetic dataset, which we call toydata, that includes the simulation of phenotypic data as well as SNP marker data and kinship is provided in the Additional file 4. After generating the toydata the user can perform both the classical and robust two-stage approaches by running the R code provided in the Additional file 5.

Abbreviations

CLS:

Classical

EBVs:

Estimated breeding values

GP:

Genomic prediction

MME:

Mixed model equations

MSD:

Mean squared deviation

PA:

Predictive accuracy

ROB:

Robust

RR-BLUP:

Ridge regression best linear unbiased prediction

SNP:

Single-nucleotide polymorphism

TBVs:

True breeding values

References

  1. 1

    Arslan O, Billor N. Robust Liu estimator for regression based on an M-estimator. J Appl Stat. 2000; 27(1):39–47.

  2. 2

    Bernal-Vasquez AM, Utz F, Piepho HP. Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor Appl Genet. 2014; 129(4):787–804.

  3. 3

    Bernal-Vasquez A-M, Utz H-F, et al.Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor Appl Genet. 2016; 129(4):787–804.

  4. 4

    Cerioli A, Farcomeni A, Riani M. Robust distances for outlier-free goodness-of-fit testing. Comput Statist Data Anal. 2013; 65:29–45.

  5. 5

    Chi EC, Scott DW. Robust Parametric Classification and Variable Selection by a Minimum Distance Criterion. J Comput Graph Stat. 2014; 23(1):111–128.

  6. 6

    Copt S, Feser V. High-breakdown inference for mixed linear models. J Am Stat Assoc. 2006; 101:292–300.

  7. 7

    Croux C, Dehon C. Estimators of the multiple correlation coefficient: local robustness and confidence intervals. Stat Pap. 2003; 44(3):315–334.

  8. 8

    Demidenko E. Mixed Models: Theory and Applications. Hoboken: John Wiley & Sons; 2004.

  9. 9

    Estaghvirou SBO, Ogutu JO, Schulz-Streeck T, Knaak C, Ouzunova M, Gordillo A, Piepho HP. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding. BMC Genomics. 2013; 14:860.

  10. 10

    Estaghvirou SBO, Ogutu JO, Piepho HP. Influence of outliers on accuracy and robustness of methods for genomic prediction in plant breeding. G3. 2014; 4:2317–28.

  11. 11

    Estaghvirou SBO, Ogutu JO, Piepho HP. How genetic variance, number of genotypes and markers influence estimates of genomic prediction accuracy in plant breeding. Crop Sci. 2015; 55(5):1911–24.

  12. 12

    Henderson CR. Comparison of alternative sire evaluation methods. J Anim Sci. 1975; 41:760–70.

  13. 13

    Hoerl AE, Kennard RW. Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics. 1970; 8:27–51.

  14. 14

    Hoerl AE, Kennard RW, Baldwin KF. Ridge Regression: Some Simulations. Commun Stat Theory Methods. 1975; 4:105–23.

  15. 15

    Holland PW. Weighted Ridge Regression: Combining Ridge and Robust Regression Methods. NBER Working Paper Series. 1973. Working Paper No.11.

  16. 16

    Huber PJ. Robust estimation of a location parameter. Ann Math Stat. 1964; 35:73–101.

  17. 17

    Huber PJ. Robust statistics: a review. Ann Math Stat. 1972; 43:1041–67.

  18. 18

    Koller M, Stahel WA. Sharpening Wald-Type Inference in Robust Regression for Small Samples. Comput Stat Data Anal. 2011; 55(8):2504–15.

  19. 19

    Koller M. Robust estimation of Linear Mixed Models. PhD Thesis. 2013. http://e-collection.library.ethz.ch/eserv/eth:6670/eth-6670-02.pdf.

  20. 20

    Koller M. robustlmm: Robust Linear Mixed Effects Models. R package version 2.1. 2015. http://CRAN.R-project.org/package=robustlmm.

  21. 21

    Koller M. robustlmm: An R Package for Robust Estimation of Linear Mixed-Effects Models. J Stat Softw. 2016; 75(6):1–24.

  22. 22

    Lourenço VM, Pires AM, Kirst M. Robust linear regression methods in association studies. Bioinformatics. 2011; 27(6):815–21.

  23. 23

    Lourenço VM, Pires AM. M-regression, false discovery rates and outlier detection with application to genetic association studies. Comput Stat Data Anal. 2014; 78:33–42.

  24. 24

    Lourenço VM, Rodrigues PC, Pires AM, Piepho H-P. A robust DF-REML framework for variance components estimation in genetic studies. Bioinformatics. 2017; 33(22):3584–94.

  25. 25

    Maronna RA, Martin DR, Yohai VJ. Robust Statistics. Chichester: Wiley; 2006.

  26. 26

    Maronna RA. Robust Ridge Regression for High-Dimensional Data. Technometrics. 2011; 53(1):44–53.

  27. 27

    Midi H, Zahari M. Estimators in the Presence of Outliers and Multicollinearity. Jurnal Teknologi. 2007; 47(C):59–74.

  28. 28

    Mrode RA, Thompson R. Linear Models for the Prediction of Animal Breeding Values, 2nd Edition. Wallingford; 2005.

  29. 29

    Oakey H, Verbyla A, Pitchford W, et al.Joint modelling of additive and non-additive genetic line effects in single field trials. Theor Appl Genet. 2006; 113:809–19.

  30. 30

    Pen̋a D, Yohai VJ. A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc. 1999; 94:434–45.

  31. 31

    Petersen RG. Agricultural field experiments/design and analysis. New York: Marcel Dekker; 1994.

  32. 32

    Piepho HP, Möhring J. Computing heritability and selection response from unbalanced plant trials. Genetics. 2007; 177:1881–8.

  33. 33

    Piepho HP, Möhring J, Melchinger AE, Büchse A. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica. 2008; 161:209–28.

  34. 34

    Piepho HP. Ridge regression and extensions for genomewide selection in maize. 2009; 49:1165–76.

  35. 35

    Piepho HP, Möhring J, Schultz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J. 2012a; 54:844–60.

  36. 36

    Piepho HP, Ogutu JO, Schultz-Streeck T, Estaghvirou B, Gordillo A, Tchenow F. Efficient computation of Ridge-Regression Best Linear Unbiased Prediction in genomic selection in plant breeding. Crop Sci. 2012b; 52:1093–104.

  37. 37

    Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. 2000.

  38. 38

    Pinheiro JC, Bates DM, DebRoy S, Sarkar D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1–117. 2014. http://CRAN.R-project.org/package=nlme.

  39. 39

    Rocke DM, Woodruff DL. Identification of Outliers in Multivariate Data. J Am Stat Assoc. 1996; 91:1047–61.

  40. 40

    Rodríguez-Álvarez MX, Boer MP, Van Eeuwijk FA, Eilers PH. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat Stat. 2018; 23:52–71.

  41. 41

    Rodrigues PC, Monteiro AF, Lourenço VM. A Robust AMMI model for the analysis of genotype-by-environment data. Bioinformatics. 2016; 32(1):58–66.

  42. 42

    Schulz-Streeck T, Ogutu JO, Piepho HP. Comparisons of single-stage and two-stage approaches to genomic selection. J Theor Appl Genet. 1996; 126:69–82.

  43. 43

    Searle SR. Linear models. New York: Wiley; 1971.

  44. 44

    Searle SR, Casella G, McCulloch CE. Variance Components. 1992.

  45. 45

    Silvapulle MJ. Robust ridge regression based on an M-estimator. Aust J Stat. 1991; 33(3):319–33.

  46. 46

    Smith A, Cullis BR, Gilmour A. The analysis of crop variety evaluation data in Australia. Aust N Z J Stat. 2001; 43:129–45.

  47. 47

    Smith AB, Cullis BR, Thompson R. The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. J Agric Sci. 2005; 143(6):449–62.

  48. 48

    Tanaka E. Simple robust genomic prediction and outlier detection for a multi-environmental field trial. arXiv preprint arXiv:1807.07268. 2018.

  49. 49

    Tukey JW. A survey of sampling from contaminated distributions In: Olkin I, et al., editors. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford: Stanford University Press: 1960. p. 448–85.

  50. 50

    Utz HF. PlabStat: A computer program for statistical analysis of plant breeding experiments. Institute of Plant Breeding, Seed Science and Population Genetics. 2011. University of Hohenheim, D-70593 Stuttgart, Germany.

  51. 51

    VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008; 91:4414–23.

  52. 52

    Zahari SM, Zainol MS, Al-Banna MI, Ismail B. Weighted Ridge MM-Estimator in Robust Ridge Regression with Multicollinearity. In: Proceedings of Mathematical Models and Methods in Modern Science. World Scientific and Engineering Academy and Society: 2012.

Download references

Acknowledgements

We thank KWS LOCHOW GMBH for providing the rye dataset and KWS for providing the maize dataset. We also thank Paul Schmidt for the helpful discussion regarding Oakey’s heritability results.

Funding

This work received partial financial support from Portuguese National Funds by the Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) through Projects PTDC/MAT-STA/0568/2012, UID/MAT/00297/2019 and UIDB/00297/2020 (CMA: Centro de Matemática e Aplicações) and Sabbatical Grants SFRH/BSAB/105935/2014 and SFRH/BSAB/142919/2018. VML, HPP & JOO acknowledge partial support by the Conselho de Reitores das Universidades Portuguesas (CRUP) and the Deutscher Akademischer Austauschdienst (DAAD) through the Luso-German Integrated Actions Project PT/A13/17-DE/57339863. HPP acknowledges support by the German Research Foundation (DFG, Grant PI 377/18-1). JOO was supported by the German Research Foundation (DFG, Grant OG 83/1-1 & OG 83/1-2).

Author information

VML and HPP conceived the project. VML and JOO wrote the R code, performed the simulations and the analyses. VML wrote the manuscript with input from JOO and HPP. HPP supervised the project. All authors read and approved the final version of the manuscript.

Correspondence to Vanda Milheiro Lourenço.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lourenço, V.M., Ogutu, J.O. & Piepho, H. Robust estimation of heritability and predictive accuracy in plant breeding: evaluation using simulation and empirical data. BMC Genomics 21, 43 (2020) doi:10.1186/s12864-019-6429-z

Download citation

Keywords

  • Genomic prediction
  • Predictive accuracy
  • Heritability
  • SNPs
  • Robust estimation