Projecting results of zoned multi-environment trials to new locations using environmental covariates with random coefficient models: accuracy and precision

Key message We propose the utilisation of environmental covariates in random coefficient models to predict the genotype performances in new locations. Abstract Multi-environment trials (MET) are conducted to assess the performance of a set of genotypes in a target population of environments. From a grower’s perspective, MET results must provide high accuracy and precision for predictions of genotype performance in new locations, i.e. the grower’s locations, which hardly ever coincide with the locations at which the trials were conducted. Linear mixed modelling can provide predictions for new locations. Moreover, the precision of the predictions is of primary concern and should be assessed. Besides, the precision can be improved when auxiliary information is available to characterize the targeted locations. Thus, in this study, we demonstrate the benefit of using environmental information (covariates) for predicting genotype performance in some new locations for Swedish winter wheat official trials. Swedish MET locations can be stratified into zones, allowing borrowing information between zones when best linear unbiased prediction (BLUP) is used. To account for correlations between zones, as well as for intercepts and slopes for the regression on covariates, we fitted random coefficient (RC) models. The results showed that the RC model with appropriate covariate scaling and model for covariate terms improved the precision of predictions of genotypic performance for new locations. The prediction accuracy of the RC model was competitive compared to the model without covariates. The RC model reduced the standard errors of predictions for individual genotypes and standard errors of predictions of genotype differences in new locations by 30–38% and 12–40%, respectively. Supplementary information The online version contains supplementary material available at (10.1007/s00122-021-03786-2).


Introduction
The main goal of a plant breeding programme is to develop well-adapted genotypes in a target population of environments (TPE). Multi-environment trials (MET) are conducted to evaluate candidate genotypes in the TPE, and to understand and exploit the pattern of genotype × environment interactions (GEI) in the TPE. GEI is the differential response of genotypes across different environments (Kang and Gorman 1989). GEI in a TPE can be exploited to make more targeted predictions and recommendations on cultivars. This is of particular interest when there is crossover interaction, which poses a challenge when selecting genotypes for broad adaptation.
Identification of environmental covariates that are responsible for GEI is useful to enhance the predictive capability of MET analyses (Heslot et al. 2014) and evaluate the adaptability of the genotypes to the new target environment. The most commonly used types of environmental covariates are soil and meteorological covariates (van Eeuwijk et al. 2016). Incorporating environmental covariates in the GEI analysis has been done by factorial regression (Denis 1988;Piepho et al. 1998;van Eeuwijk and Elgersma 1993). Furthermore, environmental covariates have been used in a linear mixed model framework, such as in quantitative trait loci (QTL) Communicated by Martin Boer. 1 3 biparental mapping to dissect the response of marker effects to the environmental covariates known as QTL × environment interactions (Boer et al. 2007;Crossa et al. 1999;Malosetti et al. 2004) or as eco-physiological QTL (van Eeuwijk et al. 2005). Environmental covariates have also been introduced in genomic selection (Gillberg et al. 2019;Heslot et al. 2014;van Eeuwijk et al. 2019).
The regression on environmental covariates is usually modelled by fixed effects. This type of modelling is appropriate when only studying the pattern of GEI at the tested locations. Such models are also appropriate for making predictions in an unstructured TPE. However, when the TPE is sub-divided into zones, it is necessary to model genotypic effects as random in order to borrow strength between zones (Buntaran et al. 2019(Buntaran et al. , 2020. If such modelling is coupled with factorial regression approaches, genotype-specific regression coefficients must be modelled as random effects as well, giving rise to what are known as random coefficient (RC) models (Longford 1993;Milliken and Johnson 2002).
Although MET are usually designed to cover the whole TPE, none of the trials in an MET coincides exactly with a grower's field or location. Thus, grower's fields, the real target of breeding, must be seen as new locations in the TPE. In the same vein, it may be said that the MET analysis is usually used to produce predictions of tested genotypes for a new location, making use of the information from tested locations. Predicting genotype performance in a new location is akin to predicting values that have no records at all. Henderson (1977) showed that best linear unbiased prediction (BLUP) can be used to predict breeding values for the animals that had no records. BLUP, therefore, can also be used to obtain genotype predictions in a new location that has no records.
Since growers' fields hardly ever coincide with the trials' location and, in practice, cultivar yield will never reach the exact same value as the predicted mean values from the MET, reporting the precision or precision measures as quantified by standard errors and prediction intervals is highly desirable. Without this, growers are left with having no information regarding the precision in the predictions that are reported. The key challenge is that the standard errors of predictions of variety means obtained from the routine analysis of MET are only valid for the locations where the trials were carried out, but not for the untested locations or growers' field. However, the precision of the predictions for the untested locations is crucial in order to assist growers in selecting a cultivar for their farm or field.
Accuracy refers to how close the value of the statistic is to a supposed ''true value''. In other words, accuracy measures how close an estimate ̂ of a parameter is to the ''true value'' of (Kotz et al. 2006). Accuracy can be evaluated via a crossvalidation (CV) study by estimating prediction error (Hastie et al. 2009). From a CV study, accuracy can be measured in terms of mean squared error (MSE), which consists of variance and squared bias of ̂ . Conversely, the precision of an estimator ̂ , measures how narrow the distribution of ̂ clusters about its expected value (Kotz et al. 2006). The precision of ̂ is the reciprocal of the variance of ̂ .
In this study, we explore the use of the RC models for improving the precision of yield predictions in some new locations, which represent growers' fields, and evaluate the prediction accuracy. The precision of predictions is assessed with standard errors of predictions of genotypic values (SEPV) and standard errors of the predictions of pairwise differences of genotypic values (SEPD). The prediction accuracy was evaluated via a CV study.

Dataset
In this study, a dataset from Swedish official cultivar testing in 2016 was used. The dataset comprises 25 genotypes of winter wheat tested in 18 locations. In addition, the dataset includes information about four new locations. The 22 locations are stratified into three zones: South, Middle, and North (Buntaran et al. 2019). Two of the four new locations are located in the North zone, and two in the South zone. The new locations have no observations of yield for any of the genotypes but do have data on environmental covariates. The layout of the trials at each location was an α-design with two replicates. The available covariates were soil properties, i.e. pH, clay content, and humus content. The covariates are location-specific.

Statistical models
A two-stage fully-efficient stage-wise approach was used, which forwards the full variance-covariance matrix of adjusted means from stage I (Damesa et al. 2017;Piepho et al. 2012) to the final analysis in stage II of a stage-wise analysis. In stage I, each location was analysed individually using a linear mixed model (LMM). The general equation of an LMM is written as where is the vector of yield is the design matrix for fixedeffects of vector , is the incidence matrix for randomeffects of vector , and is the vector residuals. The distributions of the random effects and the residuals are ∼ N( , ) and ∼ N( , ). Then, the distribution of the response is ∼ N( , ) , where = � + .
In our case, in stage I, the fixed effects for the jm-th trial are where the jm is the intercept of the j-th location nested in the m-th zone, and ijm is the effects of i-th genotype in the j-th location nested in the m-th zone, for i = 1, 2, … , I , j = 1, 2, … , J m , and m = 1, 2, … , M . The letter I denotes the number of genotypes, M is the number of zones, J m is the number of locations within the m-th zone, and J = ∑ M m=1 J m is total number of locations. The random effects and e are where r jkm ∼ N(0, 2 r ) is the random effect of the k-th replication in the j-th location nested in the m-th zone, b jklm ∼ N(0, 2 b ) is the random effect of the s-th incomplete block of the k-th replication in the j-th location nested in the m-th zone, and e ijklm is the residual plot error associated with the observation y ijklm in vector . In a single-stage analysis, the replication effect is random with necessity because it is nested within location, which is a random factor. Thus, to mimic this approach in the fully efficient two-stage analysis, the replication effect is assigned to be random.
From stage I analysis, we obtained the adjusted means at the j-th location in the m-th zone, ̂ ijm , which are estimated by best linear unbiased estimation (BLUE), and e ijm , the error associated with the adjusted means. The ijm = jm + ijm r jkm + b jklm + e ijklm adjusted means and the associated error were used for stage II analysis.
In stage II, the analysis was conducted using a model that comprises zone effects since the locations are stratified into three zones. For the stage II model, the vector consists of the means ̂ ijm from stage I and the error vector has sub-vectors j with elements e ijm , with var j = j . The variance-covariance matrix j of adjusted means in the j-th location is obtained in stage I by using residual maximum likelihood (REML). The overall variance-covariance matrix of is block diagonal, i.e.
In stage II, seven fixed-genotype-effect (FG) and seven random-genotype-effect (RG) statistical models were fitted. The entries in vectors and of the FG and RG models are presented in Tables 1, 2, respectively. In stage II, for all 14 models, the location effect nested in the m-th zone is random, whereas the effect of the zone, m , is fixed. In the RG models, the genotype × zone interaction effect, (g ) im , is random because genotype is a random factor. The random genotype main effect and genotype × zone interaction effect allow exploiting information across zones when computing zone-based prediction of genotypes (Buntaran et al. 2019;Kleinknecht et al. 2013;Piepho and Möhring 2005;Piepho et al. 2016).
In the seven FG models, two models are LMM without covariates but with different variance-covariance structure for location and genotype × location effects, two models are LMM with covariate modelled by fixed effects, and three models are LMM with covariate interactions terms. Below, we describe each model in Table 1: 1. FG1 is the baseline LMM without covariate. The random-effect terms in this model are the effect of location nested in the m-th zone, l jm ∼ N(0, 2 l m ) , and the interaction effect of genotype and location nested in the m-th ) . The variance-covariance structure for the location effect, l jm , is a heterogeneous zone-specific covariance structure, l = ⊕ M m=1 l m , where l m = 2 l m with I a J m -dimensional identity matrix. The structure for the genotype × location effect, ( l) ijm , is also a heterogeneous zone-specific covariance structure is , assuming that all genotypes were tested in all locations. 2. FG2 is the FG1 model but with homogeneous variancecovariance structure for the location and genotype × location effects. Thus, the location effect, l jm has the J-dimensional structure l = σ 2 l , and for the genotype × location effect ( l) ijm , the JI-dimensional structure is σ 2 l . 3. FGC is a model with a covariate. The covariate values, x jm , are location-specific. In this model, the covariate is modelled by a linear trend. The notation for fixed regression terms involving the covariate is x jm + m x jm , where is the fixed effect for the slope of the covariate and m is the fixed effect for the zone-specific slope of the covariate for the m-th zone. The variance-covariance structures for genotype, location, genotype × zone, and genotype × location effects are the same as in the FG1 model. 4. FGCQ is the FGC model with an additional quadratic term for the covariate, x 2 jm . The slope for this quadratic term is denoted as . Note that in this model, the zonespecific interaction term of the quadratic covariate was not included because it was not significant. Thus, we decided to settle for the FGCQ model without the zonespecific quadratic term. 5. FGI1 is a model with covariate interactions terms for genotype, i , and genotype × zone, ( ) im . Hence, this model has genotype and genotype × zone specific coefficients for the intercepts and slopes. FGI1 is the most complex model because it has linear and quadratic terms for the covariate interacting with genotype and genotype × zone. The model for covariate interaction in the genotype term is i + 2i x jm + 3 i x 2 jm where i is the genotype-specific intercept for the i-th genotype, 2i is the linear genotype-specific slope for the i-th genotype, and 3i is the quadratic genotype-specific slope for the i -th genotype. The model for covariate interaction in the genotype × zone term is ( ) im + 4 im x jm + 5 im x 2 jm , where ( ) im is the genotype × zone-specific intercept for the i-th genotype in the m-th zone, 4im is the linear genotype × zone-specific slope for the i-th genotype in the m-th zone, and 5im is the quadratic genotype × zonespecific slope for the i-th genotype in the m-th zone. 6. FGI2 is a reduced version of the FGI1 model. The regression coefficients for the linear and quadratic term in the genotype main effect are removed. 7. FGI3 is a reduced version of the FGI1 model. The regression coefficients for the linear and quadratic term in the genotype × zone term are removed.
For the seven RG models, the differences compared to the seven FG models is that the genotype effect was random and there were three RC models due to the interactions between the covariate and genotype and genotype × zone effects. In the seven RG models, two models are LMM without covariates but with different variance-covariance structures for location and genotype × location effects, two models are LMM with covariate modelled by fixed effects, and three models are LMM with random coefficients for the regression on covariates. Below, we describe each model in Table 2: 1. RG1 is the baseline LMM without covariate and random coefficients. The random-effect terms in this model are the effect of genotype, g i ∼ N(0, 2 g ) , the effect of location nested in the m-th zone, l jm ∼ N(0, 2 l m ) , the interaction effect of genotype and zone, (g ) im ∼ N(0, 2 g ) , and the interaction effect of genotype and location nested in the m-th zone, (gl) ijm ∼ N(0, 2 (gl) m ) . The variance-covariance structure for the genotype effect, g i , is g = σ 2 g . The structure for the genotype × zone effect, (g ) im , is g . For the location effect, l jm , the variancecovariance structure is a heterogeneous zone-specific covariance structure, heterogeneous zone-specific covar- with I a J m -dimensional identity matrix. The structure for the genotype × location effect, (gl) ijm , is also a heterogeneous zone-specific covariance structure is , assuming that all genotypes were tested in all locations. 2. RG2 is the RG1 model but with homogeneous variance-covariance structure for the location and genotype × location effects. Thus, the location effect, l jm has the J-dimensional structure l = σ 2 l , and for the genotype × location effect (gl) ijm , the JI-dimensional structure is σ 2 gl . 3. RGC is a model with a covariate. The covariate values, x jm , are location-specific. In this model, the covariate is modelled by a linear trend. The notation for the fixed regression terms involving the covariate is x jm + m x jm , where is the fixed effect for the slope of the covariate and m is the fixed effect for the zone-specific slope of the covariate for the m-th zone. The variance-covariance structures for the genotype, location, genotype × zone, and genotype × location effects are the same as in the RG1 model. Thus, the random effects comprise no regression terms. 4. RGCQ is the RGC model with an additional quadratic term for the covariate, x 2 jm . The slope for this quadratic term is denoted as . As in the FGCQ model, the zonespecific interaction term of the quadratic covariate was not included because it was not significant. Thus, we decided to settle for the RGCQ model without the zonespecific quadratic term. 5. RC1 is a model with random coefficients. Since the covariate is location-specific, it was not possible to fit random coefficients for location. Thus, a RC model was fitted for the genotype term, g i , and the genotype × zone term, (g ) im . Hence, this model has genotype and genotype × zone-specific coefficients for the intercepts and slopes. RC1 is the most complex model because it has linear and quadratic terms for the covariate, and allows a covariance between intercept and slope effects, meaning that g and g are unstructured variance-covariance matrices. It is essential to allow the covariance between intercept and slope to be a free parameter in order to ensure invariance with respect to translation and scale transformation of the covariates (Longford 1993; Piepho and Ogutu 2002). Note that when the covariance structure is diagonal, then the structure is only invariant to scale transformations but not to translations (Wolfinger 1996). The model for random coefficients in the genotype term was where a i is the random genotype-specific intercept for the i-th genotype, b i is the random linear genotype-specific slope for the i-th genotype, and c i is the random quadratic genotype-specific slope for the i-th genotype. The model for random coefficients in the genotype × zone term was where d im is the random genotype × zone specific intercept for the i-th genotype in the m-th zone, h im is the random linear genotype × zone-specific slope for the i-th genotype in the m -th zone, and c i is the random quadratic genotype × zonespecific slope for the i-th genotype in the m-th zone.
6. RC2 is a reduced version of model of RC1. The random regression coefficients for the linear and quadratic terms in the genotype main effect are dropped completely, so its covariance structure is g = σ 2 g . 7. In the RC3 model, only the random coefficients for the genotype × zone term is removed completely, so its covariance structure is g = σ 2 g .

Covariate selection and scaling
The covariate selection was done by extending the fixedeffects part of the FG1 model with the covariates as follows: Parameters 1 and 2 are the coefficients of linear and quadratic terms for the scaled clay covariate, respectively, 3 and 4 are the coefficients of linear and quadratic terms for the scaled pH covariate, respectively, and 5 and 6 are the coefficients of linear and quadratic terms for the scaled humus covariate, respectively. The pH and humus covariates were standardised to mean 0 and standard deviation 1. The clay covariate was scaled as (clay − 40)∕10 since this scaling resulted in non-negative variance estimates for the RC1 model. The fixed effect tests were adjusted with Kenward-Roger denominator degrees of freedom (Kenward and Roger 1997). If the F-test of a covariate effect was not significant at α = 5%, the covariate was dropped. The quadratic terms were tested first. The linear terms were tested linear only if the quadratic had to be dropped.

Predictions of genotypes in new locations
All models in Tables 1 and 2, which are special cases of Eq. 1, were fitted using the mixed model equations (MME) in Eq. 2 (Henderson 1950). The information used to estimate the G and R matrices in this MME are the records from the tested locations. Since the G and R matrices are unknown, they are estimated from the data via REML, producing estimates ̂ and ̂ : The solution for ̂ , which is the empirical BLUE (EBLUE), is obtained via generalised least-squares and ̂ , the empirical best linear unbiased predictor (EBLUP) of , is McLean et al. (1991) discussed three types of inference space, i.e. the broad, narrow, and intermediate spaces.
Estimates for these inference spaces are computed based on estimable functions ′ , which are linear combinations of fixed effects only, and predictable functions � + � , which consist of linear combinations of fixed effects and random effects ′ . The matrix ′ consists of coefficients 1 and 0, where 1 indicates fixed effects needed in the estimable function, and covariate values for models that include covariates. The matrix ′ contains the coefficients 1 and 0 according to the relevant random effects in the predictable function, and covariates for the RC models. The estimable function results in the BLUE, while the predictable function results in the BLUP. The differences between three types of inference space are explained in the Appendix.
In this study, the intermediate inference space (McLean et al 1991) is useful because it determines which genotype is the best in the particular environment. Hence, we focus on this inference space. The predictable functions for the EBLUPs in the new location can be expressed as where ′ is the estimable function of fixed effects for zone. This term also includes fixed covariate terms for the models that use covariates. The matrix ′ selects the fixed effects for the targeted zone where the new location is located, and the covariates for models with covariate. The term ′ is the predictable function involving the genotype main effect and the genotype × zone interaction effects of the zone where the new location located, as well as any random covariate terms. The matrix ′ selects the relevant random effects that can be estimated from the tested location; thus, it comprises the main effects of genotype and the genotype × zone interactions. For RC models, the matrix ′ includes intercept and slope for the random-effects terms that have interactions with covariates. The last term, ′ 0 0 , is the predictable function of the location main effect and genotype × location interaction for the new location, with var( 0 ) = 0 . The matrix ′ includes any covariate terms for the RC models. In the FG models, there is no ′ term because the genotype and zone effects are fixed, and their interactions effects are also fixed.
The prediction of ′ 0 0 is always zero because there is no information on these effects for the new location. Thus, the random effect 0 is not estimated as such, but its distribution is needed in order to obtain the standard errors of predictions of genotypic values (SEPV) and standard errors of the predictions of pairwise differences of genotypic values (SEPD) for the genotypes in the new location. An important point to be observed here is that the random effects 0 for a new location are stochastically independent of the BLUP of � + � , which can be regarded as the conditional expectation of w , given fixed and random effects of the model for the observed data. Thus, the estimable effects can be expressed as a conditional mean as follows: Furthermore, on account of the independence assumption, var( ) = , var 0 = 0 , and var , 0 = diag( , 0 ) , the conditional variance of w is:

SEPV and SEPD in new locations
In the further derivation, it is key to observe that the estimate of in (6) and the random effects u 0 for the new locations are stochastically independent. Thus, the total variance of the prediction of w in (5) is simply the sum of the variance of the estimate of in (6) and the variance in (7). Hence, the SEPV in the new locations are computed as follows: where var ̂ is the square of standard errors of EBLUPs for zone-based genotype average, and for our models var(w| , ) is the sum of the variance components of effects for location and genotype × location. For the models with covariates, var(w| , ) includes the variances and covariances for intercepts and slopes of random coefficients. In the FG models, var ̂ is the square of the standard errors of the EBLUE of for the zone-based genotype average. For plant breeders and growers, the differences between genotypes are more informative than point estimates for individual genotypes. Thus, the pairwise prediction of differences and the SEPD needs to be computed. The predictions of pairwise differences can be computed using coefficients 1 and -1 in the and matrices according to the difference of interest genotypes.
The SEPD for the new locations are computed as: where VDIFF ̂ is the square of the standard error of a difference of EBLUPs or EBLUEs for zone-based genotype averages, and 2 gl is the variance component of genotype × location effects. In the FG models the 2 gl refers to 2 l . In (9), the latter is assumed homogeneous between zones, as in models FG2 and RG2. Alternatively, the variance may be assumed zone-specific, as in the other models listed in Tables 1 and 2. Meeker et al. (2017) define a prediction interval for a single future observation is an interval that will, with a specified degree of confidence, cover a future randomly selected observation from a distribution with pre-specified coverage probability (1 − ) . Following this definition, the prediction intervals for genotype performances are regarded as a prediction interval to contain a single future observation. The important assumption of this interval is that the previously sampled locations and the future one can be regarded as random samples from the same distribution. The genotype × location effects for the new location can be regarded as a random sample from the same distribution as that of the genotype × location effects for locations in the dataset. The prediction interval for w is centred at and the approximate (1 − ) × 100% prediction interval is given by:

Prediction intervals for the genotypes in the new locations
where z 1− ∕2 is the (1 − ∕2) × 100% quantile of the standard normal distribution. The prediction interval for pairwise differences between the i-th and i ′ -th genotype is given by: where DIFF(̂ ) is the difference in predictions between two genotypes.

Cross-validation for model selection
A CV study can measure the prediction errors of the model using the mean squared error of prediction (MSEP) of difference. We conducted a leave-one-out CV for model comparison and selection. To mimic the prediction for new locations, we left one location out at a time and assigned its data as the validation set, and the data from the remaining locations as the training set. For the models with covariate, the covariate in the validation set was used for predictions. The MSEP was computed similarly to the MSEP proposed by Piepho (1998) for measuring the prediction accuracy of the models. The MSEP is a standard statistic for evaluating predictive accuracy. Let y (10) ± z 1− ∕2 SEPV (11) DIFF(̂) ± z 1− ∕2 SEPD and z denote the observed and predicted values, respectively, and let I denote the total number of genotypes, and J the total number of locations. The assessment was measured based on the discrepancies between observed ( y ij − y i � j ) and predicted ( z ij − z i � j ) pairwise differences, since the main interest in cultivar trials is in prediction of differences between genotypes rather than performance of individual genotype (Piepho, 1998). The discrepancies between the observed and predicted pairwise differences from the 18 folds were accumulated and the MSEP was computed from this accumulation. The MSEP is computed as follows: The model producing the smallest MSEP is preferable because it predicts the yield differences in the validation set most accurately.

Implementation of the models
All models were implemented in SAS 9.4 (SAS Institute 2013) and ASReml-R 4.1.0.130 (Butler et al. 2017). We briefly describe the implementation strategy and visualisations for each package in the Supplementary Material.

Visualisation
In this section, some visualisation methods are proposed. In SAS, a prediction intervals plot to present predictions with prediction intervals can be generated using PROC SGPLOT. In this plot, the prediction intervals generated based on Eq. 10. A heatmap to present genotype pairwise differences can be generated using PROC TEMPLATE and  (Hochberg and Tamhane 1987). In this case, the α was divided with 300 since there were 300 pairs. The z score was obtained by division of each predicted pairwise difference by its corresponding SEPD. Another way to present genotype pairwise differences is by generating a table comprising a letter representation of all-pairwise differences can be constructed with the insert-and-absorb algorithm (Piepho 2004). This algorithm is implemented in a SAS macro (Piepho 2012) and can be obtained from https:// biost atist ik. uni-hohen heim. de/ filea dmin/ einri chtun gen/ biost atist ik/ Tools_ und_ Macros/ SAS-Macros/ mult. sas. The basis of significance for this table is the same as the significance in the heatmap. In R, the prediction intervals plot and a heatmap can be generated using the ggplot2 (Wickham 2016) and corrplot (Wei and Simko 2017) packages, respectively.

Results
The fixed-effects test results of the extended FG1 model for covariate selection are given in Table 3. Based on the F-test, the only significant covariate was the squared scaled clay content. Thus, in this study, we decided to use only the scaled linear and quadratic clay content as covariates for all 14 models.
In this section, we present the results of all models fitted by SAS. The results from ASReml-R are available as the Supplementary Materials. The fit statistics report including the deviance, information criterion, ∆Deviance and ∆AIC of the seven FG models and the seven RG models are given in Tables S8 and S9, respectively, in the Supplementary Materials. The fit statistics from ASReml-R are given in Tables S3 and S4 for the seven FG models and seven RG models, respectively.
The covariance parameter estimates of the seven FG models are presented in Table 4. The corresponding ASReml-R outputs are presented in Table S1. The variance estimates for location and genotype × location effects in each zone were highly heterogeneous, as shown for the FG1 model. Compared to the FG2 model, the homogeneous variance structure for the location effect seems not appropriate because the homogeneous variance estimate for location was far larger than the heterogeneous variance estimate for location in the North and South zones, and far smaller for location in the Middle zone. For the genotype × location effect, the homogeneous variance estimate was larger than the heterogeneous genotype × location variance estimate for the North zone, but smaller than for the Middle and South zones. Thus, again, the FG2 model might be less appropriate than the FG1 model. The use of both a linear and a quadratic covariate term (the FGCQ model) decreased the estimate of the inter-location variance compared to the use of only a linear term (the FGC model). However, the estimate of the genotype × location variance did not change much compared to the estimate of the inter-location variance. When the covariates interacted with genotype and genotype × zone, as in the FGI1 model, the estimates of the inter-location variance were larger than the FGCQ model, but the estimate of the variance for the South zone in the genotype × location dropped considerably. There was no difference in any variance component estimates in the FGI1 and FGI2 models. Thus, unexpectedly, when the model comprised fixed effects of genotype × zone and covariates interactions, dropping the covariate interactions with the genotype main effect, variance component estimates did not change. In the FGI3 model, only the genotype main effect interacted with the covariates. Here, the heterogeneous variance component     estimates for location and genotype × location effects were similar to the estimates in the FGCQ model. Table 5 presents the covariance parameter estimates of the seven RG models. The ASReml-R results for these models are given in Table S2. Unlike the FG models, the RG models included genotype and genotype × zone variance components. The pattern of estimates for inter-location and genotype × location variances were similar to the FG1, FG2, FGC, and FGCQ models. Interestingly, for the RC1, RC2, and RC3 models, the estimate of the genotype × location variance for the South zone dropped substantially compared to the other RG models. However, the estimate of the interlocation variance for the North zone was higher for the RC3 model than for the other RC models.
The average SEPV and SEPD, over 25 genotypes by location and by all 14 models, are presented in Tables 6 and 7, respectively. The results from ASReml-R are given in Table S5 and S6, respectively. The highest SEPV were observed using the RG2 and FG2 models. These two models had no covariate and used a homogeneous variance structure for the location and genotype × location terms. The RC2 and RC3 models were the two models with the smallest SEPV. The RC2 model had the smallest SEPV in the locations N01, N02, and S01, while the RC3 model had the smallest SEPV only in the location S02. However, the difference between the RC2 and the RC3 models for location S02 was subtle. The FGI3 model, with covariate interaction only with the genotype main effect, also had a small SEPV compared with other FG models and even compared with the RG models without any covariate terms. It is also apparent that modelling a heterogeneous variance structure for location and genotype × location terms improved the SEPV compared to the homogeneous variance structure because the RG and FG models had smaller SEPV than the RG2 and FG2. Compared to the FG and RG model, the SEPV in the RC2 model was improved by 37-38% for the untested locations in the South zone, and by 30-35% for the untested locations in the North zone.
The SEPD values were smaller than the SEPV values because the computation in Eq. 9 was only based on the terms involving genotype effects. In general, the RC2 model had the smallest SEPD except in location N02. However, again, the difference in SEPD between models RC1 and RC2 was small at this location. The FGI1 and FGI2 model had the same and the smallest SEPD among the FG models. In the FGI1 model, both genotype and genotype × zone terms had interactions with the covariates, while in the FGI2 model, only the genotype × zone term interacted with the covariate.
The heterogeneous variance structure for location and genotype × location was beneficial since the SEPD in the FG1 and RG1 models were smaller than in the FG2 and RG2 models. The SEPD depended on the covariate as well. For the untested locations in the South zone, the SEPD were smaller in the model where the covariate interacted with genotype and genotype × zone terms. Concomitantly, when the covariate had no interactions with genotype and genotype × zone terms, the SEPD of the untested locations in the South zone were higher than in the North zone. The SEPD in the RC2 model decreased by 36-40% for the untested locations in the South zone and by 12-20% for the untested locations in the North zone, which implies that the genotype × location variance estimate in the North zone is much higher than in the South zone. This is also shown in Table 5. Table 8 presents variance and correlations estimates of the RC2 model, since this was a promising model based on G × Z −0.14 Corr(3,2) G × Z 0.77 1 3 the averages of SEPV and SEPD. The correlation between the linear slope term and the intercept was 0.36, which is low. However, the correlation between the quadratic term and the intercept was -0.14, which is very low as well. The correlation between quadratic slope and the linear slope was quite high, i.e. 0.76. Thus, there was no problem with collinearity between these two terms. The results of ASReml-R are given in Table S7. The MSEP of the 14 models are listed in Table 9. The model with the smallest MSEP predicted yield differences in the validation set most accurately. The RG models outperformed the FG models because their MSEP were smaller than the FG models. Hence, although both the FGI1 and FGI2 models had the exact same SEPD, and their SEPD were fairly competitive to the SEPD in the RC models, these two models were the least performant since their MSEP were the highest. The RG1, RGC, and RGCQ models had the smallest MSEP, and their values were the same when they were rounded. The RC model with the random coefficients for the genotype × zone term (RC2) ranked fifth but considering that the three smallest MSEP were equal, the MSEP of the RC2 model can be considered as the third smallest. Clearly, the pattern shows that the models with covariate interaction in the genotype or genotype × zone terms were less performant than the models without such interactions. Based on the MSEP, the best models were RGCQ, RGC, RG1, RG2, and RC2.
We demonstrate how the proposed visualisations of the results can be implemented in SAS and R. Predicted values of the genotypes and the 95% prediction intervals  Figure S4. Figure 1 presents the upper and lower prediction limits, which were obtained from Eq. 10. For the prediction of pairwise differences of genotypes in location S01 by the RC2 model, a heatmap containing p-values is given in Fig. 2. The heatmap produced by R is presented in Figure S5. The heatmap can be useful for data with a large number of genotypes and can provide an easy way to detect, by colour coding, which pairwise comparisons of genotypes are significant. The red shading indicates significant differences of pairwise genotypes predictions at α = 5% with Bonferroni adjustment. The SAS and R codes for this purpose are available in the electronic Supplementary Material. The pairwise differences displayed by a letter-based representation using the insert-and-absorb algorithm (Piepho 2004) are given in Table 10. The drawback of presenting letter-based representation of multiple comparisons is the limitation of the alphabet numbers and the clutter when many letters are required. When the needed number of letters exceeds 26, then this method will not work with the Latin alphabet, which can occur for a very large number of genotypes. The SAS code to generate Table 10 in the electronic Supplementary Materials.

Discussion
In this study, it is shown that using clay in the quadratic term as a covariate and employing RC models reduced the SEPV averages for all new locations by 30-38%, and the SEPD averages by 12-40%. This shows that the RC models can improve the precision of predictions of genotypes performance and the precision of genotypes comparisons. By using the covariates in the random coefficients term, the SEPV is evaluated at specific values of the covariates (Milliken and Johnson 2002), which can substantially decrease the SEPV of the RC models compared to the models without any random coefficients. Between the 14 models, RC2 was the model with the smallest average SEPV for three new locations, except in location S02, the SEPV of the RC2 model was 0.18 higher than the RC3 model. Thus, it was minor. For the average SEPD, the RC2 model had only a marginally larger average SEPD in location N02 than the RC1 model. Fig. 2 Heatmap of p-value of the genotype pairwise differences in location S01 by the RC2 model by PROC TEMPLATE and PROC SGREN-DER. The significance level was adjusted using Bonferroni adjustment Nevertheless, the difference in average SEPD between the RC1 and RC2 models was small. The RC2 model utilised the linear and quadratic term in the genotype × zone interaction effects, but not in the genotype main effects. Figure S1 presents the quadratic regression per genotype and shows that the variation between genotypes is not that large, as the lines of genotypes are close to each other.
For this reason, the inclusion of random coefficients in the genotype term may not be worthwhile. The random coefficients in the genotype × zone term were more beneficial than in the genotype main-effect term, since the SEPD was higher when using the RC3 model than when using the RC2 model. Also, Figure S2 shows that the variation between genotypes × zone effects is large, as the lines of genotypes × zone are more widely spread out compared to Figure S1.
The model selection can be made by jointly considering SEPV and SEPD with MSEP from the CV study. In terms of MSEP, the FG models are clearly not favourable since their MSEP were higher than those of the RG models. This poor performance is expected because in difference to the RG models, the FG models cannot borrow strength across zones. Among the RG models, there was no clear winner when jointly considering the SEPV, SEPD, and MSEP. The precision was certainly improved via the RC2 model, but its MSEP was not the smallest. The models with the smallest MSEP were RGCQ, RGC, and RG. Nevertheless, the difference of the MSEP between the RC2 and the RG, RGC, RGCQ model was 116 g 2 × m −4 . Thus, the RC2 model predictions were less accurate � √ MSEP = 10.73 g × m −2 � compared to those three models. With the RG1 model, the prediction accuracy was slightly better, but the intervals and the uncertainty were larger compared to the RC2 model. Thus, the RC2 model is preferred based on the SEPV, SEPD, and the MSEP. The RC2 model minimised the uncertainty as indicated by lower SEPV and SEPD, although its MSEP was not the smallest.
Our study is similar to Jarquin et al. (2014) in that both use RC models to improve prediction accuracy. The major difference is that Jarquin et al. (2014) used an extensive number of environmental covariates with marker data and utilised this information by computing an environmental kinship matrix, for which a single variance component was fitted. Thus, implicitly, the model used by Jarquin et al. (2014) assumes that the slopes for the different covariates have the same variance and that there is no correlation between them. By contrast, in our study, a vast number of covariates and marker data were not available, but we allowed for heterogeneity in variance between slopes, and for covariance between slopes and intercepts to maintain the invariance feature of RC models (Longford 1993;Piepho and Ogutu 2002;Wolfinger 1996). It is acknowledged that we can afford to do so because we do not have a vast number of covariates. With an increasing number of covariates, fitting such RC models becomes more challenging, and it is not obvious how this can best be done. One option to circumvent numerical problems is to fit a low-rank approximation to the unstructured variance-covariance matrix for intercepts and slopes, i.e. a factor-analytic model (Jennrich and Schuchter 1986). Fitting a factor-analytic model guarantees that the variance-covariance matrix is positive definite. If the order of the factor-analytic model equals the number of slope terms plus the intercept, the model is equivalent to the unstructured model, whereas lower-rank approximations are obtained by reducing the order. Moreover, with a large number of covariates, covariate selection will be beneficial. R-square (R 2 ) for mixed models (Piepho 2019) is an option for covariate selection, as demonstrated in Hadasch et al. (2020). The best approach to accommodating a larger number of covariates certainly deserves further research.
The BLUPs for the untested locations are the predictions obtained from the observed locations, and these values are reported to growers. However, these BLUPs do not equal Table 10 The letter representation of all-pairwise differences by the insert-and-absorb algorithm (Piepho 2004) *Means not sharing any letter are significantly different at the 5% level of significance with Bonferroni adjustment (Piepho, 2018) Genotype Yield ( the yield that the growers will get in practice. Furthermore, the standard errors of the predictions based on the observed locations cannot be applied to the grower's fields since the trials hardly ever coincide with these. Hence, the prediction intervals for the untested locations need to be computed and presented so that the growers have a view of how precise each cultivar's prediction is for their fields. To compute valid standard errors for the untested locations, the location effect needs to be modelled as random to account for the uncertainty of effects for the untested locations. Note that the random location main effect, as well as the random cultivarlocation interaction effects, are part of the deviations from the regression curves. Prediction intervals are passed on the random variation around the curves. Hence, only if these effects are indeed present and modelled as random, can we compute valid prediction intervals for new locations. If the location effect is fixed, which is known as factorial regression, then the untested locations' uncertainty cannot be obtained since there is no variance component estimate for a fixed effect and the effect itself is unobservable. Besides, when the location effect is fixed, the standard errors are only valid for the locations where the trials are carried out. The next step, after having developed the uncertainty measure of the untested locations, could be to apply the RC model for a whole country to present maps of yield predictions using environmental covariates based on the available GIS data in the Swedish official cultivar trials. When the covariate was modelled using the quadratic term, the averages of SEPV dropped considerably. Furthermore, using RC models, the reduction of the averages SEPV and SEPD depends on the covariate value in the new locations; Table 8 shows that the reduction of SEPV and SEPD varied between the four new locations. Gozdowski et al. (2017) reported that the relationships between the fine soil fractions for the 0-60 cm layers and yield are curvilinear, which also was the case in this study. The clay content used in this study was measured for the 0-25 cm layer. The SEPV depends on the variance estimates of location and genotype × location. The SEPV will decrease when the variance of location or genotype × location decrease. For the SEPD, however, using a covariate did not decrease SEPD because the SEPD only depends on the terms associated with genotype. Moreover, all variance estimates of effects for genotype, genotype × zone, and genotype × location, are similar for the RG1, RG2, RGC, and RGCQ models, while for the RC models these three terms have different variance estimates compared to the RG1, RG2, RGC, and RGCQ models (Table 5). For the FG models, the same pattern occurred in that the genotype × location variance estimates of FG1, FG2, FGC, and FGCQ models were similar.
The covariate scale is crucial when implementing RC models. We found this covariate scale issue when we fitted the RC1 model. This scaling issue occurred when the random coefficients were fitted to both genotype and genotype × zone effects: 1. When the value subtracted from the covariate was lower than the covariate's mean, and the result was divided by 10, the variance component of the linear term of the random coefficients of the genotype main effect was 0. 2. The same results were obtained when the covariate was standardised to mean 0 and standard deviation 1 and when the value subtracted from the covariate was the covariate's mean, and the result was divided by 10.
The genotype main effect had a lower variance than the genotype × zone interaction effect, as shown in Figures S1 and S2. Thus, there was likely competition between genotype main effects and genotype × zone interaction effects in absorbing the variance because in the RC2 model, in which the random coefficients were only fitted for the genotype × zone interaction effect, the scaling of (1) and (2) caused no issue in the variance component estimates and the convergence. Besides, using the scaling of (1) and (2) in the RC2 model did not change the SEPV and SEPD for all new locations due to the invariance property ensured by allowing a covariance among random coefficients (unstructured variance-covariance). Other scalings such as subtraction-ofthe-minimum and covariate-centring were also attempted. When these scalings were used, the RC models did not converge. Thus, the covariate scaling is essential for model convergence and to obtain the appropriate variance parameter estimates. The clay covariate was scaled for all fitted models by (clay − 40)∕10 . We used this scaling because it yielded a positive definite variance-covariance matrix. The choice of covariates is important in the fixed-effects and randomcoefficients parts since they contribute to the improvement or worsening of the prediction precision. Furthermore, the scaling and linear transformations of the covariate are also essential since it can affect convergence and the estimates of covariance parameters.
Zone stratification improved precision because information can be borrowed between zones. However, it should be emphasized that a clear agro-ecological division is needed in order to improve precision for each zone. In this study, the North zone had slightly higher uncertainty than the South zone, which shows that in the North zone, geographical and environmental conditions might be more varied and predictions more uncertain. The SEPV of a zone can be improved by conducting more trials within a zone.
The RC models are useful for MET analysis because the covariate information, which is changing from one to the other trials, can be exploited in the random effects. This feature is useful to measure the adaptability of the genotypes in new environments, given the covariates of the new environment are available. In fact, the benefit of using a covariate in the RC models depends on choosing the appropriate covariate. Covariates can initially be selected based on the biological considerations. Still, it is necessary to check whether these covariate candidates improve the model fit.
The computational approach for predictions in the new locations is different from Henderson (1977), who considered predictions for animals that were not in the records. The main difference is that in our models the location-specific random effects are independent between the observed dataset and the new location, whereas with the Henderson (1977) approach the random effects for animals with records are correlated by pedigree with the animals having no records. Moreover, if there is no zone stratification, the predictions of genotypes in the new locations are merely the summation of genotype BLUPs and the grand mean.
In our study, the precision of these predictions was additionally improved by incorporating covariates using RC models in terms of decreasing standard errors of predictions and standard error of pairwise differences of genotype predictions. Using RC models requires that covariates are available at each location. When the dataset is augmented with data of the new location, the computing strategy needs to be adjusted due to fact that there are no observations on the response in the new locations. For SAS, the augmented dataset could be used to conduct the analysis. However, for ASReml-R, the augmented dataset could not be used due to a different functionality of the package. Thus, the computing strategy needed to be adjusted by excluding the new locations from the dataset to conduct the analysis. The SEPV and SEPD of the new locations had to be computed after the analysis was completed in ASReml-R (see Supplementary  Materials).
We demonstrated that predicting genotype yield for some new locations can be enhanced by inclusion of environmental covariates by borrowing information from other zones. A recent paper by Neyhart et al. (2021) followed the same spirit of predicting genotype performance in unobserved environments. Neyhart et al. (2021) assessed the genome-wide predictions in the unobserved environments for both between and within breeding generations. Resende et al. (2020) recently proposed the geospatial (geographic information system) genotype-environment interaction (GIS-GEI) method within an enviromics framework. This framework involves the joint analysis of MET data accounting for phenotypic, genotypic and envirotypic sources of information. It is anchored into a geoprocessing environment that employs enviromic markers, e.g. time-trend climate data, landscape or and management treatment information, obtained by means of modern envirotyping techniques. Our proposed RC modelling approach is ideally suited for integration in an enviromics-driven GIS-GEI framework. Moreover, the RC modelling can be used in the Bayesian framework as proposed by Theobald et al. (2002), who used a Bayesian method for making predictions with incorporating environmental covariates.

Conclusion
This study showed that the RC models can be used to improve the precision of the yield predictions of winter wheat genotypes in some new locations. The RC model RC2 was competitive, with regards to MSEP, compared to the RG, RGC, and RGCQ models. The RC2 model, with random coefficients of linear and quadratic terms in the genotype × zone effect, can be recommended based on joint consideration of precision in predictions and accuracy. The RC models improved the precision of the predictions for a new location by utilising covariate information in the new location in the random effects part, and by borrowing information from other zones via genetic correlation between zones. The crucial keys for improving the precision of the predictions in the RC models are the selection of suitable covariates, suitably scaling the covariate, modelling the appropriate trend for the covariate in the fixed-effects part, and using an unstructured variance-covariance for the random coefficients. The scale of the covariate is essential to obtain reliable variance component estimates and avoid convergence issues. Failing to select suitable covariates and to model the trend for the covariate also in the fixed-effects part may lead to larger SEPV and SEPD for new locations. Breeders can use RC models to determine the adaptability of tested genotypes in new environments. Agronomists and growers can use RC models to identify the best locally adapted genotypes. effects, which corresponds to the narrow inference space. The predictable function then is + 1 +ḡ + g 1 , where ḡ is the average of the main random genotype effect, I −1 ∑ g i , and g 1 is the genotype average in the random genotype × zone interaction effect in the Middle zone, I −1 ∑ ∑ ( g) i1 . 3. The intermediate inference space also involves random effects, but in difference to the narrow inference space, the intermediate inference space is subjectspecific. For example, the performance of i-th genotype in the j-th location of the Middle zone is given by + 1 + s j1 + g i + (g ) i1 + (gs) ij1 . The estimable function ′ is + 1 and the predictable function ′ is s j1 + g i + (g ) i1 + (gs) ij1 .