Models to estimate genetic gain of soybean seed yield from annual multi-environment field trials

Key message Simulations demonstrated that estimates of realized genetic gain from linear mixed models using regional trials are biased to some degree. Thus, we recommend multiple selected models to obtain a range of reasonable estimates. Abstract Genetic improvements of discrete characteristics are obvious and easy to demonstrate, while quantitative traits require reliable and accurate methods to disentangle the confounding genetic and non-genetic components. Stochastic simulations of soybean [Glycine max (L.) Merr.] breeding programs were performed to evaluate linear mixed models to estimate the realized genetic gain (RGG) from annual multi-environment trials (MET). True breeding values were simulated under an infinitesimal model to represent the genetic contributions to soybean seed yield under various MET conditions. Estimators were evaluated using objective criteria of bias and linearity. Covariance modeling and direct versus indirect estimation-based models resulted in a substantial range of estimated values, all of which were biased to some degree. Although no models produced unbiased estimates, the three best-performing models resulted in an average bias of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm\, 7.41$$\end{document}±7.41 kg/ha\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1/yr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1 (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm\, 0.11$$\end{document}±0.11 bu/ac\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1/yr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1). Rather than relying on a single model to estimate RGG, we recommend the application of several models with minimal and directional bias. Further, based on the parameters used in the simulations, we do not think it is appropriate to use any single model to compare breeding programs or quantify the efficiency of proposed new breeding strategies. Lastly, for public soybean programs breeding for maturity groups II and III in North America, the estimated RGG values ranged from 18.16 to 39.68 kg/ha\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1/yr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1 (0.27–0.59 bu/ac\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1/yr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}-1) from 1989 to 2019. These results provide strong evidence that public breeders have significantly improved soybean germplasm for seed yield in the primary production areas of North America. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-023-04470-3.

).  2).Data were analyzed with Model 1 and included BT3, PYT, and URT.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.  1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.6).
Figure A20: Root mean squared error (RMSE) considering 10 years of MET.Values were bounded at 0.72 units to enhance visualization.Model E9 (Table 6) was not included.
Table A2: Goodness-of-fit for models considered to estimate the RGG from the soybean empirical data.Fixed effects are underlined, and the RGG was estimated with model terms highlighted in gray.The numbers in parentheses (1, 2) in the model's description represent steps to fit the models.Models in step one (1) were adjusted only with check cultivars.Models E2, E2 1 , . . ., E2 6 , were evaluated according to the Akaike information criterion (Akaike 1974, AIC), and the RGG reported with the selected model.For further details on model terms and covariance modeling, check Table 6.6.  et al. 2006) for benchmark Model EB and E7 (Table 6).

B -Trait simulation in AlphaSimR
Additive (or total) QTL effects for the simulated traits in AlphaSimR (Gaynor et al. 2021) are simulated in two steps.The first one consists of sampling the effects either from a Standard Normal or Gamma distribution (the user must define its shape parameter).The second step applies a scaling factor user-defined variance initially sampled variance to match the user-defined additive (or total) variance.Then, an intercept (user-defined) is added to the simulated (and now scaled) QTL effects.This step mimics the introduction of a trait mean reference to which trait the user is simulating.In the case of a purely additive trait, the variance of the simulated breeding values in the founder population will match the user-defined additive variance.For further details, check the vignette "Traits in AlphaSimR" (https://cran.r-project.org/web/packages/AlphaSimR/vignettes/traits.pdf,accessed on 09/07/2022).
C -Phenotypic models used in the simulator for parental selection, yearly selections, and for single-trial analyses Parental genotypes were selected in BT3 after three years of trials.Data from BT1, BT2, and BT3 was combined, and the following linear mixed model was considered: where ŷijk is the vector of eBLUE values of genotypic means for the ijk th genotype × location × year combination estimated from single-trial analyses, and G i , L j , and Y k , are the main effects of the i th genotype, j th location, and k th years, respectively, followed by all two and three-way interaction effects.All model terms but Y k (underlined) were considered random effects.The error variance σ 2 ϵ ijk of ϵ ijk is assumed to be known from the analyses of the individuals' trials (Model C3), where the residual variance matrix is a diagonal matrix with elements equal to 1 SE 2 , where SE is the estimated standard error (Smith et al. 2001;Frensham et al. 1997).Yearly selections and the analyses of single trials were performed with the following models, respectively: where y ilb is the observed yield in a plot level at any given trial.The eBLUE values of genotypic ( ŷij.. , ŷijk ) means were estimated with weighted least squares, so that Var(ϵ ilb ) = ϵ 2 ilb n i , where n i is the number of plots of the i th genotype in the b th trial.Thus, genotypes have heterogeneous SE within trials due to the insertion of missing data, i.e., a random proportion from zero to 12% of the phenotypic data was considered missing data within trials.A basic explanation of stage-wise modeling is given in Appendix F.

D -Simulation of correlated additive QTL effects for GL and GY, and GLY
In terms of variance components, the genotype by environment interactions (GEI) is composed of genotype × location (GL, σ 2 GL ), genotype × year (GY, σ 2 GY ), and genotype × location × year (σ 2 GLY ) interaction variances.Both σ 2 GY and σ 2 GLY are non-static (unrepeatable) sources of variation, while the static portion of the σ 2 GL is repeatable across years (Yan 2016).Krause et al. (2023) investigated the GEI patterns in a historical soybean dataset composed of 39,006 phenotypic records from 1989-2019.This data comprises trials conducted by public plant breeders in the Northern Region of the U.S., and it embraces 4,257 experimental genotypes, 63 locations, and 31 years.The modeling scenarios B1 and B2 (Section "Variations of simulated MET models") were designed to capture the structure of this historical data as best as possible.
The following algorithm is a heuristic approach built to simulate correlated additive QTL effects in multi-environment trials (MET) according to GL and GY effects.For example, if 1,000 QTLs were simulated and the goal is phenotyping in L = 10 locations across Y = 2 years, the algorithm will initially sample 10 unique variance components for GL and two unique variance components for GY.The variances for individual locations and years are sampled from a mixture of three and six Log-Logistic distributions, respectively (Tables 1 and E1).For more details on how these empirical probability distributions of variance components were obtained, please refer to Appendix E. The initial diagonal variance-covariance (VCOV) matrices for GL (Σ GL ) and GY (Σ GY ) are as follows: These variance components (i.e., diagonal matrices) are then set relative to the predefined additive genetic variance (σ 2 G ) of the simulated trait.For example, if the predefined σ 2 G = 5 and σ 2 GL 1 = 10, then the variance of GL effects represents two times the magnitude of the additive genetic variance, or σ 2 GL 1 /σ 2 G = 10/5 = 2.As a reminder, scenarios A define a unique σ 2 G from a point estimate of variance, and scenario B sample from empirical distributions a unique σ 2 G in each simulation run r (σ 2 G r ).Next, the relative variances are multiplied by the variance of the simulated additive effects (γ), as follows: Which are equivalent to: Where the "relG"stands for relative to the genetic variance.The VCOV matrices Σ relG GL and Σ relG GY are now modified to include covariances between locations and years, receptively.Genotypic correlation between 63 locations and 31 years was obtained by Krause et al. (2023).Briefly, the authors fitted 20 linear mixed models to the soybean historical dataset (see Table 1 in their publication) and selected the best-fit one according to some evaluation criteria.Correlations were then calculated from the best-fit model which included factor-analytic (FA) covariance structures for both GL (ρ F A GL , Figure D2) and GY (ρ F A GY , Figure D1) model terms.With the pre-sampled variances (i.e., the diagonal values from Σ relG GL and Σ relG GY ) and given the correlations (ρ l,l ′ , ρ y,y ′ ) are known, the covariances among any pair of years and/or locations were computed as COV (X i , X Correlated additive QTL effects can now be drawn from a Multivariate Normal Distribution according to the computed covariance matrices above.So, the distributions of the QTL effects for GL and GY are as follows: And the final matrices of QTL effects are of the following form: The original correlation matrices for GL and GY obtained by Krause et al. (2023) from FA models contained 63 locations and 31 years, respectively.For this work, a unique simulation run never explored more than 63 locations across the 46 years of the simulated breeding program.Hence, there was enough empirical information in ρ F A GL (Figure D2) to simulate correlated GL effects.On the other hand, for the GY effects, the correlation matrix is of dimension 31 × 31 (Figure D1).To overcome this issue, our algorithm symmetrically augments ρ F A GY according to the desired number of years.Briefly, the algorithm explores the empirical correlations to simulate the first 31 years of MET.New correlations are then simulated by adding noise, in a highly controlled manner (Hardin et al. 2013, function noisecor), to the empirical values.If the resulting covariance matrix is not positive-definite (PD), it computes the nearest PD matrix while keeping the pre-sampled variances (diagonal values) constant (Bates and Maechler 2021, function nearPD).Examples of the sampled QTL effects are shown below in Figures D3 and D4.

E -Empirical density functions of variance components
This section describes the rationale and objective of estimating probability density functions for the underlying trend of genotypic (σ 2 G ), genotype by location (σ 2 GL ), genotype by year (σ 2 GY ), genotype by location by year (σ 2 GLY ), and residual (σ 2 ϵ ) variance components derived from linear mixed models.We envision fitting parametric probability distributions to variance components to capture the GEI trend to be used in future simulation studies, needed for predicting plant breeding outcomes in changing climates.Currently, simulation studies rely on point estimates of variance components (Kleinknecht et al. 2016), or set heritability values such as low or high (Rutkoski 2019).By capturing the GEI trend using historical data, we can conduct more realistic simulation studies.

E.1 -Methods
A modified jackknife resampling approach was used to obtain empirical probability distributions for the variance components σ 2 G , σ 2 GL , σ 2 GY , and σ 2 GLY .Utilizing results from Aguate et al. (2019) and Hartung and Piepho (2021), the empirical data set from Krause et al. (2023) was divided into four groups representing consecutive eras of soybean cultivar development : From 1989to 1995, from 1996to 2003, from 2004to 2011, and from 2012to 2019 (see Discussion below).For the first group (1989)(1990)(1991)(1992)(1993)(1994)(1995), there were 181 environments (location-year combination); for 1996-2003, 194 environments; for 2004-2011, 100 environments; and for 2012-2019, 116 environments.The modified jackknife approach consisted of leaving-one-environment out (instead of one observation), and then obtaining REML estimates of variance components with a modified version of Model C1 that considered all terms as random effects.Estimates of variance components were combined and evaluated for a best-fit probability distribution with the package "ForestFit" (Teimouri 2021).Given the lack of data from individual plots, trial-based estimates of σ 2 ϵ from the first-stage of analyses were used (i.e., no resampling).Check Krause et al. (2023) for further details on the statistical modeling of the phenotypic data.Distributional parameters were estimated via the expectation maximization (EM) algorithm (Dempster et al. 1977) using the log-likelihood functions of the Gamma, Log-Logistic, Log-Normal, Burr, and F univariate and multivariate distributions.In addition to a visual comparison of the modeled distributions relative to the empirical distributions, Akaike (AIC) and Bayesian (BIC) information criteria (Akaike 1974;Schwarz 1978) as well as the Kolmogorov-Smirnov (KS), Cramer-von Mises (CM), and Anderson-Darling (AD) goodness-of-fit statistics (Stephens 1986) were considered to select the best-fit distribution for each variance component.A classical penalized criteria based on the loglikelihood (AIC, BIC) provided protection from overfitting.

E.2 -Results
Estimated variance components revealed distinct multi-modal distributions over time (Figure E1).For example, the estimated σ 2 G more than doubled from ∼ 3.54 (bu/ac) 2 for the period 1989-1995 to ∼ 7.56 (bu/ac) 2 for the period 1996-2003.While the smallest magnitudes of estimated GEI variance components were usually associated with the period from 1989 to 1995, subsequent changes across years were unique to each estimated variance component.The best-fit models for the trend of variance components consisted of a mixture of five Log-Logistic distributions for the empirical distribution of σ 2 G , three Log-Logistic distributions for the empirical distribution of σ 2 GL , six Log-Logistic distributions for the empirical distribution of σ 2 GY , and five Gamma distributions for the empirical distribution of σ 2 GLY (Table E1, Figure E2).For the empirical distribution of estimated residual variances (σ 2 ϵ ), obtained directly rather than through jackknife resampling, the best-fit model was a univariate Log-Logistic distribution (Figure E3).Maximum Likelihood estimates for the distributional model parameters are reported for the selected models (Table E1), as well as plots with empirical and fitted cumulative distribution functions (Figures E4 and E5).All evaluated models are given in Table E2.

E.3 -Discussion
Parametric probability distributions were fitted to variance components by dividing the data set into groups of seven to eight years.Aguate et al. (2019) simulated different MET with variable numbers of genotypes, locations, and years to mimic wheat trials.They found that adding years is more beneficial than adding genotypes or locations for obtaining unbiased estimates of genotypic-related variance components.Furthermore, even in highly imbalanced datasets, estimates from at least eight years of trials produced less than 5% bias in the estimates, compared to biases of ∼18% for σ 2 G , >40% for σ 2 GL , and >15% for σ 2 GY , when only two years of MET were considered.Simulation results from Hartung and Piepho (2021) further demonstrated that non-significant bias could be obtained for estimates of σ 2 G , σ 2 GL , and σ 2 GY , with decreasing dropout rate and increasing the number of years of testing.Both references showed that variance estimates from REML are biased if at least one variance component estimate is bounded at zero, which rarely occurs when using seven or eight years of data.The reader should keep in mind that the goal of minimizing bias in estimators is distinct from the magnitude and proportion of estimates of variance components.The former is a concern for algorithmic estimators while the latter is of concern for breeding decisions about whether to use more locations or more years in their crop species and for traits under selection.
In terms of the fitted probability distributions, especially due to the well-known properties of the analysis of variance (ANOVA) among the breeding community, there is a misconception regarding the distribution of estimates of variance components.If the underlying population is normally distributed, the mean squares are distributed as a chisquare (χ 2 ), whereas normality and independence are requirements to compute valid F tests from ANOVA (Rencher and Schaalje 2007, Chapter 5).The χ 2 distribution is further used for inferences about variance uncertainty, but as mentioned, it does assume that the random variable is normally distributed.The computed empirical distributions in this work can be of any form (distribution), likewise, other approaches such as hierarchical Bayesian models can be employed to obtain posterior distributions of variance estimates.It should be emphasized, however, that the obtained empirical density functions might not necessarily be the data-generating functions.When testing the fit of distributions, the systematic dropping of the data for curation and cleaning (6.8% of the data points were removed, see Krause et al. (2023) for details) was not considered.In other words, variance components are simulated from the conditional distribution that holds after trimming.

E.4 -Tables and Figures
Genotype by location  1989-1995, 1996-2003, 2004-2011, and 2012-2019.Empirical estimates were obtained using a jackknife leave-one-location-out method.Vertical bars on the x-axis represent point estimates across all years.
Figure E2: Empirical distributions and probability density function (PDF) of variance components for genotypic (A), genotype by location (B), genotype by year (C), and genotype by location by year (D).Empirical estimates were obtained using a jackknife leaveone-environment-out method.The best-fit models are presented with different colors and include the distribution's name and its mixture number.

F -Stage-wise analysis
All models were obtained via stage-wise analysis with weights equal to 1/SE 2 , where SE is the standard error of genotypic means (Frensham et al. 1997;Smith et al. 2001).
Stage-wise models break down the full analysis (i.e., single-stage analysis) into multiple steps, by carrying to the next step the estimated means and some measure of precision (Frensham et al. 1997;Smith et al. 2001;Piepho et al. 2012).

G -Model E9
Model E9 (Table 6) is based on the methodology proposed by Vencosvsky et al. (1986) and later modified/applied by Brisson et al. (2010), Oury et al. (2012), andBornhofen et al. (2018).This model takes advantage of both experimental genotypes and checks replicated between consecutive pairs of years: (i) an initial model is fitted within years to predict eBLUP values of genotypic means (g ik ); and (ii) the g ik values are then corrected for the year effect according to the "reference year (R)".For example, in a MET dataset composed of 10 years, if R = year 5, the corrections for the year effects will be performed like 1 ← 2 ← 3 ← 4 ← 5 → 6 → 7 → 8 → 9 → 10, a forward-backward process along years.If R = 1, it is only a forward process (1 → • • • → 10), and if R = 10, a backward process (1 ← • • • ← 10).The algorithm is described with the following equations: where gc ik is the corrected value for genotype i in year k, and d R,R±1 is the correction term obtained by Equation G2: where n R,R±1 is the number of common genotypes in years R and R ± 1.The ± sign states the correction can go forward or backward across years.In the following year (R ± 2), the procedure is the same: where d R±1,R±2 is: and so on for the following years.The estimated "year effects" ( dR,R±1 , dR±1,R±2 , . . ., dR±k−1,R±K ) are calculated relatively to the reference year, R. In our simulations, the reference year was randomly chosen among the five years with the highest generalized measure of heritability (Cullis et al. 2006, H 2 ), calculated as: where vBLUP k is the average pairwise genotypic prediction error variance and σ 2 G k is the genotypic variance estimated from residual maximum likelihood (Patterson and Thompson 1971, REML).In step (i), before correction, the intercept (µ) was added to the eBLUP values.µ was parameterized to represent the average least-square value of the locations observed that year.

Figure A1 :
Figure A1: Principal component analysis of the additive genomic relationship matrix (G M ) estimated from the population of experimental genotypes used as founders in the simulation (A), and a heatmap of the G M matrix (B).

Figure A5 :Figure A6 :
Figure A5: Estimated marginal values of location from empirical data used in the simulator.

Figure A7 :Figure A8 :
Figure A7: Estimates of broad sense heritabilities on an entry mean basis (Cullis et al. 2006) for each simulation model and trial.

Figure A9 :
Figure A9: True (black) and estimated (red) RGG for evaluated models in scenario A1 (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A10 :
Figure A10: True (black) and estimated (red) RGG for evaluated models in scenario A2 (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A11 :
Figure A11: True (black) and estimated (red) RGG for evaluated models in scenario B1 (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A12 :
Figure A12: True (black) and estimated (red) RGG for evaluated models in scenario B2 (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A13 :
Figure A13: True (black) and estimated (red) RGG for evaluated models in scenario B2-M (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A14 :
Figure A14: True (black) and estimated (red) RGG for evaluated models in scenario B2-R (Tables1 and 2).Check Section "Bias and linearity" for details on the "Expected" facet.

Figure A21 :
FigureA21: Predicted eBLUP values from the benchmark Model E0 (A) and from Model E7 (B) for the soybean empirical data.The blue line is from the fitted regression of the predicted values as a function of the first year of testing.For details on the models check Table6.

Figure A22 :Figure A23 :
Figure A22: Estimated eBLUP values of locations (A), years (B), location by year interaction effects (C), and of environmental (location-year combination) effects (D) from check cultivars for the empirical data.These predicted values in D were used in Model E7 to calculate RGG for the empirical data.
e s fi e ld _ n e c r a w f o r d s v il le _ ia d a v id c it y _ n e d e k a

Figure D3 :
Figure D3: An example of sampling of QTL effects according to correlated or independent genotype × location interaction effects.Pearson correlation coefficients followed by their significant tests are plotted in the upper diagonal.

Figure D4 :
Figure D4: An example of sampling of QTL effects according to correlated or independent genotype × year interaction effects.Pearson correlation coefficients followed by their significant tests are plotted in the upper diagonal.

Figure E1 :
Figure E1: Empirical distributions of estimated variance components consisting of genotypic, genotype by location, genotype by year, and genotype by location by year variances for groups of years1989-1995, 1996-2003, 2004-2011, and 2012-2019.Empirical estimates were obtained using a jackknife leave-one-location-out method.Vertical bars on the x-axis represent point estimates across all years.

Figure E3 :
Figure E3: Empirical distribution and probability density function (PDF) for the unimodal Log-Logistic model of residual estimates from individual trials.

DFigure E4 :Figure E5 :
FigureE4: Cumulative distribution function (CDF) for the best-fit models according to the genotypic (A), genotype by location (B), genotype by year (C), and genotype by location by year (D) variances from jackknife.In each plot, the legends states for name of the distribution followed by its mixture number.
Figure A4: The structure of the simulated MET.Colors represent the source of data used to estimate RGG (with or without information from breeding lines).
B Figure A2: Frequency of empirical (A) and simulated (B) check cultivars.

Table A1 :
Number of positive (P) and negative (N) estimates of RGG.

Table E1 :
Maximum Likelihood estimates of parameters for the best-fit univariate and multivariate probability distributions for empirical distributions obtained using jackknife resampling.Estimates of residual variance (σ 2 ϵ ) were obtained from trials conducted from 1989 to 2019.Estimates of weight parameters (w i ) sums to one, and both Gamma and Log-Logistic distributions include a shape (α i ) and scale (β i ) parameter. b

Table E2 :
Goodness-of-fit (GOF) statistics and selection criteria for the fit of univariate and multivariate probability distributions for genotypic, genotype by location, genotype by year, and genotype by location by year variance components estimated from the jackknife analysis, and residuals variances from trial-level.The best-fit model is highlighted in bold.KS, CM, AD, AIC, and BIC stand for Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling, Akaike's and Bayesian Information Criterion, respectively.