Abstract
A one-stage analysis of a series of variety trials involves a combined analysis of the individual plot data across trials. Together with prudent modelling of the genetic effects across trials, this is considered to be the gold standard analysis of multi-environment field trial data. An alternative is a two-stage approach in which the variety means from an analysis of the individual trials in stage one are combined into a weighted mixed model analysis in stage two to give the full set of predicted variety by environment effects and an estimate of their associated variance structure. The two-stage analysis will exactly reproduce the one-stage analysis if the full variance-covariance matrix of the means from stage one is known and is utilised in stage two. Typically the full matrix is not stored and a diagonal approximation is used. This introduces a compromise to the full analysis. The impacts of a diagonal approximation are greater in the presence of sophisticated models for the genetic effects. A second compromise is through a loss of information in estimating the non-genetic variance parameters using the two-stage approach. In this paper we draw a direct link between the one and two-stage analysis approaches for crop variety evaluation data in Australia. We now have the computing power to analyse large and complex multi-environment variety trial data sets using the one-stage approach without the need for a two-stage approximation. This should motivate a move away from the two-stage approach in a range of contexts.
Similar content being viewed by others
Change history
13 February 2018
This article has been published with an erroneous version of Eq. 15. Please find the correct Eq. 15 in this document.
References
Butler DG, Cullis BR, Gilmour AR, Gogel BJ, Thompson R (2017) Asreml-r reference manual, version 4. University of Wollongong, Wollongong
Cullis BR, Smith AB, Coombes NE (2006) On the design of early generation variety trials with correlated data. J Agric Biol Environ Stat 11(4):381–393
Gilmour AR, Cullis BR, Verbyla AP (1997) Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat 2:269–273
Gogel BJ (1997) Spatial analysis of multi-environment variety trials. PhD thesis, Department of Statistics, University of Adelaide
Kelly AM, Smith AB, Eccleston JA, Cullis BR (2007) The accuracy of varietal selection using factor analytic modles for multi-environment plant breeding trials. Crop Sci 47:1063–1070
Piepho HP, Mohring J, Schulz-Streeck T, Ogutu JO (2012) A stage-wise approach for the analysis of multi-environment trials. Biom J 54:844–860
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna http://www.R-project.org/, ISBN 3-900051-07-0
Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New York
Smith A, Cullis B (2018) Plant breeding selection tools built on factor analytic mixed models for mutli-environment trials. In preparation
Smith A, Cullis B, Gilmour A (2001a) The analysis of crop variety evaluation data in Australia. Aust N Z J Stat 43(2):129–145
Smith A, Cullis B, Thompson R (2001b) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147
Smith AB, Ganesalingam A, Kuchel H, Cullis BR (2015) Factor analytic mixed models for the provision of grower information from national crop variety programs. Genome 128:55–72
Stefanova K, Smith AB, Cullis BR (2009) Enhanced diagnostics for spatial analysis of field trials. J Agric Biol Environ Stat 14:1–19
Verbyla AP (1990) A conditional derivation of residual maximum likelihood. The University of Adelaide, Adelaide, pp 1–3
Welham SJ, Gogel BJ, Smith AB, Thompson R, Cullis BR (2010) A comparison of analysis methods for late-stage variety evaluation trials. Aust N Z J Stat 52:125–149
Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer, New York
Acknowledgements
The authors gratefully acknowledge the financial support of the Grains Research and Development Corporation (GRDC) of Australia. We thank the GRDC and the Australian Crop Accreditation System (ACAS) Limited for use of the data. With thanks to the referees whose review of the manuscript has resulted in improvements to the text.
Author information
Authors and Affiliations
Corresponding author
Additional information
A correction to this article is available online at https://doi.org/10.1007/s10681-018-2129-z.
Appendix
Appendix
Derivation of the loss of information for variance parameter estimation
Under the assumptions for model (1) for the one-stage analysis,
where \(\varvec{\sigma }_g\), \(\varvec{\sigma }_p\) and \(\varvec{\phi }\) are vectors of variance parameters for the genetic (VE), non-genetic and residual terms, respectively, and
The REML log-likelihood function for estimation of \(\varvec{\sigma }_g\), \(\varvec{\sigma }_p\) and \(\varvec{\phi }\) is
where
Now consider stage 1 of a two-stage analysis and note that model (7) can be written equivalently as
where \(\varvec{X}_1 = [\varvec{X}_g\;\varvec{X}_p]\) and \(\varvec{\tau }_1 = [\varvec{\eta }_{d}^{{\mathsf {T}}}\;\varvec{\tau }_{p}^{{\mathsf {T}}}]^{{\mathsf {T}}}.\) Under the assumptions for this model,
where \(\varvec{\phi }= [\varvec{\phi }_1^{{\mathsf {T}}},\ldots ,\varvec{\phi }_{t}^{{\mathsf {T}}}]^{{\mathsf {T}}}\) is the full set of variance parameters associated with the residual term, \(\varvec{\sigma }_p = [\varvec{\sigma }_{p_1}^{{\mathsf {T}}},\ldots ,\varvec{\sigma }_{p_t}^{{\mathsf {T}}}]^{{\mathsf {T}}}\) contains the remaining non-genetic variance parameters, and \(\varvec{V}(\varvec{\sigma }_{p},\varvec{\phi }) = \varvec{Z}_{p}\varvec{G}_{p}(\varvec{\sigma }_{p})\varvec{Z}_{p}^{{\mathsf {T}}} + \varvec{R}(\varvec{\phi })\). The REML log-likelihood function for estimation of \(\varvec{\sigma }_{p}\) and \(\varvec{\phi }\) is
where \(\varvec{P}_{1} = \varvec{V}^{-1} - \varvec{V}^{-1}\varvec{X}_{1} ( \varvec{X}_{1}^{{\mathsf {T}}} \varvec{V}^{-1} \varvec{X}_{1})^{-1} \varvec{X}_{1}^{{\mathsf {T}}} \varvec{V}^{-1}.\) Finally, under the assumptions for model (9) for stage 2 of the two-stage analysis
where \(\varvec{\sigma }_{g}\) is the vector of variance parameters for the VE effects and
where \(\varvec{\pi }_j\) contains the individual weights \(\pi _{jk}\) for trial j and \(\varvec{\pi }= [\varvec{\pi }_1^{{\mathsf {T}}},\ldots ,\varvec{\pi }_{t}^{{\mathsf {T}}}]^{{\mathsf {T}}}.\) The REML log-likelihood function for estimation of \(\varvec{\sigma }_{g}\) is
where
Derivation of (11)
Let \(\varvec{T}= \left[ \varvec{T}_{_1}\;\varvec{T}_{_2}\right]\) be an \((n \times n)\) non-singular transformation matrix such that \({ \varvec{T}}_{_1}\) and \({ \varvec{T}}_{_2},\) of dimension \((n \times t)\) and \((n \times (n-t)),\) satisfy
Likewise, let \(\varvec{Q}= \left[ \varvec{Q}_{_1}\;\varvec{Q}_{_2} \right]\) be an \(((n-t) \times (n-t))\) non-singular transformation matrix such that \({ \varvec{Q}}_{_1}\) and \({ \varvec{Q}}_{_2},\) of dimension \(((n-t) \times d)\) and \(((n-t) \times (n-t-d)),\) satisfy
Using the results of Verbyla (1990), \(\ell _R(\varvec{\sigma }_g, \varvec{\sigma }_p, \varvec{\phi })\) is the REML log-likelihood function for the marginal distribution of \(\varvec{y}_{_2} = \varvec{T}_{_2} ^{{\mathsf {T}}}\varvec{y}\), where
Now consider the transformation \(\varvec{Q}^{{\mathsf {T}}}\varvec{y}_{_2}.\) Since \(\varvec{Q}\) is a non-singular transformation matrix, then for estimation
where \(\ell _Q(\varvec{\sigma }_g, \varvec{\sigma }_p, \varvec{\phi })\) is the log-likelihood function for
If \(\varvec{Q}^{{\mathsf {T}}}\varvec{y}_{_2} = \varvec{Q}^{{\mathsf {T}}}\varvec{T}_{_2}^{{\mathsf {T}}}\varvec{y}= \left[ \begin{array}{c} \varvec{Q}_{_1}^{{\mathsf {T}}}\varvec{T}_{_2}^{{\mathsf {T}}}\varvec{y}\\ \varvec{Q}_{_2}^{{\mathsf {T}}}\varvec{T}_{_2}^{{\mathsf {T}}}\varvec{y}\end{array}\right] = \left[ \begin{array}{c}\varvec{q}_{_1}\\ \varvec{q}_{_2}\end{array}\right] ,\) then
where \(\ell _{q_{_1} | q_{_2}}\) and \(\ell _{q_{_2}}\) are the conditional and marginal log-likelihood functions for \(\varvec{q}_{_1} | \varvec{q}_{_2}\) and \(\varvec{q}_{_2},\) respectively. We show that
so that
This shows that in a one-stage analysis estimation of the variance parameters associated with the genetic (VE) effects is restricted to \(\ell _{q_{_1}|q_{_2}}\) while the full likelihood \(\ell _R\) is used for estimation of the non-genetic variance parameters.
Consider \(\ell _{q_{_2}}.\) Using (16)
However, using the form of \(\varvec{H}\) in (13) and the definition of \(\varvec{V},\)
Since \(\varvec{Q}_{_2}^{\mathsf {T}}\varvec{T}_{_2}^{\mathsf {T}}\varvec{X}_g = \mathbf{0}\) it follows that \(\varvec{Q}_{_2}^{\mathsf {T}}\varvec{T}_{_2}^{\mathsf {T}}\varvec{X}_g\varvec{D}= \varvec{Q}_{_2}^{\mathsf {T}}\varvec{T}_{_2}^{\mathsf {T}}\varvec{Z}_g = \mathbf{0},\) in which case
and the corresponding REML log-likelihood function for estimation of the variance parameters in \(\varvec{\sigma }_p\) and \(\varvec{\phi }\) is
Now, \(\varvec{T}_{_2} ^{{\mathsf {T}}} \varvec{X}= \mathbf{0}\) \(\Longrightarrow \; \varvec{T}_{_2} ^{{\mathsf {T}}} \varvec{X}_{p} = \mathbf{0}\) and \(\varvec{Q}_{_2}^{{\mathsf {T}}}\varvec{T}_{_2} ^{{\mathsf {T}}} \varvec{X}_{p} = \mathbf{0}.\) Also, by the definition of \(\varvec{Q}_{_2}\), \(\varvec{Q}_{_2} ^{{\mathsf {T}}}\varvec{T}_{_2} ^{{\mathsf {T}}}\varvec{X}_{g} = \mathbf{0}.\) We then have \(Q_{_2} ^{{\mathsf {T}}}\varvec{T}_{_2} ^{{\mathsf {T}}}\;[\varvec{X}_{g}\;\varvec{X}_{p}] = Q_{_2} ^{{\mathsf {T}}}\varvec{T}_{_2} ^{{\mathsf {T}}}\;\varvec{X}_{1} = \mathbf{0}.\) If \(\varvec{X}_1\) is of full column rank it follows that
see (Searle et al. 1992). Now consider \(\log |\varvec{Q}_{_2} ^{{\mathsf {T}}}\varvec{T}_{_2} ^{{\mathsf {T}}}\varvec{V}\varvec{T}_{_2}\varvec{Q}_{_2} |.\) Except for a constant
We then have
so that
This implies
where \(\ell _g(\varvec{\sigma }_g, \varvec{\sigma }_p, \varvec{\phi })\) is a new notation for \(\ell _{q_{_1}|q_{_2}}\) to indicate that estimation of the variance parameters in \(\varvec{\sigma }_g\) is restricted to this part of the likelihood. Finally then,
as required.
Rights and permissions
About this article
Cite this article
Gogel, B., Smith, A. & Cullis, B. Comparison of a one- and two-stage mixed model analysis of Australia’s National Variety Trial Southern Region wheat data. Euphytica 214, 44 (2018). https://doi.org/10.1007/s10681-018-2116-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10681-018-2116-4