Abstract
The use of appropriate statistical methods has a key role in improving the accuracy of selection decisions in a plant breeding program. This is particularly important in the early stages of testing in which selections are based on data from a limited number of field trials that include large numbers of breeding lines with minimal replication. The method of analysis currently recommended for earlystage trials in Australia involves a linear mixed model that includes genetic relatedness via ancestral information: nongenetic effects that reflect the experimental design and a residual model that accommodates spatial dependence. Such analyses have been widely accepted as they have been found to produce accurate predictions of both additive and total genetic effects, the latter providing the basis for selection decisions. In this paper, we present the results of a case study of 34 earlystage trials to demonstrate this type of analysis and to reinforce the importance of including information on genetic relatedness. In addition to the application of a superior method of analysis, it is also critical to ensure the use of sound experimental designs. Recently, modelbased designs have become popular in Australian plant breeding programs. Within this paradigm, the design search would ideally be based on a linear mixed model that matches, as closely as possible, the model used for analysis. Therefore, in this paper, we propose the use of models for design generation that include information on genetic relatedness and also include nongenetic and residual models based on the analysis of historic data for individual breeding programs. At present, the most commonly used design generation model omits genetic relatedness information and uses nongenetic and residual models that are supplied as default models in the associated software packages. The major reasons for this are that preexisting software is unacceptably slow for designs incorporating genetic relatedness and the accuracy gains resulting from the use of genetic relatedness have not been quantified. Both of these issues are addressed in the current paper. An updating scheme for calculating the optimality criterion in the design search is presented and is shown to afford prodigious computational savings. An in silico study that compares three types of design function across a range of ancillary treatments shows the gains in accuracy for the prediction of total genetic effects (and thence selection) achieved from modelbased designs using genetic relatedness and program specific nongenetic and residual models.
Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Plant breeding is focused on the objective of genetic improvement, producing new varieties with increased productivity and quality. Most plant breeding programs follow a method of breeding referred to as a pedigree selection method. This method implies that programs are structured around the grouping of breeding lines which have been derived as progeny of a fixed number of crosses between elite parents. Different crosses are made each year, and the cohort of breeding lines then undergoes selection through preliminary and advanced stages of testing.
Traits of interest for selection in the preliminary stage include disease and herbicide tolerance, phenology type and functional grain quality. Selection intensity is high, reflecting the relatively high heritability of these traits or the ability to use markerassisted selection techniques for simply inherited traits.
Typically, selection in the advanced stages occurs in a sequential manner. These stages are referred to as the S1, S2, S3 and S4 stages. The key selection trait in the advanced stages is grain yield. Yield data for each stage are generated from a series of field trials sown at several locations. Trials typically include the set of breeding lines of interest and also some check varieties, which will collectively be termed entries. This paper focuses on S1 and S2 stage trials, and we refer to these as the earlystage trials. The number of entries which are evaluated at the S1 and S2 stages often exceeds 1,000, while at the S3 and S4 stages, the number of entries tested is generally less than 100. At the S1 and S2 stages, resource and seed limitations reduce the numbers of locations and plots that can be used for each breeding line. Replication within a location is often less than two, and the number of locations is usually less than four. For the S3 and S4 stages, entries are evaluated at up to ten locations, with between two and three replicates per location. Given the relatively low heritability for the trait of grain yield, and the presence of variety (entry) by environment interaction, it is critical to adopt efficient experimental designs and appropriate methods of analysis.
In Australia, the preferred approach for the analysis of multienvironment trial (MET) data sets is the factor analytic linear mixed model of Smith et al. (2001). Their original approach considered modelling of the nongenetic effects within each environment using the spatial approach advocated by Gilmour et al. (1997) and modelling the variety by environment effects using a factor analytic model. Their approach did not include genetic relatedness for the variety effects for each environment. Oakey et al. (2006) addressed this for the analysis of a single trial, and later Oakey et al. (2007) incorporated genetic relatedness through the use of ancestral information for the models advocated by Smith et al. (2001). More recently, Smith and Cullis (2018) developed factor analytic selection tools to assist with selection decisions from a factor analytic linear mixed model analysis of MET data sets. These methods are in widespread use for the analysis of MET data sets in the advanced stages of selection in Australia.
There is an extensive literature on the design of field trials for plant breeding programs. Designs used for these trials fall into two broad categories: classical or optimal modelbased designs. The former include complete and incomplete block designs, row–column designs or \(\alpha \)designs (see Bailey 2008; John and Williams 1995, 1998; Patterson and Williams 1976, for example). The principle of modelbased design is to search the design space for a design function (Bailey 2008), which is near optimal under a prespecified model. The model used for the design of field trials for plant breeding programs is usually a linear mixed model, which is consistent with the linear mixed model used for the analysis. Early work on modelbased design for plant breeding selection trials focussed on methods to find optimal designs for spatially dependent data (Martin 1986; Martin et al. 2006; Martin and Eccleston 1992; Chan 1999). More recently, modelbased designs have been considered for more general linear mixed models which include spatial dependence and blocking factors (Butler et al. 2008; Coombes 2002; Williams and John 2006).
Cullis et al. (2006) introduced prep designs to address the problems associated with the design of S1 and S2 stage trials where the number of plots for each breeding line in a trial is less than two. They showed that prep designs improved the accuracy of selection of breeding lines compared to socalled gridplot designs (Kempton 1982). The prep designs are in widespread use in most plant breeding programs in Australia. These designs can be produced efficiently using statistical software such as the DiGGeR package (Coombes 2009). Williams et al. (2011) developed augmented prep designs by combining prep and augmented check plot designs.
In order to align modelbased design with current methods for the analysis of S1, S2 and S3 stage field trials, recent work has focused on the inclusion of genetic relatedness through the use of ancestral information. Bueno Filho and Gilmour (2003) considered the construction of nonresolvable incomplete block designs when the treatments effects are correlated. Their study considered three choices of genetic relatedness in a simplistic setting of six treatments in blocks of size four. Within the limited scope of their study, they concluded that for most situations, it would be reasonable to use a design which is optimal for unrelated treatment effects.
Piepho and Williams (2006) investigated three types of design for a simple genetic structure for the set of treatment effects. On the basis of a simulation experiment, they concluded that the assumption of correlated treatment effects was superior to assuming fixed treatment effects. They also recommended that an unrestricted \(\alpha \)design with a simple familybased treatment structure could be used for the design of field trials in plant breeding programs.
Butler et al. (2014) presented a more general approach for the design of field trials in a plant breeding program. Their approach was based on finding designs which were (near) optimal under a linear mixed model which partitioned the total genetic variance into additive and nonadditive effects for inbred crops or simpler models for noninbred crops which only included additive effects. Models for plot effects (Bailey 2008) were either classical (using fixed or random block effects) or based on spatial dependence or a mixture of both. Butler et al. (2014) illustrated their methods using two examples from S1 and S2 stage trials from a canola and sorghum breeding program, respectively. Their designs were generated using the ODV1 statistical software package (Butler 2013).
There remain at least two issues which hinder the adoption of the Butler et al. (2014) designs for selection experiments in plant breeding programs. Firstly, there is a lack of empirical evidence regarding the improvement in accuracy relative to other classical or modelbased designs which are currently in use. Secondly, the computational load for the design search is prohibitive for trials with a larger number of entries. The aim of this paper is to therefore address these issues. The issue of computational burden is addressed by development of an updating algorithm for evaluation of the optimality criterion, which is an extension of the algorithm described by Martin and Eccleston (1992) and Chan (1999), to allow for correlated treatment effects. Secondly, the empirical advantage of modelbased designs with correlated effects is compared to two other commonly used designs using an in silico study, based on a large set of field trials from a plant breeding program based in Australia.
The paper is arranged as follows. In Sect. 2, we present a detailed analysis of a case study based on a set of early stage trials grown in 2018 by the four publicly funded pulse breeding programs in Australia. In Sect. 3, we present an approach to modelbased design, including derivation of the updating formula for correlated treatment effects which addresses the computational burden associated with calculating the optimality criterion for each interchange in the design search. In this section, we also demonstrate how to obtain a nearoptimal design using the R package ODV2 (Butler and Cullis 2018) and provide some results on the reduction in computing time using the updating formula. We conclude in Sect. 4 with an in silico experiment designed to assess the performance of modelbased designs using genetic relatedness and specific nongenetic models.
2 Case Study
Given that the model used for modelbased design is “usually chosen to be as close as possible to that expected for the analysis” (Butler et al. 2014), a case study involving the analysis of 34 early stage trials was conducted. Further, the case study is fundamentally important to providing unequivocable evidence of the need to incorporate genetic relatedness in the design of field trials for plant breeding programs. Cullis et al. (2006) developed prep designs with the use of genetic relatedness in mind. However, they stated that pedigree information was not used for the analysis of early generation variety trials at that time. Further extensions of prep designs do not consider the use of genetic relatedness (Williams et al. 2011), while many other recent approaches to the design of field trials for plant breeding programmes have also ignored the use of pedigree information (see, for example, Piepho et al. 2018, 2016; Williams and Piepho 2018). This is despite the widespread adoption of the approaches of Oakey et al. (2007) for the analysis of both singlesite and multienvironment trial plant breeding data sets.
The trials comprised S1 and S2 stage trials from four Australian public pulse breeding programs in 2018. The case study was used to highlight the importance of including information on genetic relatedness in the analysis and to summarize models used for nongenetic effects.
2.1 Genetic Material, Experimental Designs and Phenotyping
Comprehensive ancestral information was provided for all four programs, and this was used to form numerator relationship matrices (Meuwissen and Luo 1992) that are summarized in Table 1. The median generation number was 8, 5, 10 and 6 for chickpeas, fababeans, field peas and lentils, respectively. The median inbreeding coefficient for entries in all programs is consistent with the high level of inbreeding achieved prior to inclusion in yield evaluation trials. The average genetic relatedness (Dunner et al. 1998) is also high, reflecting the strong familial structure of the breeding lines in each of the programs. It is important to note, however, that there is substantial heterogeneity of the genetic relatedness within each program. Those entries with low genetic relatedness were predominantly older commercial varieties which have not been used as parents in recent crosses, while those entries with high genetic relatedness were either recently released commercial varieties which have been used as parents in many crosses or breeding lines from (full sib) families with a large representation in trials.
Table 2 presents a summary of key features of the 34 trials. All trials were laid out in rectangular arrays of plots, indexed by field columns and field rows. Plot dimensions were long and thin, with columns longer than rows. The trait of interest here is grain yield (t/ha). The mean yield varied substantially between trials, with some trials being severely affected by drought. Generally, S1 stage trials had more entries than S2 stage trials but fewer plots per entry. Partially replicated designs (Cullis et al. 2006) were used for ten out of the 12 S1 stage trials and four out of the 22 S2 stage trials. These socalled prep designs were resolvable (or near resolvable) with respect to the entries with more than one plot. The remaining trials were designed as resolvable two or three replicate block designs. In most cases, there were also additional plots of commercially important varieties. The resolvable blocks were aligned either with columns (referred to as column replicate blocks: CRep), aligned with rows (referred to as row replicate blocks: RRep) or in both directions (trials 12 and 13). In all cases, the resolvable column or row replicate blocks spanned multiple columns or rows, respectively. Most designs were constructed by staff within each breeding program using either DiGGeR (Coombes 2009), with default parameter settings, or Agrobase (Mulitze 1990). Pedigree information was not used in the design process for these trials. Designs for trials 1, 11, 14, 15, 22, 27 and 29 were constructed using the methods to be described in the current paper using ODV2 (Butler and Cullis 2018).
2.2 Statistical Models for Analysis
The approach used for the analysis is similar to that proposed by Oakey et al. (2006). They considered two linear mixed models with differing assumptions regarding the distribution of the (random) genetic effects. They referred to these models as the standard and pedigree models. In both cases, the nongenetic effects are modelled according to the approach devised by Gilmour et al. (1997), who allowed for three possible sources of variation, namely global, extraneous and local. The linear mixed models considered in this paper also include, by default, random terms which respect the design construction process. These included terms for resolvable or nearresolvable column or row replicate blocks for all designs (either one or both as required) and terms for columns and rows when the designs were constructed using modelbased approaches. Loworder polynomials in columns and/or rows were not considered. The residual vector was modelled as a separable firstorder autoregressive process (Gilmour et al. 1997).
Following Oakey et al. (2006), the vector of genetic effects is partitioned into additive and nonadditive effects given by
where it is assumed that
where \({\textit{\textbf{A}}}\) is the \(v\times v\) numerator relationship matrix and v is the number of entries in the pedigree. The pedigree model estimates \(\sigma ^2_a\) and \(\sigma ^2_e\) by fitting both terms, while the standard model excludes the additive effects, ignoring genetic relatedness. All analyses were conducted using Version 4 of the ASRemlR package (Butler et al. 2018).
2.3 Results of Analysis
To allow for an unambiguous comparison of the fit of the two genetic models, all terms and variance models not associated with the genetic effects were chosen while fitting the pedigree model and thence retained for the fit of the standard model.
Table 3 presents a summary of the residual maximum likelihood (REML) estimates of the variance parameters associated with the nongenetic terms in the random model and the variance model for the residual. Random terms associated with resolvable blocks were always included in the model, while random terms which were nonresolvable, namely columns and rows, were only included as required. The major source of nongenetic variation in the random model terms is due to the (long) columns, being fitted on 33/34 occasions. This result supports the findings in Oakey et al. (2006), who also fitted terms associated with columns on most occasions (11/14 trials). This has important implications for choosing terms to be included in the random model for construction of nearoptimal modelbased designs (see Sect. 3.5.1).
The REML estimates of the row and column autoregressive parameters are moderate to high for all programs with the estimates for the row dimension being greater than that for the column dimension (which is consistent with the shape of the plots). The REML estimates of these parameters vary between crops but are somewhat lower than those reported by Oakey et al. (2006), since they fitted a measurement error component.
Table 4 presents a summary of the REML estimates of the genetic variance parameters and the modelbased reliability (Mrode 1995) of the prediction of the total genetic effects. Additive genetic variance is the dominant source of genetic variance in all programs, though the percent additive genetic variance does vary considerably between programs. These findings are consistent with those of Oakey et al. (2006) who reported a median percentage additive genetic variance of 84 for the 14 S3 stage wheat trials. The reliability of the predicted total genetic effects is typical of these trials.
Figure 1 presents a scatter plot of the log base 10 of the REML likelihood ratio test to test the hypothesis of zero additive genetic variance against the log base 10 of the number of entries with data for 33 trials (one trial was removed due to a nearzero estimate of genetic variance). Critical values for the REML likelihood ratio test were obtained from the lrt.asreml method within ASRemlR (Butler et al. 2018), which implements the approach described in Self and Liang (1987). The hypothesis of zero additive genetic variance was strongly rejected for the majority of the trials, with the exceptions associated with trials with low values of genetic variance or small numbers of entries. These results are again consistent with those of Oakey et al. (2006).
3 ModelBased Design
Bailey (2008) defines a comparative experiment as an experiment in which we are interested only in contrasts between treatments. Early stage trials are comparative experiments which typically have a simple treatment structure. The aim is selection of the subset of breeding lines which are superior and hence will progress to the next stage of testing. An experimental design has three key elements: the plot structure, the treatment structure and the socalled design function. The design function is a function, T which allocates treatments to plots. Bailey (2008) defines the treatment structure as meaningful ways of dividing up the set of treatments (\(\mathcal {T}\)), the plot structure as meaningful ways of dividing up the set of plots (\(\Omega \)), ignoring treatments, and the design function as a function from \(\Omega \) to \(\mathcal {T}\). In classical design, the function T is chosen to satisfy certain combinatorial properties, whereas in modelbased design the function T is chosen to result in a design which is optimal or near optimal for a prespecified model. The model considered in this paper is a linear mixed model.
3.1 Linear Mixed Model
In the following, we present a linear mixed model for \({\textit{\textbf{y}}}\), the nvector of data, which is suitable for the modelbased design (and analysis) of a comparative experiment. Nelder (1977) introduced the concept of two aspects of a random effect. He remarked that one kind of random term in a linear model is a component of error, while the other kind of random term represents those effects of interest. The latter type of random term will therefore be in the set of treatment factors, while the former type of random term will be in the set of plot factors. Applying this broad principle, we consider a linear mixed model with four components given by
where \({\varvec{\tau _o}}\) and \({\varvec{\tau _p}}\) are vectors of fixed effects with associated design matrices \({\varvec{X_o}}\) and \({\varvec{X_p}}\) with \(c_{x_o}\) and \(c_{x_p}\) columns, respectively; \({\varvec{u_o}}\) and \({\varvec{u_p}}\) are vectors of random effects with associated design matrices \({\varvec{Z_o}}\) and \({\varvec{Z_p}}\) with \(c_{z_o}\) and \(c_{z_p}\) columns, respectively, and \({\textit{\textbf{e}}}\) is the vector of residuals. The subscripts o and p identify socalled objective and peripheral fixed and random effects, respectively. Equation (2) can be written succinctly as
where \({\varvec{W_o}} = [{\varvec{X_o}}\; {\varvec{Z_o}}], \; {\varvec{W_p}} = [{\varvec{X_p}}\; {\varvec{Z_p}}], \; {\varvec{\beta _o}} = ({\varvec{\tau }}_{\textit{\textbf{o}}}^{\top }, {\textit{\textbf{u}}}_{\textit{\textbf{o}}}^{\top })^{\top }, \; {\varvec{\beta _p}} = ({\varvec{\tau }}_{\textit{\textbf{p}}}^{\top }, {\textit{\textbf{u}}}_{\textit{\textbf{p}}}^{\top })^{\top }, \; {\textit{\textbf{W}}} = [{\varvec{W_o}} \; {\varvec{W_p}}]\) and \({\varvec{\beta }} = ({\varvec{\beta }}_{\textit{\textbf{o}}}^{\top }, {\varvec{\beta }}_{\textit{\textbf{p}}}^{\top })^{\top }\).
The random effects and residuals in (2) are assumed to follow a normal distribution such that:
where \({\varvec{G_o}}, {\varvec{G_p}}\) and \({\textit{\textbf{R}}}\) are positive definite matrices assumed to be functions of vectors of variance parameters \({\varvec{\sigma _{g_o}}}, {\varvec{\sigma _{g_p}}}\) and \({\varvec{\sigma _r}}\), respectively. Modelbased design requires values for these parameters so in the following they are regarded as known.
3.2 Predictions of Interest
The aim is to find an optimal or nearoptimal design with respect to a dvector of estimable functions \({\varvec{\pi }} = {\textit{\textbf{D}}}{\varvec{\beta _o}}\) where \({\textit{\textbf{D}}}\) is a known matrix with \(c_{w_o}\) columns and \(c_{w_o}=c_{x_o} + c_{z_o}\). The vector of estimable functions, \({\varvec{\pi }}\), involves only objective effects, but may involve fixed, random or both fixed and random effects. Gilmour et al. (2004) provide a computationally efficient algorithm for forming predictions from the linear mixed model specified in (3). For brevity in the following, we use the terminology of Gilmour et al. (2004) and refer to \({\varvec{\pi }}\) as the vector of predictions, which we assume are estimable. Gilmour et al. (2004) provide a simple test of estimability of predictions which fit naturally into their prediction algorithm. Briefly, given \({\textit{\textbf{D}}}\), the vector of predictions and associated prediction error variance/covariance matrix are formed by recursive absorption from an extended set of mixed model equations (see Robinson 1991, for example). Full details can be found in Gilmour et al. (2004). The mixed model equations (MMEs) for (3) are given by
where the coefficient matrix in (4) is
with
It follows that the reduced set of MMEs for \(\tilde{{\varvec{\beta }}}_{{\textit{\textbf{o}}}}\) are given by
where \({\varvec{C_{oo}}} = {\textit{\textbf{W}}}_{\textit{\textbf{o}}}^{\top }{\varvec{P_p}}{\varvec{W_o}} + {\textit{\textbf{G}}}_{\textit{\textbf{o}}}^{*}, {\varvec{P_p}} = {\textit{\textbf{R}}}^{1}  {\textit{\textbf{R}}}^{1}{\varvec{W_p}}({\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*})^{}{\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{1}\) and \(({\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*})^{}\) is any particular generalized inverse of \({\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*}\). It can be shown that
where \({\varvec{V_p}} = {\varvec{Z_p}}{\varvec{G_p}}{\textit{\textbf{Z}}}_{\textit{\textbf{p}}}^{\top }+ {\textit{\textbf{R}}}\) and \(({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{1}{\textit{\textbf{X}}}_{\textit{\textbf{p}}})^{}\) is any particular generalized inverse of \({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{1}{\varvec{X_p}}\). The matrix \({\varvec{P_p}}\) has rank \(n  \mathrm{rank}\left( {\varvec{X_p}}\right) \) and is unique, and it is the Moore–Penrose inverse of \({\textit{\textbf{T}}} = {\varvec{M_p}}{\varvec{V_p}}{\varvec{M_p}}\), where \({\varvec{M_p}} = {\textit{\textbf{I}}}_n  {\varvec{X_p}}({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{X}}}_{\textit{\textbf{p}}})^{}{\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }\). That is \({\textit{\textbf{T}}} = {\varvec{P_p}}^{+}\).
For known \({\varvec{\sigma _{g_o}}}, {\varvec{\sigma _{g_p}}}\) and \({\varvec{\sigma _r}}\)
where \({\varvec{\Lambda }} = {\textit{\textbf{D}}}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{}{\textit{\textbf{D}}}^{\top }\) and \({\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{}\) is a particular generalized inverse of the coefficient matrix of (6).
3.3 Optimal Design Criteria
In the context of earlystage trials, or more generally for comparative experiments, the most widely used and useful optimality criterion is the \(\mathcal {A}\)optimality criterion (Martin 1986). Bueno Filho and Gilmour (2007) developed a Bayesian design criterion for selection experiments in plant breeding based on a utility function that minimizes the risk of an incorrect selection. They show that this is in fact the \(\mathcal {A}\)optimality criterion based on the PEV matrix for the vector of random entry effects. Bueno Filho and Gilmour (2003) and Cullis et al. (2006) use this criterion for generating optimal or nearoptimal designs for plant breeding designs when the treatments are correlated and for socalled prep designs used in earlystage trials. In this case, it can be shown that
where \(d\le v\) is the number of varieties for which the \(\mathcal {A}\)value is computed and \(\mathrm{pev}\left( \right) \) refers to the prediction error variance of its scalar (or matrix) argument. A convenient form for computing \(\mathcal {A}\) is given by
We seek a design which minimizes \(\mathcal {A}\) over all valid design functions of the design.
3.4 Design Search and Updating Formulae
The design search process presented in Butler et al. (2014) and implemented in ODV2 (Butler and Cullis 2018) can be summarized as follows: given an initial design, the \(\mathcal {A}\)value is optimized under the supervision of a search strategy (for example, a tabu search Glover 1989) for (pairwise) interchanges of the rows of \({\varvec{W_o}}\).
Each interchange requires evaluation of the \(\mathcal {A}\)value, which is based on the prediction error variance matrix \({\varvec{\Lambda }}\). This requires expensive matrix calculations, and hence, an exhaustive search of the design space for moderate to large problems is implausible. Updating formulae for \({\textit{\textbf{C}}}_{oo}^{}\) can be used to significantly reduce the computational burden.
Martin and Eccleston (1992) developed an updating method for finding optimal designs under a linear model with correlated errors. Their algorithm is extended here, but allowing the more general setting of correlated objective effects, within the framework of a linear mixed model. The interchange of two rows of \({\varvec{W_o}}\) is equivalent to first removing two rows from \(\Omega \), followed by adding the two units back, but in reverse order. Martin and Eccleston (1997) suggested using a fourstep approach which involves adding or removing one unit to obtain the new \({\textit{\textbf{C}}}_{oo}^{}\). Chan (1999) presented a twostep approach which she claims to be simpler and easier to implement. We begin by developing a twostep approach, similar to that proposed by Chan (1999), but a fourstep approach is also presented as this has proven to be competitive to the twostep approach in terms of computational load. The fourstep approach is also more suitable for use in the context of earlystage trials, where the presence of socalled singleton treatments (that is, those treatments which occur on only one plot) can cause computational problems as discussed by Coombes (2002).
Let the current design contain \(n=r+s\) units, which are referred to as plots in the following. Consider the retention of r plots by removing s plots, and both \({\varvec{W_o}}\) and \({\varvec{P_p}}\) are partitioned conformably with the removal of s plots as follows
where, for example, \({\varvec{W_{or}}}\) is the design matrix for the subset of the design corresponding to the r retained plots so has r rows and \(c_{w_o}\) columns. The matrix \({\textit{\textbf{T}}}\) is also partitioned conformably with \({\varvec{P_p}}\). We note that in many cases \({\varvec{X_p}}\) is null in which case \({\varvec{P_p}} = {\varvec{V_p}}^{1}\) and hence \({\textit{\textbf{T}}} = {\varvec{V_p}}\). It follows that the coefficient matrix of the reduced MMEs for the subset of the design corresponding to the r retained plots is
where \({\textit{\textbf{T}}}_{{\textit{\textbf{rr}}}}^{\sim }\) is any particular generalized inverse of \({\varvec{T_{rr}}}\).
Using a similar argument to Martin and Eccleston (1992) and results on the inverse of partitioned matrices (see, for example, Searle 1982), it can be shown that
where \({\varvec{F_{rs}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;rs}}} + {\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;ss}}}\). Hence, an updating formula for \({\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}}\) can now be derived using (8) as follows. Using Corollary 18.2.15 in Harville (1997) (p.432)
where \({\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}} = {\varvec{F_{rs}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}\). If \({\textit{\textbf{P}}}_{{\varvec{p;ss}}}\) is nonsingular, then \({\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}} = {\varvec{F_{rs}}}\). Furthermore, if \(({\textit{\textbf{P}}}_{{\varvec{p;ss}}}  {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}})\) is nonsingular, then (9) becomes
Next, we consider adding s units back to the design, but with the appropriate permutation applied to the rows of \({\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}\). In the simple case of \(s=2\), then provided that the interchange is legal, the two rows of \({\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}\) are interchanged. The new design matrix for the full set of \(n=r+s\) units is therefore given by
where \({\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}}\) is the permuted design matrix associated with the s units.
Using a similar approach to the deletion of r units, we consider the coefficient matrix of the reduced MMEs for the full, new design with \(r+s\) units which is given by
It can be shown that
where \({\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;rs}}} + {\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}\). Hence, an updating formula for \({\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}}\) is
where \({\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}\). If \({\textit{\textbf{P}}}_{{\varvec{p;ss}}}\) is nonsingular, then \({\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}\). Furthermore, if \(({\textit{\textbf{P}}}_{{\varvec{p;ss}}} + {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}})\) is nonsingular, then (12) becomes
Using the two updating formulae, one for the removal and one for the addition, allows us to compute the new \(\mathcal {A}\)value from \({\varvec{\Lambda }}^{{\varvec{(r+s)}}} = {\textit{\textbf{D}}}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}}{\textit{\textbf{D}}}^{\top }\).
Setting \(s=1\) leads to the fourstep updating scheme proposed by Martin and Eccleston (1992). The fourstep approach can have computational advantages, and additionally, it can be implemented to handle designs in which all of the objective effects are fixed effects and the design contains singletons. Using the updating formulae in (10) and (13), as an example, when \(s=1\), we have:
for removal and addition of a unit, respectively, where
and
for \(p_{p;ss}  {\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}>0\).
The twostep approach is straightforward; however, it is useful to illustrate the fourstep approach in more detail. As a simple example of the fourstep approach, consider interchange of two units in the design, denoted by \((\omega _a,\omega _b)\). The rows of the objective design matrix \({\varvec{W_o}}\) are indexed using \(\Omega \) such that \({\varvec{W_o}} = {\varvec{W_o}}[\Omega ,]\), and let \(\Omega _{a}\) be the set of plots excluding \(\omega _a\) and \(\Omega _{b}\) be the set of plots excluding \(\omega _b\); then, the fourstep approach consists of the following four steps:

1.
Remove \(\omega _a\): \({\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\Omega _{a},]\) and \({\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{\top }= {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _a,]\) and then use (14);

2.
Add \(\omega _b\): \({\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\Omega _{a},]\) and \({\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _b,]\) and then use (15);

3.
Remove \(\omega _b\): \({\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\Omega _{b},]\) and \({\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{\top }= {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\omega _b,]\) and then use (14);

4.
Add \(\omega _a\): \({\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\Omega _{b},]\) and \({\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _a,]\) and then use (15)
noting that if a singleton is included in the set \((\omega _a,\omega _b)\), then the steps must be arranged so that the singleton is not removed first.
3.5 Implementation in ODV2
3.5.1 Example
The implementation of modelbased design for earlystage trials is shown here via an example based on one of the breeding programs introduced in Sect. 2. This example also forms the basis of the simulation study presented in Sect. 4. A design is required for testing 256 S1 stage entries from the field peas program with the aim of making selections for progression to S2 stage testing. The trial will also include four check varieties, which is standard practice in most S1 stage field trials. The predictions of interest are the total genetic effects for each of the 260 entries. In the linear mixed model, the total genetic effects will be partitioned into additive and nonadditive effects through the use of the numerator relationship matrix for the 260 entries. Summaries of this matrix included a mean inbreeding coefficient of 0.941, while the minimum, mean and maximum genetic relatednesses were 0.130, 0.346 and 0.532. One of the check varieties had the highest mean genetic relatedness, as it had been widely used as a parent of the test lines.
A prep design will be constructed, with 50% of the breeding lines having two plots each. (Note that percentage replication will be examined in Sect. 4.) The check varieties will also each have two plots so the full design involves 392 plots that will be arranged as 12 columns by 28 rows. The linear mixed model for the design search will include terms for nongenetic effects as determined from historic analyses of relevant trials. In the current example, this relates to a comprehensive set of S1 and S2 stage trials from the field peas breeding program for the years 2013 to 2017 inclusively. This led to the inclusion of random effects for a twolevel blocking factor CRep (corresponding to columns 1–6 and 7–12); random Column and Row effects and finally a separable firstorder autoregressive process for the residuals. The design search requires values of the associated variance parameters, and these will be taken as the median values from the historic analyses.
The design for the example can be constructed in ODV2 (Butler and Cullis 2018) using the following call:
In this call, the data frame data.df has 392 rows and corresponds to an initial (usersupplied) design. The genetic effects are specified in the term ric(Entry, Ainv) and relate to the total genetic effects (\({\varvec{u_g}}\)). The variance function ric() indicates that the variance model for the effects in the factor argument (that is, Entry) is given by \(\sigma ^2_a{\textit{\textbf{A}}} + \sigma _e^2{\textit{\textbf{I}}}_{260}\) where \({\textit{\textbf{A}}}\) is the numerator relationship matrix for the 260 entries. Note that in the form given here, the inverse of the numerator relationship matrix, i.e. \({\textit{\textbf{A}}}^{1}\) is supplied rather than \({\textit{\textbf{A}}}\) as it has been stored in sparse form (see Butler 2013). This specification for the total genetic effects will be termed the direct formulation and is a new feature of ODV2 (Butler and Cullis 2018) that affords computational savings over the alternative, indirect formulation. The latter involves the inclusion of two terms in the model, associated with the additive effects (\({\varvec{u_a}}\)) and nonadditive effects (\({\varvec{u_e}}\)), so that the resultant vector of objective effects (\({\varvec{\beta _o}}\)) has twice as many elements as the direct formulation. The other terms in the random model formula are associated with random CRep, Column and Row effects, respectively. Lastly, the residual model formula specifies that the variance model for the residuals is one associated with a separable first order autoregressive process. The permute model formula specifies that Entry is the factor to be permuted and the use of the direct formulation means that the \(\mathcal {A}\)value will be computed for total genetic effects. The variance parameters are supplied using the G.param and R.param arguments (also see “Appendix A’).
3.5.2 Timings for Updating Schemes
Table 5 presents the elapsed time (seconds) for a full tabu search and 1000 interchanges for ODV2 using the direct formulation for total genetic effects: 1000 interchanges for ODV2 using the indirect formulation for total genetic effects and 1000 interchanges for ODV1 using the indirect formulation for total genetic effects. The timings were conducted on eight trials used in the case study (trials numbered 3, 5, 15, 17, 22, 23, 27 and 31 in Table 5). The trials included an S1 and S2 stage trial for each of the four crops. The results illustrate the substantial reduction in computational load by using the updating formula developed in Sect. 3.4 and approximately agree with the total number of FLOPs involved, that is, \(O(c_{w_o}^3)\) without an updating scheme compared with \(O(c_{w_o})\) with the updating scheme.
4 Accuracy of Designs Using Genetic Relatedness
The performance of modelbased designs constructed using genetic relatedness and specific nongenetic models was assessed using an in silico experiment based on the example presented in Sect. 3.5.1. Of particular interest were comparisons with modelbased designs using specific nongenetic models but no information on genetic relatedness and with more classical approaches to design which use neither specific nongenetic models nor genetic relatedness. Thus, there were three key treatments under investigation, corresponding to three design functions (DF), labelled as DF\(++\), DF\(+\) and DF\({}{}\) where the two final characters indicate the use/nonuse of genetic relatedness and specific nongenetic models in the design construction. Further details are provided later.
The simulation experiment was also designed to cover the range of genetic variance parameters and levels of replication in current use for S1 and S2 stage trials in the pulse breeding programs described in Sect. 2. This led to the construction of a total of 63 ancillary treatments to consider, comprising the factorial combinations of the three factors described in Table 6. These factors are labelled Prep (the percentage of breeding lines which have two plots per trial), Padd (the proportion of additive genetic variance expressed as a ratio of the total genetic variance) and Rsq (the baseline reliability of the data for the design with level 0 of the Prep factor).
Table 7 presents summaries of the layouts for the seven basic field trial configurations arising from the different replication levels used in the simulation experiment. Every trial was assumed to be laid out in a contiguous rectangular array with 12 columns and between 22 and 28 rows. The blocking factor CRep was constructed to span two sets of contiguous columns in each case.
The linear mixed model chosen to generate the nongenetic effects for the data sets used in this study included terms, variance models and variance parameters as determined from the historic analyses discussed in Sect. 3.5.1. The linear mixed model also partitioned the total genetic effects into additive and nonadditive effects. Given the set of variance parameters for the nongenetic effects, the two genetic variance parameters in the data generation model were then chosen to be consistent with the levels of the two ancillary treatment factors Padd and Rsq. Table 8 presents the variance parameters for all of the random and residual variance models in the data generation linear mixed model for each level of Padd and Rsq.
Data sets were generated according to the linear mixed model given by
where \(i = 1, \ldots , 63\) indexes the ancillary treatments, \(j = 1, \ldots , 3\) indexes the design functions and \(k = 1, \ldots , N\) indexes the simulations (\(N=4000\)). For each k (\(=1, \ldots , N\)), \({\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{a}}},i}\) and \({\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{e}}},i}\) are sampled from a normal distribution with mean zero and variance matrices given by \(\sigma _{a,i}^2{\textit{\textbf{A}}}\) and \(\sigma _{e,i}^2{\textit{\textbf{I}}}_{260}\), respectively. Note that the genetic variance components although indexed by the level of T are constant on the levels of Padd and Rsq, with only nine unique combinations as listed in Table 8. The vector \({\varvec{\eta }}_{i}^{[k]}, i=1,\ldots ,63; k=1,\ldots ,N\) represents the plot effects that is the sum of all the random nongenetic effects and residuals. These are obtained via sampling from a normal distribution with mean zero and variance matrix given by
where \({\varvec{\Sigma _{c}}}_i\) and \({\varvec{\Sigma _{r}}}_i\) are scaled variance matrices of order \(c_i\) and \(r_i\) for firstorder autoregressive processes where \(c_i\) and \(r_i\) are the numbers of columns and rows in the trial layout. These matrices are functions of autocorrelation parameters, \(\rho _c\) and \(\rho _r\), for the column and row dimensions, respectively. The design matrices for the fixed and nongenetic random effects are given by \({\textit{\textbf{X}}}_i, {\textit{\textbf{Z}}}_{{\textit{\textbf{cr}}},i}, {\textit{\textbf{Z}}}_{{\textit{\textbf{c}}},i}\) and \({\textit{\textbf{Z}}}_{{\textit{\textbf{r}}},i}\) where the latter correspond to the terms in the random model formula, namely CRep, Column and Row, respectively. Values for all the variance parameters are set out in Table 8. There is only one fixed effect associated with an overall mean. Lastly, we consider the design matrix for the genetic effects. A separate design matrix, \({\textit{\textbf{Z}}}_{ij}\), \(i=1,\ldots ,63; j=1,2,3\) is formed for each level of the design function factor as described below

DF\(++\) a design function generated from (16)

DF\(+\) a design function generated from the plot effects component of (16) but with genetic effects assumed to be fixed effects

DF\({}{}\) a design function generated from the model of Williams et al. (2011) with genetic effects assumed to be fixed effects
Hence, there were a total of 63 different designs for DF\(++\) but only seven designs for DF\(+\) and DF\({}{}\), one for each level of Prep. When forming the design function for the DF\(++\) design, for levels 5, 10, 15, 25, 50 of Prep, we used the method of Huang et al. (2013) to subset those breeding lines to be tested with two plots in the trial.
All design functions were obtained using ODV2 (Butler and Cullis 2018), by varying the model formulae for the fixed, random and residual components. The ODV2 call for DF\(++\) designs was as given in Sect. 3.5.1. Details and scripts for all designs are available in the supplementary materials.
The correlations between the true and predicted total genetic effects for all entries were calculated for each ancillary treatment and design function, (a total of 189 combinations), where the predicted genetic effects were the empirical BLUPs obtained from the fitting of the linear mixed model that corresponded to the data generation model.
4.1 Simulation Study Results
The estimation of the variance parameters is considered first. Summaries of the MSEP, simulation variance and bias (expressed as a % of the true value) for all variance parameters in the model are available in the supplementary materials. Only the results for the additive and total genetic variance parameters are presented here. Figures 2 and 3 present plots of the percentage bias for the additive genetic variance and total genetic variance against the levels of Prep. Significant bias occurs for some combinations of k and \(r^2_0\). There is consistent, negative bias for \(k=0.9\) for the additive genetic variance, for all levels of Prep and Rsq. This negative bias is associated with a significant positive bias for the nonadditive genetic variance for these combinations, which then results in bias for the total genetic variance, especially for \(k=0.9\) and \(r^2_0=1/3\). Bias for the total genetic variance is acceptable for the four Padd and Rsq combinations of \((k=0.5, r^2_0=1/2)\), \((k=0.5, r^2_0=2/3)\), \((k=0.7, r^2_0=1/2)\) and \((k=0.7, r^2_0=2/3)\) for all levels of Prep. The remaining combinations of Padd and Rsq all exhibit unacceptable bias for total genetic variance, particularly for small values of Prep. Bias for the additive genetic variance is acceptable for all levels of Prep except when \(k=0.9\). This latter bias is largely a result of the significant positive bias in estimation of the nonadditive genetic variance, when it is such a small component of the total genetic variance and is so close to zero (relative to the residual variance).
Table 9 presents the mean squared error of prediction (MSEPCV) for the REML estimate of the additive genetic variance expressed as a coefficient of variation (relative to the true value) for each level of replication of the test lines, the proportion of additive genetic variance as a ratio to the total genetic variance and for baseline heritabilities of \(r^2_0=1/3\) and \(r^2_0=1/2\). The results for \(r^2_0=2/3\) are similar and are omitted. The MSEPCV for the DF\(++\) design function is consistently lower than the MSEPCV for the other two design functions. Of particular interest is the ability of the DF\(++\) design function to achieve comparable levels of MSEPCV achieved by DF\(+\) and DF\({}{}\) for much lower values of Prep. For example, the MSEPCV for the design functions DF\({}{}\) and DF\(+\) for a Prep of 25% is approximately the same as the MSEPCV for the DF\(++\) design function for a Prep of only 10%.
The correlations between the true and predicted entry effects are summarized in Figs. 4 and 5 for \(r^2_0=1/3\) and \(r^2_0=1/2\), respectively. Each figure presents panel line plots of the mean simulationbased correlations (in blue) and modelbased reliability (in black) versus the levels of Prep for each of the three design functions. The results for \(r^2_0=2/3\) are similar, with the DF\(++\) design function superior to the other design functions for all levels of Prep and Padd. The modelbased reliability was derived from the \(\mathcal{A}\)values obtained from either the final \(\mathcal{A}\)value for the DF\(++\) design function, or the initial \(\mathcal{A}\)values for the DF\(+\) and DF\({}{}\) design functions evaluated under the (true) data model of (16), using the approximation of Cullis et al. (2006).
It is immediately clear that there is a consistent and important increase in correlation for DF\(++\) designs relative to both DF\(+\) and DF\({}{}\) designs. The advantage is larger for higher values of k and lower values of \(r^2_0\). The DF\(++\) designs are superior for all levels of Prep, though the advantage diminishes for designs with Prep less than 10%. It is also noteworthy that the advantage of DF\(++\) designs for values of Prep greater than 25% is stable, suggesting there is little impact of using the subset selection method of Huang et al. (2013) in terms of improving the accuracy of the design.
The plots also highlight the interaction between the increase in simulationbased correlation and modelbased reliability due to increasing the level of replication and the value of k. This suggests that the information coming from the data plays an important role in improving the accuracy of the prediction of total genetic effects when the nonadditive component is present (and can be reliably estimated). The impact of variance parameter estimation on accuracy is clear. All simulationbased correlations are significantly lower than modelbased reliabilities, but particularly so for designs with values of Prep less than 10%. This is consistent with the MSEPCV shown in Table 9.
One final remark is that across all levels of \(k, r^2_0\) and Prep, there is no appreciable difference between design functions DF\(+\) and DF\({}{}\). This may be a result of the choice of variance parameters for the residual process which gave rise to only modest levels of spatial dependence arising from the inclusion of spatial dependence. Chan (1999) suggested that the optimality of a modelbased design is robust to the choice of variance parameter settings as long as the model class remains the same which is not entirely consistent with our findings. Further work in this area is required to examine this issue in more detail.
References
Bailey, R. A. (2008). Design of comparative experiments. Cambridge University Press, Cambridge.
Bueno Filho, J. S. D. S. & Gilmour, S. G. (2007). Block designs for random treatment effects. Journal of Statistical Planning and Inference 137, 1446–1451.
Bueno Filho, J. S. S. & Gilmour, S. G. (2003). Planning incomplete block experiments when treatments are genetically related. Biometrics 59, 375–381.
Butler, D. & Cullis, B. (2018). Optimal Design under the Linear Mixed Model: https://mmade.org/optimaldesign/. Technical report, National Institute for Applied Statistics Research Australia, University of Wollongong.
Butler, D., Eccleston, J., & Cullis, B. (2008). On an approximate optimality criterion for the design of field experiments under spatial dependence. Australian and New Zealand Journal of Statistics 50.
Butler, D. G. (2013). On The Optimal Design of Experiments Under the Linear Mixed Model. PhD thesis, University of Queensland.
Butler, D. G., Cullis, B. R., Gilmour, A. R., & Thompson, R. (2018). ASReml version 4. Technical report, University of Wollongong.
Butler, D. G., Smith, A. B., & Cullis, B. R. (2014). On the Design of Field Experiments with Correlated Treatment Effects. Journal of Agricultural, Biological, and Environmental Statistics 19, 539–555.
Chan, B. (1999). The design of field experiments when the data are spatially correlated. PhD thesis, The University of Queensland.
Coombes, N. E. (2002). The reactive tabu search for efficient correlated experimental designs. PhD thesis, Liverpool John Moores University, Liverpool, U.K.
Coombes, N. E. (2009). DiGGeR, a Spatial Design Program. Biometric bulletin, NSW DPI.
Cullis, B. R., Smith, A. B., & Coombes, N. E. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological, and Environmental Statistics 11, 381–393.
Dunner, S., Checa, M. L., Gutierrez, J. P., Martin, J.P., & Canon, J. (1998). Genetic analysis and management in small populations: The asturcon pony as an example. Genetics Selection Evolution 30, 397–405.
Gilmour, A., Cullis, B., Welham, S., & Gogel, B. (2004). An efficient computing strategy for prediction in mixed linear models. Computational Statistics & Data Analysis 44, 571–586.
Gilmour, A.R., Cullis, B.R., Verbyla, A.P. & Verbyla, A. P. (1997). Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural, Biological, and Environmental Statistics 2, 269.
Glover, F. (1989). Tabu search–Part 1. Journal on Computing 1, 190–206.
Harville, D. A. (1997). Matrix algebra from a statisticians perspective. SpringerVerlag, New York.
Huang, B. E., Clifford, D. & Cavanagh, C. (2013). Selecting subsets of genotyped experimental populations for phenotyping to maximise genetic diversity. Theoretical and Applied Genetics 126, 379–388.
John, J. A. & Williams, E. R. (1995). Experimental designs. Wiley and Sons, London.
John, J. A. & Williams, E. R. (1998). tLatinized designs. Australian and New Zealand Journal of Statistics 40, 111–118.
Kempton, R. A. (1982). The design and analysis of unreplicated field trials. Vortrage fur Pflanzenzuchtung 7, 219–242.
Martin, R. & Eccleston, J. (1997). Construction of optimal and nearoptimal designs for dependent observations using simulated annealing. Technical report, Dept. Prob. Statist, Unversity of Sheffield.
Martin, R. J. (1986). On the design of experiments under spatial correlation. Biometrika 73, 247–277.
Martin, R. J., Chauhan, N., Eccleston, J. A. & Chan, B. S. P. (2006). Efficient experimental designs when most treatments are unreplicated. Linear Algebra and Its Applications 417, 163–182.
Martin, R. J & Eccleston, J. A. (1992). Recursive formulae for constructing block designs with dependent errors. Biometrika 79, 426–430.
Meuwissen, T. H. E. & Luo, Z. (1992). Computing inbreeding coefficients in large populations. Genetics, Selection and Evolution 24, 305–315.
Mrode, R. A. (1995). Linear models for the prediction of animal breeding values. CABI Publishing, Wallingford.
Mulitze, D. (1990). AGROBASE/4: A microcomputer database management and analysis system for plant breeding and agronomy. Agronomy Journal 82, 1016—1021.
Nelder, J. A. (1977). A reformulation of linear models. Journal of the Royal Statistical Society, Series A 140, 48–76.
Oakey, H., Verbyla, A., Cullis, B., Wei, X., & Pitchford, W. (2007). Joint modeling of additive and nonadditive (genetic line) effects in multienvironment trials. Theoretical and Applied Genetics 114.
Oakey, H., Verbyla, A. P., Cullis, B. R., Pitchford, W. S., & Kuchel, H. (2006). Joint modelling of additive and nonadditive genetic line effects in single field trials. Theoretical and Applied Genetics 113, 809–819.
Patterson, H. D. & Williams, E. R. (1976). A new class of resolvable incomplete block designs. Biometrika 63, 83–92.
Piepho, H.P. & Williams, E. R. (2006). A comparison of experimental designs for selection in breeding trials with nested treatment structure. Theoretical Applied Genetics. 113, 1505–1513.
Piepho, H.P., Williams, E. R. & Michel, V., (2016). Nonresolvable Row–Column Designs with an Even Distribution of Treatment Replications. Journal of Agricultural, Biological, and Environmental Statistics 21, 227–242.
Piepho, H.P., Michel, V., & Williams, E. R. (2018). Neighbor balance and evenness of distribution of treatment replications in rowcolumn designs. Biometrical Journal 60, 1172–1189.
Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science 6, 15–51.
Searle, S. R. (1982). Matrix algebra useful for Statistics. Wiley Inter Science, London.
Self, S. C. & Liang, K. Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Society 82, 605–610.
Smith, A., Cullis, B. R., & Thompson, R. (2001). Analysing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57, 1138–1147.
Smith, A. B. & Cullis, B. R. (2018). Plant breeding selection tools built on factor analytic mixed models for multienvironment trial data. Euphytica 214, 1–19.
Williams, E., Piepho, H. P., & Whitaker, D. (2011). Augmented prep designs. Biometrical Journal 53, 19–27.
Williams, E. R. & Piepho, H.P. (2018). Optimality and Contrasts in Block Designs with Unequal Treatment Replication Australian and New Zealand Journal of Statistics 57, 203–209.
Williams, E. R. & John, J. A. (2006). Rowcolumn factorial designs for use in agricultural field trials. Applied Statistics 62, 103–108.
Acknowledgements
Funding was provided by Grains Research and Development Corporation (Grant No. UW00010).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix A: OD Timing Syntax
Appendix A: OD Timing Syntax
Below are the calls used to obtain the timings in Table 5. The first sequence of calls provides an estimate of the elapsed time to complete a full tabu search of the design space and 1000 interchanges using ODV2 and the direct formulation for the total genetic effects. The function od.init sets or displays various options that affect the behaviour of ODV2. The first call to OD returns a template from which variance parameters for each of the random and residual terms can be specified. These are assigned in the subsequent two lines of the code, and these are supplied in the call to ODV2 via the R.param and G.param arguments. The permute argument specifies that the column labelled Entry, which contains the entry code, is to be permuted and the optimality criterion is for the total genetic effects for those entries with data. Use of the time argument provides an estimate of the time for a single tabu loop based on a set of random interchanges. If time\(> 0\), the objective criterion is evaluated the specified number of times prior to the optimization step. The elapsed time is reported and used to estimate the execution time for a scavenging tabu loop.
The next sequence of calls estimates the elapsed time for 1000 random interchanges using the indirect model, but in this call to ODV2 the optimality criterion is based on the additive genetic effects. ODV2 does not allow the user to specify the calculation of the optimality criterion for the total genetic effects using the indirect formulation for total effects. The pipe operator () is used to associate the permute factor with other terms in the model that should be permuted in parallel. The performance of the indirect formulation for the total genetic effects was found to be substantially inferior to the direct formulation, and hence it has been deprecated. Therefore, the elapsed times using these calls will not be exactly equivalent to the use of the indirect formulation within the updating scheme, but trials suggest that they are adequate to ascertain the approximate penalty associated with use of the indirect formulation, which is the only form available in ODV1 for total genetic effects.
The final sequence of calls using ODV1 provides the elapsed times for the indirect formulation for the total genetic effects. There is no option to evaluate timings within the call to ODV1, and so crude estimates of elapsed time were estimated from a sequence of two calls to isolate the time for evaluating the optimality criterion for 1000 random interchanges.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cullis, B.R., Smith, A.B., Cocks, N.A. et al. The Design of EarlyStage Plant Breeding Trials Using Genetic Relatedness. JABES 25, 553–578 (2020). https://doi.org/10.1007/s13253020004035
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253020004035