The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness

Cullis, Brian R.; Smith, Alison B.; Cocks, Nicole A.; Butler, David G.

doi:10.1007/s13253-020-00403-5

The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness

Open access
Published: 16 July 2020

Volume 25, pages 553–578, (2020)
Cite this article

Download PDF

You have full access to this open access article

Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness

Download PDF

Brian R. Cullis ORCID: orcid.org/0000-0001-6319-8155¹,
Alison B. Smith¹,
Nicole A. Cocks¹^nAff2 &
…
David G. Butler¹

7286 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

The use of appropriate statistical methods has a key role in improving the accuracy of selection decisions in a plant breeding program. This is particularly important in the early stages of testing in which selections are based on data from a limited number of field trials that include large numbers of breeding lines with minimal replication. The method of analysis currently recommended for early-stage trials in Australia involves a linear mixed model that includes genetic relatedness via ancestral information: non-genetic effects that reflect the experimental design and a residual model that accommodates spatial dependence. Such analyses have been widely accepted as they have been found to produce accurate predictions of both additive and total genetic effects, the latter providing the basis for selection decisions. In this paper, we present the results of a case study of 34 early-stage trials to demonstrate this type of analysis and to reinforce the importance of including information on genetic relatedness. In addition to the application of a superior method of analysis, it is also critical to ensure the use of sound experimental designs. Recently, model-based designs have become popular in Australian plant breeding programs. Within this paradigm, the design search would ideally be based on a linear mixed model that matches, as closely as possible, the model used for analysis. Therefore, in this paper, we propose the use of models for design generation that include information on genetic relatedness and also include non-genetic and residual models based on the analysis of historic data for individual breeding programs. At present, the most commonly used design generation model omits genetic relatedness information and uses non-genetic and residual models that are supplied as default models in the associated software packages. The major reasons for this are that preexisting software is unacceptably slow for designs incorporating genetic relatedness and the accuracy gains resulting from the use of genetic relatedness have not been quantified. Both of these issues are addressed in the current paper. An updating scheme for calculating the optimality criterion in the design search is presented and is shown to afford prodigious computational savings. An in silico study that compares three types of design function across a range of ancillary treatments shows the gains in accuracy for the prediction of total genetic effects (and thence selection) achieved from model-based designs using genetic relatedness and program specific non-genetic and residual models.

Supplementary materials accompanying this paper appear online.

Statistical Approaches in Plant Breeding: Maximising the Use of the Genetic Information

On the Design of Field Experiments with Correlated Treatment Effects

Article 04 December 2014

QuLinePlus: extending plant breeding strategy and genetic model simulation to cross-pollinated populations—case studies in forage breeding

Article Open access 27 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Plant breeding is focused on the objective of genetic improvement, producing new varieties with increased productivity and quality. Most plant breeding programs follow a method of breeding referred to as a pedigree selection method. This method implies that programs are structured around the grouping of breeding lines which have been derived as progeny of a fixed number of crosses between elite parents. Different crosses are made each year, and the cohort of breeding lines then undergoes selection through preliminary and advanced stages of testing.

Traits of interest for selection in the preliminary stage include disease and herbicide tolerance, phenology type and functional grain quality. Selection intensity is high, reflecting the relatively high heritability of these traits or the ability to use marker-assisted selection techniques for simply inherited traits.

Typically, selection in the advanced stages occurs in a sequential manner. These stages are referred to as the S1, S2, S3 and S4 stages. The key selection trait in the advanced stages is grain yield. Yield data for each stage are generated from a series of field trials sown at several locations. Trials typically include the set of breeding lines of interest and also some check varieties, which will collectively be termed entries. This paper focuses on S1 and S2 stage trials, and we refer to these as the early-stage trials. The number of entries which are evaluated at the S1 and S2 stages often exceeds 1,000, while at the S3 and S4 stages, the number of entries tested is generally less than 100. At the S1 and S2 stages, resource and seed limitations reduce the numbers of locations and plots that can be used for each breeding line. Replication within a location is often less than two, and the number of locations is usually less than four. For the S3 and S4 stages, entries are evaluated at up to ten locations, with between two and three replicates per location. Given the relatively low heritability for the trait of grain yield, and the presence of variety (entry) by environment interaction, it is critical to adopt efficient experimental designs and appropriate methods of analysis.

In Australia, the preferred approach for the analysis of multi-environment trial (MET) data sets is the factor analytic linear mixed model of Smith et al. (2001). Their original approach considered modelling of the non-genetic effects within each environment using the spatial approach advocated by Gilmour et al. (1997) and modelling the variety by environment effects using a factor analytic model. Their approach did not include genetic relatedness for the variety effects for each environment. Oakey et al. (2006) addressed this for the analysis of a single trial, and later Oakey et al. (2007) incorporated genetic relatedness through the use of ancestral information for the models advocated by Smith et al. (2001). More recently, Smith and Cullis (2018) developed factor analytic selection tools to assist with selection decisions from a factor analytic linear mixed model analysis of MET data sets. These methods are in widespread use for the analysis of MET data sets in the advanced stages of selection in Australia.

There is an extensive literature on the design of field trials for plant breeding programs. Designs used for these trials fall into two broad categories: classical or optimal model-based designs. The former include complete and incomplete block designs, row–column designs or $\alpha $-designs (see Bailey 2008; John and Williams 1995, 1998; Patterson and Williams 1976, for example). The principle of model-based design is to search the design space for a design function (Bailey 2008), which is near optimal under a prespecified model. The model used for the design of field trials for plant breeding programs is usually a linear mixed model, which is consistent with the linear mixed model used for the analysis. Early work on model-based design for plant breeding selection trials focussed on methods to find optimal designs for spatially dependent data (Martin 1986; Martin et al. 2006; Martin and Eccleston 1992; Chan 1999). More recently, model-based designs have been considered for more general linear mixed models which include spatial dependence and blocking factors (Butler et al. 2008; Coombes 2002; Williams and John 2006).

Cullis et al. (2006) introduced p-rep designs to address the problems associated with the design of S1 and S2 stage trials where the number of plots for each breeding line in a trial is less than two. They showed that p-rep designs improved the accuracy of selection of breeding lines compared to so-called grid-plot designs (Kempton 1982). The p-rep designs are in widespread use in most plant breeding programs in Australia. These designs can be produced efficiently using statistical software such as the DiGGeR package (Coombes 2009). Williams et al. (2011) developed augmented p-rep designs by combining p-rep and augmented check plot designs.

In order to align model-based design with current methods for the analysis of S1, S2 and S3 stage field trials, recent work has focused on the inclusion of genetic relatedness through the use of ancestral information. Bueno Filho and Gilmour (2003) considered the construction of non-resolvable incomplete block designs when the treatments effects are correlated. Their study considered three choices of genetic relatedness in a simplistic setting of six treatments in blocks of size four. Within the limited scope of their study, they concluded that for most situations, it would be reasonable to use a design which is optimal for unrelated treatment effects.

Piepho and Williams (2006) investigated three types of design for a simple genetic structure for the set of treatment effects. On the basis of a simulation experiment, they concluded that the assumption of correlated treatment effects was superior to assuming fixed treatment effects. They also recommended that an unrestricted $\alpha $-design with a simple family-based treatment structure could be used for the design of field trials in plant breeding programs.

Butler et al. (2014) presented a more general approach for the design of field trials in a plant breeding program. Their approach was based on finding designs which were (near) optimal under a linear mixed model which partitioned the total genetic variance into additive and non-additive effects for inbred crops or simpler models for non-inbred crops which only included additive effects. Models for plot effects (Bailey 2008) were either classical (using fixed or random block effects) or based on spatial dependence or a mixture of both. Butler et al. (2014) illustrated their methods using two examples from S1 and S2 stage trials from a canola and sorghum breeding program, respectively. Their designs were generated using the OD-V1 statistical software package (Butler 2013).

There remain at least two issues which hinder the adoption of the Butler et al. (2014) designs for selection experiments in plant breeding programs. Firstly, there is a lack of empirical evidence regarding the improvement in accuracy relative to other classical or model-based designs which are currently in use. Secondly, the computational load for the design search is prohibitive for trials with a larger number of entries. The aim of this paper is to therefore address these issues. The issue of computational burden is addressed by development of an updating algorithm for evaluation of the optimality criterion, which is an extension of the algorithm described by Martin and Eccleston (1992) and Chan (1999), to allow for correlated treatment effects. Secondly, the empirical advantage of model-based designs with correlated effects is compared to two other commonly used designs using an in silico study, based on a large set of field trials from a plant breeding program based in Australia.

The paper is arranged as follows. In Sect. 2, we present a detailed analysis of a case study based on a set of early stage trials grown in 2018 by the four publicly funded pulse breeding programs in Australia. In Sect. 3, we present an approach to model-based design, including derivation of the updating formula for correlated treatment effects which addresses the computational burden associated with calculating the optimality criterion for each interchange in the design search. In this section, we also demonstrate how to obtain a near-optimal design using the R package OD-V2 (Butler and Cullis 2018) and provide some results on the reduction in computing time using the updating formula. We conclude in Sect. 4 with an in silico experiment designed to assess the performance of model-based designs using genetic relatedness and specific non-genetic models.

2 Case Study

Given that the model used for model-based design is “usually chosen to be as close as possible to that expected for the analysis” (Butler et al. 2014), a case study involving the analysis of 34 early stage trials was conducted. Further, the case study is fundamentally important to providing unequivocable evidence of the need to incorporate genetic relatedness in the design of field trials for plant breeding programs. Cullis et al. (2006) developed p-rep designs with the use of genetic relatedness in mind. However, they stated that pedigree information was not used for the analysis of early generation variety trials at that time. Further extensions of p-rep designs do not consider the use of genetic relatedness (Williams et al. 2011), while many other recent approaches to the design of field trials for plant breeding programmes have also ignored the use of pedigree information (see, for example, Piepho et al. 2018, 2016; Williams and Piepho 2018). This is despite the widespread adoption of the approaches of Oakey et al. (2007) for the analysis of both single-site and multi-environment trial plant breeding data sets.

The trials comprised S1 and S2 stage trials from four Australian public pulse breeding programs in 2018. The case study was used to highlight the importance of including information on genetic relatedness in the analysis and to summarize models used for non-genetic effects.

2.1 Genetic Material, Experimental Designs and Phenotyping

Comprehensive ancestral information was provided for all four programs, and this was used to form numerator relationship matrices (Meuwissen and Luo 1992) that are summarized in Table 1. The median generation number was 8, 5, 10 and 6 for chickpeas, fababeans, field peas and lentils, respectively. The median inbreeding coefficient for entries in all programs is consistent with the high level of inbreeding achieved prior to inclusion in yield evaluation trials. The average genetic relatedness (Dunner et al. 1998) is also high, reflecting the strong familial structure of the breeding lines in each of the programs. It is important to note, however, that there is substantial heterogeneity of the genetic relatedness within each program. Those entries with low genetic relatedness were predominantly older commercial varieties which have not been used as parents in recent crosses, while those entries with high genetic relatedness were either recently released commercial varieties which have been used as parents in many crosses or breeding lines from (full sib) families with a large representation in trials.

Table 1 Summary of numerator relationship matrices for each crop in the case study: information includes the minimum, mean and maximum inbreeding coefficient and genetic relatedness

Full size table

Table 2 presents a summary of key features of the 34 trials. All trials were laid out in rectangular arrays of plots, indexed by field columns and field rows. Plot dimensions were long and thin, with columns longer than rows. The trait of interest here is grain yield (t/ha). The mean yield varied substantially between trials, with some trials being severely affected by drought. Generally, S1 stage trials had more entries than S2 stage trials but fewer plots per entry. Partially replicated designs (Cullis et al. 2006) were used for ten out of the 12 S1 stage trials and four out of the 22 S2 stage trials. These so-called p-rep designs were resolvable (or near resolvable) with respect to the entries with more than one plot. The remaining trials were designed as resolvable two or three replicate block designs. In most cases, there were also additional plots of commercially important varieties. The resolvable blocks were aligned either with columns (referred to as column replicate blocks: CRep), aligned with rows (referred to as row replicate blocks: RRep) or in both directions (trials 12 and 13). In all cases, the resolvable column or row replicate blocks spanned multiple columns or rows, respectively. Most designs were constructed by staff within each breeding program using either DiGGeR (Coombes 2009), with default parameter settings, or Agrobase (Mulitze 1990). Pedigree information was not used in the design process for these trials. Designs for trials 1, 11, 14, 15, 22, 27 and 29 were constructed using the methods to be described in the current paper using OD-V2 (Butler and Cullis 2018).

Table 2 Summary of trials in the case study: information includes the number of plots, columns, rows and entries; the number of entries with specified levels of replication (1, 2, 3–4, $\ge $ 5); and the number of columns per column replicate and the number of rows per row replicate where applicable

Full size table

2.2 Statistical Models for Analysis

The approach used for the analysis is similar to that proposed by Oakey et al. (2006). They considered two linear mixed models with differing assumptions regarding the distribution of the (random) genetic effects. They referred to these models as the standard and pedigree models. In both cases, the non-genetic effects are modelled according to the approach devised by Gilmour et al. (1997), who allowed for three possible sources of variation, namely global, extraneous and local. The linear mixed models considered in this paper also include, by default, random terms which respect the design construction process. These included terms for resolvable or near-resolvable column or row replicate blocks for all designs (either one or both as required) and terms for columns and rows when the designs were constructed using model-based approaches. Low-order polynomials in columns and/or rows were not considered. The residual vector was modelled as a separable first-order auto-regressive process (Gilmour et al. 1997).

Following Oakey et al. (2006), the vector of genetic effects is partitioned into additive and non-additive effects given by

$$\begin{aligned} {\varvec{u_g}} = {\varvec{u_a}} + {\varvec{u_e}} \end{aligned}$$

(1)

where it is assumed that

$$\begin{aligned} \left[ \begin{array}{c} {\varvec{u_a}} \\ {\varvec{u_e}} \end{array}\right]\sim & {} \text{ N }\left( \left[ \begin{array}{c} {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} \end{array}\right] , \left[ \begin{array}{cc} \sigma ^2_a{\textit{\textbf{A}}} &{} {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} &{} \sigma ^2_e{\textit{\textbf{I}}}_v \end{array}\right] \right) \end{aligned}$$

where ${\textit{\textbf{A}}}$ is the $v\times v$ numerator relationship matrix and v is the number of entries in the pedigree. The pedigree model estimates $\sigma ^2_a$ and $\sigma ^2_e$ by fitting both terms, while the standard model excludes the additive effects, ignoring genetic relatedness. All analyses were conducted using Version 4 of the ASReml-R package (Butler et al. 2018).

2.3 Results of Analysis

To allow for an unambiguous comparison of the fit of the two genetic models, all terms and variance models not associated with the genetic effects were chosen while fitting the pedigree model and thence retained for the fit of the standard model.

Table 3 presents a summary of the residual maximum likelihood (REML) estimates of the variance parameters associated with the non-genetic terms in the random model and the variance model for the residual. Random terms associated with resolvable blocks were always included in the model, while random terms which were non-resolvable, namely columns and rows, were only included as required. The major source of non-genetic variation in the random model terms is due to the (long) columns, being fitted on 33/34 occasions. This result supports the findings in Oakey et al. (2006), who also fitted terms associated with columns on most occasions (11/14 trials). This has important implications for choosing terms to be included in the random model for construction of near-optimal model-based designs (see Sect. 3.5.1).

The REML estimates of the row and column autoregressive parameters are moderate to high for all programs with the estimates for the row dimension being greater than that for the column dimension (which is consistent with the shape of the plots). The REML estimates of these parameters vary between crops but are somewhat lower than those reported by Oakey et al. (2006), since they fitted a measurement error component.

Table 3 Summary of median of the REML estimates of the non-genetic variance parameters for each crop in the case study ($\times 10^{4}$ for all parameters except autoregressive correlations)

Full size table

Table 4 presents a summary of the REML estimates of the genetic variance parameters and the model-based reliability (Mrode 1995) of the prediction of the total genetic effects. Additive genetic variance is the dominant source of genetic variance in all programs, though the percent additive genetic variance does vary considerably between programs. These findings are consistent with those of Oakey et al. (2006) who reported a median percentage additive genetic variance of 84 for the 14 S3 stage wheat trials. The reliability of the predicted total genetic effects is typical of these trials.

Figure 1 presents a scatter plot of the log base 10 of the REML likelihood ratio test to test the hypothesis of zero additive genetic variance against the log base 10 of the number of entries with data for 33 trials (one trial was removed due to a near-zero estimate of genetic variance). Critical values for the REML likelihood ratio test were obtained from the lrt.asreml method within ASReml-R (Butler et al. 2018), which implements the approach described in Self and Liang (1987). The hypothesis of zero additive genetic variance was strongly rejected for the majority of the trials, with the exceptions associated with trials with low values of genetic variance or small numbers of entries. These results are again consistent with those of Oakey et al. (2006).

Table 4 Summary of analysis results for the case study: median of the REML estimates of the genetic variance parameters ($\times 10^{4}$), the percent additive genetic variance relative to the total genetic variance and the reliability of the prediction of the total genetic effect for entries with a single replicate

Full size table

3 Model-Based Design

Bailey (2008) defines a comparative experiment as an experiment in which we are interested only in contrasts between treatments. Early stage trials are comparative experiments which typically have a simple treatment structure. The aim is selection of the subset of breeding lines which are superior and hence will progress to the next stage of testing. An experimental design has three key elements: the plot structure, the treatment structure and the so-called design function. The design function is a function, T which allocates treatments to plots. Bailey (2008) defines the treatment structure as meaningful ways of dividing up the set of treatments ($\mathcal {T}$), the plot structure as meaningful ways of dividing up the set of plots ($\Omega $), ignoring treatments, and the design function as a function from $\Omega $ to $\mathcal {T}$. In classical design, the function T is chosen to satisfy certain combinatorial properties, whereas in model-based design the function T is chosen to result in a design which is optimal or near optimal for a prespecified model. The model considered in this paper is a linear mixed model.

3.1 Linear Mixed Model

In the following, we present a linear mixed model for ${\textit{\textbf{y}}}$, the n-vector of data, which is suitable for the model-based design (and analysis) of a comparative experiment. Nelder (1977) introduced the concept of two aspects of a random effect. He remarked that one kind of random term in a linear model is a component of error, while the other kind of random term represents those effects of interest. The latter type of random term will therefore be in the set of treatment factors, while the former type of random term will be in the set of plot factors. Applying this broad principle, we consider a linear mixed model with four components given by

$$\begin{aligned} {\textit{\textbf{y}}} = {\varvec{X_o}}{\varvec{\tau _o}} +{\varvec{X_p}}{\varvec{\tau _p}} + {\varvec{Z_o}}{\varvec{u_o}} + {\varvec{Z_p}}{\varvec{u_p}} + {\textit{\textbf{e}}} \end{aligned}$$

(2)

where ${\varvec{\tau _o}}$ and ${\varvec{\tau _p}}$ are vectors of fixed effects with associated design matrices ${\varvec{X_o}}$ and ${\varvec{X_p}}$ with $c_{x_o}$ and $c_{x_p}$ columns, respectively; ${\varvec{u_o}}$ and ${\varvec{u_p}}$ are vectors of random effects with associated design matrices ${\varvec{Z_o}}$ and ${\varvec{Z_p}}$ with $c_{z_o}$ and $c_{z_p}$ columns, respectively, and ${\textit{\textbf{e}}}$ is the vector of residuals. The subscripts o and p identify so-called objective and peripheral fixed and random effects, respectively. Equation (2) can be written succinctly as

$$\begin{aligned} {\textit{\textbf{y}}}= & {} {\varvec{W_o}}{\varvec{\beta _o}} + {\varvec{W_p}}{\varvec{\beta _p}} + {\textit{\textbf{e}}} \nonumber \\= & {} {\textit{\textbf{W}}}{\varvec{\beta }} + {\textit{\textbf{e}}} \end{aligned}$$

(3)

where ${\varvec{W_o}} = [{\varvec{X_o}}\; {\varvec{Z_o}}], \; {\varvec{W_p}} = [{\varvec{X_p}}\; {\varvec{Z_p}}], \; {\varvec{\beta _o}} = ({\varvec{\tau }}_{\textit{\textbf{o}}}^{\top }, {\textit{\textbf{u}}}_{\textit{\textbf{o}}}^{\top })^{\top }, \; {\varvec{\beta _p}} = ({\varvec{\tau }}_{\textit{\textbf{p}}}^{\top }, {\textit{\textbf{u}}}_{\textit{\textbf{p}}}^{\top })^{\top }, \; {\textit{\textbf{W}}} = [{\varvec{W_o}} \; {\varvec{W_p}}]$ and ${\varvec{\beta }} = ({\varvec{\beta }}_{\textit{\textbf{o}}}^{\top }, {\varvec{\beta }}_{\textit{\textbf{p}}}^{\top })^{\top }$.

The random effects and residuals in (2) are assumed to follow a normal distribution such that:

$$\begin{aligned} \left[ \begin{array}{c} {\varvec{u_o}} \\ {\varvec{u_p}} \\ {\textit{\textbf{e}}} \end{array}\right]\sim & {} \text{ N }\left( \left[ \begin{array}{c} {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} \end{array}\right] , \left[ \begin{array}{ccc} {\varvec{G_o}} &{}\quad {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} &{}\quad {\varvec{G_p}} &{}\quad {\textit{\textbf{0}}} \\ {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{R}}} \end{array}\right] \right) \end{aligned}$$

where ${\varvec{G_o}}, {\varvec{G_p}}$ and ${\textit{\textbf{R}}}$ are positive definite matrices assumed to be functions of vectors of variance parameters ${\varvec{\sigma _{g_o}}}, {\varvec{\sigma _{g_p}}}$ and ${\varvec{\sigma _r}}$, respectively. Model-based design requires values for these parameters so in the following they are regarded as known.

3.2 Predictions of Interest

The aim is to find an optimal or near-optimal design with respect to a d-vector of estimable functions ${\varvec{\pi }} = {\textit{\textbf{D}}}{\varvec{\beta _o}}$ where ${\textit{\textbf{D}}}$ is a known matrix with $c_{w_o}$ columns and $c_{w_o}=c_{x_o} + c_{z_o}$. The vector of estimable functions, ${\varvec{\pi }}$, involves only objective effects, but may involve fixed, random or both fixed and random effects. Gilmour et al. (2004) provide a computationally efficient algorithm for forming predictions from the linear mixed model specified in (3). For brevity in the following, we use the terminology of Gilmour et al. (2004) and refer to ${\varvec{\pi }}$ as the vector of predictions, which we assume are estimable. Gilmour et al. (2004) provide a simple test of estimability of predictions which fit naturally into their prediction algorithm. Briefly, given ${\textit{\textbf{D}}}$, the vector of predictions and associated prediction error variance/covariance matrix are formed by recursive absorption from an extended set of mixed model equations (see Robinson 1991, for example). Full details can be found in Gilmour et al. (2004). The mixed model equations (MMEs) for (3) are given by

$$\begin{aligned} {\textit{\textbf{C}}}\tilde{{\varvec{\beta }}} = {\textit{\textbf{W}}}^{\top }{\textit{\textbf{R}}}^{-1}{\textit{\textbf{y}}} \end{aligned}$$

(4)

where the coefficient matrix in (4) is

$$\begin{aligned} \left[ \begin{array}{cc} {\textit{\textbf{W}}}_{\textit{\textbf{o}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_o}} + {\textit{\textbf{G}}}_{\textit{\textbf{o}}}^{*} &{}\quad {\textit{\textbf{W}}}_{\textit{\textbf{o}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_p}} \\ {\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_o}} &{}\quad {\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_p}} + {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*} \end{array}\right] \end{aligned}$$

(5)

with

$$\begin{aligned} {\textit{\textbf{G}}}_{\textit{\textbf{o}}}^{*} = \left[ \begin{array}{cc} {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{0}}}\\ {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{G}}}_{\textit{\textbf{o}}}^{-1} \end{array}\right] \quad \text{ and } \quad {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*} = \left[ \begin{array}{cc} {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{0}}}\\ {\textit{\textbf{0}}} &{}\quad {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{-1} \end{array}\right] \end{aligned}$$

It follows that the reduced set of MMEs for $\tilde{{\varvec{\beta }}}_{{\textit{\textbf{o}}}}$ are given by

$$\begin{aligned} {\varvec{C_{oo}}} \tilde{{\varvec{\beta }}}_{{\textit{\textbf{o}}}} = {\textit{\textbf{W}}}_{\textit{\textbf{o}}}^{\top }{\varvec{P_p}}{\textit{\textbf{y}}} \end{aligned}$$

(6)

where ${\varvec{C_{oo}}} = {\textit{\textbf{W}}}_{\textit{\textbf{o}}}^{\top }{\varvec{P_p}}{\varvec{W_o}} + {\textit{\textbf{G}}}_{\textit{\textbf{o}}}^{*}, {\varvec{P_p}} = {\textit{\textbf{R}}}^{-1} - {\textit{\textbf{R}}}^{-1}{\varvec{W_p}}({\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*})^{-}{\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}$ and $({\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*})^{-}$ is any particular generalized inverse of ${\textit{\textbf{W}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{R}}}^{-1}{\varvec{W_p}}+ {\textit{\textbf{G}}}_{\textit{\textbf{p}}}^{*}$. It can be shown that

$$\begin{aligned} {\varvec{P_p}} = {\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1} - {\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1}{\varvec{X_p}}({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1}{\textit{\textbf{X}}}_{\textit{\textbf{p}}})^{-}{\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1} \end{aligned}$$

where ${\varvec{V_p}} = {\varvec{Z_p}}{\varvec{G_p}}{\textit{\textbf{Z}}}_{\textit{\textbf{p}}}^{\top }+ {\textit{\textbf{R}}}$ and $({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1}{\textit{\textbf{X}}}_{\textit{\textbf{p}}})^{-}$ is any particular generalized inverse of ${\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{V}}}_{\textit{\textbf{p}}}^{-1}{\varvec{X_p}}$. The matrix ${\varvec{P_p}}$ has rank $n - \mathrm{rank}\left( {\varvec{X_p}}\right) $ and is unique, and it is the Moore–Penrose inverse of ${\textit{\textbf{T}}} = {\varvec{M_p}}{\varvec{V_p}}{\varvec{M_p}}$, where ${\varvec{M_p}} = {\textit{\textbf{I}}}_n - {\varvec{X_p}}({\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }{\textit{\textbf{X}}}_{\textit{\textbf{p}}})^{-}{\textit{\textbf{X}}}_{\textit{\textbf{p}}}^{\top }$. That is ${\textit{\textbf{T}}} = {\varvec{P_p}}^{+}$.

For known ${\varvec{\sigma _{g_o}}}, {\varvec{\sigma _{g_p}}}$ and ${\varvec{\sigma _r}}$

$$\begin{aligned} {\textit{\textbf{D}}}({\varvec{\beta _o}} - \tilde{{\varvec{\beta }}}_{{\textit{\textbf{o}}}}) \sim \text{ N }({\textit{\textbf{0}}},\; {\varvec{\Lambda }}) \end{aligned}$$

(7)

where ${\varvec{\Lambda }} = {\textit{\textbf{D}}}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{D}}}^{\top }$ and ${\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}$ is a particular generalized inverse of the coefficient matrix of (6).

3.3 Optimal Design Criteria

In the context of early-stage trials, or more generally for comparative experiments, the most widely used and useful optimality criterion is the $\mathcal {A}$-optimality criterion (Martin 1986). Bueno Filho and Gilmour (2007) developed a Bayesian design criterion for selection experiments in plant breeding based on a utility function that minimizes the risk of an incorrect selection. They show that this is in fact the $\mathcal {A}$-optimality criterion based on the PEV matrix for the vector of random entry effects. Bueno Filho and Gilmour (2003) and Cullis et al. (2006) use this criterion for generating optimal or near-optimal designs for plant breeding designs when the treatments are correlated and for so-called p-rep designs used in early-stage trials. In this case, it can be shown that

$$\begin{aligned} \mathcal {A} = 2\sum _{i}^{}\sum _{j<i} \mathrm{pev}\left( \tilde{\pi }_i - \tilde{\pi }_j\right) /d/(d-1) \end{aligned}$$

where $d\le v$ is the number of varieties for which the $\mathcal {A}$-value is computed and $\mathrm{pev}\left( \right) $ refers to the prediction error variance of its scalar (or matrix) argument. A convenient form for computing $\mathcal {A}$ is given by

$$\begin{aligned} \mathcal {A} = \frac{2}{d-1}(\mathrm{tr}\left( {\varvec{\Lambda }}\right) - {\textit{\textbf{1}}}_d^{\top }{\varvec{\Lambda }}{\textit{\textbf{1}}}_d/d) \end{aligned}$$

We seek a design which minimizes $\mathcal {A}$ over all valid design functions of the design.

3.4 Design Search and Updating Formulae

The design search process presented in Butler et al. (2014) and implemented in OD-V2 (Butler and Cullis 2018) can be summarized as follows: given an initial design, the $\mathcal {A}$-value is optimized under the supervision of a search strategy (for example, a tabu search Glover 1989) for (pairwise) interchanges of the rows of ${\varvec{W_o}}$.

Each interchange requires evaluation of the $\mathcal {A}$-value, which is based on the prediction error variance matrix ${\varvec{\Lambda }}$. This requires expensive matrix calculations, and hence, an exhaustive search of the design space for moderate to large problems is implausible. Updating formulae for ${\textit{\textbf{C}}}_{oo}^{-}$ can be used to significantly reduce the computational burden.

Martin and Eccleston (1992) developed an updating method for finding optimal designs under a linear model with correlated errors. Their algorithm is extended here, but allowing the more general setting of correlated objective effects, within the framework of a linear mixed model. The interchange of two rows of ${\varvec{W_o}}$ is equivalent to first removing two rows from $\Omega $, followed by adding the two units back, but in reverse order. Martin and Eccleston (1997) suggested using a four-step approach which involves adding or removing one unit to obtain the new ${\textit{\textbf{C}}}_{oo}^{-}$. Chan (1999) presented a two-step approach which she claims to be simpler and easier to implement. We begin by developing a two-step approach, similar to that proposed by Chan (1999), but a four-step approach is also presented as this has proven to be competitive to the two-step approach in terms of computational load. The four-step approach is also more suitable for use in the context of early-stage trials, where the presence of so-called singleton treatments (that is, those treatments which occur on only one plot) can cause computational problems as discussed by Coombes (2002).

Let the current design contain $n=r+s$ units, which are referred to as plots in the following. Consider the retention of r plots by removing s plots, and both ${\varvec{W_o}}$ and ${\varvec{P_p}}$ are partitioned conformably with the removal of s plots as follows

$$\begin{aligned} {\varvec{W_o}} = \left[ \begin{array}{c} {\varvec{W_{or}}} \\ {\varvec{W_{os}}} \end{array}\right] \quad \text{ and } \quad {\varvec{P_p}} = \left[ \begin{array}{cc} {\varvec{P_{p;rr}}} &{} {\varvec{P_{p;rs}}} \\ {\varvec{P_{p;sr}}} &{} {\varvec{P_{p;ss}}}\end{array}\right] \end{aligned}$$

where, for example, ${\varvec{W_{or}}}$ is the design matrix for the subset of the design corresponding to the r retained plots so has r rows and $c_{w_o}$ columns. The matrix ${\textit{\textbf{T}}}$ is also partitioned conformably with ${\varvec{P_p}}$. We note that in many cases ${\varvec{X_p}}$ is null in which case ${\varvec{P_p}} = {\varvec{V_p}}^{-1}$ and hence ${\textit{\textbf{T}}} = {\varvec{V_p}}$. It follows that the coefficient matrix of the reduced MMEs for the subset of the design corresponding to the r retained plots is

$$\begin{aligned} {\varvec{C_{oo}^{(r)}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{T}}}_{{\textit{\textbf{rr}}}}^{\sim }{\varvec{W_{or}}} + {\varvec{G_o^*}} \end{aligned}$$

where ${\textit{\textbf{T}}}_{{\textit{\textbf{rr}}}}^{\sim }$ is any particular generalized inverse of ${\varvec{T_{rr}}}$.

Using a similar argument to Martin and Eccleston (1992) and results on the inverse of partitioned matrices (see, for example, Searle 1982), it can be shown that

$$\begin{aligned} {\varvec{C_{oo}}} - {\varvec{C_{oo}^{(r)}}} = {\varvec{F_{rs}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{-}{\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{\top } \end{aligned}$$

(8)

where ${\varvec{F_{rs}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;rs}}} + {\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;ss}}}$. Hence, an updating formula for ${\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}$ can now be derived using (8) as follows. Using Corollary 18.2.15 in Harville (1997) (p.432)

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} = {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} + {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}({\textit{\textbf{P}}}_{{\varvec{p;ss}}} - {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}})^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} \end{aligned}$$

(9)

where ${\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}} = {\varvec{F_{rs}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{-}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}$. If ${\textit{\textbf{P}}}_{{\varvec{p;ss}}}$ is non-singular, then ${\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}} = {\varvec{F_{rs}}}$. Furthermore, if $({\textit{\textbf{P}}}_{{\varvec{p;ss}}} - {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}})$ is non-singular, then (9) becomes

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} = {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} + {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}({\textit{\textbf{P}}}_{{\varvec{p;ss}}} - {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}})^{-1}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} \end{aligned}$$

(10)

Next, we consider adding s units back to the design, but with the appropriate permutation applied to the rows of ${\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}$. In the simple case of $s=2$, then provided that the interchange is legal, the two rows of ${\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}$ are interchanged. The new design matrix for the full set of $n=r+s$ units is therefore given by

$$\begin{aligned} {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{{\varvec{*}}} = \left[ \begin{array}{c} {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} \\ {\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}} \end{array}\right] \end{aligned}$$

where ${\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}}$ is the permuted design matrix associated with the s units.

Using a similar approach to the deletion of r units, we consider the coefficient matrix of the reduced MMEs for the full, new design with $r+s$ units which is given by

$$\begin{aligned} {\varvec{C_{oo}^{(r+s)}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{P}}}_{{\textit{\textbf{p}}}}{\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{{\varvec{*}}} + {\varvec{G_o^*}} \end{aligned}$$

It can be shown that

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}} - {\varvec{C_{oo}^{(r)}}} = {\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{-}{\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }} \end{aligned}$$

(11)

where ${\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{P}}}_{{\varvec{p;rs}}} + {\textit{\textbf{W}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}$. Hence, an updating formula for ${\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}-}$ is

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}-} = {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} - {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}({\textit{\textbf{P}}}_{{\varvec{p;ss}}} + {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}})^{-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} \end{aligned}$$

(12)

where ${\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}^{-}{\textit{\textbf{P}}}_{{\varvec{p;ss}}}$. If ${\textit{\textbf{P}}}_{{\varvec{p;ss}}}$ is non-singular, then ${\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{F}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}$. Furthermore, if $({\textit{\textbf{P}}}_{{\varvec{p;ss}}} + {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}})$ is non-singular, then (12) becomes

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}-} = {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} -{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}({\textit{\textbf{P}}}_{{\varvec{p;ss}}} + {\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}})^{-1}{\textit{\textbf{H}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} \end{aligned}$$

(13)

Using the two updating formulae, one for the removal and one for the addition, allows us to compute the new $\mathcal {A}$-value from ${\varvec{\Lambda }}^{{\varvec{(r+s)}}} = {\textit{\textbf{D}}}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}-}{\textit{\textbf{D}}}^{\top }$.

Setting $s=1$ leads to the four-step updating scheme proposed by Martin and Eccleston (1992). The four-step approach can have computational advantages, and additionally, it can be implemented to handle designs in which all of the objective effects are fixed effects and the design contains singletons. Using the updating formulae in (10) and (13), as an example, when $s=1$, we have:

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}= & {} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} + {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}(p_{p;ss} - {\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}})^{-1}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-} \end{aligned}$$

(14)

$$\begin{aligned} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r+s)}}-}= & {} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} - {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}(p_{p;ss} +{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }}{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}})^{-1}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}{\top }} {\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{{\varvec{(r)}}-} \end{aligned}$$

(15)

for removal and addition of a unit, respectively, where

$$\begin{aligned} {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}} = \left[ \begin{array}{c} {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} \\ {\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{\top }\end{array}\right] , \quad {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{{\varvec{*}}} = \left[ \begin{array}{c} {\varvec{W_{or}}} \\ {\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }}\end{array}\right] \quad \text{ and } \quad {\textit{\textbf{P}}}_{{\textit{\textbf{p}}}} = \left[ \begin{array}{cc} {\textit{\textbf{P}}}_{{\varvec{p;rr}}} &{}\quad {\textit{\textbf{p}}}_{{\varvec{p;rs}}} \\ {\textit{\textbf{p}}}_{{\varvec{p;sr}}}^{\top }&{}\quad p_{p;ss}\end{array}\right] \end{aligned}$$

and

$$\begin{aligned} {\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}= & {} {\textit{\textbf{f}}}_{{\textit{\textbf{rs}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{p}}}_{{\varvec{p;rs}}} + {\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}p_{p;ss} \\ {\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}}= & {} {\textit{\textbf{f}}}_{{\textit{\textbf{rs}}}}^{{\varvec{*}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{or}}}}^{\top }{\textit{\textbf{p}}}_{{\varvec{p;rs}}} + {\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}}p_{p;ss} \end{aligned}$$

for $p_{p;ss} - {\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}^{\top }{\textit{\textbf{C}}}_{{\textit{\textbf{oo}}}}^{-}{\textit{\textbf{h}}}_{{\textit{\textbf{rs}}}}>0$.

The two-step approach is straightforward; however, it is useful to illustrate the four-step approach in more detail. As a simple example of the four-step approach, consider interchange of two units in the design, denoted by $(\omega _a,\omega _b)$. The rows of the objective design matrix ${\varvec{W_o}}$ are indexed using $\Omega $ such that ${\varvec{W_o}} = {\varvec{W_o}}[\Omega ,]$, and let $\Omega _{-a}$ be the set of plots excluding $\omega _a$ and $\Omega _{-b}$ be the set of plots excluding $\omega _b$; then, the four-step approach consists of the following four steps:

1.
Remove $\omega _a$: ${\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\Omega _{-a},]$ and ${\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{\top }= {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _a,]$ and then use (14);
2.
Add $\omega _b$: ${\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\Omega _{-a},]$ and ${\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _b,]$ and then use (15);
3.
Remove $\omega _b$: ${\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\Omega _{-b},]$ and ${\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{\top }= {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\omega _b,]$ and then use (14);
4.
Add $\omega _a$: ${\textit{\textbf{W}}}_{{\textit{\textbf{or}}}} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}^{*}[\Omega _{-b},]$ and ${\textit{\textbf{w}}}_{{\textit{\textbf{os}}}}^{{\varvec{*}}{\top }} = {\textit{\textbf{W}}}_{{\textit{\textbf{o}}}}[\omega _a,]$ and then use (15)

noting that if a singleton is included in the set $(\omega _a,\omega _b)$, then the steps must be arranged so that the singleton is not removed first.

3.5 Implementation in OD-V2

3.5.1 Example

The implementation of model-based design for early-stage trials is shown here via an example based on one of the breeding programs introduced in Sect. 2. This example also forms the basis of the simulation study presented in Sect. 4. A design is required for testing 256 S1 stage entries from the field peas program with the aim of making selections for progression to S2 stage testing. The trial will also include four check varieties, which is standard practice in most S1 stage field trials. The predictions of interest are the total genetic effects for each of the 260 entries. In the linear mixed model, the total genetic effects will be partitioned into additive and non-additive effects through the use of the numerator relationship matrix for the 260 entries. Summaries of this matrix included a mean inbreeding coefficient of 0.941, while the minimum, mean and maximum genetic relatednesses were 0.130, 0.346 and 0.532. One of the check varieties had the highest mean genetic relatedness, as it had been widely used as a parent of the test lines.

A p-rep design will be constructed, with 50% of the breeding lines having two plots each. (Note that percentage replication will be examined in Sect. 4.) The check varieties will also each have two plots so the full design involves 392 plots that will be arranged as 12 columns by 28 rows. The linear mixed model for the design search will include terms for non-genetic effects as determined from historic analyses of relevant trials. In the current example, this relates to a comprehensive set of S1 and S2 stage trials from the field peas breeding program for the years 2013 to 2017 inclusively. This led to the inclusion of random effects for a two-level blocking factor CRep (corresponding to columns 1–6 and 7–12); random Column and Row effects and finally a separable first-order autoregressive process for the residuals. The design search requires values of the associated variance parameters, and these will be taken as the median values from the historic analyses.

The design for the example can be constructed in OD-V2 (Butler and Cullis 2018) using the following call:

In this call, the data frame data.df has 392 rows and corresponds to an initial (user-supplied) design. The genetic effects are specified in the term ric(Entry, Ainv) and relate to the total genetic effects (${\varvec{u_g}}$). The variance function ric() indicates that the variance model for the effects in the factor argument (that is, Entry) is given by $\sigma ^2_a{\textit{\textbf{A}}} + \sigma _e^2{\textit{\textbf{I}}}_{260}$ where ${\textit{\textbf{A}}}$ is the numerator relationship matrix for the 260 entries. Note that in the form given here, the inverse of the numerator relationship matrix, i.e. ${\textit{\textbf{A}}}^{-1}$ is supplied rather than ${\textit{\textbf{A}}}$ as it has been stored in sparse form (see Butler 2013). This specification for the total genetic effects will be termed the direct formulation and is a new feature of OD-V2 (Butler and Cullis 2018) that affords computational savings over the alternative, indirect formulation. The latter involves the inclusion of two terms in the model, associated with the additive effects (${\varvec{u_a}}$) and non-additive effects (${\varvec{u_e}}$), so that the resultant vector of objective effects (${\varvec{\beta _o}}$) has twice as many elements as the direct formulation. The other terms in the random model formula are associated with random CRep, Column and Row effects, respectively. Lastly, the residual model formula specifies that the variance model for the residuals is one associated with a separable first order autoregressive process. The permute model formula specifies that Entry is the factor to be permuted and the use of the direct formulation means that the $\mathcal {A}$-value will be computed for total genetic effects. The variance parameters are supplied using the G.param and R.param arguments (also see “Appendix A’).

3.5.2 Timings for Updating Schemes

Table 5 presents the elapsed time (seconds) for a full tabu search and 1000 interchanges for OD-V2 using the direct formulation for total genetic effects: 1000 interchanges for OD-V2 using the indirect formulation for total genetic effects and 1000 interchanges for OD-V1 using the indirect formulation for total genetic effects. The timings were conducted on eight trials used in the case study (trials numbered 3, 5, 15, 17, 22, 23, 27 and 31 in Table 5). The trials included an S1 and S2 stage trial for each of the four crops. The results illustrate the substantial reduction in computational load by using the updating formula developed in Sect. 3.4 and approximately agree with the total number of FLOPs involved, that is, $O(c_{w_o}^3)$ without an updating scheme compared with $O(c_{w_o})$ with the updating scheme.

Table 5 Timings for designs for eight trials (trial numbers correspond to those in Table 2): elapsed time (seconds) for a full tabu loop (number of interchanges given in column 3) and for 1000 interchanges for OD-V2 using the direct formulation for total genetic effects; 1000 interchanges for OD-V2 using the indirect formulation for total genetic effects and 1000 interchanges for OD-V1 using the indirect formulation for total genetic effects

Full size table

4 Accuracy of Designs Using Genetic Relatedness

The performance of model-based designs constructed using genetic relatedness and specific non-genetic models was assessed using an in silico experiment based on the example presented in Sect. 3.5.1. Of particular interest were comparisons with model-based designs using specific non-genetic models but no information on genetic relatedness and with more classical approaches to design which use neither specific non-genetic models nor genetic relatedness. Thus, there were three key treatments under investigation, corresponding to three design functions (DF), labelled as DF$++$, DF$-+$ and DF${-}{-}$ where the two final characters indicate the use/nonuse of genetic relatedness and specific non-genetic models in the design construction. Further details are provided later.

The simulation experiment was also designed to cover the range of genetic variance parameters and levels of replication in current use for S1 and S2 stage trials in the pulse breeding programs described in Sect. 2. This led to the construction of a total of 63 ancillary treatments to consider, comprising the factorial combinations of the three factors described in Table 6. These factors are labelled Prep (the percentage of breeding lines which have two plots per trial), Padd (the proportion of additive genetic variance expressed as a ratio of the total genetic variance) and Rsq (the baseline reliability of the data for the design with level 0 of the Prep factor).

Table 6 Summary of the ancillary treatment factors and their levels used in the simulation experiment

Full size table

Table 7 presents summaries of the layouts for the seven basic field trial configurations arising from the different replication levels used in the simulation experiment. Every trial was assumed to be laid out in a contiguous rectangular array with 12 columns and between 22 and 28 rows. The blocking factor CRep was constructed to span two sets of contiguous columns in each case.

Table 7 Summary of the field trial layouts used in the simulation experiment

Full size table

The linear mixed model chosen to generate the non-genetic effects for the data sets used in this study included terms, variance models and variance parameters as determined from the historic analyses discussed in Sect. 3.5.1. The linear mixed model also partitioned the total genetic effects into additive and non-additive effects. Given the set of variance parameters for the non-genetic effects, the two genetic variance parameters in the data generation model were then chosen to be consistent with the levels of the two ancillary treatment factors Padd and Rsq. Table 8 presents the variance parameters for all of the random and residual variance models in the data generation linear mixed model for each level of Padd and Rsq.

Table 8 Variance parameters for the data generation linear mixed models in the simulation experiment

Full size table

Data sets were generated according to the linear mixed model given by

$$\begin{aligned} {\textit{\textbf{y}}}_{ij}^{[k]} = {\textit{\textbf{X}}}_i{\varvec{\tau }}_i + {\textit{\textbf{Z}}}_{ij}{\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{a}}},i} + {\textit{\textbf{Z}}}_{ij}{\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{e}}},i} + {\varvec{\eta }}_{i}^{[k]} \end{aligned}$$

(16)

where $i = 1, \ldots , 63$ indexes the ancillary treatments, $j = 1, \ldots , 3$ indexes the design functions and $k = 1, \ldots , N$ indexes the simulations ($N=4000$). For each k ($=1, \ldots , N$), ${\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{a}}},i}$ and ${\textit{\textbf{u}}}^{[k]}_{{\textit{\textbf{e}}},i}$ are sampled from a normal distribution with mean zero and variance matrices given by $\sigma _{a,i}^2{\textit{\textbf{A}}}$ and $\sigma _{e,i}^2{\textit{\textbf{I}}}_{260}$, respectively. Note that the genetic variance components although indexed by the level of T are constant on the levels of Padd and Rsq, with only nine unique combinations as listed in Table 8. The vector ${\varvec{\eta }}_{i}^{[k]}, i=1,\ldots ,63; k=1,\ldots ,N$ represents the plot effects that is the sum of all the random non-genetic effects and residuals. These are obtained via sampling from a normal distribution with mean zero and variance matrix given by

$$\begin{aligned} \sigma ^2_{cr}{\textit{\textbf{Z}}}_{{\textit{\textbf{cr}}},i}{\textit{\textbf{Z}}}_{{\textit{\textbf{cr}}},i}^{\top }+ \sigma ^2_{c}{\textit{\textbf{Z}}}_{{\textit{\textbf{c}}},i}{\textit{\textbf{Z}}}_{{\textit{\textbf{c}}},i}^{\top }+ \sigma ^2_{r}{\textit{\textbf{Z}}}_{{\textit{\textbf{r}}},i}{\textit{\textbf{Z}}}_{{\textit{\textbf{r}}},i}^{\top }+ \sigma ^2{\varvec{\Sigma _{c}}}_i(\rho _c)\otimes {\varvec{\Sigma _{r}}}_i(\rho _r) \end{aligned}$$

where ${\varvec{\Sigma _{c}}}_i$ and ${\varvec{\Sigma _{r}}}_i$ are scaled variance matrices of order $c_i$ and $r_i$ for first-order autoregressive processes where $c_i$ and $r_i$ are the numbers of columns and rows in the trial layout. These matrices are functions of autocorrelation parameters, $\rho _c$ and $\rho _r$, for the column and row dimensions, respectively. The design matrices for the fixed and non-genetic random effects are given by ${\textit{\textbf{X}}}_i, {\textit{\textbf{Z}}}_{{\textit{\textbf{cr}}},i}, {\textit{\textbf{Z}}}_{{\textit{\textbf{c}}},i}$ and ${\textit{\textbf{Z}}}_{{\textit{\textbf{r}}},i}$ where the latter correspond to the terms in the random model formula, namely CRep, Column and Row, respectively. Values for all the variance parameters are set out in Table 8. There is only one fixed effect associated with an overall mean. Lastly, we consider the design matrix for the genetic effects. A separate design matrix, ${\textit{\textbf{Z}}}_{ij}$, $i=1,\ldots ,63; j=1,2,3$ is formed for each level of the design function factor as described below

DF$++$ a design function generated from (16)
DF$-+$ a design function generated from the plot effects component of (16) but with genetic effects assumed to be fixed effects
DF${-}{-}$ a design function generated from the model of Williams et al. (2011) with genetic effects assumed to be fixed effects

Hence, there were a total of 63 different designs for DF$++$ but only seven designs for DF$-+$ and DF${-}{-}$, one for each level of Prep. When forming the design function for the DF$++$ design, for levels 5, 10, 15, 25, 50 of Prep, we used the method of Huang et al. (2013) to subset those breeding lines to be tested with two plots in the trial.

All design functions were obtained using OD-V2 (Butler and Cullis 2018), by varying the model formulae for the fixed, random and residual components. The OD-V2 call for DF$++$ designs was as given in Sect. 3.5.1. Details and scripts for all designs are available in the supplementary materials.

The correlations between the true and predicted total genetic effects for all entries were calculated for each ancillary treatment and design function, (a total of 189 combinations), where the predicted genetic effects were the empirical BLUPs obtained from the fitting of the linear mixed model that corresponded to the data generation model.

4.1 Simulation Study Results

The estimation of the variance parameters is considered first. Summaries of the MSEP, simulation variance and bias (expressed as a % of the true value) for all variance parameters in the model are available in the supplementary materials. Only the results for the additive and total genetic variance parameters are presented here. Figures 2 and 3 present plots of the percentage bias for the additive genetic variance and total genetic variance against the levels of Prep. Significant bias occurs for some combinations of k and $r^2_0$. There is consistent, negative bias for $k=0.9$ for the additive genetic variance, for all levels of Prep and Rsq. This negative bias is associated with a significant positive bias for the non-additive genetic variance for these combinations, which then results in bias for the total genetic variance, especially for $k=0.9$ and $r^2_0=1/3$. Bias for the total genetic variance is acceptable for the four Padd and Rsq combinations of $(k=0.5, r^2_0=1/2)$, $(k=0.5, r^2_0=2/3)$, $(k=0.7, r^2_0=1/2)$ and $(k=0.7, r^2_0=2/3)$ for all levels of Prep. The remaining combinations of Padd and Rsq all exhibit unacceptable bias for total genetic variance, particularly for small values of Prep. Bias for the additive genetic variance is acceptable for all levels of Prep except when $k=0.9$. This latter bias is largely a result of the significant positive bias in estimation of the non-additive genetic variance, when it is such a small component of the total genetic variance and is so close to zero (relative to the residual variance).

Table 9 presents the mean squared error of prediction (MSEP-CV) for the REML estimate of the additive genetic variance expressed as a coefficient of variation (relative to the true value) for each level of replication of the test lines, the proportion of additive genetic variance as a ratio to the total genetic variance and for baseline heritabilities of $r^2_0=1/3$ and $r^2_0=1/2$. The results for $r^2_0=2/3$ are similar and are omitted. The MSEP-CV for the DF$++$ design function is consistently lower than the MSEP-CV for the other two design functions. Of particular interest is the ability of the DF$++$ design function to achieve comparable levels of MSEP-CV achieved by DF$-+$ and DF${-}{-}$ for much lower values of Prep. For example, the MSEP-CV for the design functions DF${-}{-}$ and DF$-+$ for a Prep of 25% is approximately the same as the MSEP-CV for the DF$++$ design function for a Prep of only 10%.

Table 9 Mean squared error of prediction for the REML estimate of the additive genetic variance expressed as a coefficient of variation (relative to the true value) for each design function, level of replication of the test lines, the proportion of additive genetic variance as a ratio to the total genetic variance and for baseline heritabilities of 1/3 and 1/2

Full size table

The correlations between the true and predicted entry effects are summarized in Figs. 4 and 5 for $r^2_0=1/3$ and $r^2_0=1/2$, respectively. Each figure presents panel line plots of the mean simulation-based correlations (in blue) and model-based reliability (in black) versus the levels of Prep for each of the three design functions. The results for $r^2_0=2/3$ are similar, with the DF$++$ design function superior to the other design functions for all levels of Prep and Padd. The model-based reliability was derived from the $\mathcal{A}$-values obtained from either the final $\mathcal{A}$-value for the DF$++$ design function, or the initial $\mathcal{A}$-values for the DF$-+$ and DF${-}{-}$ design functions evaluated under the (true) data model of (16), using the approximation of Cullis et al. (2006).

It is immediately clear that there is a consistent and important increase in correlation for DF$++$ designs relative to both DF$-+$ and DF${-}{-}$ designs. The advantage is larger for higher values of k and lower values of $r^2_0$. The DF$++$ designs are superior for all levels of Prep, though the advantage diminishes for designs with Prep less than 10%. It is also noteworthy that the advantage of DF$++$ designs for values of Prep greater than 25% is stable, suggesting there is little impact of using the subset selection method of Huang et al. (2013) in terms of improving the accuracy of the design.

The plots also highlight the interaction between the increase in simulation-based correlation and model-based reliability due to increasing the level of replication and the value of k. This suggests that the information coming from the data plays an important role in improving the accuracy of the prediction of total genetic effects when the non-additive component is present (and can be reliably estimated). The impact of variance parameter estimation on accuracy is clear. All simulation-based correlations are significantly lower than model-based reliabilities, but particularly so for designs with values of Prep less than 10%. This is consistent with the MSEP-CV shown in Table 9.

One final remark is that across all levels of $k, r^2_0$ and Prep, there is no appreciable difference between design functions DF$-+$ and DF${-}{-}$. This may be a result of the choice of variance parameters for the residual process which gave rise to only modest levels of spatial dependence arising from the inclusion of spatial dependence. Chan (1999) suggested that the optimality of a model-based design is robust to the choice of variance parameter settings as long as the model class remains the same which is not entirely consistent with our findings. Further work in this area is required to examine this issue in more detail.

References

Bailey, R. A. (2008). Design of comparative experiments. Cambridge University Press, Cambridge.
Book Google Scholar
Bueno Filho, J. S. D. S. & Gilmour, S. G. (2007). Block designs for random treatment effects. Journal of Statistical Planning and Inference 137, 1446–1451.
Article MathSciNet Google Scholar
Bueno Filho, J. S. S. & Gilmour, S. G. (2003). Planning incomplete block experiments when treatments are genetically related. Biometrics 59, 375–381.
Article MathSciNet Google Scholar
Butler, D. & Cullis, B. (2018). Optimal Design under the Linear Mixed Model: https://mmade.org/optimaldesign/. Technical report, National Institute for Applied Statistics Research Australia, University of Wollongong.
Butler, D., Eccleston, J., & Cullis, B. (2008). On an approximate optimality criterion for the design of field experiments under spatial dependence. Australian and New Zealand Journal of Statistics 50.
Butler, D. G. (2013). On The Optimal Design of Experiments Under the Linear Mixed Model. PhD thesis, University of Queensland.
Butler, D. G., Cullis, B. R., Gilmour, A. R., & Thompson, R. (2018). ASReml version 4. Technical report, University of Wollongong.
Butler, D. G., Smith, A. B., & Cullis, B. R. (2014). On the Design of Field Experiments with Correlated Treatment Effects. Journal of Agricultural, Biological, and Environmental Statistics 19, 539–555.
Article MathSciNet Google Scholar
Chan, B. (1999). The design of field experiments when the data are spatially correlated. PhD thesis, The University of Queensland.
Coombes, N. E. (2002). The reactive tabu search for efficient correlated experimental designs. PhD thesis, Liverpool John Moores University, Liverpool, U.K.
Coombes, N. E. (2009). DiGGeR, a Spatial Design Program. Biometric bulletin, NSW DPI.
Cullis, B. R., Smith, A. B., & Coombes, N. E. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological, and Environmental Statistics 11, 381–393.
Article Google Scholar
Dunner, S., Checa, M. L., Gutierrez, J. P., Martin, J.P., & Canon, J. (1998). Genetic analysis and management in small populations: The asturcon pony as an example. Genetics Selection Evolution 30, 397–405.
Article Google Scholar
Gilmour, A., Cullis, B., Welham, S., & Gogel, B. (2004). An efficient computing strategy for prediction in mixed linear models. Computational Statistics & Data Analysis 44, 571–586.
Article MathSciNet Google Scholar
Gilmour, A.R., Cullis, B.R., Verbyla, A.P. & Verbyla, A. P. (1997). Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural, Biological, and Environmental Statistics 2, 269.
Article MathSciNet Google Scholar
Glover, F. (1989). Tabu search–Part 1. Journal on Computing 1, 190–206.
MATH Google Scholar
Harville, D. A. (1997). Matrix algebra from a statisticians perspective. Springer-Verlag, New York.
Book Google Scholar
Huang, B. E., Clifford, D. & Cavanagh, C. (2013). Selecting subsets of genotyped experimental populations for phenotyping to maximise genetic diversity. Theoretical and Applied Genetics 126, 379–388.
Article Google Scholar
John, J. A. & Williams, E. R. (1995). Experimental designs. Wiley and Sons, London.
Google Scholar
John, J. A. & Williams, E. R. (1998). t-Latinized designs. Australian and New Zealand Journal of Statistics 40, 111–118.
Article MathSciNet Google Scholar
Kempton, R. A. (1982). The design and analysis of unreplicated field trials. Vortrage fur Pflanzenzuchtung 7, 219–242.
Google Scholar
Martin, R. & Eccleston, J. (1997). Construction of optimal and near-optimal designs for dependent observations using simulated annealing. Technical report, Dept. Prob. Statist, Unversity of Sheffield.
Martin, R. J. (1986). On the design of experiments under spatial correlation. Biometrika 73, 247–277.
Article MathSciNet Google Scholar
Martin, R. J., Chauhan, N., Eccleston, J. A. & Chan, B. S. P. (2006). Efficient experimental designs when most treatments are unreplicated. Linear Algebra and Its Applications 417, 163–182.
Article MathSciNet Google Scholar
Martin, R. J & Eccleston, J. A. (1992). Recursive formulae for constructing block designs with dependent errors. Biometrika 79, 426–430.
Article MathSciNet Google Scholar
Meuwissen, T. H. E. & Luo, Z. (1992). Computing inbreeding coefficients in large populations. Genetics, Selection and Evolution 24, 305–315.
Article Google Scholar
Mrode, R. A. (1995). Linear models for the prediction of animal breeding values. CABI Publishing, Wallingford.
Google Scholar
Mulitze, D. (1990). AGROBASE/4: A microcomputer database management and analysis system for plant breeding and agronomy. Agronomy Journal 82, 1016—-1021.
Article Google Scholar
Nelder, J. A. (1977). A reformulation of linear models. Journal of the Royal Statistical Society, Series A 140, 48–76.
Article MathSciNet Google Scholar
Oakey, H., Verbyla, A., Cullis, B., Wei, X., & Pitchford, W. (2007). Joint modeling of additive and non-additive (genetic line) effects in multi-environment trials. Theoretical and Applied Genetics 114.
Oakey, H., Verbyla, A. P., Cullis, B. R., Pitchford, W. S., & Kuchel, H. (2006). Joint modelling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics 113, 809–819.
Article Google Scholar
Patterson, H. D. & Williams, E. R. (1976). A new class of resolvable incomplete block designs. Biometrika 63, 83–92.
Article MathSciNet Google Scholar
Piepho, H.-P. & Williams, E. R. (2006). A comparison of experimental designs for selection in breeding trials with nested treatment structure. Theoretical Applied Genetics. 113, 1505–1513.
Article Google Scholar
Piepho, H.-P., Williams, E. R. & Michel, V., (2016). Nonresolvable Row–Column Designs with an Even Distribution of Treatment Replications. Journal of Agricultural, Biological, and Environmental Statistics 21, 227–242.
Article MathSciNet Google Scholar
Piepho, H.-P., Michel, V., & Williams, E. R. (2018). Neighbor balance and evenness of distribution of treatment replications in row-column designs. Biometrical Journal 60, 1172–1189.
Article MathSciNet Google Scholar
Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science 6, 15–51.
Article MathSciNet Google Scholar
Searle, S. R. (1982). Matrix algebra useful for Statistics. Wiley Inter Science, London.
MATH Google Scholar
Self, S. C. & Liang, K. Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non-standard conditions. Journal of the American Statistical Society 82, 605–610.
Article Google Scholar
Smith, A., Cullis, B. R., & Thompson, R. (2001). Analysing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57, 1138–1147.
Article MathSciNet Google Scholar
Smith, A. B. & Cullis, B. R. (2018). Plant breeding selection tools built on factor analytic mixed models for multi-environment trial data. Euphytica 214, 1–19.
Article Google Scholar
Williams, E., Piepho, H. P., & Whitaker, D. (2011). Augmented p-rep designs. Biometrical Journal 53, 19–27.
Article MathSciNet Google Scholar
Williams, E. R. & Piepho, H.-P. (2018). Optimality and Contrasts in Block Designs with Unequal Treatment Replication Australian and New Zealand Journal of Statistics 57, 203–209.
Article MathSciNet Google Scholar
Williams, E. R. & John, J. A. (2006). Row-column factorial designs for use in agricultural field trials. Applied Statistics 62, 103–108.
Google Scholar

Download references

Acknowledgements

Funding was provided by Grains Research and Development Corporation (Grant No. UW00010).

Author information

Nicole A. Cocks
Present address: BASF, Ludwigshafen, Germany

Authors and Affiliations

Centre for Bioinformatics and Biometrics, National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, Australia
Brian R. Cullis, Alison B. Smith, Nicole A. Cocks & David G. Butler

Authors

Brian R. Cullis
View author publications
You can also search for this author in PubMed Google Scholar
Alison B. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Nicole A. Cocks
View author publications
You can also search for this author in PubMed Google Scholar
David G. Butler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brian R. Cullis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 49 KB)

Appendix A: OD Timing Syntax

Below are the calls used to obtain the timings in Table 5. The first sequence of calls provides an estimate of the elapsed time to complete a full tabu search of the design space and 1000 interchanges using OD-V2 and the direct formulation for the total genetic effects. The function od.init sets or displays various options that affect the behaviour of OD-V2. The first call to OD returns a template from which variance parameters for each of the random and residual terms can be specified. These are assigned in the subsequent two lines of the code, and these are supplied in the call to OD-V2 via the R.param and G.param arguments. The permute argument specifies that the column labelled Entry, which contains the entry code, is to be permuted and the optimality criterion is for the total genetic effects for those entries with data. Use of the time argument provides an estimate of the time for a single tabu loop based on a set of random interchanges. If time$> 0$, the objective criterion is evaluated the specified number of times prior to the optimization step. The elapsed time is reported and used to estimate the execution time for a scavenging tabu loop.

The next sequence of calls estimates the elapsed time for 1000 random interchanges using the indirect model, but in this call to OD-V2 the optimality criterion is based on the additive genetic effects. OD-V2 does not allow the user to specify the calculation of the optimality criterion for the total genetic effects using the indirect formulation for total effects. The pipe operator (|) is used to associate the permute factor with other terms in the model that should be permuted in parallel. The performance of the indirect formulation for the total genetic effects was found to be substantially inferior to the direct formulation, and hence it has been deprecated. Therefore, the elapsed times using these calls will not be exactly equivalent to the use of the indirect formulation within the updating scheme, but trials suggest that they are adequate to ascertain the approximate penalty associated with use of the indirect formulation, which is the only form available in OD-V1 for total genetic effects.

The final sequence of calls using OD-V1 provides the elapsed times for the indirect formulation for the total genetic effects. There is no option to evaluate timings within the call to OD-V1, and so crude estimates of elapsed time were estimated from a sequence of two calls to isolate the time for evaluating the optimality criterion for 1000 random interchanges.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cullis, B.R., Smith, A.B., Cocks, N.A. et al. The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness. JABES 25, 553–578 (2020). https://doi.org/10.1007/s13253-020-00403-5

Download citation

Received: 06 November 2019
Accepted: 16 June 2020
Published: 16 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13253-020-00403-5

The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness

Abstract

Similar content being viewed by others

Statistical Approaches in Plant Breeding: Maximising the Use of the Genetic Information

On the Design of Field Experiments with Correlated Treatment Effects

QuLinePlus: extending plant breeding strategy and genetic model simulation to cross-pollinated populations—case studies in forage breeding

1 Introduction

2 Case Study