Keywords

1 Learning Objectives

  • Understand fundamental experimental design concepts.

  • Describe the structural differences between classical designs.

  • Understand the purpose of model-based design and how it can enhance plant improvement experiments.

2 Introduction

Good experimental design underpins wheat improvement research, whether it is conducted in the field, glasshouse or laboratory. Experimental design theory has developed over the last two decades from the classical designs described in texts like Cochran and Cox [1] to optimal model-based designs introduced in Martin [2] and extended in Butler et al. [3] and Cullis et al. [4]. However, the fundamental design principles of replication, randomisation and controlling for heterogeneity promoted by Fisher [5] remain the same (Sect. 13.3).

Typical plant improvement experiments (PIEs) evaluate treatments in replicated experiments which follow one of a few classical experimental design structures (Sect. 13.4). Examples of treatments include genetic entities such as lines, hybrids or varieties in breeding trials; agronomic factors such as fertilizer or irrigation amounts and pathotypes in disease rating trials. Classical designs primarily differ in the way they control for expected heterogeneity in the experiment, that is, their plot, or block structure (Sect. 13.5). These structures are rigid with respect to the number of treatments and/or replication per treatment and can constrain research outcomes.

In contrast, model-based designs are flexible and directly link to the data analysis. They are enabled by the development of statistical modelling technology and advances in computational power. Hence, it is now possible to design experiments which optimize resource use and improve treatment prediction accuracy. Importantly, classical designs can be generated in the model-based design paradigm, as demonstrated in Sect. 13.5.

It is important to understand that the new technologies of high throughput phenotyping, genomic selection and rapid cycle breeding are as dependent on robust experimental design as older breeding technologies. The success of these technologies will depend on cohesive multi-disciplinary teams which include biometricians. This chapter aims to provide researchers with a good understanding of experimental design concepts and a taste of what is possible in the model-based design paradigm. As such, it is a resource for basic knowledge and a springboard to other resources for out-of-scope topics.

3 Fundamental Design Concepts

The following terms and concepts form the basis for understanding classical and model-based designs in a plant improvement context. These definitions follow two recommended texts: [6, 7].

3.1 Definitions

Experimental Purpose

is the aim of the experiment. Examples include, selecting breeding lines for variety release and testing the hypothesis that two pesticides are equally effective in controlling aphids.

Experimental Unit

is the smallest unit to which a treatment is applied. For example, in a yield trial a field plot is the experimental unit as a variety (the treatment) is allocated to an entire plot. In an agronomy trial where a herbicide treatment is applied along a row (containing 10 field plots, say) then the experimental unit for herbicide is a row.

Treatment Factors

are factors of interest imposed by the researcher, each treatment factor describes what can be applied to an experimental unit. The treatment structure is a meaningful way to divide up the set of treatments.

Observational Unit

is the smallest unit on which a response (trait) is measured. It is often called a plot but it may not reflect an actual field plot. For example, the observational unit (plot) could represent a tiller or grain sample, sampled from within a field plot. In yield trials, the observational unit (plot) is a physical field plot (i.e. the intersection of a row and column in a field layout) and yield is measured on the whole plot. The term field plot is used for clarity.

Plot Factors

are the non-treatment factors whose structure describes the observational units (plots).

Design Function

describes how the treatments are allocated to plots. The process of randomization which determines this allocation takes many forms and considers the logistical constraints of the experiment and the experimental purpose.

3.2 Replication

A replicate is a copy of a treatment, such that the number of replicates of a treatment is the number of experimental units to which a treatment is applied [7].

A common question from researchers is “how many replicates do I need?” The ability to detect a statistically significant difference between treatments, or power, depends on the underlying population variance (σ2) and the sample size (replication, n). The formula for the variance of the sample mean is σ2/n. Theoretically, it is clear that increasing n should decrease the variance of the sample mean thereby increasing the power of the experiment but this is not always the case (see [6, 7] for further details).

3.3 Randomization

Randomization is the process of allocating treatments to experimental units. Randomization minimizes bias in the experiment ensuring representative sampling of each treatment. Bailey [6] describes four types of bias, each illustrated here with a plant improvement example:

  • Systematic: allocating the varieties 1, 2, …, 20 to plots 1, 2, …, 20, i.e., variety 1 to plot 1, variety 2 to plot 2 etc. in the first replicate for all trials in a multi-environment trial series.

  • Selection: compositing the grain samples from the varieties with lower plot yields but not those with higher plot yields.

  • Accidental: measuring a grain quality trait on varieties which reach maturity before others.

  • Cheating: allocating an irrigation treatment to a lower lying part of the field than a non-irrigation treatment.

3.4 Blocking: Controlling for Variability

Biologically, individual experimental units (e.g., field plots) vary from one another prior to the application of treatments. Common sources of variability are fertility and moisture gradients in the field; lighting and air conditioning in glasshouse and processing equipment such as mills in laboratories. If this variability is ignored in the design (or analysis) then the measurement error (residual variation) can be inflated which results in less accurate comparisons between the treatments of interest (see Chap. 15).

Blocking the experimental units into groups that are considered to be homogeneous attempts to control known (or anticipated) local variation [5], thereby reducing the residual variance and increasing the precision (power) of the experiment. Complete blocks contain an experimental unit for each treatment, incomplete blocks do not. Spatial variability and experimental logistics determine block size, shape and orientation.

3.5 Pseudo-Replication

Pseudo-(or false) replication is when multiple measurements are taken from an experimental unit. Pseudo-replication frequently occurs when treatments are allocated to big blocks. For example, two trials of a double-haploid population are conducted to assess drought tolerance, one in an irrigated block and the other in a non-irrigated block. There is replication of the breeding lines within each trial (block) but there is no replication of the irrigation treatments and is thus not ‘real’ [6, 7].

3.6 Orthogonality and Balance

Orthogonality and balance describe the structure of an experiment [7]. Two factors are orthogonal if they can be evaluated independently of each other, i.e. their estimated effects are the same irrespective of the presence (or not) of the other factor in the model [7]. A balanced design (e.g., RCBD) has equal precision on all treatment comparisons [7].

Non-orthogonal designs are possible for situations where resources are limited. In non-orthogonal designs treatment factors are deliberately not equally replicated or deliberately confounded with other factors, such as blocks. Identifying which factors in a design are orthogonal (or not) enables appropriate inference about the key factors of interest. Non-orthogonality can occur between any two factors (treatment or plot) in an experiment. For example, in a randomized complete block design (RCBD, Sect. 13.4.2) the treatment and plot factors are orthogonal but if there is a missing data point then they are not.

Balanced incomplete block designs are an example where there is non-orthogonality between the block and treatment factors but they are balanced because each pair of treatments occurs equally often within the same blocks [7].

3.7 Resolvability

A design is resolvable if its blocks (complete or incomplete) can be grouped into sets such that each treatment occurs exactly once in each set, e.g., a RCBD is resolvable (Sect. 13.4.2).

Resolvability ensures orthogonality between treatment and block factors. It is not necessary for an optimal design. However, near-optimal designs are achieved when near-resolvability is attained.

3.8 Optimality Criterion

An optimal design is selected based on a pre-determined criterion. Two common criteria, A− and D−optimality, seek to minimize a variance. A minimizes the average pair-wise variance of treatment differences whilst D minimizes the variance of treatment means. A is common in plant improvement experiments where the treatment comparisons are of equal interest [2, 3, 8, 9]. A lower value indicates greater optimality.

3.9 Model Notation

We use the notation of Wilkinson and Rogers [10] to describe the relationship between factors in treatment and plot structures. Let A and B be two factors, where their structure can be independent (A + B, main effects), interacting (A:B), crossed (A*B, a factorial) or nested (A/B, where B is nested with A). The latter two expand such that,

$$ \mathrm{A}\ast \mathrm{B}=\mathrm{A}+\mathrm{B}+\mathrm{A}:\mathrm{B},\mathrm{and} $$
$$ \mathrm{A}/\mathrm{B}=\mathrm{A}+\mathrm{A}:\mathrm{B} $$

See Piepho et al. [11] and Welham et al. [7] for further details. Note that the interaction operand “ : ” is not consistent across statistical packages.

4 Classical Designs

In this section we describe classical designs commonly used in plant improvement experiments. The number of treatment factors, their levels and structure, together with management practices and logistics influence the plot structure and subsequent experimental design. These designs differ primarily in their plot structures, whereas their treatment structures are often similar. The Design Tableau approach of Smith and Cullis [12] helps define the treatment and plot factors, the design function and the resulting treatment and plot structures. Common treatment structures are single factor, factorial and nested (Sect. 13.4.1). For each design we describe the fundamental principles, the plot structure and assume a single factor treatment structure unless otherwise stated.

To assist with reading the text this font is used for treatment and plot factors. The notation for defining factor levels follows John and Williams [13] such that there are 𝑣 treatments, 𝑟 replicates, 𝑠 blocks and 𝑘 plots within blocks.

4.1 Treatment Structures

The treatment underpins the experimental purpose and informs the experimental hypothesis. The treatment structure describes the relationship between all treatment factors and their allocation to experimental units. Three common treatment structures in PIEs are:

  • Single factor: PIEs often aim to evaluate genetic material for selection or commercialization. The genetic material can be breeding lines, hybrids or varieties, which we call Variety, for simplicity. Variety is the treatment factor and the treatment structure is simply: Variety.

  • Factorial: A factorial treatment structure is possible with two or more factors. A full factorial experiment is when all combinations of all treatment factor levels are evaluated. Partial factorial treatment structures are possible [1]. Agronomy experiments frequently employ factorial treatment structures. For example, an experiment to identify optimal seeding (Seeding) and nitrogen rates (Nitrogen) employs a factorial treatment structure written, using the crossed notation (Sect. 13.3.9), as:

    $$ \mathrm{Seeding}\ast \mathrm{Nitrogen}=\mathrm{Seeding}+\mathrm{Nitrogen}+\mathrm{Seeding}:\mathrm{Nitrogen}. $$

Factorial treatment structures have the following advantages over a series of experiments with single treatment factors:

  1. 1.

    the presence of between treatment factor interactions can be tested;

  2. 2.

    the interaction effects are non-zero then the optimal combination of treatments can be identified;

  3. 3.

    there is higher replication for the individual treatment factors.

  • Nested: Nested treatment structures are hierarchical, often due to biology. For example, selecting breeding lines often occurs within families and the treatment structure is written as:

    $$ \mathrm{Family}/\mathrm{Line}=\mathrm{Family}+\mathrm{Family}:\mathrm{Line}. $$

4.2 Plot Structures

The plot structure describes the relationship between all plot factors (e.g. blocks, columns, rows, machines) and fully defines the observational units. The design function links the treatment and plot structures. Any of the following designs can have any of the treatment structures described in Sect. 13.4.1.

4.2.1 Randomized Complete Block Designs (RCBDs)

RCBDs have the following characteristics: all experimental units (e.g., field plots) within a block are considered homogeneous, i.e. similar in all respects that affect plant growth; each block contains a complete set of treatments so that blocks are resolvable for treatments; within a block the treatments are randomly allocated to the experimental units. The plot structure is

$$ \mathrm{Block}/\mathrm{Plot}=\mathrm{Block}+\mathrm{Block}:\mathrm{Plot}, $$

where Block:Plot defines the observational units and represents the residuals (errors). The treatment structure can be any of those described in Sect. 13.4.1.

Blocks are orthogonal to treatments so that the difference between treatments is independent of blocks. Usually, these experiments have a small number of treatments and the block size is not large. RCBDs are not recommended for PIEs with more than 10 treatments because within block homogeneity cannot be assured.

4.2.2 Alpha-Lattice Designs

The aim of the alpha-design algorithm, introduced by Patterson and Williams [14], is to generate resolvable incomplete block designs for ‘any number of varieties v and block size k such that v is a multiplier of k’. This design function determines how v treatments are allocated to k plots within s blocks within r replicates whilst minimizing the concurrence of treatment pairs within a block. An alpha (0,1)-lattice design has zero or one treatment pair concurrences in a block.

Alpha-lattice designs are suitable whenever the number of treatments, v, is a multiple of the block size, k and are easily adapted when it is not. A rule of thumb is to choose a block size which is equal to or slightly smaller than the square root of the number of treatments, i.e., \( k=\sqrt{v} \).

Figure 13.1 presents an alpha-lattice design for v = 30 varieties with r = 2 replicates, s = 6 blocks within each replicate and k = 5 plots within a block arranged as 4 rows by 15 columns. The plot structure for this design is:

$$ \mathrm{Replicate}/\mathrm{Block}/\mathrm{Plot}=\mathrm{Replicate}+\mathrm{Replicate}:\mathrm{Block}+\mathrm{Replicate}:\mathrm{Block}:\mathrm{Plot}, $$

where Replicate:Block:Plot defines the observational units and represents the residuals (errors).

Fig. 13.1
figure 1

An alpha-lattice design for v = 30 varieties and r = 2 replicates, s = 6 blocks within each replicate and k = 5 plots within a block arranged as 4 rows by 15 columns, where the replicates are in the row direction. Replicates are delineated by the horizontal bold line and blocks by the dashed lines

The treatment structure contains a single factor, Variety.

4.2.3 Row-Column Designs

Heterogeneity between rows and between columns in PIEs is well known [15, 16]. Row-column designs block in both row and column directions to minimize the effect of spatial heterogeneity. They usually employ incomplete blocks – blocks that do not contain all treatments – and are resolvable when rows and/or columns are grouped together to create single replicate blocks. Piepho et al [17] provide a concise review of these designs.

Figure 13.2 presents a row-column design for v = 18 varieties with r = 3 replicates arranged as 9 rows by 6 columns. Each row and column is an incomplete block. The design is resolvable in both the row and column directions with 3 row-blocks (RowBlock) and 3 column-blocks (ColBlock). Varieties (treatments) are allocated to field plots such that there is one replicate in each row- and column-block. The plot structure for a row-column design depends on the direction of any resolvable blocks in the design and will contain row and column terms.

Fig. 13.2
figure 2

A row-column design for v = 18 varieties and 𝑟 = 3 replicates with 𝑠 = 3 blocks of size k = 6 plots arranged as 9 rows by 6 columns. Rows and columns are incomplete blocks. Dashed horizontal and bold vertical lines delineate the row (RowBlock) and column replicates (ColBlock), respectively

The plot structure for the design presented in Fig. 13.2 is:

$$ {\displaystyle \begin{array}{l}\mathrm{RowBlock}/\mathrm{Row}+\mathrm{ColBlock}/\mathrm{Column}+\mathrm{Plot}\\ {}=\mathrm{RowBlock}+\mathrm{RowBlock}:\mathrm{Row}+\mathrm{ColBlock}+\mathrm{ColBlock}:\mathrm{Column}+\mathrm{Plot},\end{array}} $$

where Plot is described by RowBlock:Row:ColBlock:Column, defines the observational units and represents the residuals (errors).

The treatment structure contains a single factor, Variety.

4.2.4 Latinized Designs

Layouts with evenly distributed treatments are desirable to minimize the event of treatment pairs occurring together and conforms with the concept of blocking to minimize residual variation. The importance of balance and evenness depends on the intended analysis model and a researcher may forego these characteristics in some situations (see Sect. 13.5).

Latinized designs extend the concept of Latin Squares (see [6, 7]) where each treatment occurs exactly once in each row and each column. The popular mind-puzzle Sudoku is an example of a latinized row-column design. The design in Fig. 13.2 is a resolvable, latinized row-column design. No variety is in the same row or column more than once.

4.2.5 Split Plot Designs

Split plot designs are utilized for factorial treatment structures (Sect. 13.4.1) where one factor is applied to main plots and a second factor is applied to sub- (or split) plots. They are advocated in the following scenarios:

  1. 1.

    There is a factorial treatment structure and the levels of one factor must be applied to large plots (e.g., irrigation, tillage, herbicide application) for practical purposes.

  2. 2.

    There is a factorial treatment structure, but the aim of the experiment is to investigate the treatment factor allocated to the sub-plots and its interaction with the main plot treatment factor; usually because the differences between the levels of the main plot treatment factor are known (e.g., irrigation).

  3. 3.

    A long-term experiment is in progress with treatments applied to the main plots. Another treatment which can be allocated to sub-plots within the main plots is of interest.

Figure 13.3 presents a split-plot experiment for evaluating the effect of nitrogen levels (0, 50, 100 kg/ha) on v = 20 varieties with r = 2 replicates and treatment structure,

$$ \mathrm{Nitrogen}\ast \mathrm{Variety}=\mathrm{Nitrogen}+\mathrm{Variety}+\mathrm{Nitrogen}:\mathrm{Variety}. $$
Fig. 13.3
figure 3

Split plot design for v = 20 varieties and 3 nitrogen treatments (0, 50, 100 kg/ha) in r = b = 2 replicates (blocks), arranged in 6 rows by 20 columns. The bold vertical line delineates between blocks. The dashed lines delineate the main plots within blocks. The nitrogen treatments are allocated to the main plots within blocks. The varieties are allocated to the plots within main plots

The layout of 6 rows by 20 columns is equally divided in the row and column directions into 6 main plots (MainPlot) (Fig. 13.3). The two replicates (Block) contain three MainPlots each, and each MainPlot contains 20 sub-plots (Plot). The plot structure is:

$$ {\displaystyle \begin{array}{l}\mathrm{Block}/\mathrm{MainPlot}/\mathrm{Plot}\\ {}=\mathrm{Block}+\mathrm{Block}:\mathrm{MainPlot}+\mathrm{Block}:\mathrm{MainPlot}:\mathrm{Plot},\end{array}} $$

where Block:MainPlot:Plot defines the observational units and represents the residuals (errors).

Note, that factors which accommodate spatial variability, such as row and column, are not included. We will extend this example in Sect. 13.5 to illustrate how model-based design can assist to minimize the effects of the expected spatial variability.

Strip-plots designs are a variation of a split-plot design. They are used when two treatment factors need to be applied to large areas, e.g., investigating the response to micronutrient combinations. Suppose there are two treatment factors (A with 𝑎 levels and B with b levels), instead of randomizing the B within A as in a split-plot, both factors are arranged in strips across the replicates. The experimental area is divided into horizontal and vertical strips (rows and columns). Each level of factor A is allocated to all the plots in a row, and the levels of B are allocated to all plots in a column. This design provides high precision on the interaction between treatments at the expense of the main effects [1].

4.2.6 Augmented Designs

Augmented designs are widely used in the design of early stage variety trials. Early stage variety trials have large treatment numbers (hundreds to thousands of lines) with minimal seed availability for replication within and across environments. Augmented designs contain a combination of replicated and unreplicated treatments [18]. The replicated treatments (a set of check varieties, say) are allocated to a classical plot structure which accounts for spatial heterogeneity and the unreplicated treatments (usually the treatments of interest, the set of breeding lines, say) augment the replicated design. Each unreplicated treatment is allocated to one (incomplete) block only while each replicated treatment appears in each block at least once. The systematic repetition of the replicated treatments enables estimation of the block effects and residual (error) variance resulting in more precise estimates of the treatment comparisons of interest.

An augmented block design for one of twenty-five trials in the preliminary yield trial (PYT) series of the durum wheat breeding program at the International Maize and Wheat Improvement Center (CIMMYT) is presented in Fig. 13.4. The PYT series evaluates 4200 breeding lines grouped into 25 sets of 120. Two checks are evaluated in each trial.

Fig. 13.4
figure 4

Augmented block design for one trial from the CIMMYT durum breeding program preliminary yield trial series. There are 120 breeding lines (labelled 1–120) and 2 check varieties (C1 and C2). The bold black lines delineate the blocks

Each trial contains 128 field plots arranged in 8 rows by 16 columns. The augmented trial is divided into equal sized blocks (4 rows by 8 columns). The two check varieties (C1 and C2) are allocated to one plot each in each block. The 120 breeding lines allocated to this trial are randomly allocated to the remaining plots (Fig. 13.4). The treatment structure, a single factor structure (Section 0), is Variety. The plot structure for the augmented block design described in Fig. 13.4 is:

$$ \mathrm{Block}/\mathrm{Plot}=\mathrm{Block}+\mathrm{Block}:\mathrm{Plot}, $$

where Block:Plot defines the observational units and represents the residuals (errors). Note this plot structure is dependent on the experimental design of the replicated checks and is specific to this example.

Traditionally, each trial in a series is analyzed separately. However, this compromises the selection decisions as the spatial variability within and between trials is large. It is advisable to analyze all trials together and model the spatial variation appropriately, following Gilmour et al. [15], for example. We describe model-based partially replicated trials, which extend these augmented grid designs in Sect. 13.5.

5 Model-Based Designs

Classical designs can constrain comparative experiments resulting in sub-optimal and costly outcomes [6, 16]. A model-based approach can generate classical designs whilst accommodating less structured design specifications such that the design is based on the intended analytical model [2,3,4]. Model-based designs uphold the fundamental design concepts described in Sect. 13.3 and can generate and enhance the classical designs described in Sect. 13.4. They can include terms for anticipated peripheral effects such as those induced by trial management practices along row and columns. Furthermore, correlated structures for the treatment and/or residual effects are easily incorporated into a model-based design. An optimal (or near-optimal) design is determined using pre-defined optimality criterion (Sect. 13.3.8).

In this section we review two statistical models frequently employed in plant improvement research: analysis of variance (ANOVA) and linear mixed models (LMMs). Next we demonstrate the application of model-based design with two examples: extension of the split plot example to include random row and column terms and introduction of a partially replicated design which models the correlation between residuals and the correlation between breeding lines simultaneously following Cullis et al. [4, 9].

5.1 Statistical Models for Plant Improvement Experiments

5.1.1 Analysis of Variance (ANOVA)

The method of ANOVA partitions experimental observations into their treatment and plot factors, enabling a test of significance to be performed for the difference between treatment means. For example, each observation from a RCBD experiment (Sect. 13.4.2) can be written as:

$$ \mathrm{observation}=\mathrm{overall}\ \mathrm{mean}+\mathrm{treatment}\ \mathrm{effect}+\mathrm{block}\ \mathrm{effect}+\mathrm{residual}. $$

This partitioning is summarized in an ANOVA table (see Welham et al. [7] for details).

The principle of least-squares, employed in ANOVA, seeks to minimize the residual sum of squares thus obtaining the best estimate of σ2. The residuals ( errors) are assumed to be independently, identically, normally distributed with mean zero and variance σ2. The treatment and block factors are fixed effects and have no distribution.

5.1.2 Linear Mixed Model

A LMM modifies the linear model of ANOVA to allow terms to be fitted as random or fixed, hence mixed. Each random term is assumed to be independent with effects sampled from a normal distribution with a common variance, called the variance component. The residual maximum likelihood (REML) method provides unbiased estimates of the variance components [19] in a LMM. It is the method implemented in LMM software such as ASReml-R [20], REML in GenStat [21] and PROC MIXED in SAS.

Identifying which terms to fit as fixed or random is non-trivial [11, 16, 22]. A sensible starting point is the randomization-based model where all plot structure factors are fitted as random and all treatment factors are fitted as fixed [6, 7]. Smith and Cullis [22] have developed an instructive tool, Design Tableau, to identify the LMM best suited to the design and analysis of an experiment.

LMMs have some significant advantages over ANOVA models. They accommodate non-orthogonality and imbalance arising from missing data or complex experimental designs. When terms are modelled with a variance, (i.e., fitted as random) recovery of inter-block information and appropriate modelling of effects representing different sources of variation (e.g., blocks, rows and/or column) are enabled. There are three characteristics of PIEs which can be accommodated in a LMM: extraneous and residual (plot-to-plot) variability, and complex variance structures between treatments.

Extraneous variation arises from management practices and is modelled by fitting row and/or column effects as random and estimating their variance components. If extraneous variation is expected, then it can be included in a model-based design.

Accurate estimation of the plot-to-plot variability (residual variation) is achieved via spatial modelling, such as the two-dimensional separable auto-regressive models of order 1 (known as AR1⊗AR1) of Gilmour et al. [15]. Spatial models assume that spatial dependence exists between plots, i.e., plots close together are more similar than plots further apart. Accommodating this dependence between field plots in the design [2, 9] is logical and particularly important in trials with minimal replication (see Sect. 13.5.2.2).

The treatments (breeding lines, varieties or hybrids) in PIEs are often related. Pedigree information, in the form of a numerator relationship matrix, A, captures the genetic similarity between treatments. Inclusion of the A matrix in the analysis enables estimation of additive and non-additive effects [23,24,25]. Using a model-based design approach it is possible to include the pedigree information using A in the design process [3, 4]. Alternatively, if marker data are available then the kinship or genomic relationship matrix can replace the A matrix.

5.2 Examples

These designs were generated using the R library ‘od’, a freely available optimal design software [26].

5.2.1 Accounting for Extraneous Variation

Consider the split plot design experiment (Sect. 13.4.2.5) within the model-based design paradigm of “design how it would be modelled”. The LMM for the analysis of this experiment, using randomization-based theory, would include the plot structure terms Block and Block:MainPlot as random effects, i.e. assign a variance to each of them, \( {\sigma}_{Block}^2 \)and \( {\sigma}_{BlockMainPlot}^2 \), say. The term, Block:MainPlot:Plot defines the observational units, i.e. the residuals, which are assumed to be normally distributed with mean zero and variance σ2. In addition, extraneous variation introduced by management practices conducted across rows and column is accounted for by including random row and column effects with variances \( {\sigma}_{Row}^2 \)and \( {\sigma}_{Column}^2 \), respectively [15]. Thus, the plot structure for this experiment is now:

$$ \mathrm{Block}+\mathrm{Block}:\mathrm{MainPlot}+\mathrm{Block}:\mathrm{MainPlot}:\mathrm{Plot}+\mathrm{Row}+\mathrm{Column}, $$

where Block:MainPlot:Plot defines the observational units and represents the residuals (errors).

The layout presented in Fig. 13.3 was generated using this model. The resulting design is latinized (Sect. 13.4.2.4) with respect to rows such that each variety occurs exactly once in each row and no variety is allocated to a column more than once. The A-optimality criterion increased slightly from 0.362 for the classical design to 0.377 for the model-based design. This is considered acceptable.

5.2.2 Partially Replicated Designs

Partially replicated (p-rep) designs are model-based designs which were introduced as an alternative to augmented grid designs (Sect. 13.4.2.6) for early stage variety trials [9]. The key principle is to replace the replicated check lines in an augmented grid design with test lines. This increases the response to selection due to an increased replication of the lines under selection. The theoretical development underpinning this design is described in Cullis et al. [9] and extended in Cullis et al. [4] to include the use of pedigree information.

A yield evaluation trial is planned for 504 breeding lines (Varieties), but the field layout is limited to 24 columns by 26 rows, 624 plots. A p-rep trial is designed where 384 varieties are allocated to one field plot and 120 allocated to two field plots, a p-rep of 24%. The trial is blocked in the row direction (Fig. 13.5). Extraneous variation in both the column and row directions is known to exist due to irrigation infrastructure and management practices. Thus, the plot factors are Block with 2 levels, Row with 26 levels, Column with 24 levels and Plots, described by Row:Column, with 624 levels.

Fig. 13.5
figure 5

Partially replicated design for v = 504 varieties in 624 plots, arranged in 26 rows by 24 columns. The bold horizontal line delineates the Blocks. Colors represent different check lines. The gray shaded plots are those allocated with 2 replicate varieties

The plot structure is:

$$ \mathrm{Block}+\mathrm{Row}+\mathrm{Column}+\mathrm{Row}:\mathrm{Column} $$

Starting values for the variance components of the peripheral random effects, Block, Row and Column were estimated from the previous year’s dataset. The term Row:Column specifies the observational units and represents the residuals. An even spread of replicated treatments was achieved using a separable spatial model with an auto-correlation model fitted in the row direction only (written AR1⊗I). Thus, extraneous and spatial variation is captured in this model.

The treatment structure is:

$$ \mathrm{Variety} $$

where Variety is fitted as a random effect and partitioned into an additive component with variance \( {\sigma}_a^2 \)and additive variance-covariance matrix A (Sect. 13.5.1.2) and a non-additive component with variance \( {\sigma}_e^2 \). Starting values for these variances were estimated from the previous year’s dataset.

The resulting design (Fig. 13.5) is resolvable with respect to replicated varieties and blocks, thus it is also near-orthogonal. The inclusion of the pedigree information, modelling of extraneous and residual variation ensures reasonable balance of varieties (treatments) across the layout and is the anticipated analysis model.

Early generation variety trials are often evaluated in multi-environment trial (MET) series (see Chap. 3). Cullis et al. [9] states, ‘𝑝-rep designs are particularly suited to this setting [MET] since there is potential to balance test line replication across trials’. Near-optimal designs are achieved by aiming for resolvability across locations and can take family, or pedigree, structures into account. It is not necessary to have equal numbers of lines, nor even equal partial-replication at all locations. This is a significant advantage over the classical design approach given the gain in accuracy for prediction of the genetic effects and selection that is achieved by using model-based design methods [4].

6 Summary

Plant improvement datasets are costly and time consuming to collect. It is crucial then, that the best statistical methods (design and analysis) be employed to ensure that the return on investment is optimized. The fundamental design principles of replication, randomization and blocking need to be understood and upheld in classical and model-based designs. Classical designs provide a rigorous, systematic structure and are important in plant improvement research. Model-based designs are flexible and tailored to the experimental purpose and constraints. Model-based design theory allows an easing of some design concepts, such as orthogonality and resolvability, whilst maintaining optimality for the experimental purpose and intended analysis.

7 Key Concepts

  • Replication, randomisation and blocking are fundamental experimental design concepts required for rigorous plant improvement experiments

  • Understanding and minimizing bias and pseudo-replication in experimental designs enhances plant improvement research outcomes

  • Classical designs primarily differ in their plot structures – which is somewhat driven by their treatment structures

  • Model-based design theory use the anticipated statistical model to generate the design

  • Classical experimental designs can be generated within the model-based paradigm

  • Model-based designs enhance plant improvement outcomes by optimising resources and flexibility accommodating logistical constraints whilst improving prediction accuracy for the treatment effects under evaluation.

8 Review Questions

  1. 1.

    Figure 13.6 presents a field layout for 2 replicates of 24 varieties arranged as 6 rows by 8 columns. Use it to answer the following questions: (a) Where are the blocks in this design?; (b) Describe the incomplete and complete blocks?; (c) Are the blocks resolvable? Why?; (d) Are treatments orthogonal to blocks?; (e) Describe an alternate layout and why this one may have been selected.

  2. 2.

    Describe the difference between a crossed and nested treatment structure. Provide an example of each found in plant improvement experiments.

  3. 3.

    A yield evaluation trial for 30 varieties with 3 replicates is planned for a field layout with 15 rows and 6 columns. There is a known fertility gradient in the row direction whilst all trial management practices take place across columns. (a) What are the treatment factor(s)? What is the treatment structure?; (b) What type of designs could be employed? What are their plot factors and structures?; (c) Another possible layout for this design was 30 rows by 3 columns. What design principles were considered in determining the final layout?

  4. 4.

    An experiment is conducted to investigate variety response to frost events. In order to maximize the opportunity for a variety to experience frost four trials were sown at different times, two weeks apart. Each trial contained two replicates. Discuss why variety differences across the four trials cannot be attributed to time of sowing only.

  5. 5.

    An early generation yield trial is planned for a location where sowing and har- vesting operations occur along columns in a serpentine pattern. Describe what sources of variation could occur and how the trial could be designed optimally to minimize this variation.

Fig. 13.6
figure 6

Field layout for v = 24 varieties and r = 2 replicates arranged as 6 rows by 8 columns