Abstract
Nearest-neighbour methods based on first differences are an approach to spatial analysis of field trials with a long history, going back to the early work by Papadakis first published in 1937. These methods are closely related to a geostatistical model that assumes spatial covariance to be a linear function of distance. Recently, P-splines have been proposed as a flexible alternative to spatial analysis of field trials. On the surface, P-splines may appear like a completely new type of method, but closer scrutiny reveals intimate ties with earlier proposals based on first differences and the linear variance model. This paper studies these relations in detail, first focussing on one-dimensional spatial models and then extending to the two-dimensional case. Two yield trial datasets serve to illustrate the methods and their equivalence relations. Parsimonious linear variance and random walk models are suggested as a good point of departure for exploring possible improvements of model fit via the flexible P-spline framework.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Plant-breeding field trials typically show considerable spatial variation. Blocked experimental designs help to capture some of that spatial trend and provide efficient treatment estimates (Edmondson 2005). If the spatial trend is irregular, however, blocking may only be partly successful. In such cases, spatial adjustment using a suitable statistical model can provide an improvement in accuracy. Designs and modelling methods developed for field trials can also be applied with advantage in greenhouses (Hartung et al. 2019), growth chambers (Lee and Rawlings 1982) and with phenotyping platforms (Brien et al. 2013; Cabrera-Bosquet et al. 2016; van Eeuwijk et al. 2019).
A variety of variance–covariance structures can be used for spatial adjustments, and such structures are readily available in mixed model packages. Most spatial variance–covariance models are nonlinear in the parameters (Stein 1999; Stroup 2002; Schabenberger and Gotway 2004). Our paper is focused on linear models and methods for spatial adjustment. Perhaps the oldest approach of this kind is nearest-neighbour adjustment (NNA) based on differences among neighbouring plots (Papadakis 1937; Wilkinson et al. 1983; Piepho et al. 2008). The first proposals involved one-dimensional adjustment, but extension to two-dimensional spatial models has been subsequently proposed (Green et al. 1985; Kempton et al. 1994). A further option is to employ smoothing splines (Lee et al. 2020). The most recent addition in a field-trial context is the use of P-splines (Eilers and Marx 1996), which has been proposed under the acronym SpATS (Spatial Analysis of field Trials with Splines) (Rodríguez-Álvarez et al. 2018), and is available as R-package on CRAN (https://cran.r-project.org/package=SpATS). Perhaps the most important common feature of these methods is that they all can be identified with a variance–covariance structure that is linear in the parameters and as such can have computational advantages compared to nonlinear structures.
Early work on spatial adjustment using NNA methods focused on second differences (Wilkinson et al. 1983; Green et al. 1985), but it was soon recognized that first differences (Besag and Kempton 1986), and the related linear variance (LV) model (Williams 1986) often provide a good fit. The recently proposed SpATS approach was introduced in terms of second differences. In this paper, we will specifically investigate how a P-spline approach can be formulated based on first differences and how this reduced model compares to the first-difference NNA method considered by Besag and Kempton (1986) and the LV model of Williams (1986). We will also make a connection with random walk (RW) models (Lee et al. 2020). A unified formulation of these different models will be provided to facilitate the comparison.
The paper is structured as follows. In Sect. 2, we will describe our variance–covariance models in detail and establish the equivalence relations for one-dimensional spatial models. In Sect. 3 the models are applied to a breeding trial with oats. Extension to two dimensions is considered in Sect. 4, followed by a second example involving wheat in Sect. 5. The paper ends with a brief Discussion in Sect. 6.
2 Three One-Dimensional Models for Spatial Correlation
Field trials with plants typically have a two-dimensional layout that is indexed by rows and columns, so spatial correlation can be considered in two spatial directions. To set the stage, we will first consider models with spatial correlation in one dimension across n plots, arranged in a single array within a replicate.
Three types of model will be considered: linear variance (LV) models (Sect. 2.1), random walk (RW) models (Sect. 2.2) and P-splines (Sect. 2.3). Our main results will be that equivalence between models can be established when a fixed effect for the replicate is accounted for. To establish these relations, it will be convenient to sweep out these fixed effects and consider the corresponding reduced models (De Hoog et al. 1990).
It will be convenient to consider a linear mixed model of the form
where y is the response vector, ordered by replicates and plots within replicates, \(\beta _{g} \) is a vector of fixed effects for genotypes with associated design matrix \(X_{g} \), \(\beta _{r}\) is a vector of fixed effects for replicates with associated design matrix \(X_{r} =I_{r} \otimes 1_{n}\), where \(\otimes \) denotes the Kronecker product and \(1_{n}\) is an n-vector of ones, n is the number of plots per replicate, u is a vector of spatially correlated plot errors, and e is a vector of independently distributed plot errors, having (nugget) variance \(\sigma ^{2}\). We take genotypic effects as fixed throughout, but our results apply equally with random genotypic effects (see Discussion). Spatial correlation will be modelled among plots in the same replicate, but plots in different replicates are modelled as independent. We note that for the linear spatial covariance models considered here (LV, RW and P-splines), the same result would be obtained if spatial correlation were modelled across replicates. It follows from results in Williams et al. (2006) that the reduced model after sweeping out the fixed replicate effect has independent replicates even if the original model assumes correlation across replicates according to a LV model (the authors show this for columns of plots but their results for column sweeps apply equally to sweeps for replicates). Spatial models differ in the assumptions they make about the covariance function for u. The variance–covariance matrix for the composite plot error, \(f=u+e\), within a replicate will be denoted as V. Thus, the variance–covariance matrix of the data takes the form
where \(I_{r} \) is the r-dimensional identity matrix, \(\otimes \) denotes the Kronecker product, and V is an \(n\times n\) variance–covariance matrix for n plots per replicate. Similarly, the spatial errors have variance–covariance structure
where \(\Lambda _{n} \) is a symmetric \(n\times n\) matrix and \(\sigma _{u}^{2} \) is a spatial variance component. Hence, we have \(V=\sigma ^{2}I_{n} +\sigma _{u}^{2} \Lambda _{n} \). It is noted that model (1) does not have incomplete blocks. Such effects can also be added with resolvable incomplete block designs, and we will do so in Sect. 3.
In what follows, we will mainly focus on the form of \(V_{n} \) for specific spatial models. Occasionally, where necessary, explicit reference will also be made to model (1).
2.1 Linear Variance Model
Piepho and Williams (2010, Eq. 3) give the one-dimensional LV matrix as
where \(M_{n} =\left( {n-1} \right) J_{n} -L_{n} \), \(J_{n} \) is the \(n\times n\) matrix of ones, \(L_{n} \) has \(\left( {i,j} \right) \)th element equal to \(\left| {i-j} \right| \) as given by Williams (1986), and \(\sigma ^{2}>0\), \(\eta >0\) and \(\phi >0\) are variance parameters. The component \(\eta J_{n} \) can be omitted in our case because it is confounded with the replicate effect and hence will cancel when sweeping out the replicate effect as is done in the next step. Thus, in relation to model (1) we have \(M_{n} =\Lambda _{n} \) and \(\phi =\sigma _{u}^{2} \). Note that all elements in \(M_{n} \) are non-negative, ensuring that all pairwise covariances in V will be non-negative. The variance matrix for the reduced model (after sweeping out the fixed replicate effect) becomes
where \(Q_{n} =I_{n} -n^{-1}J_{n} \). So from (4) and (5),
where \(\Delta _{n}^{+} =-Q_{n} L_{n} Q_{n}\) as given in Eq. (10) of Williams (1986) is the Moore–Penrose inverse of \(\Delta _{n}\), the modified second difference operator as given by Patterson in the discussion of Wilkinson et al. (1983), namely
where
is an \(\left( {n-1} \right) \times n\) matrix generating first differences. On the other hand, for incomplete blocks, Williams (1986) has Eq. (8) as \(V=I_{n} +\gamma P_\mathrm{B} -\phi F\), where \(P_\mathrm{B}\) is the block projection matrix and F is block-diagonal with components \({3L_{k_{i} } } / {\left( {k_{i}^{2} -1} \right) }\), and in the absence of incomplete blocks, the variance matrix for the reduced model becomes equivalent to the form in (6) above. Our exposition in this section is tailored to designs having several complete replicates, where n is the number of plots per replicate, but the sweep operator could also be applied with non-resolvable designs to remove the overall intercept, where n would be the total number of plots, leading to results equivalent to the ones presented in this section.
2.2 Random Walk Model
Besag and Kempton (1986) propose a model that assumes first differences to remove trend. Let \(u_{i}\) denote spatially correlated trend value for the ith plot. Then the assumption is that first differences \(u_{i} -u_{i-1} =r_{i}\) are independently distributed with \(r_{i} \sim N\left( {0,\lambda } \right) \), such that \(u_{i} =u_{i-1} +r_{i} \) defines a non-stationary RW (Lee et al. 2006, p. 233). Besag and Kempton (1986) consider the model for first differences, involving \(D_{n} u=r\sim N\left( {0,\lambda I_{n-1} } \right) \), where here u denotes the vector of spatially correlated plot errors for a single replicate. There is a need to introduce an extra (imaginary) boundary plot to start the RW and impose a border constraint. One option is \(u_{0} =0\) for an imaginary plot next to the first, leading to (Lee et al. 2020) \(u_{1} =u_{0} +r_{1}\) and hence
In relation to model (1), we have \(S_{n} =\Lambda _{n} \) and \(\lambda =\sigma _{u}^{2} \). Here, the variance increases linearly with increasing index i along the spatial axis. Alternatively, we may start the RW at the other end, imposing the constraint \(u_{n+1} =0\), yielding the RW \(u_{i-1} =u_{i} -r_{i} \), such that (Lee et al. 2020)
Now the variance decreases linearly with i. With these specifications, adding error and a general intercept for the replicate, the variance–covariance matrix of the data may be defined as either
Noting that \(2S_{n} =1_{n} c^{T}+c1_{n}^{T} -L_{n} \) with \(c^{T}=\left( {{\begin{array}{*{20}c} 1 &{} 2 &{} \ldots &{} n \\ \end{array} }} \right) \) and \(2T_{n} =1_{n} c^{T}+c1_{n}^{T} -L_{n} \) with \(c^{T}=\left( {{\begin{array}{*{20}c} n &{} {n-1} &{} \ldots &{} 1 \\ \end{array} }} \right) \), it follows that in both cases \(V^{*}=Q_{n} VQ_{n} \) takes the same form as in Eq. (6) with \(\phi =\frac{1}{2}\lambda \), implying that LV and RW are equivalent when a fixed intercept for replicates is fitted.
2.3 P-splines
P-splines were introduced by Eilers and Marx (1996) as a general combination of B-splines (De Boor 1978) of arbitrary degree and arbitrary order of difference penalty. They used a B-spline basis with equidistant knots. For example, Fig. 1 gives a first degree B-spline basis, Fig. 2 a cubical B-spline basis. An important property of B-splines is that they are locally defined, which makes them computationally attractive. For example, the first-degree B-spline in Fig. 1 is nonzero on the interval [0,2], the second one on [1,3] etc. Another important property of B-splines is that their sum is equal to one (De Boor 1978), which makes them attractive as a natural extension of design matrices. P-splines have been used in a wide range of applications, for a recent overview see Eilers et al. (2015). For the field trials described in this paper all data are observed on an equidistant grid, and a first-degree B-spline basis amounts to the use of an the identity matrix as a basis. See also Fig. 1, where we can place the plot positions at the knots 1, 2, ..., 8. Note that using the identity matrix leads to the Whittaker smoother (Whittaker 1923; Eilers 2003; Eilers et al. 2015).
To introduce general B-splines into our mixed model, the random effect u (1) would be replaced with Zu, where \(Z=I_{r} \otimes B\) and B is an \(n\times q\) matrix of a B-spline basis for q knots (see Figs. 1, 2 for examples), with \(B1_{q} =1_{n} \). This extension includes (1) as a special case, with \(B=I_{n} \) for first-degree B-splines and \(q=n\), and here we restrict attention to this case, for which \(Zu=u\). The regression coefficients u in (1) are assumed to be random with variance–covariance matrix \(var \left( u \right) =G\) with \(G=I_{r} \otimes \sigma _{u}^{2} \Lambda _{n} \) (see Eq. 3). For penalized spline (P-spline) modelling, G involves a contrast matrix \(D_{n} \), which determines the penalty term (Ruppert et al. 2003). Here, we will use the first difference penalty based on \(D_{n} \) as defined in (8).
A P-spline representation as a mixed model may be based on a spectral decomposition of the matrix \(D_{n}^{T} D_{n} =2\Delta _{n} =U \text {diag}\left( d \right) U^{T}\) with \(U^{T}U=I_{n} \) (Welham et al. 2007; Wand and Ormerod 2008), where d is the vector of eigenvalues (sorted from largest to smallest) and U is a matrix containing the corresponding eigenvectors. For first-order differences, there is one zero eigenvalue, with corresponding eigenvector \(\sqrt{\frac{1}{n}} 1_{n} \). The \(n-1\) positive eigenvalues are denoted by \(d_{z} \), where \(d_{z} \) is a \(\left( {n-1} \right) \times 1\) vector, with corresponding eigenvectors in the columns of the \(n\times \left( {n-1} \right) \) matrix \(U_{z} \). The nonzero eigenvalues are equal to \(2\left[ {1-\cos \left( {{i\pi } / n} \right) } \right] \) for \(i=1,2,\ldots ,n-1\) (Williams 1985). With these results, our mixed model for P-splines is given by Wand and Ormerod (2008) as
with \(\tilde{{Z}}=I_{r} \otimes \mathrm{{BU}}_{z} \mathrm{{diag}}\left( {d_{z}^{-1/2} } \right) \) and \(\text {var}\left( w \right) =I_{r} \otimes \sigma _{u}^{2} I_{n-1} \). The random effect term may now be re-written as \(u=\tilde{{Z}}w\), where \(u=\left[ {I_{r} \otimes U_{z} {diag}\left( {d_{z}^{-1/2} } \right) } \right] w\), such that
In (18) we have made use of the fact that the Moore-Penrose inverse of the singular matrix \(\Delta _{n} \) can be computed from its spectral decomposition by simply inverting the nonzero eigenvalues (Bronson 1989, p.193). Lee et al. (2020, Results 1 and 2) show that the covariance can be uniquely characterized by the singular form in (18), provided the linear model contains a fixed effect with design matrix \(X_{0} \), such that \(\Lambda _{n}^{+} X_{0} \). In the case at hand, a fixed intercept per replicate, i.e. \(X_{0} =1_{n} \), meets this requirement. The P-spline penalty is \({u_{j}^{T} \Lambda _{n}^{+} u_{j} } / {\sigma _{u}^{2} }={2u_{j}^{T} \Delta _{n} u_{j} } / {\sigma _{u}^{2} }={u_{j}^{T} D_{n}^{T} D_{n} u_{j} } / {\sigma _{u}^{2} }\), where \(u_{j} \) is the sub-vector of u corresponding to the jth replicate. From (18), we find that \(V=\sigma ^{2}I_{n} +\sigma _{u}^{2} \Lambda _{n} =\sigma ^{2}I_{n} +\frac{1}{2}\sigma _{u}^{2} \Delta _{n}^{+} \), which after sweeping out the replicate mean, i.e. pre- and post-multiplication with \(Q_{n} \), yields \(V^{*}=\sigma ^{2}Q_{n} +\frac{1}{2}\sigma _{u}^{2} \Delta _{n}^{+} \), showing that this kind of P-spline is equivalent to LV with \(\phi =\frac{1}{2}\sigma _{u}^{2} \) (see Eq. 6) and hence to RW.
Subsequently in this paper, when we use the plain term P-spline for simplicity, it is implied that the P-spline is of this particular kind unless stated otherwise, i.e. it has a first-degree B-spline basis, a knot placed at every plot and a first-difference penalty.
3 Application of the One-Dimensional Models
Here we use the spring oats data reported in John and Williams (1995, p.146) to investigate the one-dimensional models (Fig. 3a). The response is grain yield in kg per hectare. The design was an alpha design with 24 varieties, three replicates and six incomplete blocks of size four per replicate. The 72 plots were arranged in a single linear array. The models fitted always comprised fixed effects for varieties and replicates, but differed with regard to effects for blocks within replicates, which were assumed absent (Sect. 3.1), fitted as fixed (Sect. 3.2), or fitted as random (Sect. 3.3). If no spatial covariance is included, these models constitute a baseline that represents the randomization structure. For resolvable designs, as in this example, replicates would always be fitted as fixed effect in these models because this ensures the effect cannot drop out and because there is no inter-replicate information to be recovered and any differences between replicates can be removed in this way (John and Williams 1995). Fitting a replicate effect allows a chunk of variation to be removed orthogonal to treatments and this results in a smaller block variance component and hence a better recovery of inter-block information. For example, when replicates need to be harvested at different dates owing to logistical reasons, any date effects can be removed via the replicate effects. We note that a baseline model with an effect for incomplete blocks would normally be the randomization-based starting point of analysis, followed by an exploration whether addition of a spatial variance–covariance structure improves model fit (Piepho and Williams 2010). When block variance is low, analysis without a block effect and assuming independent plot errors would have a backing in randomization theory (Speed et al. 1985) and so provides a viable baseline in Sect. 3.1. Different spatial models were fitted, i.e. LV, RW and P-splines. When blocks were fitted, spatial covariance was generally modelled within blocks, whereas different blocks were assumed independent. All models are fitted by residual maximum likelihood assuming normality of random effects. The fit statistics reported are \(-2\log L_{\mathrm{{R}}} \), where \(L_{\mathrm{{R}}}\) is the maximized residual likelihood, and \(\mathrm{{AIC}}=-2\log L_{\mathrm{{R}}} +2p\), where p is the number of fitted variance parameters with estimates unequal to zero. All variance parameter estimates were constrained to be non-negative. All analyses were done using the GLIMMIX procedure of SAS. For P-splines, the type\(=\)pspline structure was employed.
3.1 Results for Model Without Block Effects
The fitted P-spline is shown in Fig. 3b. Fit statistics shown in Table 1 confirm the equivalence of LV, RW and P-splines. These models fit better than the baseline model, which only had an independent plot error with variance \(\sigma ^{2}\). We fitted the same models allowing the spatial covariance to extend across the whole field. As expected from theory (Williams et al. 2006), the fits were identical to those restricting spatial covariance to occur only within replicates (results not shown; full SAS code available in Supplementary Information). To illustrate the flexibility of P-splines, we also fitted a third degree B-spline with 12 segments, corresponding to knots at half the number of plots per replicates. As can be seen from Fig. 4, the predicted plot effects u for LV model and cubic P-splines are similar, as are the predicted genotype means (Fig. 5). The fit of the cubic P-spline is slightly inferior to the first-degree models in terms of AIC.
For comparison, we also fitted several nonlinear spatial models, i.e. AR(1), Gaussian, spherical and Matérn (Schabenberger and Gotway 2004), both with and without nugget variance. Results in Table 2 indicate that these nonlinear models gave fits that were comparable in AIC to the linear models in Table 1. It is noteworthy that with the AR(1) model the autocorrelation \(\theta \) converges to a value close to unity, meaning that the spatial covariance is nearly confounded with the fixed effect for replicates (Piepho et al. 2015). This, in turn, explains why the spatial variance \(\sigma _{s}^{2} \) converges to a value that is much larger than for the other spatial models, in particular the AR(1) model without nugget. The Matérn model is afflicted with a similar problem, indicated by the excessively large values for \(\sigma _{s}^{2}\) in the models with and without nugget variance.
3.2 Results for Fixed Block Effects
We here additionally fit a fixed block effect nested within replicates, meaning that inter-block information is not utilized. This analysis is presented mainly for illustration of equivalence; in practice blocks would normally be fitted as random for recovery of information. The spatial covariance is assumed to only extend to pairs of plots in the same block. The fit statistics in Table 3 again confirm the equivalence of all spatial models. In this case, spatial modelling does not provide any improvement in fit, indicating that the blocks did a good job capturing most of the spatial trend.
3.3 Results for Random Block Effects
In the previous sub-section, blocks were modelled as fixed. Here, blocks are fitted as random for recovery of inter-block information. This time, the equivalence of fits is lost as expected (Table 4). Comparison to results in Table 1 indicates that an analysis without effects for incomplete blocks, leaving the recovery of information entirely to the spatial model component, is preferable in this case. In fact, when allowing the spatial correlation to extend across the whole replicate and adding a random block effect to the model, the variance for blocks converges to zero for all spatial models, leading to the fits in Table 1. Obviously, the smooth component and the block component compete in capturing the spatial trend, and in this case the former wins.
4 Extension to Two Dimensions
Next, we will consider extensions to two dimensions, where the variance–covariance structure is additive in components that can be written as \(V_{s} \otimes V_{k} \), where \(V_{s} \) is a \(s\times s\) variance–covariance structure associated with the s columns within a replicate, \(V_{k} \) is a \(k\times k\) variance–covariance structure associated with the k rows within a replicate, and \(\otimes \) denotes the Kronecker product.
4.1 Linear Variance Model
Piepho and Williams (2010, Eqs. 3 and 10) give
So looking again at \(V^{*}\) in (2), we get (following some algebra) after sweeping out the replicates
where \(\phi _\mathrm{{C}}^{*} ={\phi }'_\mathrm{{C}} +\left( {s-1} \right) {\phi }'_\mathrm{{RC}} \) and \(\phi _\mathrm{{R}}^{*} ={\phi }'_\mathrm{{R}} +\left( {k-1} \right) {\phi }'_\mathrm{{RC}} \). We may also consider sweeping out fixed row and column effects, leading to
This form is useful in comparisons with the two-dimensional RW model considered in the next sub-section.
4.2 Random Walk Model
Consider the ’locally quadratic representation’ proposed by Besag and Higdon (1999), given by
where \(r_{ij} \sim N\left( {0,\lambda _\mathrm{{RC}} } \right) \) is a random effect associated with the ith row and jth column. If plots are ordered by columns and by rows within columns, this corresponds to the penalty \(u^{T}\Lambda _\mathrm{{RC}}^{+} u\), where \(\Lambda _\mathrm{{RC}} ={D_{n}^{T} D_{n} } / {\lambda _\mathrm{{RC}} }\) with \(D_{n} =D_{s} \otimes D_{k} \). Hence, this model is equivalent to fitting the variance–covariance matrix \(\Lambda _\mathrm{{RC}} =\lambda _\mathrm{{RC}} D_{n}^{+} D_{n}^{+T} =\frac{1}{4}\lambda _\mathrm{{RC}} \Delta _{s}^{+} \otimes \Delta _{k}^{+} \), provided we also fit fixed effects \(X_{0} =\left( {1_{sk} ,I_{s} \otimes 1_{k} ,1_{s} \otimes I_{k} } \right) \) (Lee et al. 2020). Thus, in this case we also need to fit fixed row and column effects, and then the RW model is equivalent to the LV model in Eq. (11). With resolvable designs, as considered in our examples, one also needs to fit a fixed replicate effect, and row and column effects are then nested within replicates. We may also replace the matrices \(\Delta _{s}^{+} \) and \(\Delta _{k}^{+} \) with \(S_{s}\) and \(S_{k}\) or \(T_{s} \) and \(T_{k} \) to obtain equivalent fits when the model comprises fixed effects for rows and columns.
When fitting random row and column effects, we may also impose RW models for both of these effects, amounting to variance–covariance structures \(\Lambda _\mathrm{{R}}^{+} =\frac{1}{2}\lambda _\mathrm{{R}} J_{s} \otimes \Delta _{k}^{+}\) and \(\Lambda _\mathrm{{C}}^{+} =\frac{1}{2}\lambda _\mathrm{{C}} \Delta _{s}^{+} \otimes J_{k}\), providing an interesting alternative to the LV\(\otimes \)LV model with random row and column effects. These models will not be equivalent, nor will there be an equivalence when replacing the matrices \(\Delta _{s}^{+} \) and \(\Delta _{k}^{+}\) with \(S_{s} \) and \(S_{k}\) or \(T_{s}\) and \(T_{k}\) in the RW model.
4.3 P-splines
Using the fact that in the one-dimensional case the variance–covariance for a P-spline with \(B=I_{n} \) and \(q=n\) is proportional to \(\Delta _{n}^{+} \), a natural tensor-product extension (Ruppert et al. 2003, p. 240) in two dimensions is to fit variance terms for \(J_{s} \otimes \Delta _{k}^{+} \), \(\Delta _{s}^{+} \otimes J_{k} \) and \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \), which are mutually orthogonal. For a similar decomposition into main effects and interaction effects see Verbyla et al. (2018), Wood et al. (2013) and Wood (2017, p. 233). We will have equivalence between LV, RW and P-splines when fitting fixed row and column effects. With random row and column effects that equivalence will be lost. Here, we have used first differences. If first differences were replaced with second differences, we would obtain an interaction that is similar but not fully identical to the term \(f_{u,v} \left( {u,v} \right) \) in the smooth part of the SpATS model (Rodríguez-Álvarez et al. 2018).
In SpATS, there are two options to model the smooth part. The first one is based on so-called overlapping penalties, based on a proposal laid down in detail in Rodríguez-Álvarez et al. (2015) and further extended in Rodríguez-Álvarez et al. (2019). These overlapping penalties are not directly available in linear model packages and so are not considered here. Another option is the P-spline ANOVA decomposition (Lee and Durban 2011; Rodríguez-Álvarez et al. 2018), which leads to five variance components for second differences, and three variance components for first differences as used here.
4.4 Equivalence when the Interaction Term is Omitted
When the interaction terms are dropped, i.e. \(S_{k} \otimes S_{s} \) or \(T_{k} \otimes T_{s} \) for RW, \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) for P-splines, and \(M_{k} \otimes M_{s} \) for LV, we obtain equivalence also with random row and column effects. For example, it can be shown that
5 Application of the Two-Dimensional Models
We here use the wheat trial of Gilmour et al. (1997) that was also considered in Piepho and Williams (2010; Example 2). The trial comprised three replicates, 22 rows and five columns within replicate. The models fitted always comprised fixed effects for varieties and replicates, but differed with regard to effects for rows and columns nested within replicates, which were either fitted as fixed (Sect. 5.1) or as random (Sect. 5.2). Spatial covariance was assumed among all plots in a replicate, but replicates were independent. All variance parameter estimates were constrained to be non-negative. All analyses were done using the GLIMMIX procedure of SAS. For P-splines, the specification \(\hbox {type}=\hbox {pspline}\) could not be employed. Instead, we provided \(\frac{1}{2}J_{s} \otimes \Delta _{k}^{+} \), \(\frac{1}{2}\Delta _{s}^{+} \otimes J_{k} \) and \(\frac{1}{4}\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) via \(\hbox {type}=\hbox {LIN(p)}\) structures.
5.1 Results for Fixed Row and Column Effects
The results in Table 5 confirm the equivalence of all models when fitting fixed effects for rows and columns.
5.2 Results for Random Row and Column Effects
Here we fitted spatial structures not only for the plots, but also for rows and columns within replicates as described in Sect. 4. Thus, for example, the RW model for \(T_{n} \) contained variance terms for \(J_{k} \otimes T_{s} \), \(T_{k} \otimes J_{s} \) and \(T_{k} \otimes T_{s} \). For the P-spline models, the components were \(J_{s} \otimes \Delta _{k}^{+} \), \(\Delta _{s}^{+} \otimes J_{k} \) and \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \). The LV model contained terms for \(J_{k} \otimes M_{s} \), \(M_{k} \otimes J_{s} \) and \(M_{k} \otimes M_{s} \). We also considered additive models, were the interaction term was dropped, that is, we dropped \(S_{k} \otimes S_{s} \) or \(T_{k} \otimes T_{s} \) for RW, \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) for P-splines, and \(M_{k} \otimes M_{s} \) for LV. Results in Table 6 demonstrate that the full models are not equivalent when fitting random row and column effects, but equivalence is achieved when the interaction term is dropped as expected. The nonlinear AR(1)\(\otimes \)AR(1) model with nugget yields a deviance of 2538.41 and an AIC of 2548.41 (Piepho and Williams 2010), which is slightly better than the linear models with interaction terms. When the nugget was dropped, the deviance and AIC rose to 2554.95 and 2564.95, respectively, indicating that the nugget is clearly needed. For various variations of this model applied to the same data, including purely spatial models for within-field heterogeneity that do not have randomization-based replicate effects, see Verbyla (2019).
6 Discussion
This paper has demonstrated that there are very close links between the LV model as proposed by Williams (1986), RW models as introduced by Besag and Kempton (1986) and first-degree P-splines with first difference penalty. Thus, what may seem like an entirely new methodology for field trials (i.e. P-splines) is in fact rooted in similar proposals made several decades ago.
When the model is extended to two dimensions and when random effects are fitted for rows and columns, full equivalence holds when the interaction term is excluded (Table 6) but not otherwise, and the choice between these models may make a slight difference. The complexity of these models is comparable, and so there should be little practical difference between them.
P-splines provide a very rich class of smoothing models, which hold much promise for application in large plant breeding trials. An advantage of P-splines over LV and RW models (shared by many nonlinear spatial models) is that they can also be applied when plots are not equally spaced. Also, the number of knots can be smaller than the number of plots. The recently proposed modelling approach called SpATS (Rodríguez-Álvarez et al. 2018) is a case demonstrating the great versatility of the framework. It uses second differences rather than first differences as penalties and also uses higher-degree B-splines with a limited number of knots. The main challenge with P-splines is that some choices need to be made, such as the number of segments and the degree of the B-spline basis. Rodríguez-Álvarez et al. (2018) show that the results from SpATS based on second differences (five parameters) are very similar to those from the LV model which uses first differences (three parameters). Hence it is possible that a SpATS model based on first differences could provide a simpler but still effective approach. This suggestion would be consistent with the early findings on spatial adjustment mentioned in the Introduction. Note that the performance of P-splines is deemed relatively insensitive with respect to the specific choice of penalty and spline basis, as well as the number of knots, and that there is usually some compensating effect between these components, so long as the choice provides sufficient flexibility to accommodate the underlying trends (Wood 2006, p.161).
A major computational advantage of all spatial models considered in this paper is that they are linear in the variance parameters, meaning that convergence is usually quick and stable. This is in contrast to nonlinear spatial models, where numerical problems such as lack of convergence to the maximum of the likelihood, dependence on good starting values and large numbers of iterations are common with some models (Diggle and Ribeiro 2007, p. 114; Slaets et al. 2020). In our experience convergence issues are particularly pertinent when a nugget effect, i.e. an independent error term e is fitted in addition to a spatially correlated component u (Piepho et al. 2015; Velazco et al. 2017; Rodríguez-Álvarez et al. 2018). For example, it may happen with nonlinear models that the correlation structure for u converges to either an identity matrix (I) or to a square matrix of ones (J), in which case there is confounding with either the nugget or the intercept (replicate effect), thus causing numerical problems. It is a common experience, however, that a nugget effect is frequently needed (H. D. Patterson in the discussion of Wilkinson et al. 1983; Pilarcyk 2009; Piepho and Williams 2010), so omitting this effect, and be it only to achieve easier convergence, is not advisable. As a conservative modelling approach, we suggest that a nugget should always be included. A good thing about all linear models considered in this paper is that a nugget is always included with necessity.
LV and RW models have the advantage that the inverses of the variance–covariance matrices are sparse (see “Appendix”). Figure 6 gives the nonzero elements of the mixed model coefficient matrix for the LV and P-splines model, showing that the LV one is sparser than the P-spline one. Algorithms from sparse matrix algebra can be used to find the REML estimates in an efficient way (Misztal and Perez-Enciso 1993; Smith 1995; Meyer and Smith 1996).
Our results have focused on models with fixed genotypic effects but equivalence relations hold equally when genotypes are modelled as random, which is becoming increasingly common in breeding programs (Cullis et al. 2020; Heslot and Feoktistov 2020). Note that our equivalence relations rely on a model reduction for fixed block effects, and the model reduction does not alter the estimates of any other effects in the model (De Hoog et al. 1990). This general result means that our findings apply equally to models with fixed or random genotypic effects.
We would like to re-iterate that for the linear covariance models considered in this paper, model fits are identical regardless of whether covariance is allowed to extend only within replicates or across replicates. The reason for this equivalence is that the presence of a fixed effect for replicates absorbs any correlation at the replicate boundary (Williams et al. 2006). The same equivalence relations apply with fixed incomplete block effects, but they do not hold for models with random block effects. In practice, analysis assuming random block effects will usually be preferred as this allows inter-block information to be recovered. Our empirical results suggests that the different linear covariance models provide comparable fits, and where necessary model choice may be guided by likelihood-based criteria as usual (Verbyla 2019).
The equivalence, in the presence of a fixed replicate effect, between linear covariance models with correlation extending across the whole trial vs. correlation confined within replicates does not carry over to nonlinear models such as AR(1). It may be noted, however, that LV, and hence RW and P-splines, can be seen as a first-order approximation of the most commonly used non-linear AR(1) model (Piepho and Williams 2010) and are in fact a limiting form of AR(1) as the correlation parameter approaches one; Williams et al. (2006) note that in practice, estimates of this parameter are often quite high. This was also evident in the oats data. When correlation was confined within replicates, the AR(1) autocorrelation converged to 0.9962, and then likelihood was indistinguishable from that of the linear models (Tables 1, 2 ). When the correlation is allowed to extend across the whole field, the correlation converges to the boundary of 1.0, and again the likelihood is identical to that of the linear models in Table 1.
All SAS code used to perform the analyses for the two examples is provided as Supplementary Electronic Material, along with the two published datasets.
References
Besag, J. and Higdon, D. (1999). Bayesian analysis of agricultural field trials, Journal of the Royal Statistical Society, Series B, 61, 691–746.
Besag, J. and Kempton, R. (1986). Statistical analysis of field experiments using neighbouring plots, Biometrics, 42, 231–251.
Brien, C. J., Berger, B., Rabie, H., and Tester, M. (2013). Accounting for variation in designing greenhouse experiments with special reference to greenhouses containing plants on conveyor systems, Plant Methods, 9, 5.
Bronson, R. (1989). Matrix operations. New York: McGraw-Hill.
Cabrera-Bosquet, L., Fournier, C., Brichet, N., Welcker, C., Suard, B., and Tardieu, F. (2016). High-throughput estimation of incident light, light interception and radiation-use efficiency of thousands of plants in a phenotyping platform, New Phytologist, 212, 269–81.
Cullis, B. R., Smith, A. B., Cocks, N. A., and Butler, D. G. (2020). The design of early-stage plant breeding trials using genetic relatedness, Journal of Agricultural, Biological, and Environmental Statistics. https://doi.org/10.1007/s13253-020-00403-5
De Boor, C. (1978). A practical guide to splines. New York: Springer.
De Hoog, F. R., Speed, T. P., and Williams, E. R. (1990). On a matrix identity associated with generalized least squares, Linear Algebra and its Applications, 127, 449–456.
Diggle, P., and Ribeiro, P. J. (2007). Model-based geostatistics with R. Berlin: Springer.
Edmondson, R. N. (2005). Past developments and future opportunities in the design and analysis of crop experiments, Journal of Agricultural Science, 143, 27–33.
Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties, Statistical Science, 11, 89–102.
Eilers, P. H. C. (2003). A perfect smoother, Analytical Chemistry, 75, 3631–3636.
Eilers, P. H. C., Marx, B. D., and Durban, M. (2015). Twenty years of P-splines, SORT 39 (2), 149–186.
Gilmour, A. R., Cullis, B. R., and Verbyla, A. P. (1997). Accounting for natural and extraneous variation in the analysis of field experiments, Journal of Agricultural, Biological, and Environmental Statistics, 2, 269–293.
Green, P., Jennison, C., and Seheult, A. (1985). Analysis of field experiments by least squares smoothing, Journal of the Royal Statistical Society, Series B, 47, 299–315.
Hartung, J., Wagener, J., Ruser, R., and Piepho, H. P. (2019). Is it helpful to periodically rearrange pots in a greenhouse experiment?, Plant Methods, 15, 143.
Heslot, N. and Feoktistov, V. (2020). Optimization of selective phenotyping and population design for genomic prediction, Journal of Agricultural, Biological, and Environmental Statistics. https://doi.org/10.1007/s13253-020-00415-1.
John, J. A., and Williams, E. R. (1995). Cyclic and computer generated designs. London: Chapman & Hall.
Kempton, R. A., Seraphin, J. C., and Sword, A. M. (1994). Statistical analysis of two-dimensional variation in variety yield trials, Journal of Agricultural Science Cambridge, 122, 335–342.
Lee, C. S., and Rawlings, J. O. (1982). Design of experiments in growth chambers, Crop Science, 22, 551–558.
Lee, D.-J., and Durban, M. (2011). P-spline ANOVA-type interaction models for spatio-temporal smoothing, Statistical Modelling, 11, 49–69.
Lee, W., Piepho, H. P., and Lee, Y. (2020). Resolving the ambiguity of random-effects models with singular precision matrix, Statistica Neerlandica (in revision).
Lee, Y., Nelder, J. A., and Pawitan, Y. (2006). Generalized linear models with random effects. London: Chapman & Hall/CRC.
Meyer, K., and Smith, S. P. (1996). Restricted maximum likelihood estimation for animal models using derivatives of the likelihood, Génétique Sélection and Evolution, 28, 23–49.
Misztal, I., and Perez-Enciso, M. (1993). Sparse matrix inversion for restricted maximum likelihood estimation of variance components by expectation-maximization, Journal of Dairy Science, 76, 1479–1483.
Papadakis, J. S. (1937). Méthode statistique pour des expériences sur champ, Bulletin de l’Institute d’Amélioration des Plantes á Salonique 23.
Piepho, H. P., Möhring, J., Pflugfelder, M., Hermann, W., and Williams, E. R. (2015). Problems in the parameter estimation for power and AR(1) models of spatial correlation in designed field experiments, Communications in Biometry and Crop Science, 10, 3–16.
Piepho, H. P., Richter, C., and Williams, E. R. (2008). Nearest neighbour adjustment and linear variance models in plant breeding trials, Biometrical Journal, 50, 164–189.
Piepho, H. P., and Williams, E. R. (2010). Linear variance models for plant breeding trials, Plant Breeding, 129, 1–8.
Pilarcyk, W. (2009). The extent and prevailing shape of spatial relationships in Polish variety testing trials on wheat, Plant Breeding, 138, 411–415.
Rodríguez-Álvarez, M. X., Boer, M. P., van Eeuwijk, F. A., and Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines, Spatial Statistics, 23, 52–71.
Rodríguez-Álvarez, M. X., Durban, M., Lee, D.-J., and Eilers, P. H. C. (2019). On the estimation of variance parameters in non-standard generalised linear mixed models: Application to penalised smoothing, Statistics and Computing, 29, 483–500.
Rodríguez-Álvarez, M. X., Lee, D.-J., Kneib, T., Durban, M., and Eilers, P. H. C. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm, Stat. Comput., 25, 941–957.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression. Cambridge: Cambridge University Press.
Schabenberger, O., and Gotway, C. A. (2004). Statistical methods for spatial data analysis. Boca Raton: CRC Press.
Slaets, J., Boeddinghaus, R., and Piepho, H. P. (2020). Linear mixed models and geostatistics for designed experiments in soil science - two entirely different methods or two sides of the same coin?, European Journal of Soil Sciencehttps://doi.org/10.1111/ejss.12976
Smith, S. P. (1995). Differentiation of the Cholesky algorithm, Journal of Computational and Graphical Statistics, 4, 134–147.
Speed, T. P., Williams, E. R., and Patterson, H. D. (1985). A note on the analysis of resolvable block designs, Journal of the Royal Statistical Society B, 47, 357–361.
Stein, M. L. (1999). Interpolation of spatial data: Some theory for kriging. New York: Springer.
Stroup, W. W. (2002). Power analysis based on spatial effects mixed models: A tool for comparing design and analysis strategies in the presence of spatial variability, Journal of Agricultural Biological and Environmental Statistics, 7, 491–501.
van Eeuwijk, F. A., Bustos-Korts, D., Millet, E. J., Boer, M. P., Kruijer, W., Thompson, A., Malosetti, M., Iwata, H., Quiroz, R., Kuppe, C., Muller, O., Blazakis, K. N., Yu, K., Tardieu, F., and Chapman, S. C. (2019). Modelling strategies for assessing and increasing the effectiveness of new phenotyping techniques in plant breeding, Plant Science, 282, 23–39.
Velazco, J. G., Rodríguez-Álvarez, M. X., Boer, M. P., Jordan, D. R., Eilers P. H. C., Malosetti, M., and van Eeuwijk F. A. (2017). Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model, Theoretical and Applied Genetics, 130, 1375–1392.
Verbyla, A. R. (2019). A note on model selection using information criteria for general linear models estimated using REML, Australian and New Zealand Journal of Statistics, 61, 39–50.
Verbyla, A.P., De Faveri, J., Wilkie, J.D., and Lewis, T. (2018). Tensor cubic smoothing splines in designed experiments requiring residual modelling, Journal of Agricultural, Biological, and Environmental Statistics, 23, 478–508.
Wand, M. P., and Ormerod, J. T. (2008). On semiparametric regression with O’Sullivan penalized splines, Australian and New Zealand Journal of Statistics, 50, 179–198.
Welham, S. J., Cullis, B. R., Kenward, M. G., and Thompson, R. (2007). A comparison of mixed model splines for curve fitting, Australian and New Zealand Journal of Statistics, 49, 1–23.
Whittaker, E. (1923). On a new method of graduation, Proceedings of the Edinburgh Mathematical Society, 41, 63–75.
Wilkinson, G. N., Eckert, S. R., Hancock, T. W., and Mayo, O. (1983). Nearest neighbour (NN) analysis of field experiments (with discussion), Journal of the Royal Statistical Society, Series B, 45, 151–211.
Williams, E. R. (1985). A criterion for the construction of optimal neighbour designs, Journal of the Royal Statistical Society, Series B, 47, 489–497.
Williams, E. R. (1986). A neighbour model for field experiments, Biometrika, 73, 279–287.
Williams, E. R., John, J. A., and Whitaker, D. (2006). Construction of resolvable spatial row–column designs, Biometrics, 62, 103–108.
Wood, S. N. (2006). Generalized additive models: An introduction with R. Boca Raton: Chapman & Hall/CRC.
Wood, S. N. (2017). Generalized additive models: An introduction with R. Second edition. Boca Raton: Chapman & Hall/CRC.
Wood, S. N., Scheipl, F., and Faraway, J. J. (2013). Straightforward intermediate rank product smoothing in mixed models, Statistical Computing, 23, 341–360.
Funding
Open Access funding provided by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animals rights
This research involves no human participants or animals.
Informed consent
The authors agree to transfer the copyright to Springer.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The precision matrix of the LV and RW models is of the general form
where \(\alpha \) is a constant and \(c_{n} \) is a vector of length n, with condition \(c_{n} \ne 0\). For the LV model, we have
Similarly, for the two RW models we find
and
Obviously, the precision matrices for LV and RW are sparse because of the sparsity of both \(\Delta _{n} \) and \(c_{n} \).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Boer, M.P., Piepho, HP. & Williams, E.R. Linear Variance, P-splines and Neighbour Differences for Spatial Adjustment in Field Trials: How are they Related?. JABES 25, 676–698 (2020). https://doi.org/10.1007/s13253-020-00412-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-020-00412-4