1 Introduction

Plant-breeding field trials typically show considerable spatial variation. Blocked experimental designs help to capture some of that spatial trend and provide efficient treatment estimates (Edmondson 2005). If the spatial trend is irregular, however, blocking may only be partly successful. In such cases, spatial adjustment using a suitable statistical model can provide an improvement in accuracy. Designs and modelling methods developed for field trials can also be applied with advantage in greenhouses (Hartung et al. 2019), growth chambers (Lee and Rawlings 1982) and with phenotyping platforms (Brien et al. 2013; Cabrera-Bosquet et al. 2016; van Eeuwijk et al. 2019).

A variety of variance–covariance structures can be used for spatial adjustments, and such structures are readily available in mixed model packages. Most spatial variance–covariance models are nonlinear in the parameters (Stein 1999; Stroup 2002; Schabenberger and Gotway 2004). Our paper is focused on linear models and methods for spatial adjustment. Perhaps the oldest approach of this kind is nearest-neighbour adjustment (NNA) based on differences among neighbouring plots (Papadakis 1937; Wilkinson et al. 1983; Piepho et al. 2008). The first proposals involved one-dimensional adjustment, but extension to two-dimensional spatial models has been subsequently proposed (Green et al. 1985; Kempton et al. 1994). A further option is to employ smoothing splines (Lee et al. 2020). The most recent addition in a field-trial context is the use of P-splines (Eilers and Marx 1996), which has been proposed under the acronym SpATS (Spatial Analysis of field Trials with Splines) (Rodríguez-Álvarez et al. 2018), and is available as R-package on CRAN (https://cran.r-project.org/package=SpATS). Perhaps the most important common feature of these methods is that they all can be identified with a variance–covariance structure that is linear in the parameters and as such can have computational advantages compared to nonlinear structures.

Early work on spatial adjustment using NNA methods focused on second differences (Wilkinson et al. 1983; Green et al. 1985), but it was soon recognized that first differences (Besag and Kempton 1986), and the related linear variance (LV) model (Williams 1986) often provide a good fit. The recently proposed SpATS approach was introduced in terms of second differences. In this paper, we will specifically investigate how a P-spline approach can be formulated based on first differences and how this reduced model compares to the first-difference NNA method considered by Besag and Kempton (1986) and the LV model of Williams (1986). We will also make a connection with random walk (RW) models (Lee et al. 2020). A unified formulation of these different models will be provided to facilitate the comparison.

The paper is structured as follows. In Sect. 2, we will describe our variance–covariance models in detail and establish the equivalence relations for one-dimensional spatial models. In Sect. 3 the models are applied to a breeding trial with oats. Extension to two dimensions is considered in Sect. 4, followed by a second example involving wheat in Sect. 5. The paper ends with a brief Discussion in Sect. 6.

2 Three One-Dimensional Models for Spatial Correlation

Field trials with plants typically have a two-dimensional layout that is indexed by rows and columns, so spatial correlation can be considered in two spatial directions. To set the stage, we will first consider models with spatial correlation in one dimension across n plots, arranged in a single array within a replicate.

Three types of model will be considered: linear variance (LV) models (Sect. 2.1), random walk (RW) models (Sect. 2.2) and P-splines (Sect. 2.3). Our main results will be that equivalence between models can be established when a fixed effect for the replicate is accounted for. To establish these relations, it will be convenient to sweep out these fixed effects and consider the corresponding reduced models (De Hoog et al. 1990).

It will be convenient to consider a linear mixed model of the form

$$\begin{aligned} y=X_{g} \beta _{g} +X_{r} \beta _{r} +u+e \end{aligned}$$
(1)

where y is the response vector, ordered by replicates and plots within replicates, \(\beta _{g} \) is a vector of fixed effects for genotypes with associated design matrix \(X_{g} \), \(\beta _{r}\) is a vector of fixed effects for replicates with associated design matrix \(X_{r} =I_{r} \otimes 1_{n}\), where \(\otimes \) denotes the Kronecker product and \(1_{n}\) is an n-vector of ones, n is the number of plots per replicate, u is a vector of spatially correlated plot errors, and e is a vector of independently distributed plot errors, having (nugget) variance \(\sigma ^{2}\). We take genotypic effects as fixed throughout, but our results apply equally with random genotypic effects (see Discussion). Spatial correlation will be modelled among plots in the same replicate, but plots in different replicates are modelled as independent. We note that for the linear spatial covariance models considered here (LV, RW and P-splines), the same result would be obtained if spatial correlation were modelled across replicates. It follows from results in Williams et al. (2006) that the reduced model after sweeping out the fixed replicate effect has independent replicates even if the original model assumes correlation across replicates according to a LV model (the authors show this for columns of plots but their results for column sweeps apply equally to sweeps for replicates). Spatial models differ in the assumptions they make about the covariance function for u. The variance–covariance matrix for the composite plot error, \(f=u+e\), within a replicate will be denoted as V. Thus, the variance–covariance matrix of the data takes the form

$$\begin{aligned} \text {var}~ \left( y \right) =I_{r} \otimes V \end{aligned}$$
(2)

where \(I_{r} \) is the r-dimensional identity matrix, \(\otimes \) denotes the Kronecker product, and V is an \(n\times n\) variance–covariance matrix for n plots per replicate. Similarly, the spatial errors have variance–covariance structure

$$\begin{aligned} \text {var}~\left( u \right) =G=I_{r} \otimes \Lambda _{n} \sigma _{u}^{2} \end{aligned}$$
(3)

where \(\Lambda _{n} \) is a symmetric \(n\times n\) matrix and \(\sigma _{u}^{2} \) is a spatial variance component. Hence, we have \(V=\sigma ^{2}I_{n} +\sigma _{u}^{2} \Lambda _{n} \). It is noted that model (1) does not have incomplete blocks. Such effects can also be added with resolvable incomplete block designs, and we will do so in Sect. 3.

In what follows, we will mainly focus on the form of \(V_{n} \) for specific spatial models. Occasionally, where necessary, explicit reference will also be made to model (1).

2.1 Linear Variance Model

Piepho and Williams (2010, Eq. 3) give the one-dimensional LV matrix as

$$\begin{aligned} V=\sigma ^{2}I_{n} +\eta J_{n} +\phi M_{n} \end{aligned}$$
(4)

where \(M_{n} =\left( {n-1} \right) J_{n} -L_{n} \), \(J_{n} \) is the \(n\times n\) matrix of ones, \(L_{n} \) has \(\left( {i,j} \right) \)th element equal to \(\left| {i-j} \right| \) as given by Williams (1986), and \(\sigma ^{2}>0\), \(\eta >0\) and \(\phi >0\) are variance parameters. The component \(\eta J_{n} \) can be omitted in our case because it is confounded with the replicate effect and hence will cancel when sweeping out the replicate effect as is done in the next step. Thus, in relation to model (1) we have \(M_{n} =\Lambda _{n} \) and \(\phi =\sigma _{u}^{2} \). Note that all elements in \(M_{n} \) are non-negative, ensuring that all pairwise covariances in V will be non-negative. The variance matrix for the reduced model (after sweeping out the fixed replicate effect) becomes

$$\begin{aligned} V^{*}=Q_{n} VQ_{n} \end{aligned}$$
(5)

where \(Q_{n} =I_{n} -n^{-1}J_{n} \). So from (4) and (5),

$$\begin{aligned} V^{*}=\sigma ^{2}Q_{n} +\phi \Delta _{n}^{+} \end{aligned}$$
(6)

where \(\Delta _{n}^{+} =-Q_{n} L_{n} Q_{n}\) as given in Eq. (10) of Williams (1986) is the Moore–Penrose inverse of \(\Delta _{n}\), the modified second difference operator as given by Patterson in the discussion of Wilkinson et al. (1983), namely

$$\begin{aligned} \Delta _{n} =\frac{1}{2}\left( {{\begin{array}{*{20}c} 1 &{} {-1} &{} 0 &{} \cdots &{} \cdots &{} 0 \\ {-1} &{} 2 &{} {-1} &{} &{} &{} \vdots \\ 0 &{} {-1} &{} 2 &{} {-1} &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} 0 \\ \vdots &{} &{} \ddots &{} {-1} &{} 2 &{} {-1} \\ 0 &{} \cdots &{} \cdots &{} 0 &{} {-1} &{} 1 \\ \end{array} }} \right) =\frac{1}{2}D_{n}^{T} D_{n} \end{aligned}$$
(7)

where

$$\begin{aligned} D_{n} =\left( {{\begin{array}{*{20}c} 1 &{} {-1} &{} 0 &{} \cdots &{} \cdots &{} 0 \\ 0 &{} 1 &{} {-1} &{} &{} &{} \vdots \\ 0 &{} 0 &{} 1 &{} {-1} &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} 0 \\ \vdots &{} &{} \ddots &{} 1 &{} {-1} &{} 0 \\ 0 &{} \cdots &{} \cdots &{} 0 &{} 1 &{} {-1} \\ \end{array} }} \right) \end{aligned}$$
(8)

is an \(\left( {n-1} \right) \times n\) matrix generating first differences. On the other hand, for incomplete blocks, Williams (1986) has Eq. (8) as \(V=I_{n} +\gamma P_\mathrm{B} -\phi F\), where \(P_\mathrm{B}\) is the block projection matrix and F is block-diagonal with components \({3L_{k_{i} } } / {\left( {k_{i}^{2} -1} \right) }\), and in the absence of incomplete blocks, the variance matrix for the reduced model becomes equivalent to the form in (6) above. Our exposition in this section is tailored to designs having several complete replicates, where n is the number of plots per replicate, but the sweep operator could also be applied with non-resolvable designs to remove the overall intercept, where n would be the total number of plots, leading to results equivalent to the ones presented in this section.

2.2 Random Walk Model

Besag and Kempton (1986) propose a model that assumes first differences to remove trend. Let \(u_{i}\) denote spatially correlated trend value for the ith plot. Then the assumption is that first differences \(u_{i} -u_{i-1} =r_{i}\) are independently distributed with \(r_{i} \sim N\left( {0,\lambda } \right) \), such that \(u_{i} =u_{i-1} +r_{i} \) defines a non-stationary RW (Lee et al. 2006, p. 233). Besag and Kempton (1986) consider the model for first differences, involving \(D_{n} u=r\sim N\left( {0,\lambda I_{n-1} } \right) \), where here u denotes the vector of spatially correlated plot errors for a single replicate. There is a need to introduce an extra (imaginary) boundary plot to start the RW and impose a border constraint. One option is \(u_{0} =0\) for an imaginary plot next to the first, leading to (Lee et al. 2020) \(u_{1} =u_{0} +r_{1}\) and hence

$$\begin{aligned} u_{i}= & {} r_{1} +\cdots +r_{i} \hbox { for }i=1,\ldots ,n \hbox { and } \end{aligned}$$
(9)
$$\begin{aligned} \text {var}\left( u \right)= & {} \lambda S_{n} \hbox {, where} \end{aligned}$$
(10)
$$\begin{aligned} S_{n}= & {} \left( {{\begin{array}{*{20}c} 1 &{} 1 &{} 1 &{} \cdots &{} \cdots &{} 1 \\ 1 &{} 2 &{} 2 &{} &{} &{} \vdots \\ 1 &{} 2 &{} \ddots &{} \ddots &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} {n-2} \\ \vdots &{} &{} &{} &{} {n-1} &{} {n-1} \\ 1 &{} \cdots &{} \cdots &{} {n-2} &{} {n-1} &{} n \\ \end{array} }} \right) \end{aligned}$$
(11)

In relation to model (1), we have \(S_{n} =\Lambda _{n} \) and \(\lambda =\sigma _{u}^{2} \). Here, the variance increases linearly with increasing index i along the spatial axis. Alternatively, we may start the RW at the other end, imposing the constraint \(u_{n+1} =0\), yielding the RW \(u_{i-1} =u_{i} -r_{i} \), such that (Lee et al. 2020)

$$\begin{aligned} u_{i}= & {} -r_{n} -\cdots -r_{i} \hbox { and } \end{aligned}$$
(12)
$$\begin{aligned} \text {var} \left( u \right)= & {} \lambda T_{n} \hbox {, where} \end{aligned}$$
(13)
$$\begin{aligned} T_{n}= & {} \left( {{\begin{array}{*{20}c} n &{} {n-1} &{} {n-2} &{} \cdots &{} \cdots &{} 1 \\ {n-1} &{} {n-1} &{} {n-2} &{} &{} &{} \vdots \\ {n-2} &{} {n-2} &{} \ddots &{} \ddots &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} 1 \\ \vdots &{} &{} &{} &{} 2 &{} 1 \\ 1 &{} \cdots &{} \cdots &{} 1 &{} 1 &{} 1 \\ \end{array} }} \right) \end{aligned}$$
(14)

Now the variance decreases linearly with i. With these specifications, adding error and a general intercept for the replicate, the variance–covariance matrix of the data may be defined as either

$$\begin{aligned} V= & {} \sigma ^{2}I_{n} +\eta J_{n} +\lambda S_{n} \hbox { or } \end{aligned}$$
(15)
$$\begin{aligned} V= & {} \sigma ^{2}I_{n} +\eta J_{n} +\lambda T_{n} \end{aligned}$$
(16)

Noting that \(2S_{n} =1_{n} c^{T}+c1_{n}^{T} -L_{n} \) with \(c^{T}=\left( {{\begin{array}{*{20}c} 1 &{} 2 &{} \ldots &{} n \\ \end{array} }} \right) \) and \(2T_{n} =1_{n} c^{T}+c1_{n}^{T} -L_{n} \) with \(c^{T}=\left( {{\begin{array}{*{20}c} n &{} {n-1} &{} \ldots &{} 1 \\ \end{array} }} \right) \), it follows that in both cases \(V^{*}=Q_{n} VQ_{n} \) takes the same form as in Eq. (6) with \(\phi =\frac{1}{2}\lambda \), implying that LV and RW are equivalent when a fixed intercept for replicates is fitted.

2.3 P-splines

P-splines were introduced by Eilers and Marx (1996) as a general combination of B-splines (De Boor 1978) of arbitrary degree and arbitrary order of difference penalty. They used a B-spline basis with equidistant knots. For example, Fig. 1 gives a first degree B-spline basis, Fig. 2 a cubical B-spline basis. An important property of B-splines is that they are locally defined, which makes them computationally attractive. For example, the first-degree B-spline in Fig. 1 is nonzero on the interval [0,2], the second one on [1,3] etc. Another important property of B-splines is that their sum is equal to one (De Boor 1978), which makes them attractive as a natural extension of design matrices. P-splines have been used in a wide range of applications, for a recent overview see Eilers et al. (2015). For the field trials described in this paper all data are observed on an equidistant grid, and a first-degree B-spline basis amounts to the use of an the identity matrix as a basis. See also Fig. 1, where we can place the plot positions at the knots 1, 2, ..., 8. Note that using the identity matrix leads to the Whittaker smoother (Whittaker 1923; Eilers 2003; Eilers et al. 2015).

Fig. 1
figure 1

Example of a first-degree B-spline basis for a continuous coordinate x

Fig. 2
figure 2

Cubical B-spline basis for a continuous coordinate x

To introduce general B-splines into our mixed model, the random effect u (1) would be replaced with Zu, where \(Z=I_{r} \otimes B\) and B is an \(n\times q\) matrix of a B-spline basis for q knots (see Figs. 1, 2 for examples), with \(B1_{q} =1_{n} \). This extension includes (1) as a special case, with \(B=I_{n} \) for first-degree B-splines and \(q=n\), and here we restrict attention to this case, for which \(Zu=u\). The regression coefficients u in (1) are assumed to be random with variance–covariance matrix \(var \left( u \right) =G\) with \(G=I_{r} \otimes \sigma _{u}^{2} \Lambda _{n} \) (see Eq. 3). For penalized spline (P-spline) modelling, G involves a contrast matrix \(D_{n} \), which determines the penalty term (Ruppert et al. 2003). Here, we will use the first difference penalty based on \(D_{n} \) as defined in (8).

A P-spline representation as a mixed model may be based on a spectral decomposition of the matrix \(D_{n}^{T} D_{n} =2\Delta _{n} =U \text {diag}\left( d \right) U^{T}\) with \(U^{T}U=I_{n} \) (Welham et al. 2007; Wand and Ormerod 2008), where d is the vector of eigenvalues (sorted from largest to smallest) and U is a matrix containing the corresponding eigenvectors. For first-order differences, there is one zero eigenvalue, with corresponding eigenvector \(\sqrt{\frac{1}{n}} 1_{n} \). The \(n-1\) positive eigenvalues are denoted by \(d_{z} \), where \(d_{z} \) is a \(\left( {n-1} \right) \times 1\) vector, with corresponding eigenvectors in the columns of the \(n\times \left( {n-1} \right) \) matrix \(U_{z} \). The nonzero eigenvalues are equal to \(2\left[ {1-\cos \left( {{i\pi } / n} \right) } \right] \) for \(i=1,2,\ldots ,n-1\) (Williams 1985). With these results, our mixed model for P-splines is given by Wand and Ormerod (2008) as

$$\begin{aligned} y=X_{g} \beta _{g} +X_{r} \beta _{r} +\tilde{{Z}}w+e \end{aligned}$$
(17)

with \(\tilde{{Z}}=I_{r} \otimes \mathrm{{BU}}_{z} \mathrm{{diag}}\left( {d_{z}^{-1/2} } \right) \) and \(\text {var}\left( w \right) =I_{r} \otimes \sigma _{u}^{2} I_{n-1} \). The random effect term may now be re-written as \(u=\tilde{{Z}}w\), where \(u=\left[ {I_{r} \otimes U_{z} {diag}\left( {d_{z}^{-1/2} } \right) } \right] w\), such that

$$\begin{aligned} \Lambda _{n} =U_{z} \text {diag}\left( {d_{z}^{-1} } \right) U_{z}^{T} =\frac{1}{2}\Delta _{n}^{+} \end{aligned}$$
(18)

In (18) we have made use of the fact that the Moore-Penrose inverse of the singular matrix \(\Delta _{n} \) can be computed from its spectral decomposition by simply inverting the nonzero eigenvalues (Bronson 1989, p.193). Lee et al. (2020, Results 1 and 2) show that the covariance can be uniquely characterized by the singular form in (18), provided the linear model contains a fixed effect with design matrix \(X_{0} \), such that \(\Lambda _{n}^{+} X_{0} \). In the case at hand, a fixed intercept per replicate, i.e. \(X_{0} =1_{n} \), meets this requirement. The P-spline penalty is \({u_{j}^{T} \Lambda _{n}^{+} u_{j} } / {\sigma _{u}^{2} }={2u_{j}^{T} \Delta _{n} u_{j} } / {\sigma _{u}^{2} }={u_{j}^{T} D_{n}^{T} D_{n} u_{j} } / {\sigma _{u}^{2} }\), where \(u_{j} \) is the sub-vector of u corresponding to the jth replicate. From (18), we find that \(V=\sigma ^{2}I_{n} +\sigma _{u}^{2} \Lambda _{n} =\sigma ^{2}I_{n} +\frac{1}{2}\sigma _{u}^{2} \Delta _{n}^{+} \), which after sweeping out the replicate mean, i.e. pre- and post-multiplication with \(Q_{n} \), yields \(V^{*}=\sigma ^{2}Q_{n} +\frac{1}{2}\sigma _{u}^{2} \Delta _{n}^{+} \), showing that this kind of P-spline is equivalent to LV with \(\phi =\frac{1}{2}\sigma _{u}^{2} \) (see Eq. 6) and hence to RW.

Subsequently in this paper, when we use the plain term P-spline for simplicity, it is implied that the P-spline is of this particular kind unless stated otherwise, i.e. it has a first-degree B-spline basis, a knot placed at every plot and a first-difference penalty.

3 Application of the One-Dimensional Models

Here we use the spring oats data reported in John and Williams (1995, p.146) to investigate the one-dimensional models (Fig. 3a). The response is grain yield in kg per hectare. The design was an alpha design with 24 varieties, three replicates and six incomplete blocks of size four per replicate. The 72 plots were arranged in a single linear array. The models fitted always comprised fixed effects for varieties and replicates, but differed with regard to effects for blocks within replicates, which were assumed absent (Sect. 3.1), fitted as fixed (Sect. 3.2), or fitted as random (Sect. 3.3). If no spatial covariance is included, these models constitute a baseline that represents the randomization structure. For resolvable designs, as in this example, replicates would always be fitted as fixed effect in these models because this ensures the effect cannot drop out and because there is no inter-replicate information to be recovered and any differences between replicates can be removed in this way (John and Williams 1995). Fitting a replicate effect allows a chunk of variation to be removed orthogonal to treatments and this results in a smaller block variance component and hence a better recovery of inter-block information. For example, when replicates need to be harvested at different dates owing to logistical reasons, any date effects can be removed via the replicate effects. We note that a baseline model with an effect for incomplete blocks would normally be the randomization-based starting point of analysis, followed by an exploration whether addition of a spatial variance–covariance structure improves model fit (Piepho and Williams 2010). When block variance is low, analysis without a block effect and assuming independent plot errors would have a backing in randomization theory (Speed et al. 1985) and so provides a viable baseline in Sect. 3.1. Different spatial models were fitted, i.e. LV, RW and P-splines. When blocks were fitted, spatial covariance was generally modelled within blocks, whereas different blocks were assumed independent. All models are fitted by residual maximum likelihood assuming normality of random effects. The fit statistics reported are \(-2\log L_{\mathrm{{R}}} \), where \(L_{\mathrm{{R}}}\) is the maximized residual likelihood, and \(\mathrm{{AIC}}=-2\log L_{\mathrm{{R}}} +2p\), where p is the number of fitted variance parameters with estimates unequal to zero. All variance parameter estimates were constrained to be non-negative. All analyses were done using the GLIMMIX procedure of SAS. For P-splines, the type\(=\)pspline structure was employed.

3.1 Results for Model Without Block Effects

The fitted P-spline is shown in Fig. 3b. Fit statistics shown in Table 1 confirm the equivalence of LV, RW and P-splines. These models fit better than the baseline model, which only had an independent plot error with variance \(\sigma ^{2}\). We fitted the same models allowing the spatial covariance to extend across the whole field. As expected from theory (Williams et al. 2006), the fits were identical to those restricting spatial covariance to occur only within replicates (results not shown; full SAS code available in Supplementary Information). To illustrate the flexibility of P-splines, we also fitted a third degree B-spline with 12 segments, corresponding to knots at half the number of plots per replicates. As can be seen from Fig. 4, the predicted plot effects u for LV model and cubic P-splines are similar, as are the predicted genotype means (Fig. 5). The fit of the cubic P-spline is slightly inferior to the first-degree models in terms of AIC.

Fig. 3
figure 3

a Raw data of the oats data set. The dashed red horizontal lines are the means of the replicates, and are spatial effects. b The black points show the spatial trend for the P-spline model, modelled as replicate effect plus correlated plot effects within replicates. The dashed red lines are the replicate effects

Fig. 4
figure 4

Comparison of plot effects u for the oats data. The red points are the estimates for the LV model and equivalent to first-order P-splines. The black curve is based on P-splines, using a third degree B-splines with 12 segments, so half the number of plots per replicates

Fig. 5
figure 5

Comparison of estimated genotypic effects for the oats data, for LV model (equivalent to RW models and first degree P-splines) and cubic P-splines using 12 segments per replicate (24 plots per replicate)

For comparison, we also fitted several nonlinear spatial models, i.e. AR(1), Gaussian, spherical and Matérn (Schabenberger and Gotway 2004), both with and without nugget variance. Results in Table 2 indicate that these nonlinear models gave fits that were comparable in AIC to the linear models in Table 1. It is noteworthy that with the AR(1) model the autocorrelation \(\theta \) converges to a value close to unity, meaning that the spatial covariance is nearly confounded with the fixed effect for replicates (Piepho et al. 2015). This, in turn, explains why the spatial variance \(\sigma _{s}^{2} \) converges to a value that is much larger than for the other spatial models, in particular the AR(1) model without nugget. The Matérn model is afflicted with a similar problem, indicated by the excessively large values for \(\sigma _{s}^{2}\) in the models with and without nugget variance.

Table 1 Fit statistics for one-dimensional LV, RW and P-spline models fitted to spring oats data in John and Williams (1995, p. 146) when no effects are fitted for blocks within replicates
Table 2 Fit statistics for one-dimensional nonlinear spatial covariance models fitted to spring oats data in John and Williams (1995, p. 146) when no effects are fitted for blocks within replicates.

3.2 Results for Fixed Block Effects

We here additionally fit a fixed block effect nested within replicates, meaning that inter-block information is not utilized. This analysis is presented mainly for illustration of equivalence; in practice blocks would normally be fitted as random for recovery of information. The spatial covariance is assumed to only extend to pairs of plots in the same block. The fit statistics in Table 3 again confirm the equivalence of all spatial models. In this case, spatial modelling does not provide any improvement in fit, indicating that the blocks did a good job capturing most of the spatial trend.

Table 3 Fit statistics for one-dimensional LV, RW and P-spline models fitted to spring oats data in John and Williams (1995, p. 146) when fixed effects are fitted for blocks within replicates

3.3 Results for Random Block Effects

Table 4 Fit statistics for one-dimensional LV, RW and P-spline models fitted to spring oats data in John and Williams (1995, p. 146) when random effects are fitted for blocks within replicates

In the previous sub-section, blocks were modelled as fixed. Here, blocks are fitted as random for recovery of inter-block information. This time, the equivalence of fits is lost as expected (Table 4). Comparison to results in Table 1 indicates that an analysis without effects for incomplete blocks, leaving the recovery of information entirely to the spatial model component, is preferable in this case. In fact, when allowing the spatial correlation to extend across the whole replicate and adding a random block effect to the model, the variance for blocks converges to zero for all spatial models, leading to the fits in Table 1. Obviously, the smooth component and the block component compete in capturing the spatial trend, and in this case the former wins.

4 Extension to Two Dimensions

Next, we will consider extensions to two dimensions, where the variance–covariance structure is additive in components that can be written as \(V_{s} \otimes V_{k} \), where \(V_{s} \) is a \(s\times s\) variance–covariance structure associated with the s columns within a replicate, \(V_{k} \) is a \(k\times k\) variance–covariance structure associated with the k rows within a replicate, and \(\otimes \) denotes the Kronecker product.

4.1 Linear Variance Model

Piepho and Williams (2010, Eqs. 3 and 10) give

$$\begin{aligned} V=\sigma ^{2}I_{n} +\kappa J_{s} \otimes J_{k} +{\phi }'_\mathrm{{C}} J_{s} \otimes M_{k} +{\phi }'_\mathrm{{R}} M_{s} \otimes J_{k} +{\phi }'_\mathrm{{RC}} M_{s} \otimes M_{k} \end{aligned}$$
(19)

So looking again at \(V^{*}\) in (2), we get (following some algebra) after sweeping out the replicates

$$\begin{aligned} V^{*}=Q_{n} VQ_{n} =\sigma ^{2}Q_{n} +\phi _\mathrm{{C}}^{*} J_{s} \otimes \Delta _{k}^{+} +\phi _\mathrm{{R}}^{*} \Delta _{s}^{+} \otimes J_{k} +{\phi }'_\mathrm{{RC}} Q_{n} \left( {L_{s} \otimes L_{k} } \right) Q_{n} \end{aligned}$$
(20)

where \(\phi _\mathrm{{C}}^{*} ={\phi }'_\mathrm{{C}} +\left( {s-1} \right) {\phi }'_\mathrm{{RC}} \) and \(\phi _\mathrm{{R}}^{*} ={\phi }'_\mathrm{{R}} +\left( {k-1} \right) {\phi }'_\mathrm{{RC}} \). We may also consider sweeping out fixed row and column effects, leading to

$$\begin{aligned} V^{**}=\left( {Q_{s} \otimes Q_{k} } \right) V\left( {Q_{s} \otimes Q_{k} } \right) =\sigma ^{2}Q_{s} \otimes Q_{k} +{\phi }'_\mathrm{{RC}} \Delta _{s}^{+} \otimes \Delta _{k}^{+} \end{aligned}$$
(21)

This form is useful in comparisons with the two-dimensional RW model considered in the next sub-section.

4.2 Random Walk Model

Consider the ’locally quadratic representation’ proposed by Besag and Higdon (1999), given by

$$\begin{aligned} u_{ij} -u_{i-1,j} -u_{i,j-1} +u_{i-1,j-1} =r_{ij} \end{aligned}$$
(22)

where \(r_{ij} \sim N\left( {0,\lambda _\mathrm{{RC}} } \right) \) is a random effect associated with the ith row and jth column. If plots are ordered by columns and by rows within columns, this corresponds to the penalty \(u^{T}\Lambda _\mathrm{{RC}}^{+} u\), where \(\Lambda _\mathrm{{RC}} ={D_{n}^{T} D_{n} } / {\lambda _\mathrm{{RC}} }\) with \(D_{n} =D_{s} \otimes D_{k} \). Hence, this model is equivalent to fitting the variance–covariance matrix \(\Lambda _\mathrm{{RC}} =\lambda _\mathrm{{RC}} D_{n}^{+} D_{n}^{+T} =\frac{1}{4}\lambda _\mathrm{{RC}} \Delta _{s}^{+} \otimes \Delta _{k}^{+} \), provided we also fit fixed effects \(X_{0} =\left( {1_{sk} ,I_{s} \otimes 1_{k} ,1_{s} \otimes I_{k} } \right) \) (Lee et al. 2020). Thus, in this case we also need to fit fixed row and column effects, and then the RW model is equivalent to the LV model in Eq. (11). With resolvable designs, as considered in our examples, one also needs to fit a fixed replicate effect, and row and column effects are then nested within replicates. We may also replace the matrices \(\Delta _{s}^{+} \) and \(\Delta _{k}^{+} \) with \(S_{s}\) and \(S_{k}\) or \(T_{s} \) and \(T_{k} \) to obtain equivalent fits when the model comprises fixed effects for rows and columns.

When fitting random row and column effects, we may also impose RW models for both of these effects, amounting to variance–covariance structures \(\Lambda _\mathrm{{R}}^{+} =\frac{1}{2}\lambda _\mathrm{{R}} J_{s} \otimes \Delta _{k}^{+}\) and \(\Lambda _\mathrm{{C}}^{+} =\frac{1}{2}\lambda _\mathrm{{C}} \Delta _{s}^{+} \otimes J_{k}\), providing an interesting alternative to the LV\(\otimes \)LV model with random row and column effects. These models will not be equivalent, nor will there be an equivalence when replacing the matrices \(\Delta _{s}^{+} \) and \(\Delta _{k}^{+}\) with \(S_{s} \) and \(S_{k}\) or \(T_{s}\) and \(T_{k}\) in the RW model.

4.3 P-splines

Using the fact that in the one-dimensional case the variance–covariance for a P-spline with \(B=I_{n} \) and \(q=n\) is proportional to \(\Delta _{n}^{+} \), a natural tensor-product extension (Ruppert et al. 2003, p. 240) in two dimensions is to fit variance terms for \(J_{s} \otimes \Delta _{k}^{+} \), \(\Delta _{s}^{+} \otimes J_{k} \) and \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \), which are mutually orthogonal. For a similar decomposition into main effects and interaction effects see Verbyla et al. (2018), Wood et al. (2013) and Wood (2017, p. 233). We will have equivalence between LV, RW and P-splines when fitting fixed row and column effects. With random row and column effects that equivalence will be lost. Here, we have used first differences. If first differences were replaced with second differences, we would obtain an interaction that is similar but not fully identical to the term \(f_{u,v} \left( {u,v} \right) \) in the smooth part of the SpATS model (Rodríguez-Álvarez et al. 2018).

In SpATS, there are two options to model the smooth part. The first one is based on so-called overlapping penalties, based on a proposal laid down in detail in Rodríguez-Álvarez et al. (2015) and further extended in Rodríguez-Álvarez et al. (2019). These overlapping penalties are not directly available in linear model packages and so are not considered here. Another option is the P-spline ANOVA decomposition (Lee and Durban 2011; Rodríguez-Álvarez et al. 2018), which leads to five variance components for second differences, and three variance components for first differences as used here.

4.4 Equivalence when the Interaction Term is Omitted

When the interaction terms are dropped, i.e. \(S_{k} \otimes S_{s} \) or \(T_{k} \otimes T_{s} \) for RW, \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) for P-splines, and \(M_{k} \otimes M_{s} \) for LV, we obtain equivalence also with random row and column effects. For example, it can be shown that

$$\begin{aligned}&Q_{n} \left( {J_{k} \otimes M_{s} } \right) Q_{n}\nonumber \\&=2Q_{n} \left( {J_{k} \otimes S_{s} } \right) Q_{n} =2Q_{n} \left( {J_{k} \otimes T_{s} } \right) Q_{n} =Q_{n} \left( {J_{k} \otimes \Delta _{s}^{+} } \right) Q_{n} =J_{k} \otimes \Delta _{s}^{+} \quad \end{aligned}$$
(23)

5 Application of the Two-Dimensional Models

We here use the wheat trial of Gilmour et al. (1997) that was also considered in Piepho and Williams (2010; Example 2). The trial comprised three replicates, 22 rows and five columns within replicate. The models fitted always comprised fixed effects for varieties and replicates, but differed with regard to effects for rows and columns nested within replicates, which were either fitted as fixed (Sect. 5.1) or as random (Sect. 5.2). Spatial covariance was assumed among all plots in a replicate, but replicates were independent. All variance parameter estimates were constrained to be non-negative. All analyses were done using the GLIMMIX procedure of SAS. For P-splines, the specification \(\hbox {type}=\hbox {pspline}\) could not be employed. Instead, we provided \(\frac{1}{2}J_{s} \otimes \Delta _{k}^{+} \), \(\frac{1}{2}\Delta _{s}^{+} \otimes J_{k} \) and \(\frac{1}{4}\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) via \(\hbox {type}=\hbox {LIN(p)}\) structures.

5.1 Results for Fixed Row and Column Effects

The results in Table 5 confirm the equivalence of all models when fitting fixed effects for rows and columns.

Table 5 Fit statistics for two-dimensional LV, RW and P-spline models fitted to wheat data of Gilmour et al. (1997) when fixed effects are fitted for rows and columns within replicates

5.2 Results for Random Row and Column Effects

Here we fitted spatial structures not only for the plots, but also for rows and columns within replicates as described in Sect. 4. Thus, for example, the RW model for \(T_{n} \) contained variance terms for \(J_{k} \otimes T_{s} \), \(T_{k} \otimes J_{s} \) and \(T_{k} \otimes T_{s} \). For the P-spline models, the components were \(J_{s} \otimes \Delta _{k}^{+} \), \(\Delta _{s}^{+} \otimes J_{k} \) and \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \). The LV model contained terms for \(J_{k} \otimes M_{s} \), \(M_{k} \otimes J_{s} \) and \(M_{k} \otimes M_{s} \). We also considered additive models, were the interaction term was dropped, that is, we dropped \(S_{k} \otimes S_{s} \) or \(T_{k} \otimes T_{s} \) for RW, \(\Delta _{s}^{+} \otimes \Delta _{k}^{+} \) for P-splines, and \(M_{k} \otimes M_{s} \) for LV. Results in Table 6 demonstrate that the full models are not equivalent when fitting random row and column effects, but equivalence is achieved when the interaction term is dropped as expected. The nonlinear AR(1)\(\otimes \)AR(1) model with nugget yields a deviance of 2538.41 and an AIC of 2548.41 (Piepho and Williams 2010), which is slightly better than the linear models with interaction terms. When the nugget was dropped, the deviance and AIC rose to 2554.95 and 2564.95, respectively, indicating that the nugget is clearly needed. For various variations of this model applied to the same data, including purely spatial models for within-field heterogeneity that do not have randomization-based replicate effects, see Verbyla (2019).

Table 6 Fit statistics for two-dimensional LV, RW and P-spline models fitted to wheat data of Gilmour et al. (1997) when random effects are fitted for rows and columns within replicates

6 Discussion

This paper has demonstrated that there are very close links between the LV model as proposed by Williams (1986), RW models as introduced by Besag and Kempton (1986) and first-degree P-splines with first difference penalty. Thus, what may seem like an entirely new methodology for field trials (i.e. P-splines) is in fact rooted in similar proposals made several decades ago.

When the model is extended to two dimensions and when random effects are fitted for rows and columns, full equivalence holds when the interaction term is excluded (Table 6) but not otherwise, and the choice between these models may make a slight difference. The complexity of these models is comparable, and so there should be little practical difference between them.

P-splines provide a very rich class of smoothing models, which hold much promise for application in large plant breeding trials. An advantage of P-splines over LV and RW models (shared by many nonlinear spatial models) is that they can also be applied when plots are not equally spaced. Also, the number of knots can be smaller than the number of plots. The recently proposed modelling approach called SpATS (Rodríguez-Álvarez et al. 2018) is a case demonstrating the great versatility of the framework. It uses second differences rather than first differences as penalties and also uses higher-degree B-splines with a limited number of knots. The main challenge with P-splines is that some choices need to be made, such as the number of segments and the degree of the B-spline basis. Rodríguez-Álvarez et al. (2018) show that the results from SpATS based on second differences (five parameters) are very similar to those from the LV model which uses first differences (three parameters). Hence it is possible that a SpATS model based on first differences could provide a simpler but still effective approach. This suggestion would be consistent with the early findings on spatial adjustment mentioned in the Introduction. Note that the performance of P-splines is deemed relatively insensitive with respect to the specific choice of penalty and spline basis, as well as the number of knots, and that there is usually some compensating effect between these components, so long as the choice provides sufficient flexibility to accommodate the underlying trends (Wood 2006, p.161).

A major computational advantage of all spatial models considered in this paper is that they are linear in the variance parameters, meaning that convergence is usually quick and stable. This is in contrast to nonlinear spatial models, where numerical problems such as lack of convergence to the maximum of the likelihood, dependence on good starting values and large numbers of iterations are common with some models (Diggle and Ribeiro 2007, p. 114; Slaets et al. 2020). In our experience convergence issues are particularly pertinent when a nugget effect, i.e. an independent error term e is fitted in addition to a spatially correlated component u (Piepho et al. 2015; Velazco et al. 2017; Rodríguez-Álvarez et al. 2018). For example, it may happen with nonlinear models that the correlation structure for u converges to either an identity matrix (I) or to a square matrix of ones (J), in which case there is confounding with either the nugget or the intercept (replicate effect), thus causing numerical problems. It is a common experience, however, that a nugget effect is frequently needed (H. D. Patterson in the discussion of Wilkinson et al. 1983; Pilarcyk 2009; Piepho and Williams 2010), so omitting this effect, and be it only to achieve easier convergence, is not advisable. As a conservative modelling approach, we suggest that a nugget should always be included. A good thing about all linear models considered in this paper is that a nugget is always included with necessity.

LV and RW models have the advantage that the inverses of the variance–covariance matrices are sparse (see “Appendix”). Figure 6 gives the nonzero elements of the mixed model coefficient matrix for the LV and P-splines model, showing that the LV one is sparser than the P-spline one. Algorithms from sparse matrix algebra can be used to find the REML estimates in an efficient way (Misztal and Perez-Enciso 1993; Smith 1995; Meyer and Smith 1996).

Fig. 6
figure 6

Graphical representation of mixed model equation coefficient matrix for a one-dimensional LV model and b one-dimensional P-spline for the oats data

Our results have focused on models with fixed genotypic effects but equivalence relations hold equally when genotypes are modelled as random, which is becoming increasingly common in breeding programs (Cullis et al. 2020; Heslot and Feoktistov 2020). Note that our equivalence relations rely on a model reduction for fixed block effects, and the model reduction does not alter the estimates of any other effects in the model (De Hoog et al. 1990). This general result means that our findings apply equally to models with fixed or random genotypic effects.

We would like to re-iterate that for the linear covariance models considered in this paper, model fits are identical regardless of whether covariance is allowed to extend only within replicates or across replicates. The reason for this equivalence is that the presence of a fixed effect for replicates absorbs any correlation at the replicate boundary (Williams et al. 2006). The same equivalence relations apply with fixed incomplete block effects, but they do not hold for models with random block effects. In practice, analysis assuming random block effects will usually be preferred as this allows inter-block information to be recovered. Our empirical results suggests that the different linear covariance models provide comparable fits, and where necessary model choice may be guided by likelihood-based criteria as usual (Verbyla 2019).

The equivalence, in the presence of a fixed replicate effect, between linear covariance models with correlation extending across the whole trial vs. correlation confined within replicates does not carry over to nonlinear models such as AR(1). It may be noted, however, that LV, and hence RW and P-splines, can be seen as a first-order approximation of the most commonly used non-linear AR(1) model (Piepho and Williams 2010) and are in fact a limiting form of AR(1) as the correlation parameter approaches one; Williams et al. (2006) note that in practice, estimates of this parameter are often quite high. This was also evident in the oats data. When correlation was confined within replicates, the AR(1) autocorrelation converged to 0.9962, and then likelihood was indistinguishable from that of the linear models (Tables 1, 2 ). When the correlation is allowed to extend across the whole field, the correlation converges to the boundary of 1.0, and again the likelihood is identical to that of the linear models in Table 1.

All SAS code used to perform the analyses for the two examples is provided as Supplementary Electronic Material, along with the two published datasets.