1 Introduction

Spatial regression models are commonly used in applied statistics to model data collected at different spatial locations across a geographical region. Such models use spatial random effects to account for spatial correlation in the response variable that cannot be explained by the covariates in the model. However, as spatial random effects are not independent of the spatially dependent covariates, they can interfere with the effect estimates of interest. This issue, known as spatial confounding, was first identified by Clayton et al. (1993) and has since been extensively studied (Reich et al. 2006; Hodges and Reich 2010; Hughes and Haran 2013; Hanks et al. 2015; Paciorek 2010; Page et al. 2017; Khan and Calder 2020; Reich et al. 2021; Nobre et al. 2021; Dupont et al. 2022b). A well-known example in Reich et al. (2006) illustrates the problem: An initial regression without spatial random effects estimates the effect of socio-economic status on stomach cancer incidence across Slovenia to be negative and significant. But when spatial random effects are added to the model, the estimated effect becomes close to zero and not significant. Such behaviour makes it difficult to draw reliable conclusions.

The problem of spatial confounding is complex, and while several methods have been suggested for dealing with it (Dupont et al. 2022b; Marques et al. 2022; Guan et al. 2022; Schnell and Papadogeorgou 2020; Thaden and Kneib 2018), so far all analysis and methods have assumed that the covariate effects of interest are linear. Linear covariate effects, however, are not always sufficient for modelling complex dependencies between the covariates and the response.

Our work has been motivated by a spatial model for forest health. Recently, climate change has contributed to the decline in forest health, and European forest health monitoring data are increasingly being used to investigate the effects of climate change on forests in order to decide on forest management strategies for mitigation. Forests in Germany have been badly affected, and climate change is now a major cause of defoliation (Eickenscheidt et al. 2019). We model yearly forest health monitoring data of the main species Norway spruce from South Western Germany in order to identify the site characteristics (topography, soil and climate) which are associated with damage. South-western Germany experienced high water deficits in the drought year 2003 and also more recently in 2015, 2018 and 2020 and is the region with the highest defoliation in Germany. The analysis in Eickenscheidt et al. (2019) showed that the intensity and duration of defoliation following drought stress varies between species of tree. Tree health is quantified by defoliation measured as a percentage of defoliation. There are many covariates with nonlinear effects due to there often being an optimal range at which the effect occurs (for example for number of days with frost).

Generalised additive models (GAMs), developed by Hastie and Tibshirani (1990), are a flexible nonparametric extension of linear models and generalised linear models (GLMs) that allow estimation of unknown and possibly nonlinear relationships between response and predictors. Each covariate effect is represented by a smooth function defined on the domain of the covariate. The functions are then estimated using a penalised version of maximum likelihood estimation. Spatial random effects can be added to the model in the form of a spatial thin plate regression spline. Using this formulation of the spatial model, we build on the analysis in Dupont et al. (2022b) to investigate spatial confounding for models with nonlinear covariate effects. As shown in Paciorek (2010), the bias for linear covariate effects depends on the spatial scales of confounding, where confounding at high spatial frequencies is more likely to lead to bias than low frequencies. Based on this, we investigate the issue for nonlinear effects using a simulation study that considers confounding at both high and low frequencies. Moreover, we show that the spatial+ methodology presented in Dupont et al. (2022b) for reducing spatially induced bias in estimates of linear covariate effects can be adapted to the estimation of nonlinear covariate effects in the GAM framework. In practice, spatial+ can then be used as a diagnostic tool to check whether the estimate of a given nonlinear effect is affected by spatial confounding, and if so, the method can be used to alleviate it. Finally, we illustrate our findings by applying spatial+ to our forest health example.

2 Data

We model defoliation data from the Terrestrial Crown Condition Inventory (TCCI), a forest health monitoring survey which has been carried out yearly in the forests of Baden-Württemberg since 1983. The survey is in alignment with the International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests and thus uses the same survey protocol (Eichhorn et al. 2017). The TCCI includes sampling points from the large-scale monitoring programme (level I) and long-term intensive monitoring areas (level II). The background and description of these two parts of the survey is given in Damman et al. (2001); de Vries et al. (2003) and Eichhorn et al. (2017). The level I data are essentially yearly repeated measures on a regular spatial grid with different subsets of locations missing depending on the year, resulting in sampling points on either a \(4\times 4\), \(8\times 8\) or \(16\times 16\) km grid. At each sampling grid point 24 sample trees with minimum height of 60 cm and a dominant and subdominant position within the forest stand are randomly selected using a protocol ensuring good spatial coverage within a 50 m radius. The trees are permanently marked and re-assessed during subsequent surveys. Trees that are removed are replaced by newly sampled trees. The level II sampling areas are of size 0.25 ha, predominantly in monoculture stands representing typical forest landscapes with the main species spruce, beech, fir, pine or oak. Only stands with trees of age 60 years or older were selected, and all trees are permanently marked. We combine the level I (90% of trees) and II (10% of trees) of the year 2019 for our analysis, and this results in 203 sampling grid points.

Percentage of crown defoliation is recorded in both surveys and this is our response variable as it is a good indicator of tree vitality and hence forest health. It is estimated by eye for individual trees in 5% intervals. Defoliation can be classified into light, medium and severe damage (11–25%, 26–60% and 61–99% defoliation). We focus on the most common tree species Norway Spruce (Picea abies). As 24 trees are surveyed at each location, we model the mean defoliation of spruce trees at each survey location in the year 2019. This is appropriate because we are interested in mean defoliation, the forest is heavily managed, and the trees will all be of the same age. The average defoliation rate in 2019 was 30%.

The explanatory variables are available at sampling point level only. Besides spatial coordinates and mean age, the explanatory variables are related to topography, soil and climate characteristics. The variables on soil properties are available once by sampling point location and are derived from the National Forest Soil Inventory carried out once between 2006 and 2008. The climate variables are available yearly, and we use different indicators for mean annual drought stress and aridity indices derived from daily weather variables (temperature, precipitation, solar radiation, humidity and wind speed) which were predicted on a 250-m grid using climate models. The methodology for the generation of the weather grid data is described in Dietrich et al. (2019).

3 Methods

3.1 Spatial Models with Nonlinear Covariate Effects

Suppose \({\textbf{y}}=(y_1,\ldots ,y_n)^T\) and \({\textbf{x}}=(x_1,\ldots ,x_n)^T\) are observations of a response variable and a covariate, respectively, measured at spatial locations \({\textbf{t}}_1,\ldots ,{\textbf{t}}_n\). The “null model” is given by the following GAM:

$$\begin{aligned} y_i=\alpha +f_{\text {co}}(x_i)+\epsilon _i,\quad \epsilon _i\underset{\text {iid}}{\sim }N(0,\sigma ^2) \end{aligned}$$
(1)

where \(\alpha \) is an unknown intercept and \(f_{\text {co}}\) is an unknown smooth function of the covariate, defined on a domain \(\Omega _{\text {co}}\) that contains the observations. Thus, \(f_{\text {co}}\) captures the (possibly nonlinear) effect of the covariate on the response across this domain. \(\alpha \) and \(f_{\text {co}}\) are estimated through penalised maximum likelihood estimation, a method used to obtain fitted values close to the observed data values but with a smoothing penalty applied to \(f_{\text {co}}\) in order to avoid overly wiggly functions that are likely to overfit the data. The level of smoothing applied is controlled by a smoothing parameter \(\lambda _{\text {co}}>0\) which is also estimated from the data.

In practice, the model is fitted by writing \(f_{\text {co}}\) as a linear combination of \(k_{\text {co}}\) smooth basis functions and then estimating the corresponding \(k_{\text {co}}\) coefficients. Let \(b_1^{\text {co}}, \ldots , b_k^{\text {co}}\) be a set of smooth basis functions defined on \(\Omega _{\text {co}}\). Writing

$$\begin{aligned} f_{\text {co}}(x)=\sum _{j=1}^{k_{\text {co}}} \beta _j^{\text {co}}b_j^{\text {co}}(x) \end{aligned}$$
(2)

for \(x\in \Omega _{\text {co}}\), the covariate effect \(f_{\text {co}}\) can be estimated by estimating the unknown coefficients \(\varvec{\beta }_{\text {co}}=(\beta _1^{\text {co}}, \ldots , \beta _{k_{\text {co}}}^{\text {co}})^T\). Thus, estimates in model (1) are obtained through a penalised version of linear least squares for the model

$$\begin{aligned} {\textbf{y}}={\textbf{X}}\varvec{\beta }+\varvec{\epsilon },\quad \varvec{\epsilon }\sim N({\varvec{0}},\sigma ^2{\textbf{I}}) \end{aligned}$$
(3)

with coefficients \(\varvec{\beta }=(\alpha ,\varvec{\beta }_{\text {co}})^T\) and model matrix \({\textbf{X}}=[{\varvec{1}}|{\textbf{B}}_{\textrm{co}}]\) where \({\varvec{1}}\) denotes a column vector of ones and \({\textbf{B}}_{\textrm{co}}=[{\textbf{b}}_1^{\text {co}}|\cdots |{\textbf{b}}_{k_{\text {co}}}^{\text {co}}]\) has columns \({\textbf{b}}_j^{\text {co}}=(b_j^{\text {co}}(x_1), \ldots , b_j^{\text {co}}(x_n))^T\) for \(j=1, \ldots , k_{\text {co}}\). The estimated \(\varvec{\beta }\) minimises

$$\begin{aligned} \Vert {\textbf{y}}-{\textbf{X}}\varvec{\beta }\Vert ^2+\lambda _{\text {co}}\varvec{\beta }_{\text {co}}^T{\textbf{S}}_{\text {co}}\varvec{\beta }_{\text {co}} \end{aligned}$$

where \(\lambda _{\text {co}}>0\) is the smoothing parameter and \({\textbf{S}}_{\text {co}}\) a penalty matrix. The resulting estimated function \({\hat{f}}_{\text {co}}\) is known as a “smooth”. Note that there is some flexibility in model specification as different choices of basis functions and penalty structures lead to different types of smooth structures for the estimated effects. A detailed account of GAM theory can be found in Wood (2017).

Spatial random effects are added to model (1) in the form of a spatial thin plate regression spline (see Dupont et al. (2022b)). This is just another smooth term in the GAM, and the resulting spatial model is given by

$$\begin{aligned} y_i=\alpha + f_{\text {co}}(x_i)+f_{\text {sp}}({\textbf{t}}_i)+\epsilon _i,\quad \epsilon _i\underset{\text {iid}}{\sim }N(0,\sigma ^2) \end{aligned}$$
(4)

where \(\alpha \) is the unknown intercept and \(f_{\text {co}}\) and \(f_{\text {sp}}\) are unknown smooth functions on \(\Omega _{\text {co}}\) and the spatial domain \(\Omega _{\text {sp}}\), respectively. Thus, \(f_{\text {sp}}\) captures the residual spatial variation in the response variable that cannot be explained by the covariate effect \(f_{\text {co}}\).

Model (4) is once again fitted using penalised least squares to estimate the coefficients in basis expansions of the functions \(f_{\text {co}}\) and \(f_{\text {sp}}\). That is, writing \(f_{\text {sp}}\) as a linear combination of \(k_{\text {sp}}\) spatial basis functions, we estimate the corresponding \(k_{\text {sp}}\) coefficients in addition to the \(k_{\text {co}}\) coefficients for \(f_{\text {co}}\). Let \(b_1^{\text {sp}}, \ldots , b_{k_{\text {sp}}}^{\text {sp}}\) denote the thin plate regression spline basis on the spatial domain \(\Omega _{\text {sp}}\) and write \(f_{\text {sp}}({\textbf{t}})=\sum _{j=1}^{k_{\text {sp}}} \beta _j^{\text {sp}} b_j^{\text {sp}}({\textbf{t}})\) for \({\textbf{t}}\in \Omega _{\text {sp}}\). Model (4) is then fitted through penalised least squares for the model (3) but with coefficients \(\varvec{\beta }=(\alpha ,\varvec{\beta }_{\text {co}},\varvec{\beta }_{\text {sp}})^T\) for \(\varvec{\beta }_{\text {sp}}=(\beta _1^{\text {sp}}, \ldots , \beta _{k_{\text {sp}}}^{\text {sp}})^T\), and model matrix \({\textbf{X}}=[{\varvec{1}}|{\textbf{B}}_{\textrm{co}}|{\textbf{B}}_{\textrm{sp}}]\) where \({\textbf{B}}_{\textrm{sp}}=[{\textbf{b}}_1^{\text {sp}}|\cdots |{\textbf{b}}_{k_{\text {sp}}}^{\text {sp}}]\) with \({\textbf{b}}_j^{\text {sp}}=(b_j^{\text {sp}}({\textbf{t}}_1), \ldots , b_j^{\text {sp}}({\textbf{t}}_n))^T\) for \(j=1, \ldots , k_{\text {sp}}\). The estimated \(\varvec{\beta }\) minimises

$$\begin{aligned} \Vert {\textbf{y}}-{\textbf{X}}\varvec{\beta }\Vert ^2+\lambda _{\text {co}}\varvec{\beta }_{\text {co}}^T{\textbf{S}}_{\text {co}}\varvec{\beta }_{\text {co}}+\lambda _{\text {sp}}\varvec{\beta }_{\text {sp}}^T{\textbf{S}}_{\text {sp}}\varvec{\beta }_{\text {sp}} \end{aligned}$$

for smoothing parameters \(\lambda _{\text {co}},\lambda _{\text {sp}} >0\) and penalty matrices \({\textbf{S}}_{\text {co}}\), \({\textbf{S}}_{\text {sp}}\).

3.2 Spatial+

For spatial models with linear covariate effects, that is, models of the form

$$\begin{aligned} y_i=\alpha + \beta x_i+f_{\text {sp}}({\textbf{t}}_i)+\epsilon _i,\quad \epsilon _i\underset{\text {iid}}{\sim }N(0,\sigma ^2) \end{aligned}$$
(5)

where \(\alpha \) and \(\beta \) are unknown parameters and \(f_{\text {sp}}\) a spatial thin plate regression spline, Dupont et al. (2022b) show how spatial confounding manifests itself as a smoothing-induced bias in the effect estimate for the (unsmoothed) covariate part of the model. Spatial smoothing introduces bias in return for lower variance in order to obtain a better model fit by lowering the mean squared error (MSE) of fitted values. However, spatial dependence of the covariate means that although the smoothing is only applied to the spatial effect, it can also interfere with the covariate effect estimation and cause unwanted bias here.

Spatial+ is a two-step regression procedure developed in Dupont et al. (2022b) for alleviating such bias. In the first step, a spatial thin plate regression spline is fitted to the covariate \({\textbf{x}}\). The fitted values \(\widehat{{\textbf{f}}}^x=({\hat{f}}^x_1, \ldots , {\hat{f}}^x_n)^T\) identifies the spatial part of \({\textbf{x}}\), and we obtain the decomposition

$$\begin{aligned} {\textbf{x}}=\widehat{{\textbf{f}}}^x+{\textbf{r}}^x \end{aligned}$$
(6)

where \({\textbf{r}}^x=(r^x_1, \ldots , r^x_n)^T\) are the residuals. The second step regression then uses the original model (5) but with \({\textbf{x}}\) replaced by \({\textbf{r}}^x\) in the model matrix.

The key idea is to decouple the estimation of the covariate and spatial effects so that the covariate effect estimate is less sensitive to spatial smoothing. More precisely, the residuals \({\textbf{r}}^x\) are largely independent of the spatial part of the model but has the same effect \(\beta \) on \({\textbf{y}}\) as \({\textbf{x}}\) (because \(\beta {\textbf{x}}=\beta \widehat{{\textbf{f}}}^x+\beta {\textbf{r}}^x\)). Therefore, using \({\textbf{r}}^x\) instead of \({\textbf{x}}\) for the estimation means that the same effect is estimated but in a way that is largely unaffected by spatial smoothing.

In model (4), the covariate term \(f_{\text {co}}\) depends on the covariate and, as such, inherits any spatial dependence. Hence, spatial smoothing can once again lead to unwanted interference with the estimation of this effect. It would be natural to try to extend the spatial+ method to nonlinear covariate effects by simply replacing the covariate \({\textbf{x}}\) in model (4) by the residuals \({\textbf{r}}^x\) from the decomposition (6). This would once again decouple the estimation of the covariate and spatial effects; however, unlike the linear case, \({\textbf{r}}^x\) may not have the same effect on \({\textbf{y}}\) as \({\textbf{x}}\). More precisely, the relationship between \({\textbf{x}}\) and \({\textbf{y}}\) is described by the function \(f_{\text {co}}\), but since \(f_{\text {co}}\) may be nonlinear, we cannot assume that \(f_{\text {co}}(x_i)=f_{\text {co}}({\hat{f}}^x_i)+f_{\text {co}}(r_i^x)\) for \(i=1, \ldots , n\). Therefore, the function that describes the relationship between \({\textbf{r}}^x\) and \({\textbf{y}}\) may be quite different to \(f_{\text {co}}\).

In the GAM framework, however, although the covariate effect itself may be nonlinear, it is decomposed into a sum of basis components, each of which have a linear effect on the response variable. In fact, the model is fitted by estimating exactly these linear effects. More precisely, in the notation of Sect. 3.1, we can view \(f_{\text {co}}\) as an additive effect of \(k_{\text {co}}\) linear “covariates” \({\textbf{x}}^1={\textbf{b}}_1^{\text {co}},\ldots , {\textbf{x}}^{k_{\text {co}}}={\textbf{b}}_{k_{\text {co}}}^{\text {co}}\) and, as such, we can apply spatial+ to each component \({\textbf{x}}^j\) to avoid spatially induced smoothing bias in the estimate of the corresponding coefficient \(\beta _j^{\text {co}}\). We note that for different coefficients, the bias could be different. Therefore, applying spatial+ may change both the size and the shape of the estimated covariate effect.

In summary, spatial+ can be applied to model (4) using the following two-step procedure.

  1. 1)

    For the columns \({\textbf{x}}^1,\ldots , {\textbf{x}}^{k_{\text {co}}}\) of the covariate part \({\textbf{B}}_{\textrm{co}}\) of the model matrix for model (4), decompose each as \({\textbf{x}}^j=\widehat{{\textbf{f}}}^j+{\textbf{r}}^j\) as in (6).

  2. 2)

    Replace each \({\textbf{x}}^j\) by \({\textbf{r}}^j\) in the model matrix for model (4).

The estimated covariate effect is then obtained by using the resulting estimates of \(\beta _j^{\text {co}}\) as the coefficients in the basis expansion (2).

One difference here to the analysis in Dupont et al. (2022b) is that, in addition to spatial smoothing, we also have a covariate smoothing penalty. This will induce bias in the estimates of the coefficients \(\beta _j^{\text {co}}\), even when there is no spatial smoothing. However, the purpose of the covariate smoothing is to avoid overfitting, penalising overly wiggly estimates of the effect \(f_{\text {co}}\) when viewed as a function of the covariate. Therefore, this bias is, in that sense, not unwanted as it should be compensated by a more realistic shape for the estimated covariate effect. Additional bias in the covariate effect estimate arising from the smoothing of the spatial effect, however, seems more arbitrary, and it is this bias that we can avoid by using spatial+.

Although we could apply the above procedure directly, the \(k_{\text {co}}\) separate spatial regressions required for the first step can be time consuming. In practice, we therefore apply a simpler and faster implementation. Details are given in Sect. 1 of the supplementary material.

3.3 The Forest Health Example

We fit the following additive model to mean defoliation \(\texttt{ratio}_{i}\) of \(\texttt{nobs}_i\) trees per plot i

$$\begin{aligned} \texttt{ratio}_{i}= f_{\text {sp}}({\textbf{t}}_i) + f_1(\texttt{age}_{i}) + \sum _k f_k(x_{ik}) + \epsilon _{i} \end{aligned}$$

where \({\varvec{\epsilon }} \sim N(\textbf{0},{\varvec{\Lambda }})\) with \({\varvec{\Lambda }}=\sigma ^2{\textbf{W}}\) and \({\textbf{W}}=\text {diag}(\texttt{nobs})\) a diagonal matrix of weights. \(f_{\text {sp}}(\cdot )\) is a thin plate regression spline defined on the spatial domain of the data. The \(x_{ik}\) are predictor variables. Note that the \(x_{ik}\) include factors fitted as random effects, because the mathematical structure of random effects and smooths are the same when a Bayesian representation is used (see for example Section 5.8 in Wood (2017)). Parameter estimation is done using the approach described in Sect. 3 with smoothing parameters estimated using the restricted maximum likelihood (REML) method (for further details see Wood (2011)). The implementation of spatial+ has been slightly adapted due to the weights in the regression (details are given in Sect. 2 of the supplementary material).

The estimation scheme is implemented in the gam() and bam() functions of the R-package mgcv.

The data have a large number of missing values in many explanatory variables, and this was taken into consideration during the pre-selection of variables. The variables with over 5% missing data were immediately removed from consideration. A further set of variables were removed from the initial modelling data set based on recommendations from forest health experts which suggested these variables are difficult to collect in practice. If pairs of variables have a strong association and are proxies of each other, we included the variable with fewer missing values and/or the variable which is more useful in terms of identifying causes of tree damage and predictor value ranges for critical conditions.

Finally, only complete observations were used in the modelling. This pre-processing of data reduced the originally 80 explanatory variables to 61.

Regarding the variable selection, we have the problem that in the full model, the 61 variables result in more parameters than observations so backward selection is not possible. Instead we use a “boost forward penalise backward” model selection scheme (Augustin et al. 2022) based on combining component-wise gradient boosting (Schmid and Hothorn 2008; Mayr et al. 2012) with integrated backward selection (Marra and Wood 2011).

4 Simulation Study

4.1 Simulation Data and Implementation

We generate 100 independent replicates of covariate data \({\textbf{x}}=(x_1,\ldots ,x_n)^T\) and response data \({\textbf{y}}=(y_1,\ldots ,y_n)^T\), observed at \(n=1000\) randomly selected locations in the spatial domain \([0,10]\times [0,10]\) in \({\mathbb {R}}^2\). Let \({\textbf{z}}=(z_1,\ldots ,z_n)^T\) and \({\textbf{z}}'=(z'_1,\ldots ,z'_n)^T\) denote the values at the selected locations of independently generated spatial fields. As in Dupont et al. (2022b), these have initially been generated as spatial Gaussian processes with a spherical and exponential covariance structure, respectively. That is, each spatial field is sampled from a multivariate normal distribution centred at \({\varvec{0}}\) with covariance structure defined by \(C(h)=-1-1.5h/R+0.5(h/R)^3\) for \(h\le R\), \(C(h)=0\) for \(h>R\) (with \(R=1\)) for the spherical field and \(C(h)=\exp (-(h/R))\) (with \(R=5\)) for the exponential field (where h denotes Euclidean distance). We then ensure that \({\textbf{z}}\) and \({\textbf{z}}'\) are high- and low-frequency processes, respectively, by fitting a spatial thin plate spline to each with basis sizes \(k_{\text {sp}}=300\) and \(k_{\text {sp}}=10\), respectively. We then let

$$\begin{aligned} {\textbf{y}}= & {} {\textbf{f}}_{\text {co}}+{\textbf{f}}_{\text {sp}}+\varvec{\epsilon }^y,\quad \varvec{\epsilon }^y{\sim }N({\varvec{0}},\sigma _y^2{\textbf{I}}) \end{aligned}$$

with \(\sigma _y=1.5\) and true covariate effect given by \({\textbf{f}}_{\text {co}}=(f_{\text {co}}(x_1), \ldots , f_{\text {co}}(x_n))^T\) with \(f_{\text {co}}(x)=2x^3\). The covariate data \({\textbf{x}}\) and the true spatial effect \({\textbf{f}}_{\text {sp}}\) are simulated to reflect four different scenarios. These consist of two confounding scenarios: “confounding at high spatial frequencies” and “confounding at low spatial frequencies”, where for each of these scenarios we simulate two different levels of overall correlation between the covariate and the true spatial effect (“high” and “low”):

  1. 1.

    Confounding at high frequencies: \({\textbf{x}}={\textbf{z}}+\varvec{\epsilon }^x,\quad \varvec{\epsilon }^x{\sim }N({\varvec{0}},\sigma _x^2{\textbf{I}})\),

    • \({\textbf{f}}_{\text {sp}}=-5{\textbf{z}}-5{\textbf{z}}'\) (high correlation),

    • \({\textbf{f}}_{\text {sp}}=-5{\textbf{z}}-50{\textbf{z}}'\) (low correlation).

  2. 2.

    Confounding at low frequencies: \({\textbf{x}}={\textbf{z}}'+\varvec{\epsilon }^x,\quad \varvec{\epsilon }^x{\sim }N({\varvec{0}},\sigma _x^2{\textbf{I}})\),

    • \({\textbf{f}}_{\text {sp}}=-5{\textbf{z}}'-5{\textbf{z}}\) (high correlation),

    • \({\textbf{f}}_{\text {sp}}=-5{\textbf{z}}'-50{\textbf{z}}\) (low correlation).

Here, \(\sigma _x=0.1\).

Using the R-package mgcv, we fit the null model (1), the spatial model (4) and the spatial+ model as described in Sect. 3.2 to each of the 100 data replicates described above. The spatial model and spatial+ model are fitted both without spatial smoothing (that is, where the spatial smoothing parameter is fixed at \(\lambda _{\text {sp}}=0\)) and with spatial smoothing (where \(\lambda _{\text {sp}}\) is estimated). Basis sizes used are \(k_{\text {co}}=10\) for the covariate effect and \(k_{\text {sp}}=300\) for the spatial effect. Estimates of smoothing parameters are obtained through the generalised cross-validation (GCV) criterion.

We have also repeated the simulations for a number of additional simulation scenarios: data generated with positive correlations between the covariate and the true spatial effect, data where the underlying spatial processes \({\textbf{z}}\) and \({\textbf{z}}'\) are Gaussian processes (i.e. where the fitted model is mis-specified) and scenarios with more than one covariate. Details of these additional simulations can be found in the supplementary material.

4.2 Simulation Results

Figure 1 shows the results for the scenario in which the confounding takes place at high spatial frequencies and where the overall correlation between the covariate and the true spatial effect is high. When the spatial model and spatial+ model are fitted without spatial smoothing, they do not appear to be biased. However, when spatial smoothing is applied, the spatial model is no longer able to capture the true covariate effect, while spatial+ remains broadly unbiased. The null model is biased as well.

Figure 1 also shows the MSE of fitted values for all the models in this scenario, calculated as \(\Vert {\hat{{\textbf{y}}}}-({\textbf{f}}_{\text {co}}+{\textbf{f}}_{\text {sp}})\Vert ^2\) where \({\hat{{\textbf{y}}}}\) is the fitted values and \({\textbf{f}}_{\text {co}}\) and \({\textbf{f}}_{\text {sp}}\) are the true effects in the data. We see that the null model has the highest MSE of fitted values indicating a poorer fit, which is not surprising as it has lower explanatory power than the other models. For the spatial and spatial+ models, the fitted values are similar, and the MSE of fitted values drops further when spatial smoothing is applied. (Indeed, this is the purpose of smoothing.)

Fig. 1
figure 1

Results for the scenario of confounding at high frequencies with high overall correlation: Estimated covariate effect \({\hat{f}}_{\text {co}}\) (in grey) in the spatial model (left), spatial+ model (middle) and null model (right) fitted to 100 data replicates, where the true covariate effect is \(f_{\text {co}}(x)=2x^3\) (shown in black). The spatial and spatial+ models have been fitted with (top) and without (bottom) spatial smoothing. The bottom right plot shows the \(\log (\text {MSE})\) for the null, spatial (Sp) and spatial+ (Sp+) models fitted to the same 100 data replicates where superscript 0 refers to a model with no spatial smoothing. Cor(\(x,f_{\text {sp}}\)) denotes the average correlation between the covariate and the true spatial effect

Figure 2 summarises the results for the remaining three scenarios. We see that for the spatial model, the covariate effect estimate is biased in the scenario where confounding happens at high spatial frequencies, even when correlation between the covariate and the true spatial effect is low. However, bias in the spatial model is negligible when confounding happens at low frequencies, irrespective of the size of the overall correlation. In contrast, for the null model, the size of the bias appears to be directly related to this correlation, with higher correlation leading to larger bias. When correlation is low, the null model does not appear to be heavily biased. However, the effect estimate has relatively high uncertainty and, in some cases, this model fails to capture the true shape of the covariate effect. Finally, the spatial+ model appears to give broadly unbiased estimates in all scenarios.

Results of our additional simulation scenarios are given in the supplementary material. When data were generated with positive correlations between the covariate and the true spatial effect, we see the same behaviour but with the bias (in both the spatial and the null model) going in the opposite direction to the bias observed in Figs. 1 and 2. For mis-specified models (i.e. where data was generated using Gaussian process spatial fields but the spatial effects in the fitted models were thin plate splines), there is a small mis-specification bias, but this seems negligible, and the overall behaviour is broadly similar to the correctly specified models. For scenarios with two covariates, we see that the overall behaviour is similar to that of the single covariate case, although it appears that bias in the spatial model may be distributed slightly differently between the two covariate smooths when the covariates are highly correlated (with one covariate estimate becoming more biased and the other less biased). However, in all scenarios, spatial+ appears to eliminate the bias and recover the true covariate effects.

Fig. 2
figure 2

Results for the scenarios of confounding at high frequencies with low overall correlation (top row), confounding at low frequencies with high overall correlation (middle row) and confounding at low frequencies with low overall correlation (bottom row). Plots in each row show the estimated covariate effect \({\hat{f}}_{\text {co}}\) (in grey) in the spatial model (left), spatial+ model (middle) and null model (right). In the spatial and spatial+ models, spatial smoothing has been applied. The models have been fitted to 100 data replicates, where the true covariate effect is \(f_{\text {co}}(x)=2x^3\) (shown in black)

4.3 Simulation Analysis

Our simulation confirms that unwanted bias in the spatial model arises as a direct result of spatial smoothing as bias does not occur when spatial smoothing is not applied. As described in Paciorek (2010), the size of this bias depends on the spatial frequency at which the confounding takes place. Confounding at high frequencies (those that are smoothed more) are likely to lead to larger bias, whereas confounding at lower frequencies is less problematic. Our simulations confirm that confounding at high frequencies lead to bias in the spatial model irrespective of the level of overall correlation between the covariate and the true spatial effects, whereas confounding at lower frequencies did not appear to induce bias. In contrast, for the null model, our simulation shows that bias is directly related to the correlation between the covariate and the true spatial effects. This is what we would expect as, in this model, the estimated covariate effect must reflect, not only the effect of the covariate, but also any part of the unmeasured spatial effects that is correlated with the covariate. We also see that the null model at times has difficulty estimating the true shape of the covariate effect. This is not surprising as the presence of unmeasured spatial variation in the response variable violates the assumption of iid residuals in the null model. Therefore, this model is not an appropriate model to use, which is also confirmed by the consistently large MSE of fitted values. Finally, in the spatial+ model, estimation of the covariate effect is broadly decoupled from the estimation of the spatial effect and, as a result, is less sensitive to spatial smoothing. Our simulation results confirm that the covariate effect estimates in spatial+ are broadly unbiased and capture the true effect.

Our additional simulations outlined in the supplementary material show that the same overall behaviour persists in scenarios with positive correlations, model mis-specification and when there are multiple covariates. Moreover, in the case of two correlated covariates, the bias in the spatial model may be distributed between the two covariate smooths in different ways, making it even more uncertain whether the spatial model estimates capture the true covariate effects. We note that in all of the additional simulations, spatial+ was able to broadly eliminate the biases and recover the true effect.

In practice, as the true spatial effect is unknown, we do not know which of the scenarios from the simulation apply in a given data set. Hence, while the spatial model may well produce reasonable results, equally, the bias may be non-negligible. In this context, spatial+ can be used as a diagnostic tool for assessing the sensitivity of estimates in the spatial model to spatial confounding. More precisely, if a covariate has largely the same estimated effect in the spatial model and the spatial+ model, it suggests there is no evidence of spatial confounding affecting this covariate. On the other hand, if the covariate effect estimates differ, then the spatial+ estimate can be used to avoid the spatially induced bias.

Fig. 3
figure 3

Forest health example: The estimated covariate effects from the spatial () and the spatial+ model are shown. The black lines correspond to spatial, and the grey lines denote the spatial+ model. There is a horizontal dotted line at zero to aid interpretation. The dashed lines correspond to 95% confidence bands

5 Results for the Forestry Example

The estimated effects of the spatial and spatial+ model for the covariates in the forest health example are shown in Fig. 3. We see that the spatial model is an improvement compared to the null model both in terms of higher R\(^2\) and lower AIC. As the true residual spatial effects are unmeasured, we cannot know whether the correlation between the covariate and the true spatial effects is high or low, or which spatial frequencies, if any, are confounded. However, as the estimated effects in spatial+ are not very different from those in the spatial model, this suggests that spatial confounding is not a major concern here.

In terms of fit, spatial+ is similar to the spatial model with a similar R\(^2\) explaining 72% and 71%, respectively, of the variation (Table 1). In spatial+, some of the explanatory power between the covariate and spatial terms has been redistributed. This is apparent in Table 2 where we see that the covariate d_tmin is no longer significant in spatial+. Also the p-values of soil depth and twi25 are increased in spatial+ compared to the spatial model. Note that in the spatial model, the spatial effect is the residual spatial effect, whereas in spatial+, it is the total spatial effect. It is therefore not surprising that the spatial effect in spatial+ has higher edf than in the spatial model, as it is taking on more of the explanatory power in the model. This is slightly compensated by a small reduction in edf of the covariate effects.

Table 1 Forestry example: R squared, residual standard deviation and AIC for the three models
Table 2 Forestry example: results of fitting models to the data

6 Discussion

In this paper, we assume that the covariates of interest are spatially dependent but not fully determined by their spatial pattern. In practice, this means that the first stage regressions of spatial+ return nonzero residuals. In this situation, there is non-spatial information in each covariate which can be used by the spatial model to identify the corresponding effects, but spatial smoothing may distort the effect estimation. Spatial+ is a practical and easily implementable method for detecting and avoiding this distortion. We note that when a covariate of interest is completely determined by spatial location or it has very little non-spatial information, spatial+ cannot be applied. In this case, the spatial model is essentially unidentifiable, that is, the model cannot distinguish the covariate from unmeasured spatial variation. Therefore, the covariate effect could equally well be modelled as part of the spatial effects, and any apportionment of the effect between the covariate and spatial terms during model fitting may be spurious. In this situation, identifiability can only be achieved through further assumptions, e.g. assumptions about the spatial scales at which confounding takes place (see Guan et al. (2022)). As is also reflected in the published discussions around the spatial+ paper (Dupont et al. 2022b, a; Reich et al. 2022; Marques and Kneib 2022; Papadogeorgou 2022; Schmidt 2022), spatial confounding is an area of much recent interest in the literature, and there are many aspects of this complex issue that are yet to be explored. We note that when the main goal is interpolation or prediction of the outcome variable, the allocation of explanatory power within the model is not of major concern and therefore spatial confounding is not an issue. In this case, there is no need to use methods like spatial+ as the spatial model could be used without adjustment.

Here, we have shown how spatial+ can be extended to spatial models with covariate effects modelled under the GAM framework. As such, this paper presents a first step in the investigation into spatial confounding when there is the added complication of unknown and possibly nonlinear covariate effects. Our simulation study shows that spatial confounding is only likely to be an issue when confounding occurs at high spatial frequencies. It is also possible that the additional complexity of the model, which includes separate smoothing applied to the covariate term, means that the effects of spatial smoothing become less pronounced. Therefore, in practice, the spatial model may well produce reasonable effect estimates. As demonstrated in our forest health example, spatial+ provides a useful tool for identifying whether this is the case. In the example, there could have been both high- and low-frequency spatial confounders, but the comparison between estimates from spatial+ and the spatial model shows that spatial confounding is not a major concern here. If spatial confounding bias had been present, however, the method could also have been used to alleviate it.