1 Introduction

Research in spatial and spatio-temporal disease mapping has mainly focused on models for smoothing risks in space and time. The models include spatially and temporally correlated random effects as proxies of spatially and temporally structured unobserved covariates with the goal of discovering spatial patterns and their evolution in time. This information is very valuable in epidemiology and public health to highlight regions with high risk and as a first step to discover potential risk factors that may be related to the response of interest. However, this information is somehow preliminar and currently there is an increasing interest in finding associations between hypothetical risk factors and the phenomenon under study. Including potential risk factors (covariates) in a spatial model allows making inference on the strength of the relationship between the response and the covariate. This is usually known as ecological regression.

Spatial regression models including covariates seem a simpler and intuitive mechanism to account for the variability that can be explained by the covariates and the spatially structured variability that remains unexplained, but they present important challenges that continue unsolved (or at least partially unsolved). The most important one is the so called “spatial confounding”. This concept has been commonly used to explain the difference between the fixed effect estimates in spatial models and simpler models like ordinary regression that do not consider spatial correlation (see for example [15, 25]). However, there is neither a unique general definition of spatial confounding nor a definitive solution. This might be the reason why it has been ignored in practice despite its important implications.

Clayton et al. [5] comment that when “the pattern of variation of the covariate is similar to the disease risk, the location may act as a confounder”. Consequently, we would not be stunned if changes in the fixed effects estimates are observed when a spatial term is included in the regression. This might be one of the first references to spatial confounding. Later, Zadnik and Reich [35] conjecture that the change in fixed effects estimates can be due to collinearity between the fixed effects and the conditional autoregressive (CAR) spatial random effects. This collinearity between the fixed effects and the spatial random effects is probably the definition of spatial confounding in spatial linear models in general, and in disease mapping in particular (see for example [1, 14,15,16, 22, 25]).

Recently, Gilbert et al. [11] state that spatial confounding is seldom defined explicitly and they point to four phenomena related to this concept. Namely, 1) bias in the fixed effect estimates due to unobserved variables with spatial pattern; 2) change in fixed effect estimates due to collinearity between fixed and random effects; 3) bias in the fixed effect estimates due to the use of functions to control for spatial dependence such as Markov Gaussian random fields or splines; 4) the challenge of assessing the effect of a covariate with a smooth spatial distribution. Although they appear different ideas at first sight, they are closely connected. It is widely accepted that spatial random effects (spatial functions) are introduced in the model to adjust for unobserved covariates and hence improve model fitting. However, they may also compete with the observed covariates and then change the fixed effect estimates as an effect of collinearity. Probably, the main difference between these four ideas may be the area of statistics where they appear. For example the first notion is compatible with the definition of confounding in causal inference and there are some examples in the literature (see for example [23, 28]) where spatial confounding is understood as the presence of unmeasured variables with a spatial structure that influence both, an observed covariate and the outcome of interest. In this paper, and to avoid misleading interpretations, we do not pursue the estimation of causal effects, but a rather modest goal: estimating linear associations between a covariate (potential risk factor) and the response of interest in different (spatial) Poisson regression models. We implicitly assume that spatial random effects are included in the model as an approximation to the overall effect of the unobserved covariates [6, 19], and this provokes changes in the fixed effect estimates. Then, we investigate which model provides the estimate of fixed effects closest to the true value.

Research on spatial confounding has been focused on existing spatial models to clarify in which conditions they give valid fixed effects estimates [21]. Probably, the most extended method for dealing with spatial confounding is restricted spatial regression (RSR) proposed by Reich et al. [25]. RSR is intended to remove collinearity between the covariate of interest and the spatial random effects by restricting the latter to the orthogonal complement of the space spanned by the fixed effects. Hence, the method preserves the fixed effect estimates obtained in a simple regression model without spatial random effects (henceforth null model). Reich et al. [25] and Hodges and Reich [15] analyse the association between stomach cancer incidence in Slovenia and a socioeconomic indicator (covariate) and justify the RSR because they observe a big change in the fixed effect estimate when a spatial random effect is included in the model. They also explain how the variance of the fixed effect estimator is inflated in the spatial model with respect to the null model. The variance obtained with the RSR is between the variance of the null model and that of the spatial model. However, RSR has been recently criticised. Khan and Calder [17] show that in linear spatial models with normal responses the variance of the RSR fixed effect estimator is always less than or equal to the variance of the null model and hence RSR leads to too liberal inference. For count data they show through simulations that, in certain scenarios, the null model and RSR perform worse than the spatial model if there is spatial variation not explained by observed covariates. Additionally, Gilbert et al. [11] affirm that RSR presumes no confounding bias. This can be understood because RSR assigns all the variability in the fixed effects direction to the observed convariate assuming that the rest of variability is orthogonal to the observed covariate. Consequently, RSR does not consider the possibility of unobserved overlapping covariates with the observed one and hence the fixed effect estimate should be equal to the null model. Moreover, for these authors, collinearity between fixed and random effects should not be a problem as we would expect a change in fixed effect estimates if we presume there are unobserved covariates. Consequently, the spatial model would account for confounding bias. However, Hodges and Reich [15] show that even if the unobserved covariates are orthogonal to the observed ones, the random effects still provoke changes in the fixed effects.

In the literature there are other methods to alleviate spatial confounding. For example, Thaden and Kneib [29] propose a geoadditive structural equation model (gSEM) based on structural equation techniques to account for spatial dependence in both the response and the covariate. This method is introduced for Gaussian responses and it is not clear how to extend it to non-normal cases because there are two likelihood functions that are modeled together, one for the covariate and one for the response. Additionally, it requires more than one observation per area, precluding its use in disease mapping. Recently, Dupont et al. [9] propose a method called spatial+ which is a modification of the spatial model. Spatial+ removes spatial dependence from the covariates by fitting spatial spline models to them. The residuals of these fits are then used as explanatory covariates in the spatial regression model for the outcome. The method seems a promising and simple technique to obtain correct fixed effects estimates. A different approach, based on transformed Gaussian Markov random fields and Gaussian copulas, has been proposed by Prates et al. [24]. The advantage of the method is that the spatial dependence does not interfere with the fixed effects avoiding spatial confounding. All these methods are not free from inconveniences and the main difficulty is to show when and in what circumstances they alleviate confounding effectively.

The main goal of this work is to assess how well recent methods designed to alleviate spatial confounding estimate the fixed effects when there are additional spatially structured variability unexplained by the observed covariates. In particular, we focus on areal count data. For this aim, we simulate several scenarios using different data generating mechanisms that include one observed covariate and additional variability, and fit the different models to compare the fixed effect estimates. We also use the different approaches to revisit real data. Model fitting and inference are carried out from a full Bayes approach using two main techniques: integrated nested Laplace approximations (INLA) and Markov chain Monte Carlo (MCMC) methods.

The rest of the paper is organized as follows. Section 2 briefly introduces the methods used in this work to alleviate spatial confounding. Section 3 illustrates the methods analysing dowry deaths in Uttar Pradesh registered in 2001 [32], the Slovenian stomach cancer data in the period 1995-2001 [35] and the well known Scottish lip cancer data during the years 1975-1980 (see for example [4]). Section 4 is devoted to a vast simulation study. Finally, the paper closes with a discussion.

2 Methods to alleviate spatial confounding

Throughout this section we assume a large domain (e.g. a country) divided into n small areas (i.e. provinces or districts) labelled as \(i=1, 2,\ldots , n\). Denote by \(Y_{i}\) the number of deaths (or incident cases) in the ith small area. Then, conditional on the relative risk \(r_{i}\), \(Y_{i}\) is assumed to be Poisson distributed with mean \(\mu _{i}=e_{i}r_{i}\), where \(e_{i}\) represents the number of expected cases for area i. That is

$$\begin{aligned} Y_{i}\arrowvert r_{i} \sim Poisson(\mu _{i}=e_{i}r_{i}),\;\;\; \text {and} \;\;\; \log \mu _{i}=\log e_{i} + \log r_{i}. \end{aligned}$$

In the following, we review some models for \(\log r_{i}\) that have been proposed in the literature to deal with confounding.

2.1 Spatial model

Spatial regression models include spatial effects to account for the similarity of nearby observations and hence induce spatial smoothness. In disease mapping, Gaussian Markov random fields (GMRF) are used to model spatial random effects (see for example [26]). In particular conditional autoregressive spatial random effects (CAR) have been broadly adopted to capture the spatial dependence that remains unexplained in the model after accounting for covariates. Here, the vector containing the log risks, \(\log {\varvec{r}}\), is modeled as

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}} _{n}\alpha + \textbf{X}{{{\varvec{\beta }}}}+ {{{\varvec{\xi }}}}\end{aligned}$$
(1)

where \({\varvec{r}}=(r_1, r_2, \dots , r_{n})^{'}\) is the vector of relative risks, \({\textbf{1}} _{n}\) is a column vector of ones of length n, \(\alpha \) can be interpreted as an overall risk, \({{\varvec{X}}}=({{\varvec{X}}}_1,\ldots , {{\varvec{X}}}_p)\) is an \(n\times p\) matrix whose columns \({{\varvec{X}}}_j\), \(j=1,\ldots ,p\) are the observed covariates, and \({{{\varvec{\beta }}}}=(\beta _1, \beta _2,\ldots , \beta _{p})^{'}\) is the vector of regression coefficients corresponding to the p observed covariates. Finally, \({{{\varvec{\xi }}}}= (\xi _1, \xi _2,\ldots , \xi _{n}){'}\) is the vector of spatial random effects which is assumed to follow an intrinsic conditional autoregressive (ICAR) prior [2], that is, an improper distribution with Gaussian kernel \(p({{{\varvec{\xi }}}}) \propto \exp (-\frac{1}{2\sigma _{\xi }^2}{{{\varvec{\xi }}}}^{'}{\varvec{Q}}_{\xi } {{{\varvec{\xi }}}})\). Here, \({\varvec{Q}}_{\xi }\) is the neighbourhood matrix defined as \({\varvec{Q}}_{\xi (ij)}=-1\) if areas i and j are neighbours and 0 otherwise, and \({\varvec{Q}}_{\xi (ii)}\) is equal to the number of neighbours of the ith region. Alternatively, spatial effects can be modelled using a smooth function of the coordinates longitude and latitude, that is

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}} _{n}\alpha + \textbf{X}{{{\varvec{\beta }}}}+ {\varvec{f}}({\varvec{s}}_1,{\varvec{s}}_2), \end{aligned}$$
(2)

where \(({\varvec{s}}_1,{\varvec{s}}_2)\) are the coordinates (longitude and latitude) of the centroid of the small areas, and \({\varvec{f}}({\varvec{s}}_1,{\varvec{s}}_2)=(f(s_{11}, s_{12}), f(s_{21}, s_{22}), \dots , f(s_{n1}, s_{n2}))^{'}\) is a smooth function to be estimated using, for example, P-splines with a B-spline basis (see for example [13, 30, 31]).

Ignoring the spatial dependence \({{{\varvec{\xi }}}}\) or \({\varvec{f}}\) in (1) and (2) we obtain the null model, that is, the model without spatial effects. In our case, a simple Poisson regression model, i.e.

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}}_{n}\alpha + \textbf{X}{{{\varvec{\beta }}}}. \end{aligned}$$
(3)

The null model implicitly assumes that all the variability in the response is explained by the observed covariates and there is no confounding bias due to unobserved covariates. Note that spatial models would lead to a change in the fixed effects estimates in comparison to the null model due to the collinearity between the fixed and the random effects. This alleviates confounding according to Gilbert et al. [11]. Here we understand collinearity between the fixed and the CAR random effects as a collinearity problem between the covariates with spatial structure and the eigenvector of the CAR precision matrix corresponding to the lowest non-null eigenvalue. For a more explicit reformulation of the spatial model (1) highlighting the collinearity issue, see for example Reich et al. [25] or Goicoa et al. [12].

2.2 Restricted spatial regression model

Restricted spatial regression (RSR) is probably the most popular method to deal with spatial confounding and was first proposed by Reich et al. [25] to avoid collinearity between fixed and spatial random effects. These authors studied the association between a socioeconomic indicator and stomach cancer incidence in Slovenia. At first sight, they observed that the standardized incidence ratios (SIR), defined as the number of observed cases in one area divided by the number of expected cases in the same area, and the socioeconomic status exhibited strong spatial patterns. Moreover, a clear negative association between SIR and the socioeconomic status was detected. The authors first fitted a Poisson regression model (null model) with the socioeconomic status as a single covariate. Secondly, they fitted a spatial model adding spatial random effects that follow the convolution prior proposed by Besag et al. [3]. They observed that the estimate of the fixed effect in the null and the spatial model changed dramatically: the posterior mean of the fixed effect changed from \(-0.137\) (null) to \(-0.022\) (spatial) and the posterior variance changed from 0.0004 (null) to 0.0016 (spatial). In the case of the Slovenia data, after including the spatial random effects in the model, the negative association between the socioeconomic indicator and stomach cancer disappeared.

To solve this problem, Reich et al. [25] proposed restricted spatial regression (RSR), a method that consists of restricting the spatial random effects to the space orthogonal to the fixed effects. For count data, the RSR model is expressed as

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}}_{n}\alpha + {\varvec{X}}{{{\varvec{\beta }}}}+ \hat{\textbf{W}}^{-1/2}{\varvec{L}}{\varvec{L}}^{'}\hat{\textbf{W}}^{1/2}{{{\varvec{\xi }}}}\end{aligned}$$
(4)

where the columns of \({\varvec{L}}\) are the eigenvectors having non-null eigenvalues of the projection matrix \({\varvec{I}}_{n}-\hat{\textbf{W}}^{1/2}{\varvec{X}}_{*}({\varvec{X}}_{*}^{'}\hat{\textbf{W}}{\varvec{X}}_{*})^{-1}{\varvec{X}}_{*}^{'}\hat{\textbf{W}}^{1/2}\), which projects onto the orthogonal space of \(\hat{\textbf{W}}^{1/2}{\varvec{X}}_{*}\) being \({\varvec{X}}_{*}=[{\textbf{1}}_{n}, {\varvec{X}}]\) and \({\varvec{W}}\) a diagonal matrix of weights with \(W_{ii}=Var(Y_{i}\,\arrowvert \, \alpha , {{{\varvec{\beta }}}}, {{{\varvec{\xi }}}})=\mu _i\). In practice, the matrix \(\hat{\textbf{W}}\) is obtained by fitting the spatial model (1). Note that the RSR model (4) removes collinearity between the fixed and random effects as the combination of \(\hat{\textbf{W}}^{1/2}{{{\varvec{\xi }}}}\) in the span of \(\hat{\textbf{W}}^{1/2}{\varvec{X}}_{*}\) is deleted.

RSR removes collinearity, but all the variability in the direction of the fixed effects is attributed to the observed covariate, consequently it implicitly asumes that there is no unobserved covariate that may produce confounding bias. Then, according to Gilbert et al. [11], RSR is not a method to alleviate spatial confounding. Additionally, Khan and Calder [17] and Zimmerman and Ver Hoef [36] have demonstrated that in spatial models with normal responses the variances of the fixed effects estimates obtained with RSR are less than or equal to the variances obtained with the null model. Consequently, the credible intervals are narrower leading to small coverage rates and an increase of Type-S error rates. The Type-S error is the Bayesian analogue to the frequentist Type I error (see for example [14]). That is, a Type-S error occurs if a 95% equal-tailed credible interval for the regression parameter does not contain zero when the regression parameter is truly zero.

2.3 Spatial+ method

Very recently, Dupont et al. [9] have proposed a novel approach to reduce spatial confounding when the covariate of interest is spatially structured. These authors show that the bias in the fixed effect estimate is due to spatial smoothing. The Spatial+ method is a modification of the spatial model and reduces bias by eliminating the spatial dependence of the covariate. The method consists of two steps: first, the spatial dependence of the covariate is removed through a model that we will denote as covariate model. Second, the spatial model is fitted replacing the covariate by the residuals obtained in the first step. We will call this model spatial+ final model. The authors introduce the method using thin plate splines for the spatial effects in both the covariate model and the spatial+ final model. Here we also deal with the spatial dependence in the covariate model using P-splines or including the eigenvectors of the precision matrix \({\varvec{Q}}_{\xi }\) corresponding to a specific number of the non-null lowest eigenvalues as covariates in a linear model where the observed covariate is now the response. Note that these eigenvectors (in particular the one corresponding to the lowest non-null eigenvalue) are responsible for the collinearity between the fixed and random effects [25]. In more detail, the spatial+ method starts from the spatial model (2),

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}} _{n}\alpha + \textbf{X}{{{\varvec{\beta }}}}+ {\varvec{f}} \end{aligned}$$

where \({\varvec{f}}\) is a spatial term originally modeled with splines (see [9]). Given the jth covariate \({\varvec{X}}_{j}\), \(j=1,\ldots , p\), we consider the covariate model

$$\begin{aligned} {\tilde{{\varvec{X}}}}_{j}={\tilde{{{{\varvec{\psi }}}}}}_{j} + {\tilde{{{{\varvec{\epsilon }}}}}}_{j} \end{aligned}$$
(5)

where \({\tilde{{\varvec{X}}}}_{j}={\hat{{\varvec{W}}}}^{1/2}{\varvec{X}}_{j}\), \({\tilde{{{{\varvec{\psi }}}}}}={\hat{{\varvec{W}}}}^{1/2}{{{\varvec{\psi }}}}\), \({\tilde{{{{\varvec{\epsilon }}}}}}_{j}={\hat{{\varvec{W}}}}^{1/2}{{{\varvec{\epsilon }}}}_{j}\), and \({{{\varvec{\epsilon }}}}_{j} \sim N({\textbf{0}}, \sigma _{{\varvec{X}}_{j}}^2 {\varvec{I}}_n)\). Here, \(\sigma _{{\varvec{X}}_{j}}\) is the standard deviation of the independent and identically distributed errors in the jth covariate model, \({\varvec{I}}_n\) is an \(n\times n\) identity matrix, and \({\varvec{W}}\) is the same diagonal matrix of weights from Model (4). Finally, \({{{\varvec{\psi }}}}\) are spatial effects that can be modeled in two ways. The first one consists of including the eigenvectors of the precision matrix \({\varvec{Q}}_{\xi }\) corresponding to the k lowest non-null eigenvalues as covariates, so that model (5) is a weighted linear regression model. Here we choose k so that it is at least \(5\%\) and at most \(30\%\) of the total number of eigenvectors. The second option uses P-splines or thin plate splines to model the spatial dependence of the covariate.

The residuals of each covariate j are \({\tilde{{\varvec{Z}}}}_{j}={\tilde{{\varvec{X}}}}_{j}-{\tilde{{{{\varvec{\psi }}}}}}_{j}\). Once the weighted residuals are computed, they are transformed to the original scale \({\varvec{Z}}_{j}={\hat{{\varvec{W}}}}^{-1/2}{\tilde{{\varvec{Z}}}}_{j}\) (see [9], for details). The residuals \({\varvec{Z}}_{j}\) are standardized before including them in the spatial+ model.

Finally, the spatial+ final model is fitted replacing the matrix of covariates \({\varvec{X}}\) in (2) by the matrix of residuals \({\varvec{Z}}\) as

$$\begin{aligned} \log {\varvec{r}}={\textbf{1}}_{n}\alpha + {\varvec{Z}}{{{\varvec{\beta }}}}+ {\varvec{f}}. \end{aligned}$$
(6)

Note that in this paper the spatial term \({\varvec{f}}\) is modeled using ICAR random effects or using splines.

2.4 Transformed Gaussian Markov Random Field (TGMRF) model

Transformed Gaussian Markov Random Fields (TGMRF) were introduced by Prates et al. [24] and are based on the general Gaussian graphical model proposed by Dobra and Lenkoski [8]. The interpretation of the fixed effects is the same as in the previous methods and the main advantage is that the spatial dependence does not interfere with the fixed effects.

In the previous models (spatial model, RSR, and spatial+ model), the main idea is to connect the covariate and the spatial effects with the relative risks using a given link function g(). In our case, \(g({\varvec{r}})=\log {\varvec{r}}\). Then, the dependence between the relative risks \(r_i\) is induced by the prior distribution of the spatial effects. TGMRF provides an alternative way that specifies any positive continuous distribution for the marginal distributions of the relative risks where the covariate effects are introduced in the parameters of the marginal distribution and the spatial dependence structure is captured thanks to the use of a Gaussian copula. Copulas are functions that join multivariate distribution functions to their one-dimensional marginal distribution functions [20]. Sklar’s theorem illustrates the role that copulas play in the relationship between multivariate distribution functions and their univariate margins (see Section 2.3 of [20]).

Assuming that areal count data follow a Poisson distribution, the TGMRF model is expressed as,

$$\begin{aligned} {\varvec{r}} \sim TGMRF({\varvec{F}}, \varvec{\Omega }), \end{aligned}$$
(7)

where \({\varvec{r}}=(r_1, r_2, \dots , r_{n})^{'}\) is the vector of relative risks, \({\varvec{F}}=(F_1, F_2, \dots , F_{n})^{'}\), \(F_{i}\) is the marginal distribution of \(r_{i}\), and \(\varvec{\Omega }\) is a correlation matrix that determines the spatial dependence structure in the Gaussian copula. Details about how the marginal distributions for the relative risks are defined in this work, as well as the way of specifying the spatial correlation matrix \(\varvec{\Omega }\) are available in Appendix B. In short, the TGMRF method defines the n-dimensional distribution function of the vector of relative risks \({\varvec{r}}\), denoted as H, in two steps. First, a marginal distribution \(F_{i}\) is choosen for each \(r_{i}\). Then, the multivariate distribution function of \({\varvec{r}}\) is defined as

$$\begin{aligned}{} & {} p(r_1 \le a_1, \,\dots ,\, r_{n} \le a_{n})= H(a_1,\,\dots ,\, a_{n} \,\arrowvert \, \varvec{\Omega },\, F_1,\, \dots ,\, F_{n})\\{} & {} \quad =C(F_1(a_1), \, \dots ,\, F_{n}(a_{n}) \,\arrowvert \, \varvec{\Omega }) \end{aligned}$$

where \(C(u_1, \, \ldots , \, u_n \,\arrowvert \, \varvec{\Omega })=\Phi _n(\Phi ^{-1}(u_1), \, \ldots ,\,\Phi ^{-1}(u_n) \,\arrowvert \, \varvec{\Omega }): [0,\,1]^{n} \rightarrow [0,\,1]\) is a Gaussian copula, \(\Phi _{n}(\cdot )\) is the cumulative distribution function of the multivariate normal distribution \(N({\textbf{0}}, \, \varvec{\Omega })\) [8], and \(\Phi ^{-1}\) is the cumulative distribution function of the standard normal random variable. TGMRFs avoid spatial confounding since the covariates are included in the parameters of the marginal distributions \(F_{i}\), and as a second step, the spatial dependence is introduced with the Gaussian copula.

In Poisson models, the most common choice for the marginal distribution of each \(r_{i}\) is the Gamma distribution. If the covariates are included in the scale parameter, the marginal distribution \(F_{i}\) is of the form

$$\begin{aligned} \Gamma (1/ \upsilon , \upsilon \exp ({\varvec{X}}_{i, \cdot }\, {{{\varvec{\beta }}}})) \end{aligned}$$

where \(\upsilon >0\) and \({\varvec{X}}_{i, \cdot }\) is the ith row of the covariate matrix \({\varvec{X}}\). When the covariates are included in the shape parameter, the marginal distribution \(F_{i}\) takes the form

$$\begin{aligned} \Gamma (\exp ({\varvec{X}}_{i, \cdot }\, {{{\varvec{\beta }}}}) / \upsilon , \upsilon ). \end{aligned}$$

The TGMRF model is fitted within a full Bayesian framework using Markov chain Monte Carlo (MCMC) algorithms to draw samples from the posterior distribution of the parameters of interest. The authors of the method have created an R package called TMGMRF which implements the TGMRF method using NIMBLE [7], and it is available at https://github.com/DouglasMesquita/TGMRF. The rest of the models are fitted using INLA [27]. Note that INLA provides posterior distributions of the quantities of interest, but it does not rely on MCMC algorithms, thus reducing computing time.

3 Real data analyses

In this section, three real data sets are used for illustration purposes: dowry deaths data in Uttar Pradesh in 2001 (see [32]), stomach cancer incidence data in Slovenia over the period 1995-2001 [35], and lip cancer incidence data in Scotland during 1975-1980 [4].

All the methods introduced in Sect. 2 are fitted to each dataset to estimate the relationship between the relative risks and the covariate of interest. Namely, the null model, the spatial model, the RSR, the spatial+ model and the TGMRF model. A CAR prior for the spatial random effects has been considered in all models. Additionally, the spatial dependence has been modelled using P-splines in the spatial+ method. Regarding the spatial+ technique, two main different approaches have been considered in the covariate model to remove the spatial dependence. In the first one we fit a linear model where the covariate of interest is the response and the k eigenvectors corresponding to the k lowest non-null eigenvalues of the precision matrix \({\varvec{Q}}_{\xi }\) are the regressors. In the second one, we model the spatial dependence in the covariate using P-splines or thin plate splines. The number of eigenvectors depends on the dimension of the matrix \({\varvec{Q}}_{\xi }\), i.e, the size of the map. Here a minimum of 5 eigenvectors have been chosen for all data sets whereas the maximum number ranges between 15 and 40. The spatial dependence in the second step of the spatial+ approach has been modelled using an ICAR prior or P-splines. Finally, we fit TGMRF models with gamma marginal distributions including the covariates in both, the scale (TGMRF1) and the shape parameter (TGMRF2). Table 1 displays the notation of the different proposals for the spatial+ approach depending on how we deal with the spatial dependence in the covariate model and in the spatial+ final model.

Table 1 Different proposals for the spatial+ approach depending on how we deal with the spatial dependence in the covariate model and in the spatial+ final model

We fit all the models with R, version 4.0.4. For the TGMRF models, we ran three MCMC chains for each model with 10000 iterations each discarding the first 2000 as a burn-in period. One out of every 20 iterations was saved leading to a total of 1200 iterations. For these models we use the TGMRF package. The rest of the models were fitted using the R-INLA package [18] version 21.02.23 (dated 2021-04-08) with the full laplace strategy. As recommended by Gelman [10], a vague uniform prior on the standard deviation \(\sigma _{\xi }\) was considered in the spatial, the RSR, and the spatial+ model with ICAR spatial random effects. A vague normal prior with mean 0 and precision 0.001 is considered for the regression coefficients.

Regarding the dimension of the spline bases in the spatial+ method, the dimension of the thin plate spline basis is 17 for the Uttar Pradesh and the Scotland data. For the Slovenia data, we use dimension 30 as we have more areas. For the P-splines, a total of 11 internal knots were chosen for the marginal bases (longitude and latitude) leading to bases of dimension 13 for the Uttar Pradesh and Scotland data. For the Slovenia data, 28 internal knots are considered giving rise to bases of dimension 30. Finally, cubic polynomials were chosen for the marginal B-spline bases and a RW2 prior distribution on the unknown coefficients was used. The mgcv package (version 1.8-40) was used to fit the covariate model with thin plate splines in the spatial+ approach [34]. Finally, to compare the models in terms of goodness of fit and complexity, we compute the Watanabe-Akaike Information Criterion, WAIC, [33].

3.1 Dowry death data in Uttar Pradesh

Very succinctly, dowry is the amount of money, properties or goods that the bride’s family gives to the groom’s relatives before or after the marriage. The dowry was first designed to protect women from unfair traditions, but it has evolved to an extortion practice and female exploitation. In brief, the groom or the groom’s relatives use physical and psychological violence against the woman as a means to achieve a greater dowry. This violence can be extended over time ending up in the death of the woman. This is known as a dowry death. Although any form of dowry is prohibited in India, it is still a widespread practice in that country. For more precise details about dowry and dowry death, the reader is referred to Vicente et al. [32].

Fig. 1
figure 1

Standardized sex ratio covariate in Uttar Pradesh in 2001

In this section, we analyze the number of dowry deaths in 70 districts of Uttar Pradesh in the year 2001. Uttar Pradesh is the Indian state with the highest population and the highest rate of dowry deaths. The goal is to assess if there is a linear association between the covariate sex ratio, defined as the number of females per 1000 males, and the risk of dowry deaths. Figure 1 shows that the standardized sex ratio has a clear spatial pattern, and hence a collinearity problem with the spatial random effects may exist. Additionally, given the complexity of the dowry death problem, it is very plausible that other unobserved covariates (potential risk factors) may be associated with the dowry deaths and hence confounding bias may appear. Table 2 provides the posterior means of sex ratio, their posterior standard errors, and \(95\%\) credible intervals obtained with the different models. The last column of the table shows the WAIC. The differences in the estimates are clear. According to the credible intervals, only two models, the null and the RSR, point towards a significant negative linear association between sex ratio and dowry death relative risk. Spatial and TGMRF models also indicate a negative association, but the 95% credible intervals contain 0. The rest of models (spatial+ models) provide posterior mean estimates of sex ratio around zero, indicating that the variable is not significant. Regarding standard errors, the TGMRF models lead to higher posterior standard deviations than the spatial models. The spatial+ approach provides posterior standard deviations somewhere in between the null and RSR, and the spatial models. According to WAIC, all the spatial models but SpatPlusP2 and SpatPlusTP2 lead to similar fits. Clearly, the null model provides the less satisfactory fit.

Table 2 Dowry death analysis in Uttar Pradesh: posterior means of the sex ratio coefficient, posterior standard deviations and 95% credible intervals obtained with different models

3.2 Stomach cancer incidence data in Slovenia

This data set was first analysed by Zadnik and Reich [35]. The objective is to assess the association between a socioeconomic indicator and the stomach cancer incidence in different regions of Slovenia during the period 1995-2001. Reich et al. [25] and Hodges and Reich [15] display the maps of the standardized incidence ratios and the socioeconomic indicator and they observe a negative association. Table 3 shows the posterior mean estimates of the socioeconomic indicator, their posterior standard deviations, and \(95\%\) credible intervals as well as the WAIC obtained with the different models. The null model, RSR and the TGMRF methods estimate a negative regression coefficient for socioeconomic status and the \(95\%\) credible interval does not include 0. Otherwise, spatial and spatial+ models estimate regression coefficients very close to 0 and not significant. Similar to the dowry death data, the TGMRF models provides standard errors similar to the spatial model. The null and RSR models lead to the lowest posterior standard deviation, and the spatial+ methods gives posterior standard deviation somewhere in between. Again, all the spatial models but SpatPlusP2 and SpatPlusTP2 lead to similar fits.

Table 3 Stomach cancer incidence analysis in Slovenia: posterior means of the socioeconomic coefficient, posterior standard deviations and 95% credible intervals obtained with different models

3.3 Lip cancer incidence data in Scotland

Finally, lip cancer incidence data in Scotland during 1975-1980 is analyzed. A covariate indicating the proportion of the population engaged in agriculture, fishing, or forestry, hereafter named AFF, is included in the models (see for example [4]). Table 4 provides the posterior estimates of the regression coefficient of AFF with their posterior standard deviations, \(95\%\) credible intervals and WAIC values. The methods estimate a positive regression coefficient for AFF. However, all the spatial+ models, except the one with 5 eigenvectors as regressors and the ones that model the spatial dependence in the spatial+ final model with splines, provide 95% credible intervals that include 0, hence discarding an association between AFF and lip cancer incidence relative risks.

Table 4 Lip cancer incidence analysis in Scotland: regression coefficient estimates of AFF with their standard deviations and 95% credible intervals

In summary, depending on the model used to analyse the data, different estimates of the fixed effects and standard errors are obtained. We note that standard errors seem to be too high in several models. In terms of goodness of fit, the null model presents larger WAIC values than the rest of the methods, so it is not an adequate model for smoothing the risks. Differences among the rest of the models are minor indicating that the procedures lead to a similar smoothing. SpatPlusP2 and SpatPlusTP2 models provide larger values of WAIC than the others. This might probably happens because they oversmooth the risks. Due to the observed discrepancies in the estimates, a simulation study is performed to evaluate which model recovers best the true value of the fixed effects in several scenarios of spatial confounding. Additionally, we also evaluate which model provides appropriate estimates of the standard error.

4 Simulation study

In this section, we conduct a complete simulation study to evaluate how the different models estimate the fixed effects in the presence of spatial confounding. For the simulation, we use the geographical setup of Uttar Pradesh consisting of 70 connected districts and the standardized observed covariate sex ratio, denoted as \({{\varvec{X}}}_1\). To simulate the log risks, we use \({{\varvec{X}}}_1\) and an additional covariate \({{\varvec{X}}}_2\) which is generated to have high, intermediate and low correlation with \({{\varvec{X}}}_1\). The \({{\varvec{X}}}_2\) variable will play the role of an unobserved covariate.

We consider two different scenarios named Simulation study 1 and Simulation study 2.

Simulation study 1: The goal of this simulation study is to assess how well the different models estimate the fixed effect \({{\varvec{X}}}_1\) when there is spatial confounding. To do this, the data generating model includes both covariates \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\), and additional spatial variability is added in some scenarios. Then we fit the models without the covariate \({{\varvec{X}}}_2\). Note that \({{\varvec{X}}}_2\) is treated as an unobserved covariate in the fitted models that may produce spatial confounding. In more detail, we first generate the logarithm of relative risks and then we simulate the counts using the Poisson distribution, that is

$$\begin{aligned}{} & {} \log \, {\varvec{r}}= {\varvec{X}}{{{\varvec{\beta }}}}+ {\varvec{S}} \end{aligned}$$
(8)
$$\begin{aligned}{} & {} {\varvec{Y}}^{k}\arrowvert {\varvec{r}} \sim Poisson(\varvec{\mu }=\varvec{er}), \end{aligned}$$
(9)

where \(k=1,\ldots , K\), \({\varvec{X}}=({{\varvec{X}}}_1, {{\varvec{X}}}_2)\), \({\varvec{e}}\) is the vector of expected cases taken from the real case study (dowry deaths data), and \({{{\varvec{\beta }}}}=(\beta _1, \beta _2)^{'}\). Here, \({{{\varvec{\beta }}}}=(0.2, 0.3)^{'}\). Note that the generating model includes both covariates \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\) to simulated the log risks. Finally, \({\varvec{S}}\) is a term to introduce additional spatial variability. Three different scenarios are considered depending on how we generate the term \({\varvec{S}}\).

  • Scenario 1: Here we do not include additional spatial variability. That is \({\varvec{S}}={\textbf{0}}\).

  • Scenario 2: The spatial variability is generated using an ICAR model, that is \({\varvec{S}}={{{{\varvec{\xi }}}}}\) with \(p({{{\varvec{\xi }}}}) \propto \exp (-\frac{1}{2\sigma _{\xi }^2}{{{\varvec{\xi }}}}^{'}{\varvec{Q}}_{\xi } {{{\varvec{\xi }}}})\) and \(\sigma _{\xi }^2=0.2\).

  • Scenario 3: The spatial variability is a smooth surface built using P-splines. That is \({\varvec{S}}={\varvec{f}}({\varvec{s}}_{1}, {\varvec{s}}_{2})={\varvec{B}}_{s}\varvec{\theta }\) defined as in Ugarte et al. [30], where \({\varvec{s}}_{1}\) and \({\varvec{s}}_{2}\) are vectors containing the longitude and latitude of the centroids of the small areas, \({\varvec{B}}_{s}\) is a two dimensional B-spline basis of dimensions \(n\times k_1 k_2\), and \(\varvec{\theta }=(\theta _1, \theta _2, \dots , \theta _{k_1 k_2})'\) is the vector of coefficients. Here, the number of elements of the marginal B-splines bases for longitude and latitude is set to \(k_1=k_2=13\), leading to 169 elements in the spatial B-spline basis \({{\varvec{B}}}_s\). To generate a smooth surface, the following prior is considered for the coefficients, \(\varvec{\theta } \sim N({\varvec{0}}, {\varvec{P}})\), where \({\varvec{P}}=\lambda _1{\varvec{I}}_{k_1}\otimes {\varvec{D}}_1'{\varvec{D}}_1+\lambda _2 {\varvec{D}}_2'{\varvec{D}}_2\otimes {\varvec{I}}_{k_2}\) is a precision matrix and \({\varvec{D}}_{1}\) and \({\varvec{D}}_{2}\) are difference matrices of order 2. Here, different degree of smoothing is considered for longitude and latitude (see [30]). In particular, the hyperparameters that control the amount of smoothing in longitude and latitude are set at \(\lambda _{s_1}=1.22\) and \(\lambda _{s_2}=8.87\).

For each one of these scenarios, three subscenarios have been generated according to a high, medium or low correlation between the covariates \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\). Namely, subscenario 1 with \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.8\), subscenario 2 with \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.5\), and subscenario 3 with \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.2\). Figure 2 displays the spatial patterns of the covariates, the ICAR and the smooth spatial surfaces. The first row shows the spatial patterns of the covariates when the correlation is 0.8. The second row shows the spatial patterns of the covariates when the correlation is 0.5, and the third row corresponds to correlation 0.2. Note that the ICAR and the smooth spatial pattern are simulated only once and they are the same in the three rows. The correlations between sex ratio and the spatial effects simulated with an ICAR or using P-splines are \(cor({{\varvec{X}}}_1, {{{\varvec{\xi }}}})=0.5865\) and \(cor({{\varvec{X}}}_1, f({\varvec{x}}_1, {\varvec{x}}_2))=0.1998\) respectively. In total we have 9 scenarios, and for each one we generate \(K=100\) data sets. Table 5 summarizes the details of all the scenarios in Simulation Study 1.

Fig. 2
figure 2

From left to right, spatial patterns of the covariate sex ratio (\({{\varvec{X}}}_1\)), the simulated covariate \({{\varvec{X}}}_2\), and spatial effects simulated with an ICAR (Scenario 2) or using P-splines (Scenario 3). In the top row \(cor({{\varvec{X}}}_1,{{\varvec{X}}}_2) = 0.8\), in the middle row \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.5\), and in the bottom row \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.2\)

Table 5 Different scenarios considered in Simulation study 1 depending on the data generating model

Simulation study 2: The goal of this simulation study is to assess Type-S error rates to complement the information in Simulation study 1. In this simulation study the log risks are simulated using \({{\varvec{X}}}_1\) and additional spatial variability. Then, the models are fitted including \({{\varvec{X}}}_2\) to see if any of the models tend to identify this covariate as significant when in fact it is not part of the generating model. The generating process is similar to the one in Simulation study 1, but now \(\beta _2=0\) to remove the covariate \({{\varvec{X}}}_2\).

All the methods introduced in Sect. 2 are fitted to the simulated data. The goal of the study is to assess how well the different methods recover the true value of the fixed effect coefficient and how the posterior standard deviation approximates the true standard error of the estimator. In addition, a method with low Type-S error rates is preferred. Regarding TGMRF approach, both TGMRF1 and TGMRF2 provide pretty similar results, so to conserve space we only report on TGMRF1.

4.1 Simulation study 1: Results

The goal of the simulation study is two-fold. On the one hand we evaluate how well the different methods estimate the fixed effects, something crucial to establish the linear relationship between the response and the covariates. On the other hand, we also investigate if the models recover the true risk surface, something relevant to identify potential risk factors.

Table 6 Posterior means and standard deviations of \(\beta _1\) based on 100 simulated datasets for Simulation study 1, Scenarios 1, 2 and 3 and \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.8\), 0.5 and 0.2

Table 6 provides the average over the 100 simulated data sets of the posterior means and posterior standard deviations of the regression coefficient \(\beta _1\) obtained with the different models in each simulated Scenario. The results are interesting. In Scenario 1, we observe a highly biased fixed effect estimates for the null, the spatial, the RSR and the TGMRF methods when the correlation between the observed (\({{\varvec{X}}_1}\)) and the unobserved (\({{\varvec{X}}_2}\)) covariates is high. In this situation, it appears that the estimated \(\beta _1\) captures the effect of both covariates \({\varvec{X}}_1\) and \({\varvec{X}}_2\). The bias reduces when the correlation between the two covariates decreases. In Scenario 1, the spatial+ method with 15 eigenvectors recovers pretty well the true value of \(\beta _1\) if the correlation is high. With moderate correlation, 5 or 10 eigenvectors give nearly unbiased estimates. When the correlation is low, the null, the spatial, the RSR, and the TGMRF lead to fixed effects estimates with the lowest bias. Additionally, we observe that the spatial model leads to the highest posterior standard deviation of the fixed effects, whereas the null and RSR models provide the lowest posterior standard deviation. The rest of models provide posterior standard deviations somewhere in between. Results for Scenarios 2 and 3 are somewhat different as additional variability is included through an ICAR model and P-splines respectively. In both scenarios, the null, the spatial, the RSR, and the TGMRF models lead to highly biased fixed effects estimates irrespective of the correlation between \({{\varvec{X}}_1}\) and \({{\varvec{X}}_2}\), though the bias reduces when the correlation decreases. In Scenario 2, the spatial+ methods again recover pretty well the \(\beta _1\) coefficient, though now we need to increase the number of eigenvectors in the covariate model. The number of eigenvectors needed is smaller when the correlation between the covariates is low. In this scenario, the TGMRF model produces the highest posterior standard deviations. Similar results are observed in Scenario 3. Here, the highest posterior standard deviations correspond to the spatial model whereas the smallest come from the null and the RSR. In this scenario, the posterior standard errors obtained with the TGMRF models are pretty similar to those of the spatial model.

To inspect visually the different methods, Fig. 3 shows the boxplots of the posterior means of \(\beta _1\) over the 100 simulated data sets for Scenario 1. The first row shows the boxplots when the correlation between \({\varvec{X}}_1\) and \({\varvec{X}}_2\) is 0.8. The second row shows the boxplots when correlation is 0.5 and the third row shows the boxplots for correlation 0.2. Figures A1 and A2 in the Appendix A display the same boxplots for Scenarios 2 and 3 respectively. Interestingly, the bias of the null, RSR, spatial and TGMRF models reduces when the correlation between the covariates decreases. This reduction is particularly remarkable in Scenario 1. Additionally, Table A1 in the Appendix A provides mean absolute relative bias (MARB) and mean root relative mean squared error (MRRMSE) of the fixed effect estimates to complement the information. For the null, the spatial, the RSR and the TGMRF models both the MARB and the MRRMSE reduce when the correlation between the covariates decreases. This is expected because spatial confounding is more severe if the unobserved covariate is correlated with the observed one. For the rest of models there is not a clear pattern. In general, when the correlation between \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\) is small a spatial+ model with a small number of eigenvectors provides the lowest MARB and MRRMSE. If the correlation is high, a spatial+ model with a larger number of eigenvectors is better.

Fig. 3
figure 3

Boxplots of the estimated means of \(\beta _1\) based on 100 simulated datasets for Simulation study 1, Scenario 1 and \(cor({{\varvec{X}}}_1,{{\varvec{X}}}_2) = 0.8\) (top row), 0.5 (middle row), and 0.2 (bottom row)

Table 6 (and Fig. 3, A1 and A2) gives an idea about the magnitude of the bias of the fixed effect estimate as we can compare the average of the posterior means with the true value of \(\beta _1\), but they do not give information about the posterior standard deviation. To see if the posterior standard deviation is a good measure of the variability of the fixed effect estimate, Table 7 compare the true simulated standard error (\(s.e._{sim}\)) with the estimated standard error (\(s.e._{est}\)). They are defined as follows

$$\begin{aligned} s.e._{sim}=\sqrt{\frac{1}{100} \sum _{k=1}^{100}\left( {\hat{\beta }}_1^k-\overline{{\hat{\beta }}}_1\right) ^2}\quad \quad s.e._{est}=\frac{1}{100} \sum _{k=1}^{100} sd({\hat{\beta }}_1^k) \end{aligned}$$

where \({\hat{\beta }}_1^k\) is the posterior mean of \(\beta _1\) in simulation k, \(\overline{{\hat{\beta }}}_1\) is the average of all the posterior estimates, and \(sd({\hat{\beta }}_1^k)\) is the posterior standard deviation of \(\beta _1\) in simulation k. Then, the true simulated standard error is the sample standard deviation of the posterior mean estimates, and the estimated standard error is the average of the posterior standard deviations. If the estimated standard error is higher than the simulated standard error, then we are overestimating the posterior standard deviation of the fixed effects. And the other way around, if the estimated standard error is lower than the simulated standard error we are underestimating the posterior standard deviation of the fixed effects. According to Table 7, the null and the RSR models provides estimated standard errors pretty similar to the simulated ones in all scenarios. On the contrary, the spatial and the TGMRF models lead to estimated standard errors much higher than the simulated ones in all the scenarios. All the spatial+ models tend to overestimate the posterior standard deviation but to a lower extent than the spatial and the TGMRF model. It is worth noting that the spatial+ models SpatPlusP2 and SpatPlusTP2 give pretty similar values of estimated and simulated standard errors.

Table 7 Estimated standard errors (\(s.e._{est}\)) and simulated standard errors (\(s.e._{sim}\)) for \(\beta _1\) based on 100 simulated datasets for Simulation Study 1, Scenarios 1, 2 and 3 and \(cor({\varvec{X}}_1,{\varvec{X}}_2) = 0.8\), 0.5 and 0.2

In addition to the posterior mean and standard deviation, and to have a complete view on the inference about fixed effects, we are interested in credible intervals. Table 8 displays the empirical coverage of credible intervals for \(\beta _1\) at \(95\%\) nominal value. In general, the empirical coverage obtained with the null and the RSR models is very low, in many cases 0. This is expected because of the high bias. Regarding the spatial model, the empirical coverage is also very low. Again this is explained by the high bias. However, in Scenario 1 and 3 when correlation is 0.2 the coverage is 100% and in Scenario 2 and \(cor({{\varvec{X}}}_1,{{\varvec{X}}}_2)=0.2\) the coverage is 92%, close to the nominal value. The performance of the TGMRF is similar to the spatial model. Regarding the spatial+ method using eigenvectors of the precision matrix, we observe in general overcoverage. This can be explained because the method reduces the bias but overestimates the standard error. In some cases we observe a clear under-coverage that is explained because the overestimation of the standard error does not compensate for the bias. In general, the over-coverage is due to large standard errors whereas under-coverage can be attributed to a large bias. To have a complete picture about coverages, Table A2 in the Appendix A provides the length of the 95% credible intervals for the parameter \(\beta _1\) obtained with the different methods. The most remarkable point is that the null and RSR models give substantially shorter credible intervals than the other models. The widest credible intervals are obtained with the spatial and the TGMRF models, and the spatial+ models give credible intervals wider than the null and RSR models but narrower than the spatial and the TGMRF models.

Table 8 Empirical \(95\%\) coverage probabilities of the true value of \(\beta _1\) based on 100 simulated datasets for Scenarios 1, 2, and 3 and \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.8, 0.5\) and 0.2

To complete this simulation study, we would like to have a look at risk smoothing and goodness of fit. Table A3 in the Appendix A displays averages over the 100 simulations of WAIC values. The null model is clearly insufficient for risk smoothing and presents larger WAIC values than the other methods. The spatial+ models SpatPlusP2 and SpatPlusTP2 also provide larger values of WAIC than the other models. Probably they are oversmoothing the risks. Differences among the rest of models are minor indicating that the procedures lead to a similar smoothing. This is corroborated in Table A4 of the Appendix A where MARB and MRRMSE of the relative risks are provided. In general, the null model and the spatial+ models SpatPlusP2 and SpatPlusTP2 give the largest MARB and MRRMSE, indicating a worse fit. The rest of models provide MARBs below 10%.

Finally, as suggested by one reviewer, we have simulated a Scenario 4 where the additional term \({{\varvec{S}}}\) has been generated from a multivariate normal distributions \(N({{\varvec{0}}},\sigma ^2{{\varvec{I}}}_n)\) with \(\sigma ^2=0.2\). That is, the additional variability is not spatially structured. Results are rather similar to those from Scenario 3 and they are not shown to save space. The reason why the results are similar is probably because the correlation between the generated random effects and the covariate \({{\varvec{X}}}_1\) in Scenario 4 (0.1438) is very similar to the correlation between the spatial surface and the covariate \({{\varvec{X}}}_1\) in Scenario 3 (0.1998).

4.2 Simulation study 2: Results

To complete the study, we now pay attention to the Type-S error rate of the different methods considered in this paper.

Table 9 displays the Type-S errors for \(\beta _2\) based on 100 simulated datasets for each scenario. Type-S error rates should be around the nominal value 5%. In Scenario 1, where there is no more variability than that introduced by the covariate \({\varvec{X}}_1\), the Type-S error rate is small (less than 10%) for all the methods. This agrees with the results of Khan and Calder [17]. In Scenarios 2 and 3, where additional variability is introduced in the generating model through an ICAR and P-splines respectively, the Type-S error rates are very high for the null and the RSR model. This is in line with some results in Hanks et al. [14]. Overall, the spatial+ models do not produce high Type-S error rates. The exception is Scenario 2 and high correlation between the covariates where the models SpatPlusP2 and SpatPlusTP2 exhibit rates over 30%. To better understand the Type-S error rates in Table 9, Figures A3, A4, and A5 in the Appendix A display the posterior mean estimates of the parameter \(\beta _2\). The bias of the null and the RSR models in Scenarios 2 and 3 helps to understand the high Type-S error rates in some subscenarios.

Table 9 Type-S errors rate (%) of \(\beta _2\) based on 100 simulated datasets for Scenarios 1, 2 and 3 and \(cor({{\varvec{X}}}_1, {{\varvec{X}}}_2)=0.8, 0.5\) and 0.2

5 Discussion

Spatial confounding is a problem that still remains unsolved or at least partially unsolved. One of the main difficulties is that there is not a unique and general definition. Traditionally, spatial confounding has been considered as a collinearity problem between the fixed and the random effects. Or in other words, the fixed and random effects “compete” for the same variability. Then, when random effects with a spatial correlation structure are included in a linear or generalized linear model, the fixed effects estimates change. The question is if we should expect a change or not.

One of the most popular methods to deal with spatial confounding, restricted spatial regression, was proposed to avoid the change in fixed effects estimates in relation to the model without spatial random effects. Restricted spatial regression simply restricts the random effects to lie in the orthogonal complement of the fixed effects, consequently the fixed effects estimates do not change. The idea underlying restricted regression is to assign all the variability in the direction of the covariates to the covariates themselves. This seems a good idea if we assume that the estimates we obtain in the null model (the one without spatial random effects) are correct. If this is not the case, and a spatial random effect is introduced in a model to deal with the remaining spatial variability that the observed covariates do not account for, some issues arise. The main one is collinearity, because the spatial random effects also compete to explain the same variability as the observed covariates. Restricted spatial regression implicitly assumes that there are no other covariates overlapping with the observed ones, something that might not be very realistic in practice. On the other hand, the standard errors of the fixed effects estimates in the null model is known to be too small and they are inflated when the spatial random effect is included in the model. The restricted regression was supposed to provide standard errors for the fixed effects estimates somewhere in between. However, recent research (see for example [17, 36]) shows that with normal responses, the restricted regression provides standard errors less than or equal to those obtained with the null model. Consequently, inference is liberal and Type-S error rates can be high. However, with Poisson responses, no clear results have been provided yet. In this line, and assuming that spatial random effects play the role of unobserved covariates with spatial structure, recent research [11] suggests that a change in the fixed effects estimates is expected and collinearity between fixed and random effects is not a problem because this collinearity represents the overlap between observed and unobserved covariates. These authors study spatial confounding from a causal inference perspective, where the change in the fixed effect estimates is due to the existence of unmeasured variables spatially structured.

Given the controversy about spatial confounding, in this paper we analyse three data sets to illustrate how different techniques yield to different estimates and posterior standard deviations and hence, produce different conclusions about the fixed effects. Then, we run a simulation study to evaluate how some of the different existing methods designed to alleviate spatial confounding estimate the fixed effects in different scenarios. Namely, a simple Poisson regression model, a Poisson spatial mixed model, restricted spatial regression, TGMRFs and spatial+ models. Spatial confounding is introduced by using generating models with two covariates, \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\), where the first one plays the role of the observed covariate and the second one acts as an unobserved covariate that is not included in the fitting process. Additional spatial variability is added in the generating process using an ICAR spatial random effect or a spatial surface generated using P-splines. More precisely, in Scenario 1 all the variability is introduced with the covariates. In Scenario 2, additional spatial variability is included with an ICAR random effect, and finally, in Scenario 3, we use a spatial surface generated using P-splines to introduced additional spatial variability in the generating process. The results of the simulation study are very informative. Overall, the method that best recovers the true value of the fixed effects is the spatial+ model using eigenvectors of the spatial precision matrix as regressors in the covariate model. The number of eigenvectors depends on the correlation between the two covariates \({{\varvec{X}}}_1\) and \({{\varvec{X}}}_2\), and on the way we generate additional spatial variability (ICAR or P-splines). In general, the higher the correlation between the covariates, the larger the number of eigenvectors. When the correlation is high (0.8), 14-21% of the eigenvectors associated to the lower eigenvalues of the spatial precision matrix are required. If the correlation is medium (0.5), 7-14% of the eigenvectors are needed if the generating model only includes the covariates, whereas if the generating model includes additional variability (ICAR or P-splines), 14-21% of the eigenvectors seem to produce good results. Finally, when the correlation between the covariates is low (0.2), 7-14% eigenvectors are needed in Scenarios 2 and 3. However, the spatial+ model does not provide good results in Scenario 1 where there is no additional spatial variability other than that included in the covariates.

In terms of standard errors, the posterior standard deviation in the null and in the RSR models seems to be a good estimator of the true standard error, whereas the rest of the models tend to overestimate the true standard error, notably the spatial and the TGMRF models. Regarding coverage rates, it seems that the spatial+ method leads to overcoverage, something expected as it also overestimates the standard error. In addition, the Type-S error rates are very low in several scenarios. Therefore, the spatial+ method with a suitable number of eigenvectors seems to recover the true fixed effects quite well but could inflate standard errors. In our opinion, Scenarios 2 and 3 are the most realistic as they include additional spatial variability other than that captured by the covariates and a number of eigenvectors between 14% and 21% of the total could be a good choice in general.

Regarding risk estimation, the null model is clearly insufficient, whereas similar estimates are obtained with the rest of models with the exception of the spatial+ using splines (P-splines of thin plate splines) to smooth the risks. This agrees with the work by Adin et al. [1], where identical risk estimates where observed with the spatial and the restricted spatial regression models. As suggested by one reviewer, we have also generated a Scenario 4 where the additional variability is spatially unstructured. It is worth noting that results in this scenario are rather similar to those of Scenario 3, so they have been omitted to save space. We remark that if researchers are interested in risk prediction, probably the fixed effects estimates are not so important given that all the spatial methods including ICAR random effects lead to essentially identical risk surfaces, i.e., irrespective of the fixed effect estimated value, the risk predictions do not change. However, if researchers are interested in identifying potential risk factors looking at the spatial map of the unexplained variability, it is crucial to provide unbiased estimates of the fixed effects, otherwise the map of the remaining variability would not be correct.

To conclude this paper, we provide some guidelines to practitioners in light of our simulation results. Our advice is to fit the null and the spatial model first. If there is no change in the fixed effects estimates, probably spatial confounding is not an issue. If a substantial change is observed, the spatial+ method with a number of eigenvectors between 14% and 21% of the total could lead to nearly unbiased fixed effects estimates. However, inference could probably be too conservative as the method seems to inflate standard errors. This might be what we observe in the real data analyses of this paper. In any case, caution is always recommended as our results depend on the generating models, and different data generating mechanisms could lead to different conclusions.