1 Introduction

The issue of spatial confounding within linear regression models is particularly relevant in environmental applications, when the aim is to assess the effect of spatially varying environmental variables, as for example pollutants, on other environmental outcomes. In this context, we delve into the issue of confounding and its far-reaching implications, that are useful to overcome such problem in the framework of complex environmental data.

The main focus of this paper is to evaluate the impact of unobserved significant information on the estimation of regression parameters, an issue known in the literature as confounding. It arises when the relationship between the covariate and the response variable is influenced by an unmeasured confounder associated with both (Fig. 1). This can result in severely biased estimates for the regression coefficients of the measured covariates. Such deviation from the true values takes place because the posited statistical model, assuming a random effect correlated with the covariate, tries to model data from a generative mechanism characterized by confounding. It is crucial to understand the consequences of confounding when the main objective is to estimate the relationship between the response and the covariates through regression coefficients.

Confounding can occur in various statistical models and research areas, such as epidemiology, environmental sciences, public health and physics. Spatial models are often employed in these fields, leading to extensive research on spatial confounding. Previous studies have discussed the impact of spatially varying covariates and the introduction of spatial random effects on estimation (Clayton et al. 1993; Bernardinelli et al. 1995; Reich et al. 2006). However, it is now widely recognized that confounding can still exist even with suitable correlation structures for the residuals. The literature on spatial confounding has focused on assessing the strength of spatial association between the covariate, the confounder and their interaction, to understand its impact on the sampling properties of regression coefficient estimators. The parameters influencing spatial autocorrelation of the covariate and confounder are found to be of major relevance. In line with Paciorek (2010), it is commonly asserted that a confounder that is smoother than the covariates leads to lower bias and less confounding. In this paper we discuss this topic, investigating the dependence of confounding on several features of the data generative mechanism. The existing literature can be divided into two strands. The first strand aims to quantify and evaluate the impact that spatial confounding has on regression coefficients (Paciorek 2010; Page et al. 2017; Nobre et al. 2021), while the second one focuses on developing methods that account for spatial confounding to obtain accurate estimates of the parameters of interest (Reich et al. 2006; Hodges and Reich 2010; Hughes and Haran 2013; Hanks et al. 2015; Hefley et al. 2017; Thaden and Kneib 2018; Papadogeorgou et al. 2018; Guan et al. 2023; Dupont et al. 2022; Yang 2021; Reich et al. 2021; Marques et al. 2022; Hui and Bondell 2022). The main objective of this article is to evaluate the sampling properties of parameter estimators in a simple setup with one covariate. A formal study of these sampling properties in the presence of confounding is conducted by defining the Data Generating Process (DGP) and a separate model for parameter estimation. This is intended to mimic the workflow of statistical analysis where data are interpreted as realizations of a random mechanism that the researcher tries to infer toward statistical modeling. In this spirit, confounding is about the bias affecting the regression coefficient estimates when the postulated model misses some relevant features of the DGP. Hypotheses underlying the posited model lead to different estimators of the regression coefficients such as Ordinary Least Squares (OLS, in regression with spherical disturbances), Generalized Least Squares (GLS) and more generally maximum likelihood estimators in mixed linear regression models. These estimators can be cast as the same linear function of the data with appropriate weighting matrices: this allows a unified treatment of the sampling distribution of all these estimators, which is provided in our Sect. 3. The estimator sampling properties, such as bias, variance and mean square error, conditionally on the covariate process, are derived. They are random variables giving rise to ratios of dependent quadratic forms (QFs) in Gaussian random variables (Provost and Mathai 1992). Following Paolella (2018), it is possible to obtain their expected value, providing an analytic expression of the marginal sampling properties of the estimator by means of the Carlson’s function (Carlson 1963; Lauricella 1893). These sampling properties serve as indicators of the effect that confounding has on target parameters.

Confounding has been a focal point in numerous environmental applications where different methods have been proposed for adjusting estimates of the regression coefficients. To name a few, Dominici et al. (2004) assess the pollution-mortality relationship due to unmeasured time-varying factors, Paciorek (2010) examines the association between ambient air pollution and birthweight in eastern Massachusetts, Papadogeorgou et al. (2018) investigate the effects of power plant emission reduction technologies on ambient ozone pollution, Dupont et al. (2022) explore the temperature’s impact on the crown defoliation of trees, and Marques et al. (2022) analyze the monthly precipitation variations in Germany. A shared characteristic among these methods is the reliance on unverifiable assumptions about the data generating process when information regarding confounding variables is absent. Although the issues addressed in this paper may not offer direct guidance for estimating regression coefficients in the presence of unobserved relevant information, we believe that the theoretical insights presented in the following sections, elucidating the connections between parameters of the data generating process and the bias of estimates, can contribute to the development of novel methods. This will be object of future research.

The article is structured as follows: in Sect. 2, we introduce the data generating process and the statistical model adopted for inference. In Sect. 3, we elucidate the fundamental aspects of confounding in terms of quadratic forms, which enable us to furnish the marginal sampling properties of the estimator in a closed-form expression. Section 4 is dedicated to the introduction of measures concerning the marginal variability and smoothness of Gaussian random vectors. To show the relationship between these measures and the structure of the DGP, an application is presented in Sect. 5. The application concerns two models that are commonly applied in geostatistical analysis and areal data modeling.

2 Analytic framework

Informally, confounding occurs when the regression coefficient of a response variable on a covariate is estimated with lack of information on a third covariate, the confounder, which is associated with both. In this section we introduce the data generating process and the posited statistical model for estimation: sampling properties of regression coefficient estimators are derived, conditionally on the covariate distribution. A formal definition of confounding concludes the section.

2.1 The data generating process

To introduce the problem of confounding, a stochastic generative model, i.e. the DGP, is considered. Specification of the DGP starts by the following conditional distribution of the n-dimensional response vector \(\varvec{Y}\) on the covariates \(\varvec{X}\) and \(\varvec{Z}\):

$$\begin{aligned} \varvec{Y}|\varvec{X},\varvec{Z} \sim \mathcal {N}_n \left( \mathcal {B}_{y\cdot 0(xz)}{} {\textbf {1}}_n + \mathcal {B}_{y\cdot x(z)} \varvec{X} + \mathcal {B}_{y\cdot z(x)} \varvec{Z},\;\;\varvec{\Sigma }_{y|x,z} \right) , \end{aligned}$$
(1)

where \({\textbf {1}}_n\) is the n-dimensional unit vector and \(\varvec{\Sigma }_{y|x,z}\) is the covariance matrix expressing the variability of the dependent variable \(\varvec{Y}\) that is not explained by the linear relationship with the regressors \(\varvec{X}\) and \(\varvec{Z}\). Moreover, \(\mathcal {B}_{y\cdot 0(xz)}\) denotes the intercept term, while \(\mathcal {B}_{y\cdot x(z)}\) and \(\mathcal {B}_{y\cdot z(x)}\) are the partial regression coefficients that determine the strength and direction of the corresponding covariate’s influence. The subscripts of the partial regression coefficients aim at pointing out that they quantify the relationship between the response (before the dot) and the covariate which is referred to (after the dot), in the presence of the other variable (within brackets) in the conditional mean.

This paper aims at discussing how the features of the joint distribution of \(\varvec{X}\) and \(\varvec{Z}\) affect the sampling distribution of the estimators of \(\mathcal {B}_{y\cdot x(z)}\). Without loss of generality, we consider zero mean processes for both the covariate \(\varvec{X}\) and the confounder \(\varvec{Z}\), so that \(\left( \varvec{X}^{\top },\varvec{Z}^{\top }\right) ^{\top }\sim \mathcal {N}_{2n} \left( \varvec{0}, \varvec{\Sigma }_{x,z}\right)\) where

$$\begin{aligned} \varvec{\Sigma }_{x,z}= \begin{bmatrix} \varvec{\Sigma }_x &{} \varvec{\Sigma }_{xz}\\ \varvec{\Sigma }_{zx}&{} \varvec{\Sigma }_z \end{bmatrix} =\begin{bmatrix} \sigma ^2_x{\textbf {R}}_x &{} \sigma _{xz}{} {\textbf {R}}_{xz}\\ \sigma _{xz}{} {\textbf {R}}_{zx}&{} \sigma ^2_z{\textbf {R}}_z \end{bmatrix} \end{aligned}$$
(2)

is the 2n-dimensional joint covariance matrix of \(\varvec{X}\) and \(\varvec{Z}\), \(\varvec{\Sigma }_{x}\) is the marginal covariance matrix of \(\varvec{X}\) and \(\varvec{\Sigma }_{xz}=\varvec{\Sigma }_{zx}^{\top }\) is the cross-covariance matrix. Expression of each block of \(\varvec{\Sigma }_{x,z}\) as the product of a scalar times a structure matrix \({\textbf {R}}\) is useful for the following developments. As observed by Paciorek (2010) and Page et al. (2017), the treatment of \(\varvec{X}\) and \(\varvec{Z}\) as random processes allows for the derivation of analytic results that can give insights on confounding: in this paper we extend such analytic results leveraging on the theory of quadratic forms in Gaussian variables.

To understand the consequences of lack of information concerning the unobserved variable \(\varvec{Z}\), it is customary to consider the conditional distribution \(\varvec{Y}|\varvec{X}\) marginalized over \(\varvec{Z}\),

$$\begin{aligned} \varvec{Y}|\varvec{X} \sim \mathcal {N}_n(\mathcal {B}_{y\cdot 0(x)} {\textbf {1}}_n + {\textbf {A}}_{y\cdot x}\varvec{X},\;\; \mathcal {B}_{y\cdot z(x)}^2\varvec{\Sigma }_{z|x} + \varvec{\Sigma }_{y|x,z}), \end{aligned}$$

where

$$\begin{aligned} {\textbf {A}}_{y\cdot x} = \varvec{\Sigma }_{yx} \varvec{\Sigma }_x^{-1}= \mathcal {B}_{y \cdot x(z)} {\textbf {I}}_n + \mathcal {B}_{y \cdot z(x)}{} {\textbf {A}}_{z\cdot x} \end{aligned}$$
(3)

is the regression matrix of \(\varvec{Y}\) on \(\varvec{X}\) which depends on \(\varvec{Z}\) through the partial regression coefficient of \(\varvec{Y}\) on \(\varvec{Z}\), \(\mathcal {B}_{y \cdot z(x)}\), and through the regression matrix of \(\varvec{Z}\) on \(\varvec{X}\), \({\textbf {A}}_{z\cdot x}\). A well-understood case in terms of confounding is the spherical DGP, obtained by considering diagonal structure matrices \({\textbf {R}}={\textbf {I}}_n\) in Equation (2), i.e.:

$$\begin{aligned} \varvec{\Sigma }_{x,z}= \begin{bmatrix} \sigma ^2_x{\textbf {I}}_n &{} \sigma _{xz}{} {\textbf {I}}_{n}\\ \sigma _{xz}{} {\textbf {I}}_{n}&{} \sigma ^2_z{\textbf {I}}_n \end{bmatrix}. \end{aligned}$$
(4)

Under this DGP, the regression matrix \({\textbf {A}}_{z\cdot x}\) corresponds to the scalar matrix \(\mathcal {B}_{z\cdot x}{} {\textbf {I}}_n\). This is due to the fact that cross-correlations \(\text {cor}(X_i,Z_j)=0\;\;\forall i\ne j\). As a consequence, the regression matrix in Equation (3) reduces to the scalar matrix:

$$\begin{aligned} {\textbf {A}}_{y\cdot x} =\left( \mathcal {B}_{y \cdot x(z)} +\mathcal {B}_{y \cdot z(x)}\mathcal {B}_{z\cdot x}\right) {\textbf {I}}_n. \end{aligned}$$

The conditional distribution defined in Equation (1), combined with the joint distribution (2), delivers the following joint distribution of \(\varvec{Y}\), \(\varvec{X}\) and \(\varvec{Z}\):

$$\begin{aligned} \begin{pmatrix} \varvec{Y}\\ \varvec{X}\\ \varvec{Z} \end{pmatrix}\sim \mathcal {N}_{3n} \left( \begin{pmatrix} \mathcal {B}_{y\cdot 0(xz)}{} {\textbf {I}}_n \\ \varvec{0} \\ \varvec{0} \end{pmatrix}, \begin{bmatrix} \varvec{\Sigma }_{y}&{} \varvec{\Sigma }_{yx} &{} \varvec{\Sigma }_{yz}\\ \varvec{\Sigma }_{xy} &{} \varvec{\Sigma }_{x}&{} \varvec{\Sigma }_{xz}\\ \varvec{\Sigma }_{zy} &{} \varvec{\Sigma }_{zx} &{} \varvec{\Sigma }_{z} \end{bmatrix}\right) \end{aligned}$$

where the joint covariance matrix \(\varvec{\Sigma }_{y,x,z}\) can be expressed as a function of the regression matrices \({\textbf {A}}_{y\cdot z}\) and partial regression coefficients as

$$\begin{aligned} \begin{bmatrix} \varvec{\Sigma }_{y|x,z}+\begin{bmatrix} \mathcal {B}_{y\cdot x(z)}{} {\textbf {I}}_n \,:\, \mathcal {B}_{y\cdot z(x)}{} {\textbf {I}}_n \end{bmatrix} \varvec{\Sigma }_{x,z} \begin{bmatrix} \mathcal {B}_{y\cdot x(z)}{} {\textbf {I}}_n \,:\, \mathcal {B}_{y\cdot z(x)}{} {\textbf {I}}_n \end{bmatrix}^{\top } \;\;\; &{} {\textbf {A}}_{y\cdot x}\varvec{\Sigma }_{x}\;\;\; &{} {\textbf {A}}_{y\cdot z}\varvec{\Sigma }_{z}\;\;\;\\ {\textbf {A}}_{x\cdot y}\varvec{\Sigma }_{y}\;\;\; &{} \varvec{\Sigma }_{x}\;\;\; &{} {\textbf {A}}_{x\cdot z}\varvec{\Sigma }_{z}\;\;\;\\ {\textbf {A}}_{z\cdot y}\varvec{\Sigma }_{y}\;\;\; &{} {\textbf {A}}_{z\cdot x}\varvec{\Sigma }_{x}\;\;\; &{} \varvec{\Sigma }_{z} \end{bmatrix}. \end{aligned}$$

The sufficient conditions that ensure its positive definiteness are \(\varvec{\Sigma }_{x,z} \succ 0\) and \(\varvec{\Sigma }_{y|x,z}\succ 0\). Hence, regression matrices \({\textbf {A}}_{y\cdot x}\) and \({\textbf {A}}_{y\cdot z}\) do not impact positive definiteness of \(\varvec{\Sigma }_{y,x,z}\).

Fig. 1
figure 1

Inter-dependencies among variables that characterize the data generating process (1)-(2)

Developments proposed in this paper are aimed at illustrating how the degree of confounding changes with varying levels and structure of interdependence between the observed and unobserved variables.

2.2 The statistical model

After defining the DGP, this subsection introduces the posited statistical model for parameter estimation when only the realizations of \(\varvec{Y}\) and \(\varvec{X}\) are observed.

Given a phenomenon of interest, different model specifications can be proposed that reflect the researcher beliefs and assumptions. Starting from the following model:

$$\begin{aligned} \varvec{Y}=\beta _0{\textbf {1}}_n+\beta _x\varvec{X}+ \varvec{\varepsilon } \qquad \varvec{\varepsilon } \sim \mathcal {N}_n\left( \varvec{0},{\textbf {S}}\right) , \end{aligned}$$

one obtains the generalized least squares estimator

$$\begin{aligned} \hat{\varvec{\beta }}=\left( \hat{\beta }_0,\hat{\beta }_x\right) ^{\top } ={\textbf {J}}\varvec{Y}, \end{aligned}$$
(5)

where \({\textbf {J}}= \left( \tilde{{\textbf {X}}}^{\top }{} {\textbf {S}}^{-1} \tilde{{\textbf {X}}}\right) ^{-1}\tilde{{\textbf {X}}}^{\top }{} {\textbf {S}}^{-1}\) and \(\tilde{{\textbf {X}}}=\begin{bmatrix} {\textbf {1}}_n \,:\, {\textbf {X}} \end{bmatrix}\) is the design matrix. When \({\textbf {S}}={\textbf {I}}_n\) estimator (5) corresponds to the OLS estimator. More complex models, i.e. linear mixed effect models, lead to different estimators that, conditionally on other model parameters, share the same functional form of estimator (5). We do not considered such estimators in this paper for the sake of simplicity. In line with the works of Paciorek (2010) and Page et al. (2017), we study the sampling properties of \(\hat{\beta }_x\) as an estimator of \(\mathcal {B}_{y\cdot x(z)}\) to investigate confounding.

2.3 Conditional sampling properties of \(\hat{\beta }_x\)

We start by presenting the sampling distribution of estimator (5) conditionally on \(\varvec{X}\). In Sect. 3 the marginal sampling properties with respect to \(\varvec{X}\) will be obtained. As a first step, the following proposition introduces a well-known result adopted to study confounding in a spatial framework (see Paciorek (2010), Page et al. (2017), Nobre et al. (2021), Marques et al. (2022)).

Proposition 1

The estimator \(\hat{\varvec{\beta }}\) in (5) conditional on \(\varvec{X}\) has the following sampling distribution:

$$\begin{aligned} \hat{\varvec{\beta }}|\varvec{X}\sim \mathcal {N}_2\left( {\textbf {J}} (\mathcal {B}_{y\cdot 0(xz)}\varvec{1}_n + {\textbf {A}}_{y\cdot x} \varvec{X}), {\textbf {J}} \varvec{\Sigma }_{y|x} {\textbf {J}}^{\top }\right) . \end{aligned}$$

Proof

See the Appendix. \(\square\)

With reference to the estimator \(\hat{\beta }_x\), it follows from Proposition 1 that:

$$\begin{aligned} \text {Bias}_{\,Y}\left[ \hat{\beta }_x|\varvec{X}\right] =\mathcal {B}_{y\cdot z(x)}{} {\textbf {J}}_{2\bullet } \,\varvec{\Sigma }_{zx}\varvec{\Sigma }_x^{-1}\varvec{X} \end{aligned}$$

and

$$\begin{aligned} \mathbb {V}_{\,Y}\left[ \hat{\beta }_x|\varvec{X}\right] ={\textbf {J}}_{2\bullet }\varvec{\Sigma }_{y|x}{} {\textbf {J}}_{2\bullet }^{\top } \end{aligned}$$

where \({\textbf {J}}_{2\bullet }\) indicates the second row of J. We emphasize that the subscript Y aims at stressing that bias and variance are obtained by integrating over \(\varvec{Y}\), conditionally on \(\varvec{X}\).

The main novelties introduced in this paper are based on the expression of conditional bias and variance as ratios of quadratic forms in Gaussian random variables. This is formalized in the following proposition.

Proposition 2

Considering the DGP in (1)-(2), bias and variance of \(\hat{\beta }_x\) can be expressed in terms of ratios of quadratic forms as:

$$\begin{aligned}&\textrm{Bias}_{\,Y}\left[ \hat{\beta }_x|\varvec{X}\right] =\mathcal {B}_{y\cdot z(x)}\dfrac{\varvec{X}^{\top } \varvec{\Delta }{} {\textbf {A}}_{z\cdot x}\varvec{X}}{\varvec{X}^{\top } \varvec{\Delta }\varvec{X}}, \end{aligned}$$
(6)
$$\begin{aligned}&{\mathbb {V}}_{\,Y} \left[ \hat{\beta }_x|\varvec{X}\right] =\dfrac{\varvec{X}^{\top }\varvec{\Delta }\varvec{\Sigma }_{y|x} \varvec{\Delta }\varvec{X}}{(\varvec{X}^{\top }\varvec{\Delta }\varvec{X})^2}, \end{aligned}$$
(7)

where \(\varvec{\Delta }={\textbf {S}}^{-1}-\dfrac{{\textbf {S}}^{-1} {\textbf {1}}_n{\textbf {1}}_n^{\top }{} {\textbf {S}}^{-1}}{{\textbf {1}}_n^{\top } {\textbf {S}}^{-1}{} {\textbf {1}}_n}\) is the weighted centering matrix and S is a covariance matrix depending upon the posited model.

Proof

See the Appendix. \(\square\)

When \({\textbf {S}}={\textbf {I}}_n\), \(\varvec{\Delta }\) reduces to the centering matrix \({\textbf {M}}={\textbf {I}}_n-\varvec{1}_n \varvec{1}_n^{\top }/n\). Equation (6) highlights that when both \(\mathcal {B}_{y\cdot z(x)}\ne 0\) and \({\textbf {A}}_{x\cdot z}\ne {\textbf {0}}\), the estimator \(\hat{\beta }_x\) is biased. Moreover, it is worth noting that the conditional variance of \(\hat{\beta }_x\) depends on the structure of all processes included in the DGP. Indeed, given that

$$\begin{aligned} \varvec{\Sigma }_{y|x}=\varvec{\Sigma }_{y|x,z}+\mathcal {B}_{y\cdot z(x)}^2 \varvec{\Sigma }_{z|x}=\varvec{\Sigma }_{y|x,z}+\mathcal {B}_{y\cdot z(x)}^2\left( \varvec{\Sigma }_{z}-\varvec{A}_{z\cdot x} \varvec{\Sigma }_{x}\varvec{A}_{z\cdot x}^{\top }\right) , \end{aligned}$$

Equation (7) can be re-written as

$$\begin{aligned} {\mathbb {V}}_Y\left[ \hat{\beta }_x|\varvec{X}\right] = \dfrac{\varvec{X}^{\top }\varvec{\Delta }\varvec{\Sigma }_{y|x,z} \varvec{\Delta }\varvec{X}}{(\varvec{X}^{\top }\varvec{\Delta }\varvec{X})^2} +\mathcal {B}_{y\cdot z(x)}^2\left( \dfrac{\varvec{X}^{\top } \varvec{\Delta }\varvec{\Sigma }_{z}\varvec{\Delta }\varvec{X}}{(\varvec{X}^{\top } \varvec{\Delta }\varvec{X})^2} - \dfrac{\varvec{X}^{\top }\varvec{\Delta } \left( \varvec{A}_{z\cdot x}\varvec{\Sigma }_{x}\varvec{A}_{z\cdot x}^{\top }\right) \varvec{\Delta }\varvec{X}}{(\varvec{X}^{\top }\varvec{\Delta }\varvec{X})^2}\right) . \end{aligned}$$

The first term expresses the conditional variance when \(\varvec{Y}\) does not depend on \(\varvec{Z}\). When \(\varvec{Y}\) depends on \(\varvec{Z}\), i.e. \(\mathcal {B}_{y\cdot z(x)}\ne 0\), the first term between brackets inflates the variance with a magnitude depending on \(\varvec{\Sigma }_{z}\) while the second term, which is non-null when \({\textbf {A}}_{z\cdot x}\ne \varvec{0}\), deflates the conditional variance. In fact, correlation between \(\varvec{X}\) and \(\varvec{Z}\) produces a reduction of \({\mathbb {V}}_Y\left[ \hat{\beta }_x|\varvec{X}\right]\), as stated in the following theorem.

Theorem 1

The conditional variance of the estimator \(\hat{\beta }_x\) assumes the maximum value when \(\varvec{X}\) and \(\varvec{Z}\) are independent:

$$\begin{aligned} {\mathbb {V}}_Y\left[ \hat{\beta }_x|\varvec{X}\right] \ge {\mathbb {V}}_Y \left[ \hat{\beta }_x|\varvec{X},\varvec{A}_{z\cdot x}=\varvec{0} \right] . \end{aligned}$$

Proof

See the Appendix. \(\square\)

Summarizing, as the dependences expressed via \({\textbf {A}}_{x\cdot z}\) and \(\mathcal {B}_{y\cdot z(x)}\) increase, the conditional variance decreases and the conditional bias increases. Interpretation of bias and variance of \(\hat{\beta }_x\) as a ratio of quadratic forms allows for a formal treatment of the relationship between the magnitude of confounding and some features of the DGP. In particular, Sect. 3 frames the confounding problem within the context of the theory of quadratic forms, delivering analytic results on the marginal bias that, to the best of our knowledge, have been obtained only by simulation in previous papers.

2.4 Formal definition of confounding

Both Propositions 1 and 2 show that the bias of the estimator \(\hat{\beta }_x\) depends on the correlation structure of the DGP through the relationship that ties the confounder to the response variable and to the observed covariate. The following definition of confounding is coherent with the one proposed by Thaden and Kneib (2018), with adapted notation.

Definition 1

Let \((\varvec{Y}^{\top }, \varvec{X}^{\top }, \varvec{Z}^{\top })^{\top }\) follow the DGP of Equations (1)-(2). Then, the regression of \(\varvec{Y}\) on \(\varvec{X}\) is defined confounded by \(\varvec{Z}\) if both the following conditions are verified:

  1. (i)

    \(\varvec{Y}\) and \(\varvec{Z}\) are conditionally dependent given \(\varvec{X}\) (\(\varvec{Y} \not \perp \varvec{Z}|\varvec{X}\)), i.e. \(\mathcal {B}_{y \cdot z(x)}\ne {\textbf {0}}\)

  2. (ii)

    \(\varvec{X}\) and \(\varvec{Z}\) are dependent (\(\varvec{X} \not \perp \varvec{Z}\)), i.e. \({\textbf {A}}_{z \cdot x} \ne {\textbf {0}}\)

An alternative and equivalent way to define confounding, explicitly related to the joint covariance matrix of the distribution of \(\varvec{Y}, \varvec{X}\) and \(\varvec{Z}\) follows.

Definition 2

Let \((\varvec{Y}^{\top }, \varvec{X}^{\top }, \varvec{Z}^{\top })^{\top }\) follow the DGP of Equations (1)-(2). The regression of \(\varvec{Y}\) on \(\varvec{X}\) is confounded by \(\varvec{Z}\) if both \(\varvec{\Sigma }_{yz}\ne {\textbf {0}}\) and \(\varvec{\Sigma }_{xz} \ne {\textbf {0}}\).

In other words, confounding occurs if the unobserved variable is related with both the response and the covariate. As shown in Fig. 1, the confounder \(\varvec{Z}\) influences the response and covariate simultaneously via \(\mathcal {B}_{y\cdot z(x)}\) and \({\textbf {A}}_{z\cdot x}\), respectively. In the same spirit of Thaden and Kneib (2018), we highlight that the overall effect of the confounder can be expressed through the regression matrix \({\textbf {A}}_{y\cdot z}\). The so-called indirect effect is linked to the regression matrix \({\textbf {A}}_{x\cdot z}\), while the direct one is linked to the partial regression coefficient \(\mathcal {B}_{y\cdot z(x)}\).

Confounding is not related to either the sole DGP or the posited model itself, but rather pertains to the characteristics of the DGP that the model fails to capture. The following section frames the confounding problem within the context of the theory of quadratic forms, offering some marginal results concerning bias and variance of \(\hat{\beta }_{x}\) that are exploited to investigate the links between DGP, posited model and confounding.

3 Main features of confounding in terms of quadratic forms

Given the importance of quadratic forms in this work, a brief introduction to the subject is given below.

3.1 Quadratic forms in Gaussian random variables

Considering the random vector \(\varvec{X}\sim \mathcal {N}_n ({\textbf {0}}, \varvec{\Sigma }_x)\), it is possible to define the quadratic form (QF, see Provost and Mathai 1992, for a comprehensive overview of the topic) associated to a matrix \({\textbf {A}}\in \mathbb {R}^{n\times n}\) as:

$$\begin{aligned} Q_A(\varvec{X})=\varvec{X}^{\top }{} {\textbf {A}}\varvec{X}. \end{aligned}$$

Decomposing the covariance matrix as follows \(\varvec{\Sigma }_x=\varvec{\Sigma }_x^{1/2}\varvec{\Sigma }_x^{1/2}\), we note that \(Q_A(\varvec{X})\) can be expressed as a function of a standard multivariate normal vector \(\varvec{\nu }=\varvec{\Sigma }_x^{-1/2}\varvec{X}\):

$$\begin{aligned} Q_{A}(\varvec{X})=\varvec{\nu }^{\top }\tilde{{\textbf {A}}}\varvec{\nu } =Q_{\tilde{A}}(\varvec{\nu }), \end{aligned}$$

where \(\tilde{{\textbf {A}}}= \varvec{\Sigma }_x^{1/2}{} {\textbf {A}}\varvec{\Sigma }_x^{1/2}\). Many properties of \(Q_A(\varvec{X})\), such as moments and distribution function, are strictly related to the eigenvalues of the matrix \(\tilde{{\textbf {A}}}\). We indicate them with

$$\begin{aligned} \varvec{\lambda }\left( \tilde{{\textbf {A}}}\right) =\left( \lambda \left( \tilde{{\textbf {A}}}\right) _1,\dots , \lambda \left( \tilde{{\textbf {A}}}\right) _n\right) ^{\top }, \end{aligned}$$

and, if \(\tilde{{\textbf {A}}}\) is symmetric, they are such that \(\lambda \left( \tilde{{\textbf {A}}}\right) _1\ge \lambda \left( \tilde{{\textbf {A}}}\right) _2\ge \dots \ge \lambda \left( \tilde{{\textbf {A}}}\right) _n\). Indeed, the expected value is

$$\begin{aligned} \mathbb {E}_{\, X}[Q_A(\varvec{X})] =\sum _{i=1}^n\lambda ({\textbf {A}}\varvec{\Sigma }_x)_i =\sum _{i=1}^n\lambda \left( \tilde{{\textbf {A}}}\right) _i =\textrm{tr}\left( \tilde{{\textbf {A}}}\right) , \end{aligned}$$
(8)

and the moment generating function is

$$\begin{aligned} \phi _{Q_A(\varvec{X})}(t)=\mathbb {E}_{\, X} \left[ e^{tQ_A(\varvec{X})}\right] =|{\textbf {I}}_n-2t{\textbf {A}} \varvec{\Sigma }_x|^{-1/2}=\prod _{i=1}^n\left( 1-2t\lambda \left( \tilde{{\textbf {A}}}\right) _i\right) ^{-1/2}. \end{aligned}$$

Once QFs are defined, our attention moves to ratios of powers of dependent QFs. Let us consider a further positive semi-definite matrix \({\textbf {B}}\in \mathbb {R}^{n \times n}\), then we introduce the following ratio of QFs

$$\begin{aligned} R_{A,B}^{p,q}(\varvec{X}) = \dfrac{(\varvec{X}^{\top }{} {\textbf {A}} \varvec{X})^p}{(\varvec{X}^{\top }{} {\textbf {B}}\varvec{X})^q} =\dfrac{(\varvec{\nu }^{\top }\tilde{{\textbf {A}}}\varvec{\nu })^p}{(\varvec{\nu }^{\top }\tilde{{\textbf {B}}}\varvec{\nu })^q} =R_{\tilde{A},\tilde{B}}^{p,q}(\varvec{\nu }) \end{aligned}$$
(9)

where \(p\ge 0\), \(q\ge 0\) are integers and \(\tilde{{\textbf {B}}}= \varvec{\Sigma }_x^{1/2}{} {\textbf {B}}\varvec{\Sigma }_x^{1/2}\). Computing the expectation of such random variable is of primary interest for the developments in the paper. It represents a well-known problem of numerical probability and is faced in several works, such as Magnus (1986), Roberts (1995) and Bao and Kan (2013). The latter provides an up-to-date review and includes most of the exploited results. Firstly, \(\mathbb {E}_{\,X}\left[ R_{A,B}^{p,q}(\varvec{X})\right]\) exists if and only if \(\text {rank}(\tilde{{\textbf {B}}})>2q\), and it can be numerically evaluated as

$$\begin{aligned} \frac{1}{\Gamma (q)}\int _0^{\infty }t^{q-1} \frac{\partial ^p}{\partial t_1^p}\phi (t_1, t_2) \bigg |_{t_1=0, t_2=-t} \textrm{d}t, \end{aligned}$$
(10)

where \(\phi (t_1, t_2)=|{\textbf {I}}_n-2t_1\tilde{{\textbf {A}}} -2t_2\tilde{{\textbf {B}}}|^{-1/2}\) is the joint moment generating function of \(\varvec{X}^{\top }{} {\textbf {A}}\varvec{X}\) and \(\varvec{X}^{\top }{} {\textbf {B}}\varvec{X}\), and \(|\cdot |\) denotes the determinant.

Since many statistical quantities can be written as ratios of quadratic forms, the computation of their expected value has been very important for statisticians. The most popular method for its numerical evaluation is to make use of the results in Sawa (1978) and Cressie et al. (1981). This method is by far the most widespread one in the literature and Xiao-Li (2005) provides a very good review of the literature on the subject. We are concerned about obtaining computationally efficient expressions of the expectation of the ratio of dependent QFs defined in (9). Relatively straightforward expressions are available for moments of a QF in spherical normal variables. These moments appear as simple integrals which can be evaluated numerically in a straightforward manner.

3.2 Marginal sampling properties

The marginal moments of \(\hat{\beta }_x\) can be retrieved using the law of iterated expectation and the law of the total variance. We find it convenient to represent the estimator marginal sampling properties in terms of a hypergeometric function: Carlson’s function \(R(a;\varvec{b},\varvec{z})\) (see Carlson (1963) for more details). Its extensive use is justified by two distinctive properties, symmetry and homogeneity. The former means that it is invariant under permutations of the subscript \(1,\dots , m\) and the latter implies:

$$\begin{aligned} R(a;b_1,\dots ,b_m ;sz_1,\dots ,sz_m) =s^{-a}R(a;b_1,\dots ,b_m;z_1,\dots ,z_m). \end{aligned}$$

We consider the special case in which

$$\begin{aligned} \int _0^{\infty }t^{a-1}\prod _{i=1}^m(1+sz_it)^{-b_i} \,dt =B(a,a')R(a;\varvec{b}, s\varvec{z}), \end{aligned}$$
(11)

where \(B(a,a')=\Gamma (a)\Gamma (a')/\Gamma (a+a')\) is the beta function expressed via a gamma function, \(\Gamma (\cdot )\), and \(a'\) is defined by \(a+a'=b =\sum _{i=1}^m b_i \in \mathbb {Q}{\setminus } \{ 0\}.\)

Let us consider an n-dimensional vector \(\varvec{\lambda }\). Assuming \(s\varvec{z}= 2\varvec{\lambda }\), \(b_i=\frac{1}{2}\) \(\forall \, i=1, \dots , m\), and posing:

$$\begin{aligned} a = q, \quad a' = \frac{n}{2}+ p -q, \quad m= n + 2p, \end{aligned}$$

the right-hand side of Equation (11) can be re-written as follows:

$$\begin{aligned}&\int _0^{\infty }t^{q-1}\prod _{i=1}^{n+2p}(1+2\lambda _i t)^{-1/2} \,dt \\&\quad =B\left( q,\frac{n}{2}+ p -q \right) R\left( q;\frac{1}{2} {\textbf {1}}_{n+2p}, 2\varvec{\lambda }\right) = I^{p,q}(\varvec{\lambda }), \end{aligned}$$

where \(I^{p,q}(\varvec{\lambda })\) denotes the integral characterized by the powers of the QFs’ ratio, p and q, and the n-dimensional vector of denominator matrix eigenvalues. Carlson (1963) states that the R function reduces to another function of the same type with one less variable if one of its variables \(z_i\) vanishes. This property is helpful for the operative computation. The next statement formalizes the analytical results enabling to compute the exact marginal sampling properties of estimator \(\hat{\beta }_x\) with no use of simulation study.

Theorem 2

The expected value and variance of the estimator \(\hat{\beta }_x\) defined in (5) may be expressed in terms of Carlson’s R functions as follows:

$$\begin{aligned} {\mathbb {E}}_{\,Y, X}\left[ \hat{\beta }_{x}\right]&= \mathcal {B}_{y\cdot x(z)} +\mathcal {B}_{y\cdot z(x)} \sum _{j=1}^n c_{1,jj} I^{1,1}_{h_j}\left( \varvec{\lambda }\right) \\ \mathbb {V}_{\,Y, X}\left[ \hat{\beta }_{x}\right]&= \sum _{j=1}^n c_{2,jj}I^{1,2}_{h_j}\left( \varvec{\lambda }\right) -\textrm{Bias}^2_{\,Y, X}\left[ \hat{\beta }_{x}\right] + \\&\quad + \mathcal {B}_{y\cdot z(x)}^2 \sum _{i=1}^n \sum _{j=1}^n (c_{1, ii}c_{1, jj}+2c^2_{1, ij}) I^{2,2}_{h_{ij}} \left( \varvec{\lambda }\right) , \end{aligned}$$

where

$$\begin{aligned} h_{ij} = {\left\{ \begin{array}{ll} 1 &{} i,j=1, \dots , n-1, \\ 3 &{} i=n,\, j=1,\dots ,n-1 \text { and } j=n,\, i=1,\dots ,n-1, \\ 5 &{} i=j=n. \end{array}\right. } \end{aligned}$$

The eigenvalues \((\lambda _1, \dots , \lambda _n)\), such that \(\varvec{\Lambda }=\textrm{diag}(\varvec{\lambda })\), are derived from the spectral decomposition \(\varvec{\Sigma }_x^{1/2}\varvec{\Delta } \varvec{\Sigma }_x^{1/2} ={\textbf {P}}\varvec{\Lambda }{} {\textbf {P}}^{\top }\) and \(c_{1,ij}\), \(c_{2,ij}\) are the (ij)-th entries of the matrices

$$\begin{aligned} {\textbf {C}}_1={\textbf {P}}^{\top }\varvec{\Sigma }_x^{1/2} \varvec{\Delta \Sigma }_{zx}\varvec{\Sigma }_x^{-1/2}{} {\textbf {P}} \end{aligned}$$

and

$$\begin{aligned} {\textbf {C}}_2={\textbf {P}}^{\top }\varvec{\Sigma }_x^{1/2} \varvec{\Delta \Sigma }_{y|x}\varvec{\Delta }\varvec{\Sigma }_x^{1/2}{} {\textbf {P}}, \end{aligned}$$

respectively.

Proof

See the Appendix. \(\square\)

The evaluation of the integrals involved in the previous theorem is carried out by computing Carlson’s R functions: such computations exploit algorithms implemented in the R-package QF (Gardini et al. 2022).

To highlight the effect of confounding on the marginal variance, we rewrite it as:

$$\begin{aligned} \mathbb {V}_{\,Y, X}\left[ \hat{\beta }_{x}\right] =\mathbb {V}_{\,Y, X} \left[ \hat{\beta }_{x}|\varvec{A}_{z\cdot x}=\,{\textbf {0}} \right] +\mathbb {V}^{cd}_{\,Y, X}\left[ \hat{\beta }_{x}\right], \end{aligned}$$

where \(\mathbb {V}_{\,Y, X}\left[ \hat{\beta }_{x}|\varvec{A}_{z\cdot x}=\,{\textbf {0}} \right]\) denotes the marginal variance in absence of confounding. It can be shown (Narcisi 2023) that the confounding-dependent (cd) part of the marginal variance, \(\mathbb {V}^{cd}_{\,Y, X}\left[ \hat{\beta }_{x}\right]\), corresponds to

$$\begin{aligned} \mathbb {E}_{\,X}\left[ \textrm{Bias}^2_{\,Y}\left[ \hat{\beta }_{x}| \varvec{X}\right] \right] - \textrm{Bias}^2_{\,Y, X} \left[ \hat{\beta }_{x}\right] - \mathcal {B}_{y\cdot z(x)}^2 \mathbb {E}_{\,X}\left[ \dfrac{\varvec{X}^{\top }\varvec{\Delta } \varvec{A}_{z\cdot x}\varvec{\Sigma }_{x}\varvec{A}_{z\cdot x}^{\top } \varvec{\Delta }\varvec{X}}{(\varvec{X}^{\top }\varvec{\Delta }\varvec{X})^2}\right] . \end{aligned}$$

Note that this component can take both negative and positive values, i.e. confounding can increase or decrease the marginal variance of the estimator with respect to the case \(\varvec{X} \bot \varvec{Z}\). Following the results above, the marginal Mean Square Error (MSE) can be expressed as

$$\begin{aligned} \text {MSE}_{\,Y, X}\left[ \hat{\beta }_x\right] =\textrm{Bias}^2_{\,Y, X}\left[ \hat{\beta }_x\right] +\mathbb {V}^{cd}_{\,Y, X} \left[ \hat{\beta }_{x}\right] +\mathbb {V}_{\,Y, X} \left[ \hat{\beta }_{x}|\varvec{A}_{z\cdot x}=\,{\textbf {0}} \right] . \end{aligned}$$
(12)

Exact computation of these quantities allows to study the impact of some relevant features of the DGP on the sampling properties of \(\hat{\beta }_x\), as will be shown in Sect. 5. In the following section, we provide some approximations to exact formulae aiming to highlight the role of correlation and cross-correlation structures.

4 Links between DGP structure and confounding

We start by considering the spherical DGP (4): in this case, the conditional bias is constant with respect to \(\varvec{X}\). As a consequence, marginal and conditional biases coincide:

$$\begin{aligned} \textrm{Bias}_{\,Y,X}\left[ \hat{\beta }_{x}\right] = \text {Bias}_{\,Y} \left[ \hat{\beta }_{x}|\varvec{X}\right] = \mathcal {B}_{y\cdot z(x)} \mathcal {B}_{z\cdot x}=\mathcal {B}_{y\cdot z(x)} \frac{\sigma _{zx}}{\sigma _x^2}. \end{aligned}$$
(13)

This simplification with respect to the bias Formula (6) is due to the fact that the regression matrix of \(\varvec{Z}\) on \(\varvec{X}\) is scalar, i.e. \({\textbf {A}}_{z \cdot x} = \mathcal {B}_{z \cdot x}{} {\textbf {I}}_n\). This is the key point that enables the deterministic bias in (13) and allows a clear-cut interpretation of the effect of the DGP structure on the bias, which is an increasing function of the marginal covariance \(\sigma _{zx}\) and a decreasing function of the marginal variance \(\sigma _x^2\) of the random vector \(\varvec{X}\). Note that the marginal variance of \(\varvec{Z}\), \(\sigma ^2_z\), does not contribute explicitly to Equation (13).

When \({\textbf {A}}_{z \cdot x}\) is not a scalar matrix, it is not immediate to measure the sources of confounding. Understanding the links between the DGP structure and the bias is a more complicated task that can be addressed by leveraging on the theory of quadratic forms: in Sect. 4.1 we introduce a measure of marginal variability of a Gaussian random vector and a measure of marginal covariance between Gaussian random vectors that deliver the simple counterparts \(\sigma _x^2\) and \(\sigma _{xz}\) as special cases when the DGP is spherical. Additionally, we introduce a measure of similarity between the components of a Gaussian random vectors, aimed at describing the smoothness of the DGP.

4.1 Variability and smoothness of Gaussian random vectors

Let \(\varvec{X}\sim \mathcal {N}_n\left( \varvec{0}, \varvec{\Sigma }_{x}\right)\), with \(\varvec{\Sigma }_{x}=\sigma ^2_x{\textbf {R}}_x\) where \({\textbf {R}}_x\) reflects the covariance structure of the random vector \(\varvec{X}\), while \(\sigma _x^2\) acts as a scaler. The random variable

$$\begin{aligned} V_x=\frac{\varvec{X}^{\top }{} {\textbf {M}}\varvec{X}}{n-1}=\frac{1}{n-1} \sum _{i=1}^{n}(X_i-\bar{X})^2, \end{aligned}$$

i.e. the sampling variance of \(\varvec{X}\), is a quadratic form in Gaussian random variables whose expected value, which we dub the expected sampling variance is, as indicated in Equation (8)

$$\begin{aligned} EV_x =\mathbb {E}_X[V_x] =\bar{\lambda }_{M\Sigma _x} = \sigma _x^2\bar{\lambda }_{MR_x}, \end{aligned}$$
(14)

where \(\bar{\lambda }_{MR_x}\) is the mean of positive eigenvalues of \({\textbf {MR}}_x\). When \({\textbf {R}}_x={\textbf {I}}_n\), as in the case of spherical DGP, \(\bar{\lambda }_{MR_x}=1\) and \(\mathbb {E}_X[V_x]=\sigma ^2_x\). When \({\textbf {R}}_x\ne {\textbf {I}}_n\), \(\mathbb {E}_X[V_x]\ne \sigma ^2_x\) depends on the eigenvalues of \({\textbf {MR}}_x\): this must be taken into account when studying the effect of confounding under non-spherical (or structured) DGPs.

The same logic holds for the covariance between two random vectors \(\varvec{X}\) and \(\varvec{Z}\), through the expected value of their sampling covariance defined as:

$$\begin{aligned} EV_{xz} ={\mathbb {E}}_{\,X,Z}\left[ V_{xz}\right] ={\mathbb {E}}_{\,X,Z}\left[ \frac{\varvec{X}^{\top }{} {\textbf {M}}\varvec{Z}}{n-1}\right] =\bar{\lambda }_{M\Sigma _{xz}} =\sigma _{xz}\bar{\lambda }_{MR_{xz}} \end{aligned}$$
(15)

where \(V_{xz}\) is the sampling covariance of \(\varvec{X}\) and \(\varvec{Z}\). It equals \(\sigma _{xz}\) only when \({\textbf {R}}_{xz}={\textbf {I}}_n\).

The relevance of these QFs can be emphasized by noting that, in a linear regression model, the variability of the response variable \(\varvec{Y}\) is decomposed as follows:

$$\begin{aligned} EV_y =\mathcal {B}_{y\cdot x(z)}^2 EV_x+ \mathcal {B}_{y\cdot z(x)}^2 EV_z+ 2\mathcal {B}_{y\cdot x(z)}\mathcal {B}_{y\cdot z(x)} EV_{zx} +\sigma ^2_{y|x,z}. \end{aligned}$$
(16)

Thus, the variability of the response variable can be obtained as a function of expected sampling variances and covariances of the explanatory variables. As they are defined, such quantities show that the eigenvalues of variance and covariance matrices of the DGP are suitable quantities for quantifying the explanatory power of the covariates.

As a further tool to investigate the features of the DGP that determine the magnitude of confounding, we introduce an indicator for measuring the level of similarity between components of a random vector, which depends upon the correlation structure implied by the covariance matrix: this is strictly related to the level of smoothness of the DGP. Recalling that the correlation matrix \({\textbf {C}}_x\) of a random vector \(\varvec{X}\) is obtained from the covariance matrix as

$$\begin{aligned} {\textbf {C}}_x =\textrm{diag}\left( \varvec{\Sigma }_x\right) ^{-\frac{1}{2}} \varvec{\Sigma }_x \, \textrm{diag}\left( \varvec{\Sigma }_x\right) ^{-\frac{1}{2}} =\textrm{diag}\left( {\textbf {R}}_x\right) ^{-\frac{1}{2}}{} {\textbf {R}}_x \, \textrm{diag}\left( {\textbf {R}}_x\right) ^{-\frac{1}{2}}, \end{aligned}$$

we define the random vector \(\varvec{s}=\textrm{diag}\left( \varvec{\Sigma }_x\right) ^{-\frac{1}{2}}\varvec{X}\) and the random variable

$$\begin{aligned} IS_x=\frac{\varvec{s}^{\top }{} {\textbf {M}}\varvec{s}}{n-1}. \end{aligned}$$

This stochastic variable is designated as the sampling inverse smoothness of \(\varvec{X}\), a terminology rooted in its expected value \({\mathbb {E}}_{\,X}\left[ IS_x\right] =\bar{\lambda }_{MC_x}\) being inversely related the smoothness of the underlying process. Note that \({\mathbb {E}}_{\,X}\left[ IS_x\right] =1\) if \({\textbf {R}}_x={\textbf {I}}_n\).

In order to illustrate how marginal variability and inverse smoothness are related to the DGP structure, we consider two widely adopted models in spatial statistics: the Gaussian Random Field with exponential covariance function, which is a special case of the Matérn covariance function (Matérn 1986), largely used in geostatistical analysis and the Conditional Auto Regressive (CAR) model (Besag 1974; Rue and Held 2005), widely used for areal data modeling. Both models are parameterized by a spatial correlation parameter denoted as \(\theta\) in what follows.

With regard to the first model, \(\varvec{\Sigma }_x=\sigma _x^2{\textbf {R}}_x\), with the ij-th entry of \({\textbf {R}}_x\) being

$$\begin{aligned} {\textbf {R}}_x\left( i,j\right) =\exp \left( -\frac{d_{ij}}{\theta }\right) \end{aligned}$$

where \(d_{ij}\) is the Euclidean distance between points i and j. The range parameter \(\theta\) regulates the decay of spatial correlation as a function of the distance. The smoothness of the process, intended as the strength of correlation between nearby points in space, grows with \(\theta\).

In this case \({\textbf {R}}_x\) is a correlation matrix (\({\textbf {R}}_x ={\textbf {C}}_x\)): as a consequence, \(\bar{\lambda }_{MR_x}=\bar{\lambda }_{MC_x}\) and \(EV_x =\sigma _x^2{\mathbb {E}}_{\,X}\left[ IS_x\right]\), i.e. expected inverse smoothness and expected sampling variance are proportional, with proportionality constant \(\sigma _x^2\).

Consider now the CAR model, where \(\varvec{\Sigma }_x=\sigma ^2_x{\textbf {R}}_x\), with \({\textbf {R}}_x=({\textbf {I}}_n-\theta \varvec{W})^{-1}\). Here, a spatial lattice structure is described by the \(n\times n\) neighborhood matrix \(\varvec{W}\) specified as \(w_{ii}=0\), \(w_{ij}=1\) if area i is neighbor to area j and \(w_{ij}=0\) otherwise. The sufficient condition ensuring positive definiteness of the covariance matrix is \(\theta \in (\theta _{\textrm{min}}, \theta _{\textrm{max}})\), where \(\theta _{\textrm{min}}\) and \(\theta _{\textrm{max}}\) are the inverse of the smallest and largest eigenvalues of \(\varvec{W}\) respectively (Cressie 1993). Moreover, \(\theta\) is a measure of (conditional) spatial autocorrelation. In what follows we adopt the lattice of the 115 Missouri’s counties to build \(\varvec{W}\).

In the CAR case, \({\textbf {R}}_x\ne {\textbf {C}}_x\), hence expected sampling variance and expected inverse smoothness show different behaviors with respect to \(\theta\).

Fig. 2
figure 2

Left panel: marginal inverse smoothness and variance, exponential model. Middle panel: marginal inverse smoothness, CAR model. Right panel: marginal variance, CAR model

Figure 2, left panel, shows the behavior of \({\mathbb {E}}_{\,X}\left[ IS_x\right]\) as a function of \(\theta\) in the Exponential model: coherently with the well-known theory concerning the exponential correlation function, smoothness is an increasing function (\({\mathbb {E}}_{\,X}\left[ IS_x\right]\) is a decreasing function) of both \(\theta\). On the other hand, the expected sampling variance \(EV_x\) is a decreasing function of \(\theta\). Figure 2, middle panel, displays the relationship between \(\theta\) and \({\mathbb {E}}_{\,X}\left[ IS_x\right]\) when considering the CAR model: it can be noticed that smoothness increases with \(\theta\) (inverse smoothness decreases), confirming the interpretation of \(\theta\) as a spatial correlation parameter. Moreover, when \(\theta <0\), implying negative spatial correlation, \({\mathbb {E}}_{\,X}\left[ IS_x\right] >1\), meaning that the DGP is less smooth than the spherical DGP, while positive values of \(\theta\) deliver a smoother process with respect to the spherical DGP. In the right panel of Fig. 2 it can be noticed that the expected sampling variance \(EV_x\) decreases when \(\theta <0\) and increases when \(\theta >0\). Hence, in the case of the CAR model, expected smoothness and expected sampling variance show different behaviors as functions of the smoothness parameter \(\theta\).

To summarize, a sample from the exponential model is expected to be smoother and less variable as \(\theta\) increases, on the other hand a sample from the CAR model is expected to be smoother as \(\theta\) increases, while variability increases as \(\theta\) approaches the boundaries \(\left( \theta _{\textrm{min}}, \theta _{\textrm{max}}\right)\).

A thorough discussion of the relationships between smoothness and variability is out of the scope of this paper: here, we are interested in showing how these features of the DGP relate to confounding. In particular, in the following subsection, we provide an approximation to the marginal bias that is strictly related to the quantities introduced in this section and that will be useful to discuss the relevance of variability and smoothness as determinants of bias.

4.2 Approximation of estimator bias

Equations (14)-(15) can be exploited for obtaining the first-order Taylor series approximation of the marginal bias, i.e. the expected value of a ratio is approximated by the ratio of expected values:

$$\begin{aligned} \textrm{Bias}_{\,Y, X}\left[ \hat{\beta }_{x}\right] \approx \mathcal {B}_{y\cdot z(x)}\frac{{\mathbb {E}}_{\,X} \left[ \varvec{X}^{\top } \varvec{\Delta } {\textbf {A}}_{z\cdot x} \varvec{X}\right] }{{\mathbb {E}}_{\,X}\left[ \varvec{X}^{\top } \varvec{\Delta } \varvec{X}\right] }=\mathcal {B}_{y\cdot z(x)} \frac{\bar{\lambda }_{\Delta \Sigma _{zx}}}{\bar{\lambda }_{\Delta \Sigma _{x}}}=E_T. \end{aligned}$$
(17)

In the case of spherical DGP, \(E_T\) is the exact value for the bias.

When \(\varvec{\Delta }={\textbf {M}}\), i.e. when the OLS estimator is considered, the approximation coincides with the ratio between \(EV_{zx}\) and \(EV_{x}\): approximate bias is an increasing function of the expected sampling covariance and a decreasing function of the expected sampling variance. The marginal variability of the confounder, \(EV_z\), has no direct impact on bias, with the only caveat that \(\varvec{\Sigma }_z\) must satisfy the conditions that guarantee positive definiteness of the DGP joint covariance matrix. Note that the smoothness of the DGP does not enter explicitly in Equation (17), suggesting that such feature of the DGP is not a relevant determinant of bias, as will be discussed in Sect. 5.

5 Applications

This section provides a study of the marginal sampling properties of \(\hat{\beta }_{x}\) under several DGPs: concerning the conditional distribution \(\varvec{Y}|\varvec{X},\varvec{Z}\) in Equation (1), we set \(\mathcal {B}_{y\cdot x(z)}=\mathcal {B}_{y\cdot z(x)}=1\) and \(\varvec{\Sigma }_{y|x,z}={\textbf {I}}_n\). As for the specification of the joint covariance matrix \(\varvec{\Sigma }_{x,z}\) in Equation (2), we consider the three DGPs summarised in Table 1, where regression matrices \({\textbf {A}}_{z\cdot x}\) are reported as well. Moreover, we fix \(\sigma _x^2=\sigma _z^2=\sigma _{zx}=1\).

Table 1 Joint confounder-covariate covariance matrices and regression matrices of \(\varvec{Z}\) on \(\varvec{X}\) specified for DGPs A-C

Parameter \(\rho\) governs the strength of cross-correlation between \(\varvec{X}\) and \(\varvec{Z}\) and determines positive definiteness of \(\varvec{\Sigma }_{x,z}\). A sufficient condition to ensure the positive definiteness in DGP-A is \(\lambda _{\textrm{min}}\left( {\textbf {R}} (\theta _{z})^{-1}{} {\textbf {R}}(\theta _x){\textbf {R}}(\theta _{z})^{-1} {\textbf {R}}(\theta _{\tilde{z}})\right) \ge \rho ^2\), while the condition \(|\rho |<1\) is sufficient for positive definiteness in DGPs B and C.

In DGP-A, the cross-covariance matrix is parameterized independently on the parameters of the marginal covariance matrices of \(\varvec{X}\) and \(\varvec{Z}\). This construction is intended to show that, as expected from Theorem 2, the bias relies on the structure of \(\varvec{Z}\) only when the cross-covariance matrix is obtained as a function of \(\varvec{\Sigma }_z\) by construction. This happens in both DGP-B and DGP-C.

In DGP-B, which is a modification of the DGP discussed in Paciorek (2010), parameter \(\theta _z\) governs both the covariance matrix of \(\varvec{Z}\) and the cross-covariance matrix, while the covariance matrix of \(\varvec{X}\) is a function of both \(\theta _z\) and \(\theta _x\).

In DGP-C, which corresponds to the specification adopted in Page et al. (2017), the cross-covariance matrix is obtained as the product of the lower Cholesky factors of the covariance matrices of \(\varvec{X}\) and \(\varvec{Z}\), indexed by spatial correlation parameters \(\theta _x\) and \(\theta _z\) respectively. Note that this is the only DGP that can cover the case of asymmetric cross-covariances.

In what follows, we study the marginal sampling properties of \(\hat{\beta }_x\) in the cases where the structure matrices \({\textbf {R}}\) are specified via the exponential and the CAR model. Results provided in Theorem 2 allow to compute exact quantities with no need for simulation. We focus on the study of the sampling properties of the OLS estimator: results obtained using the GLS estimator (not shown) are very similar.

5.1 Results

With regard to the exponential model, a regular grid of \(n=64\) spatial units located on the unit square is considered and the sampling properties of \(\hat{\beta }_x\) are studied by letting all the spatial correlation parameters vary in the interval [0, 1]. Figure 3 reports, for each considered DGP, the expected sampling covariance (\(EV_{zx}\), first line), the expected sampling variance \(EV_x\) (second line) and the exact marginal bias \(\textrm{Bias}_{\,Y, X}\left[ \hat{\beta }_{x}\right]\) (third line). The relative error of approximation (17) is reported in the last line. It is worth noting that, with reference to bias, parameter \(\rho\) constitutes a mere scaling constant: Fig. 3 displays the behavior of the marginal bias as a function of the spatial correlation parameters, rather than its magnitude; to this aim, \(\rho\) values are irrelevant.

Fig. 3
figure 3

Expected sampling covariance, expected sampling variance, marginal bias and relative error of the Taylor approximation with respect to \(\theta _x\) and \(\theta _{z}\) for DGPs A-C. Exponential model

It can be noticed that, with reference to DGPs A and B, \(EV_{zx}\) does not depend on \(\theta _x\) and is a decreasing function of \(\theta _z\), while, with reference to DGPs A and C, \(EV_{x}\) does not depend on \(\theta _z\) and is a decreasing function of \(\theta _x\). Moreover, in DGP-B, \(EV_{x}\) is a decreasing function of both spatial correlation parameters, while, in DGP-C, \(EV_{zx}\) is a non-monotonic function of both \(\theta _x\) and \(\theta _z\). These behaviors are reflected by the marginal bias toward the ratio of expected sampling covariance and expected sampling variance of the observed covariate \(\varvec{X}\), which delivers the approximation provided in Equation (17). Last line of Fig. 3 shows a satisfactory accuracy of the approximation for all considered DGPs: the relative error is mostly less than \(\pm 5\%\), higher errors are observed for low values of \(\theta _z\) and high \(\theta _x\) with reference to DGPs A and C.

Regarding DGPs A and B, it turns out that the marginal bias is an increasing function of \(\theta _x\) (maintaining \(\theta _z\) constant) and a decreasing function of \(\theta _z\) (maintaining \(\theta _x\) constant). Note that the spatial correlation parameter indexing the covariance matrix of \(\varvec{Z}\) in DGP-A, (\(\theta _{\tilde{z}}\) in Table 1), has no impact on bias: dependency of bias on the spatial correlation parameter of the confounder is actually an artifact generated when the same parameter appears both in the cross-covariance matrix and in the marginal covariance matrix of \(\varvec{Z}\), as observed in DGPs B and C. Interpretation of the bias trend in Case C is less immediate because of the product involved in the construction of the cross-covariance matrix. This gives rise to the non-monotonic behavior of the marginal bias with respect to \(\theta _x\) and \(\theta _z\), such behavior is inherited from \(EV_{zx}\).

In summary, the findings presented in Fig. 3 indicate that the primary contributors to bias are the expected sampling variance and covariance. Notably, the similarity in behavior between smoothness and expected sampling variance when employing an exponential correlation function (as depicted in the left panel of Fig. 2) complicates our ability to discern whether the key drivers of bias are predominantly associated with the strength of spatial autocorrelation (smoothness) or the marginal variability. A more in-depth analysis of the case involving CAR model provides enlightening insights into this matter.

With regard to the CAR model, the lattice of the 115 Missouri’s counties is considered. The spatial correlation parameters vary in the interval \([\theta _{\textrm{min}}+0.01, \theta _{\textrm{max}}-0.01]\) to ensure positive definiteness of variance and covariance matrices. Figure 4 is the counterpart of Fig. 3 in the case of CAR model.

Fig. 4
figure 4

Expected sampling covariance, expected sampling variance, marginal bias and relative error of the Taylor approximation with respect to \(\theta _x\) and \(\theta _{z}\) for DGPs A-C. CAR model

It can be noticed that, with reference to DGPs A and B, \(EV_{zx}\) does not depend on \(\theta _x\) and is a convex function of \(\theta _z\), while, with reference to DGPs A and C, \(EV_{x}\) does not depend on \(\theta _z\) and is a convex function of \(\theta _x\), coherently with what was observed in Fig. 2, right panel. Moreover, \(EV_{x}\) in DGP-B and \(EV_{zx}\) in DGP-C are convex functions of both spatial correlation parameters. In this instance, the Taylor approximation given by Equation (17) exhibits a high degree of accuracy, as evidenced by the relative error falling within the range of \(1\%\) to \(-4\%\) across all scenarios.

The key insight gleaned from Fig. 4 is the lack of significance of DGP smoothness in influencing bias. Despite the fact that smoothness rises as a function of spatial autocorrelation parameters (as shown in Fig. 2, middle panel), the bias trends exhibit a convex pattern, primarily shaped by the characteristics of \(EV_x\) and \(EV_{zx}\). In conclusion, marginal variance and covariance are the main determinants of bias. This is strictly tied with the decomposition of the response marginal variance reported in Equation (16): when \(EV_x\) increases, keeping other factors fixed, the explanatory power of \(\varvec{X}\) increases and bias decreases.

Figure 5 shows, in the case of the exponential model, the contribution of each component constituting the estimator mean square error in (12) to the total as a function of the parameter \(\theta _x\) and \(\theta _z\) fixing \(\rho =0.5\).

It is evident how the mean squared error is primarily determined by bias. It can be seen that the bias share is a decreasing function of \(\theta _z\), while the variance component not dependent on confounding shows an opposite behavior which is related to the role of \(\theta _z\) in the distribution of the confounder. Actually, the most relevant role of smoothness and marginal variability of the confounder is related to the marginal variance, rather than to the bias of the estimator. The confounding-dependent share of the variance predominantly assumes negative values, showing that confounding tends to generate a reduction in variance. Similar results (not shown) are obtained in the case of the CAR model.

Fig. 5
figure 5

The influence of the three components of the estimator mean square error on it as function of the range parameters \(\theta _x\), \(\theta _z\) with \(\rho =0.5\) (exponential function)

6 Conclusions

In this work the problem of confounding in linear regression models is addressed. In particular, we study, through the evaluation of the estimator sampling properties, how confounding affects the estimate of the inferential target.

The spatial literature has extensively dealt with this issue. To assess the impact of confounding on the sampling properties of the regression coefficient estimators, the research focused on the strength of the auto-correlation characterizing covariate and confounder, both spatially varying. To date, what is clear from the previous studies is that the parameters influencing the spatial auto-correlation of the covariate and confounder processes are undoubtedly of great importance. We provide more awareness regarding the effect of confounding on coefficient estimates by generalizing the theory discussed by Paciorek (2010) and Page et al. (2017), who introduced the widely accepted idea that the smoothness of covariate and confounder processes is an important factor that impacts the estimator sampling properties. In particular, Paciorek (2010) affirms that a confounder smoother than the covariate leads to a lower bias, and subsequently to less confounding. Actually, one may agree with this belief only in specific situations, such as when the parameters governing the confounder covariance matrix contribute to the cross-covariance matrix, e.g. assuming an exponential correlation function and DGPs B-C. In other cases, as demonstrated by the application involving the CAR model, the connection to smoothness would not hold. In this regard, we introduce the expected sampling variance and covariance, expressing the variability of a process and the variability of the interaction between two processes, respectively. When considering the estimator bias as the principal marker of confounding, we point out that the confounder smoothness is not the most relevant measure determining bias. Indeed, the cross-covariance matrix characterizing the covariate-confounder interaction plays the most prominent role, as the bias mainly hinges on the covariate variability and on the expected sampling covariance between covariate and confounder. Moreover, we note that the confounder structure does not affect the estimator bias; rather, it influences the variance of the estimator.

While the primary focus of the paper is theoretical and doesn’t offer immediate implications for practical applications, certain aspects of the presented theory could potentially guide advancements in environmental contexts. For instance, one could attempt to estimate bias resulting from confounding by devising an estimator for the approximate bias (17). Given the unobservable nature of the confounder, such estimations would necessitate relying on assumptions about the DGP, which may be challenging to verify.