Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence

Fingleton, Bernard

doi:10.1007/s43071-022-00029-4

Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence

Original Paper
Open access
Published: 02 October 2022

Volume 3, article number 10, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Spatial Econometrics

Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence

Download PDF

Bernard Fingleton ORCID: orcid.org/0000-0001-7359-643X¹

1868 Accesses
Explore all metrics

Abstract

The aim in the paper is to show how the presence of spatial dependence affects the often-adopted Windmeijer (J Econom 126:25–51, 2005) finite sample correction (For example it is an option facilitating robust estimation in the software package Stata, which is used by many applied econometricians and data analysts.), which corrects the downward bias in estimated parameter standard errors. Windmeijer (2005) explains why, with numerous instruments, the estimated asymptotic standard errors of the efficient, two-step, GMM estimator are downward biased in small samples. GMM estimation is based on an estimated optimal weight matrix, which is the inverse of the covariance of the sample moments, and the bias results from the weight matrix being evaluated at estimated, rather than the true values of parameters. Hwang et al. (J Econom 229(2):276–298, 2022) provide a correction to the Windmeijer (2005) finite sample correction to allow for over-identification bias. The novel contribution of the current paper is to show how the Windmeijer (2005) correction can be modified given spatial dependence in the error term of a model with moments conditions that are linear in parameters estimated by GMM, leading to corrected standard errors and therefore more accurate inference. Monte Carlo simulations are used to demonstrate the effect of the modification and two examples using real data shows how inference may be affected by ignoring the effect of spatial error dependence.

Robust estimation and inference of spatial panel data models with fixed effects

Article 18 April 2020

The generalized spatial random effects model in R

Article Open access 24 June 2022

Alternative Model Specifications for Big Datasets

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introducing spatial error dependence into a dynamic panel data model

The generalized method of moments (GMM) estimator is a very general approach to estimation with widespread application, which includes two-stage least squares (2SLS) as one member of the class of GMM estimators of the linear model, but in this paper, without loss of generality, we focus on a specific version of GMM applicable to dynamic panel data modelling. More generally, GMM can be applied to estimate parameters of linear or nonlinear models, with parameters chosen so as to satisfy moments conditions as closely as possible. So parameters are chosen which give the best fit to a set of equations, each of which sets a sample moment to zero. Invariably with over-identification these moments conditions cannot be satisfied simultaneously, so the approach involves minimising a quadratic objective function to achieve the best fit. While GMM can be applied to non-linear moments conditions, when all the moments conditions are linear in parameters, one typically obtains an improvement in the variance estimate (Windmeijer 2005, p. 29). This paper adopts a linear in parameters approach. GMM can be applied to linear cross-sectional models in which the sample covariance of regression error and exogenous variables is as small as possible, in other words one is minimising a quadratic function which is in terms of the regression residuals per se, the exogenous variables and an appropriate weight matrix. In the case of dynamic panel data models in which there is a compound error process which captures both individual heterogeneity and a remainder idiosyncratic error component, a commonly used approach is the GMM estimator in the model in first differences (see Arellano and Bond 1991). In this case the moments conditions are in terms of differences in residuals (to avoid correlation between regressors and residuals per se), exogenous variables and the weight matrix, which define the quadratic objective function. This naturally extends to the dynamic panel data model under consideration here, with the additional extension to accommodate spatial error dependence. One-step and two-step variants, in which the second step utilises residuals from the first-step, are also options. While the focus for the paper is on a particular two-step GMM estimator, the Windmeijer correction, which seeks to avoid the severe downward bias in the estimated asymptotic standard errors of the efficient two-step GMM estimator, and the modification proposed in this paper, it should be emphasised that spatial error dependence is undoubtedly present in data across the range of GMM estimators. For example in the cross-sectional regression context, Kelejian and Prucha (1998) give a generalized spatial 2SLS procedure for estimating a spatial autoregressive model with autoregressive disturbances. Fingleton and Le Gallo (2008) extend their feasible spatial 2SLS estimator to allow for several endogenous regressors, and also introduce spatial moving average (SMA) error dependence.

Consider first a simple dynamic panel data model

$$y_{it} = \gamma y_{it - 1} + \beta x_{it} + \varepsilon_{it} ;\;i = 1, \ldots ,N,t = 1, \ldots ,T$$

(1)

in which there are $N$ individuals /regions/locations and $T$ times, $x$ is an exogenous variable, $\gamma ,\beta$ are scalar parameters to be estimated. The error term is compound, thus

$$\varepsilon_{it} = \mu_{i} + \nu_{it}$$

(2)

where $\mu_{i}$ is a set of individual effects, one for each of the $N$ individuals, controlling for unobserved time-invariant heterogeneity across individuals. The term $\nu_{it}$ varies both by region and by time, and represents other, unpredictable, random effects. The assumption is that each $\mu_{i}$ and $\nu_{it}$ is a random draw from independent and identically distributed distributions thus $\mu_{i} \sim iid(0,\sigma_{\mu }^{2} )$ and $\nu_{it} \sim iid(0,\sigma_{\nu }^{2} )$ with $\mu_{i}$ and $\nu_{it}$ independent of each other and among themselves. Given $\sigma_{\mu }^{2} > 0$ there is individual heterogeneity with $\mu_{i}$ capturing individual effects and also regional variation in unobserved effects.

A more general specification written in matrix terms is

$${\mathbf{y}}_{t} = \gamma {\mathbf{y}}_{t - 1} + {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + {{\varvec{\upvarepsilon}}}_{t}$$

(3)

In which ${\mathbf{y}}_{t}$ is an $N$ by 1 vector, ${\tilde{\mathbf{x}}}_{t}$ is an $N$ by $k$ matrix of exogenous regressors, $\beta$ is a $k$ by 1 vector of coefficients, and ${{\varvec{\upvarepsilon}}}_{t}$ is a vector of errors at time $t$.

Spatial dependence^{Footnote 1} could be introduced into the errors by a spatial autoregressive error process, which is the most widely adopted on the spatial econometrics literature. Given an N by N connectivity matrix M, with spatial autoregressive (SAR) error dependence ${\mathbf{H}}_{1} = \left( {{\mathbf{I}}_{N} - \hat{\rho }{\mathbf{M}}} \right)^{ - 1}$ where ${\mathbf{I}}_{N}$ is an $N$ by $N$ identity matrix, and with SMA error dependence ${\mathbf{H}}_{2} = \left( {{\mathbf{I}}_{N} - \hat{\lambda }{\mathbf{M}}} \right)$, as in Baltagi et al. (2019). Below we use ${\mathbf{S}} \in \left\{ {{\mathbf{H}}_{1} ,{\mathbf{H}}_{2} } \right\}$ to refer to either version of ${\mathbf{H}}$.

Accordingly, given $N$ locations, for SAR error dependence it is assumed that in each period,

$$\begin{aligned} {{\varvec{\upvarepsilon}}}_{t} & = \rho {\mathbf{M\varepsilon }}_{t} + {{\varvec{\upxi}}}_{t} \\ {{\varvec{\upvarepsilon}}}_{t} & = \left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)^{ - 1} {{\varvec{\upxi}}}_{t} \\ {{\varvec{\upvarepsilon}}}_{t} & = {\mathbf{H}}_{1} ({{\varvec{\upmu}}} + {{\varvec{\upnu}}}_{t} ) \\ \end{aligned}$$

(4)

The diagonal elements of ${\mathbf{M}}$ are zeros since a location cannot be a neighbour of itself, $\left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)$ is non-singular and ${\mathbf{M}}$ is uniformly bounded in absolute value. This error process implies that a shock at location j is transmitted to all $N$ locations, as shown, assuming $\left| \rho \right|$< 1 and the rows of ${\mathbf{M}}$ sum to 1, by the expansion

$$\left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)^{ - 1} {{\varvec{\upxi}}}_{t} = \left( {\sum\limits_{i = 0}^{\infty } {{\mathbf{M}}^{i} } \rho^{i} } \right){{\varvec{\upxi}}}_{t} = {{\varvec{\upxi}}}_{t} + \rho {\mathbf{M\xi }}_{t} + \rho^{2} {\mathbf{M}}^{2} {{\varvec{\upxi}}}_{t} + \rho^{3} {\mathbf{M}}^{3} {{\varvec{\upxi}}}_{t} + \cdots$$

(5)

In Eq. (5), ${\mathbf{M}}^{0} = {\mathbf{I}}_{N}$, ${\mathbf{M}}^{2}$ is the matrix product of ${\mathbf{M}}$ and ${\mathbf{M}}$,and ${\mathbf{M}}^{i}$ is the matrix product of ${\mathbf{M}}$ and ${\mathbf{M}}^{i - 1}$. A shock at $j$ is felt directly at location $j$, with an indirect effect due to $\rho {\mathbf{M\xi }}_{t}$ affecting all pairs of locations with non-zero cells in ${\mathbf{M}}$. Shocks extend beyond these local impacts as a result of transmission via neighbours of neighbours, ultimately affecting all $N$ locations. Working via neighbours of neighbours of neighbours, shock effect rebound to $j$ so that the full impact of a shock at $j$ is the initial shock plus the shock effects feeding back from other locations.

In contrast, an SMA error process is given by

$$\begin{gathered} {{\varvec{\upvarepsilon}}}_{t} = \left( {{\mathbf{I}}_{N} - \lambda {\mathbf{M}}} \right){{\varvec{\upxi}}}_{t} \hfill \\ {{\varvec{\upvarepsilon}}}_{t} = {\mathbf{H}}_{2} ({{\varvec{\upmu}}} + {{\varvec{\upnu}}}_{t} ) \hfill \\ \end{gathered}$$

(6)

Which implies that a shock at $j$ will only affect locations that are directly connected by non-zero elements of ${\mathbf{M}}$, so shock effects are local rather than global. The consequence of spatially dependent errors is that the parameter standard errors differ from those obtained assuming independent errors.

2 GMM estimation

Estimation of linear GMM panel data regressions involves first differences to avoid dynamic panel bias, eliminating the individual effects $\mu_{i}$ which would otherwise be correlated with the time lag of the dependent variable. First differencing Eq. (3) gives

$$\Delta {\mathbf{y}}_{t} = \gamma \Delta {\mathbf{y}}_{t - 1} + \Delta {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + \Delta {{\varvec{\upvarepsilon}}}_{t} = \gamma \Delta {\mathbf{y}}_{t - 1} + \Delta {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + {\mathbf{S}}\Delta {{\varvec{\upnu}}}_{t}$$

(7)

and moments equations thus

$$\sum\limits_{i} {y_{il} \Delta \nu_{it} = 0} ,\;\;\forall i,\;l = 0,\;1, \ldots ,t - 2,t = 2,3, \ldots ,T$$

(8)

Additional GMM-style moments equations for the $j = 1,...,k$ exogenous regressors ${\tilde{\mathbf{x}}}_{t}$ are given by

$$\sum\limits_{i} {\tilde{x}_{j,im} \Delta \nu_{it} } = 0, \, \forall i,j,\;m = 1, \ldots ,T\,t = 2, \ldots ,T$$

(9)

In order to reduce the number of instruments the ${\tilde{\mathbf{x}}}_{t}$ can also be treated as $k$ instruments in the classic one column for each instrumenting variable design for the matrix of instruments ${\mathbf{Z}}$. The initial step is to obtain consistent estimates of the $k$ + 1 by 1 vector ${{\varvec{\upbeta}}}_{0}$ [$\gamma ,{{\varvec{\upbeta}}}$] using an IV or GM estimator, so as to obtain consistent estimates of residuals.

For example, let

$${\tilde{\mathbf{W}}}_{0} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } ({{\varvec{\Lambda}}} \otimes {\mathbf{I}}_{N} ){\mathbf{Z}}} \right)$$

(10a)

and

$${\hat{\mathbf{W}}}_{0} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } ({\hat{\mathbf{\Lambda }}}_{1} ){\mathbf{Z}}} \right)$$

(10b)

Also

$${\mathbf{\hat{\tilde{\beta }}}}_{0} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\tilde{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\tilde{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$

(11a)

and

$${\hat{\mathbf{\beta }}}_{0} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$

(11b)

In Eqs. (11a and 11b), $\Delta {\mathbf{x}} = [\Delta {\mathbf{y}} \, \Delta {\tilde{\mathbf{x}}}]$ is an $N(T - 2)$ by $k$ + 1 matrix and ${\mathbf{Z}}$ is an $N(T - 2)$ by $h \ge k + 1$ matrix of $h$ instruments and $(T - 2)$ by $(T - 2)$ matrix ${{\varvec{\Lambda}}}$ has 2’s on the main diagonal, − 1’s on the first off-diagonals and zeros elsewhere.

The efficiency of ${\tilde{\mathbf{W}}}_{0}$ depends on $\nu_{it}$ being i.i.d. (Windmeijer 2005 p. 32), but the resulting $N(T - 2)$ by 1 vector of residuals $\Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\tilde{\beta }}}}_{0}$ allows more robust estimates given by Eqs. (10b) and (11b) involving the $N(T - 2)$ by $N(T - 2)$ matrix

$${\hat{\mathbf{\Lambda }}}_{1} = \left( {\Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0} \Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0}^{\prime } } \right) \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$

where ${\mathbf{J}}_{T - 2} = \iota_{T - 2} \iota^{\prime}_{T - 2}$ and $\iota_{T - 2}$ is a $(T - 2)$ by 1 vector of ones.

Using (10b) in Eq. (11b) gives estimates of residuals $\Delta {\hat{\mathbf{\varepsilon }}}_{0} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{0}$ that embody spatial dependence, so on this basis a GM estimator of λ is the solution of sample moments using nonlinear least squares, as shown by Baltagi et al (2019). Likewise, the residuals are used to obtain consistent estimates of the autoregressive parameter $\rho$ based on the Kapoor et al. (2007) approach, as in Baltagi et al (2014). Given ${\mathbf{S}}$, one obtains an ‘initial’ weight matrix as an estimate of the covariance matrix of the moment conditions.

$${\hat{\mathbf{W}}}_{1} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } \left( {{{\varvec{\Lambda}}} \otimes {\hat{\mathbf{S}}\mathbf{S}}^{\prime } } \right){\mathbf{Z}}} \right)$$

(12)

Then the ‘first- step’ parameter estimates are given by

$${\hat{\mathbf{\beta }}}_{1} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{1}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{1}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$

(13)

Using the Moore–Penrose pseudo-inverse throughout maintains the symmetry and positive definitiveness of the weight matrix estimates.

Equation (13) gives the first-step residuals

$$\Delta {\hat{\mathbf{\varepsilon }}}_{1} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{1} = \Delta {\hat{\mathbf{\nu }}}$$

(14)

In the second step, ${\hat{\mathbf{W}}}_{1}$ is replaced by its robust version,

$${\hat{\mathbf{W}}} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } {\mathbf{\hat{\Omega }Z}}} \right)$$

(15)

and

$${{\varvec{\Omega}}} = \left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}} \right){{\varvec{\Phi}}}\left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}^{\prime } } \right)$$

(16)

where ${{\varvec{\Omega}}}$ is an $N(T - 2)$ by $N(T - 2)$ matrix and ${\mathbf{I}}_{T - 2}$ is an identity matrix of dimension $T - 2$. Also ${{\varvec{\Omega}}}$ depends on the $N(T - 2)$ by $N(T - 2)$ matrix

$${{\varvec{\Phi}}} = \left[ {\left( {\Delta {{\varvec{\upnu}}}} \right)\left( {\Delta {{\varvec{\upnu}}}} \right)^{\prime } } \right] \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$

(17)

Also (see Arellano and Bond 1991; Doornik et al. 2001; Roodman 2009; Hwang et al. 2022)

$${\text{var}} ({\hat{\mathbf{\beta }}}_{1} ) = {\hat{\mathbf{V}}}_{01} = N\left( {\Delta {\tilde{\mathbf{x}}}^{\prime } {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\tilde{\mathbf{x}}}} \right)^{ - 1} \Delta {\tilde{\mathbf{x}}}^{\prime } {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{\hat{W}\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{{^{\prime } }} \Delta {\tilde{\mathbf{x}}}\left( {\Delta {\tilde{\mathbf{x}}}^{{^{\prime } }} {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\tilde{\mathbf{x}}}} \right)^{ - 1}$$

(18)

The $h$ by $h$ matrix ${\hat{\mathbf{W}}}$ is the optimal weight matrix which is used in the second step of linear two-step GMM.

The vector of two-step parameter estimates is

$${\hat{\mathbf{\beta }}}_{2} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$

(19)

With two-step residuals

$$\Delta {\hat{\mathbf{\varepsilon }}}_{2} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{2}$$

(20)

Accordingly,

$${\text{var}} ({\hat{\mathbf{\beta }}}_{2} ) = {\hat{\mathbf{V}}}_{0} = N\left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1}$$

(21)

The standard errors of the parameters resulting from the two-step spatial GMM estimator, or conventional standard errors, are given by the $k$ + 1 by 1 vector

$${\text{conventional }}s.e.\left( {{\hat{\mathbf{\beta }}}_{2} } \right) = \sqrt {diag({\hat{\mathbf{V}}}_{0} )}$$

(22)

3 The Windmeijer correction corrected for spatial dependence

Given that the estimated asymptotic standard errors of the efficient, two-step, GMM estimator can be downward biased in small samples, Windmeijer(2005), corrects for the bias, due to the presence of estimated parameters in the efficient weight matrix, by applying a Taylor series expansion leading to the expression

$${\hat{\mathbf{V}}}_{W} = {\hat{\mathbf{V}}}_{0} + {\mathbf{\hat{D}\hat{V}}}_{0} + {\hat{\mathbf{V}}}_{0} {\hat{\mathbf{D}}}^{\prime } + {\mathbf{\hat{D}\hat{V}}}_{01} {\hat{\mathbf{D}}}^{\prime }$$

(23)

in which

$${\hat{\mathbf{D}}} = - \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} \left. {\frac{{\partial {\mathbf{W}}}}{{\partial \hat{\beta }}}} \right|_{{\hat{\beta } = \hat{\beta }_{1} }} {\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$$

(24)

and

$${\text{Windmeijer}}\;s.e.\left( {{\hat{\mathbf{\beta }}}_{2} } \right) = \sqrt {diag\left( {{\hat{\mathbf{V}}}_{{\mathbf{W}}} } \right)}$$

(25)

A summary focussing on the computation of ${\mathbf{D}}$ is provided in the Appendix.^{Footnote 2} Integral to this is ${\mathbf{W}}$ as estimated by Eq. (15) which embodies spatial dependence, but assuming $\rho = 0$ or $\lambda = 0$ in ${\mathbf{S}}$ eliminates spatial error dependence. Alternatively, instead of ${\mathbf{V}}_{W}$ one might opt for ${\mathbf{V}}_{{\mathbf{0}}}$ as in Eq. (22), thus ignoring the Windmeijer correction. Overall therefore we have four alternative standard errors. First, what we might term the naïve conventional standard error based on ${\mathbf{V}}_{{\mathbf{0}}}$ and an assumption that $\rho = 0$ or $\lambda = 0$. Secondly the spatially corrected conventional standard error also applies ${\mathbf{V}}_{{\mathbf{0}}}$ given by Eq. (21) but with ${\hat{\mathbf{W}}}$ and ${\hat{\mathbf{S}}}$ based on $\hat{\rho },\hat{\lambda } \ne 0$. Thirdly we have the classic Windmeijer correction given by Eq. (23) but with $\hat{\rho },\hat{\lambda } = 0$. Finally the spatially corrected Windmeijer correction also applies Eq. (23) but ${\mathbf{V}}_{{\mathbf{0}}}$, ${\mathbf{V}}_{{{\mathbf{01}}}}$ and ${\mathbf{D}}$ are estimated using ${\hat{\mathbf{W}}}$ and ${\hat{\mathbf{S}}}$ as determined by $\hat{\rho },\hat{\lambda } \ne 0$. To save space we focus on the first, third and fourth of these in the Monte Carlo simulation, but all four are reported in the empirical examples.

4 Numerical illustration

To show the impact of different assumptions regarding the error process on the conventional and Windmeijer parameter standard error estimates, a Monte Carlo approach is adopted with mean estimates based on 100 replications for each combination of assumptions. Throughout the matrix inverse is obtained using the Moore–Penrose pseudo-inverse to allow for asymmetric non-positive definite weight matrices, so the replications are an attempt to moderate any resulting inaccuracy. The simulations are based on four exogenous regressors and the lagged dependent variable, thus

$$\begin{aligned} y_{it} & = \gamma y_{it - 1} + \beta_{1} x_{1it} + \beta_{2} x_{2it} + \beta_{3} x_{3it} + \beta_{4} x_{4it} + \varepsilon_{it} ;\;i = 1, \ldots ,N,t = 1, \ldots ,T \\ \xi_{it} & = \mu_{i} + \nu_{it} \\ {{\varvec{\upvarepsilon}}}_{t} & = {\mathbf{S\xi }}_{t} \\ \end{aligned}$$

(26)

With $\mu_{i} \sim iid.N(0,\sigma_{\mu }^{2} )$ and $\nu_{it} \sim iid.N(0,\sigma_{\nu }^{2} )$ and $N$ = 100 and $T$ = 10.

The exogenous variables are generated using

$$x_{kit} = \delta x_{kit - 1} + \upsilon_{it}\,k = 1, \ldots ,4$$

(27)

In which $\upsilon_{it} \sim iid.N(0,\sigma_{\upsilon }^{2} )$.

${\mathbf{M}}$ is an $N$ by $N$ matrix of non-stochastic weights defining the error interdependence across $N$ locations. The tabulated outcomes use a ‘r ahead and r behind’ connectivity matrix popularised by Kelejian and Prucha (1998), in which it is assumed that r = 5. This is subsequently row normalised so that each row sums to 1. Thus each row of spatial matrix ${\mathbf{M}}(i.e. \, m_{ij} ,{\text{ with }}i = 1,...,N,j = 1,...,N)$ has up to 10 non-zero elements (5 ahead and 5 behind each with equal weights), with zeros on the main diagonal and elsewhere.

In practice, for the purposes of simulation, various alternative true parameter values have been considered, but the results presented subsequently are based on $\sigma_{\mu }^{2} = 0.2,0.8$, $\sigma_{\nu }^{2} = 0.8,0.2$, $\rho = 0.25,0.75,\lambda = - 0.25, - 0.75$ with $\delta = 0.8,\sigma_{\upsilon }^{2} = 0.9$,$\gamma = 0.2,\beta_{1} = 1,\beta_{2} = 0.5,\beta_{3} = 0.75$ and $\beta_{4} = 1.0$. Given these true values of the various parameters, and drawing in each replication from the normal distributions defined by $\sigma_{\mu }^{2} ,\sigma_{\nu }^{2}$, Eq. (26) is repeatedly calculated to give realisations of $y_{it} ,i = 1,...,N;t = 1,...,T$ and the conventional and Windmeijer parameter standard error estimates. Each of the 100 replications for each parameter combination discards an initial 51 simulation outcomes in order to minimise the effect of initial values at $t = - 50$ of zero (i.e. simulation outcomes for $t = - 50, - 49,...,0$ are discarded).

For each replication, estimates of $\gamma ,\rho ,\lambda ,{{\varvec{\upbeta}}}_{1} ,{{\varvec{\upbeta}}}_{2}$ and hence ${\mathbf{W}}$ and the parameter standard error estimates were obtained by one-step and two-step GMM estimation collapsing the HENR instruments for $y_{it}$ to give $T - 2$ instruments plus 4 instruments in the classic one column for each instrumenting variable design for the 4 exogenous variables, giving 12 instruments in total. Note that replications producing parameter estimates indicating non-stationarity were rejected. Hence we require that $- 1 < \hat{\gamma } < 1$ and $e_{\min }^{ - 1} < \left[ {\hat{\rho }{\text{ or }}\hat{\lambda }} \right] < e_{\max }^{ - 1}$ where $e_{\min }$ and $e_{\max }$ are the minimum and maximum real characteristic roots of ${\mathbf{M}}$.

Tables 1, 2, 3 illustrate how the standard errors increase as we transition from zero error dependence in the DGP and the corresponding adjusted standard error estimators to positive error dependence. The corrected Windmeijer correction is always larger than the conventional standard errors reflecting the correction for bias in estimating the efficient weight matrix embodied in Windmeijer. These higher standard error estimates are maintained as the variance of the error components increase.

Table 1 Mean standard errors assuming DGP with no spatial error dependence

Full size table

Table 2 Mean standard errors assuming DGP with spatial moving average error dependence

Full size table

Table 3 Mean standard errors assuming DGP with spatial autoregressive error dependence

Full size table

5 Application to real data

5.1 Demand for cigarettes

Baltagi and Levin (1986) and Baltagi (2021) consider the problem of the determinants of demand for cigarettes across $n$ = 46 US States over $T$ = 30 years. In a panel analysis and using data measured in real terms and in logarithmic form, they regress per capita sales of cigarettes to people aged 14 and above on the average retail price and per capita disposable income inter alia. The analysis here uses the same data (starting at year 2, so $T$ = 29) and is based on a very simple dynamic panel data model in which the key elements are the relationship between consumer demand ($D$), prices ($P$) and income ($Y$). A priori theory suggests that demand will fall with higher prices and increase with higher income.

In this application there are three additions to this basic theory. One is that we allow the impact of income to be non-linear in logs, anticipating a possibly quadratic relationship between log income and log demand. As income increases, demand will rise before falling as one reaches higher income levels. The idea is that at the micro-level higher income and possibly better educated consumers will be more aware of, or sensitive to, health issues relating to cigarette consumption. This might be reflected in the aggregate State level data analysed here by a negative parameter on $Y^{2}$. A second addition to the basic demand model is the possibility that consumption in a given State may be affected by the minimum real price of cigarettes in other nearby States. At the micro-level, this could reflect cross-border travel whereby demand is transferred to contiguous States with lower prices (reflecting maybe taxation regime differences across States). This possibility is represented in the model by a spatially lagged variable, $\ln P_{{it}}^{L} = \sum\nolimits_{{j = 1}}^{n} {m_{{ij}} \log \left( {P_{{it}} } \right)}$ where ${\mathbf{M}}$ is an $n$ by $n$ matrix of weights applied to contiguous States according to total population of each State, subsequently standardised by dividing each cell by its row total. States with larger populations have higher weights, on the assumption that cross-border travel will be correspondingly greater. Thus $\ln P_{it}^{L}$ is an $n$ by 1 vector of weighted averages of log prices in States contiguous to each of the $n$ States. It acts as a substitute price attracting consumers from high-tax States, to nearby low-tax States (Baltagi 2021). The third and critical amendment to the basic demand model is the possibility of spatial error dependence. The assumption is that, notwithstanding controlling for individual State heterogeneity via differencing in the estimator, a host of omitted spatially correlated effects may affect the level of demand.

We attempt to capture the influence of these by invoking a SAR error dependence process as defined by Eq. (4). To summarise, the model specification is

$$\ln D_{it} = \beta_{0} + \gamma \ln D_{it - 1} + \beta_{1} \ln P_{it} + \beta_{2} \ln P_{it}^{L} + \beta_{3} \ln Y_{it} + \beta_{4} \ln Y_{it}^{2} + \varepsilon_{it}$$

(28)

Estimation proceeds by two-step difference GMM as outlined above. Rather than the moments in Eq. (9), the four exogenous variables are treated as 4 instruments in the classic one column for each instrumenting variable design. The other instruments based on the endogenous variable are the outcome of collapsing the standard set of GMM instruments (Holtz-Eakin et al. 1988), so that there is one instrument for each lag distance, rather than one for each time period and lag distance, giving 27 additional moments equations and 31 instruments in total.

The estimates in Table 4 indicate that each variable is statistically significant and correctly signed on the basis of no spatial correction to the conventional two-step standard errors. Evidently demand falls with rising prices locally, and lower prices in contiguous States reduce demand locally. Increasing income increases demand, but the negative coefficient on income-squared indicates a quadratic relation, with demand rising then falling as income increases. The conventional two-step standard errors with spatial correction reaffirm these interpretations. Note that the corrected standard errors are larger, but not sufficiently large to lead to failure to reject the null hypothesis of zero effect. The Windmeijer corrected standard errors are larger again, but with no modification due to spatial error correlation one would again formally accept that each of the variables has a significant impact on demand. However, introducing the spatial modification one would reject of the effect of prices in contiguous States on the basis of a two-tailed test, using a 5% level of risk and referring to the N(0,1) distribution. Nevertheless, it seems irrational to consider a two-sided alternative hypothesis, since theory states that higher prices in contiguous States boosts local demand, and that lower prices cause local demand to fall as a result of cross border travel. So there is a sound basis for a one-sided test of this specific line of theoretical reasoning. The fact that z = 1.60 equates to an upper tail p-value of 0.055 in the N(0,1) reference distribution. This is somewhat borderline in terms of significance, but since it is conditional on the definition of $\ln P_{it}^{L}$, one surely cannot rule out demand being transferred to nearby states with lower prices.

Table 4 Parameter estimates: demand for cigarettes

Full size table

5.2 EU regional productivity

A simple model of productivity levels across EU NUTS2 regions also illustrates the effect of spatial dependence in the errors on estimated parameter standard errors. The theoretical motivation for the model specification is the so-called Verdoorn law (Verdoorn 1949), which traditionally is a relationship between the growth of labour productivity and the growth of output in the manufacturing sector (Fingleton and McCombie 1998), but which has been applied across sectors where increasing returns to scale may also exist. One would also envisage spatial effects for various reasons. For example, causal effects may transgress regional boundaries, and there could be interdependence between the levels of productivity across regions through supply-chain effects, spillovers of technology, etc. We attempt to capture these effects via an SAR error dependence process in the model specification. Also, one might assume that a region’s productivity is affected by its labour productivity in the previous period, perhaps as a manifestation of localised variation in technical knowledge which is transmitted to the next period. This is captured by the presence of a lag term in the specification given by Eq. (29).

In this example data are available for 255 regions over the period 2001 to 2010, thus spanning the economic crisis of 2008.^{Footnote 3} The data^{Footnote 4} comprise employment levels, output as measured by gross value added (GVA) and gross fixed capital formation (GFCF) for each region and each year, with GVA and GFCF denominated in €2005 m. The model treats GVA and capital stock per worker as causal variables, with capital stock derived as a nonlinear function of GFCF, following the approach of Fingleton (2020). Productivity is GVA per worker. Given severe global economic instability through the period of analysis, year-specific dummy variables from 2003 to 2010 are also included as additional causes of regional fluctuations in productivity levels. Earlier years are omitted to avoid collinearity.

$$\ln p_{it} = \beta_{0} + \gamma \ln p_{it - 1} + \beta_{1} \ln GVA_{it} + \beta_{2} \ln cap_{it}^{{}} + \beta_{3} D2003_{it} + ... + \beta_{10} D2010_{it}^{{}} + \varepsilon_{it}$$

(29)

Again the assumed SAR error dependence process is defined by Eq. (4).

We assume that $\ln p,\ln GVA$ and $\ln cap$ are endogenous, since the levels of output and the capital stock per worker could respond to variations in productivity levels as well as being causes. This endogenous interaction is in the spirit of Kaldor (1957, 1981), who integrated the Verdoorn Law as part of a recursive causal chain of regional export-driven productivity. Estimation follows the standard GMM approach of differencing, thus eliminating the individual-specific effects $\mu_{i} ,i = 1,...,255$ from the compound error process. Differencing log levels means that the estimator is in terms of exponential growth rates, which is the traditional Verdoorn Law specification. Lagging the endogenous right hand side variables sufficiently creates instrumental variables that satisfy the moments equations so that, following Eq. (8), $\begin{array}{*{20}c} {\sum\nolimits_{i} {\ln p_{il} \Delta \nu_{it} = 0,} } & {\sum\nolimits_{i} {\ln GVA_{il} \Delta \nu_{it} = 0} } \\ \end{array}$ and $\sum\nolimits_{i} {\ln cap_{il} \Delta \nu_{it} = 0}$. Collapsing the lagged instruments amounts to 24 instruments, and introducing the 8 exogenous year dummies as one column for each instrumenting variable, gives 32 instruments in total.

Table 5 shows the resulting parameter estimates and the standard errors and z-ratios. With neither the spatial correction nor the Windmeijer correction, the conventional two-step estimates indicate that there is a highly significant lag parameter $\gamma$, so that productivity is dependent on productivity in the previous year, which is suggestive of inter-temporal spillovers of technical knowledge as proposed above. Also productivity depends on output, with the elasticity indicating that a 1% point increase in output causes productivity to increase by about 0.6% of a percentage point, which is close to the elasticity of 0.5% typical of the Verdoorn law. The estimated coefficient of capital stock per worker is evidently significantly negative, which is counter-intuitive in that one would expect productivity to increase as capital stock per worker increases. There are also some significant year dummies, particularly close to the economic crisis of the 2008/9, where global shocks caused a significant reduction in productivity across all regions. There is in addition significant positive error correlation, as indicated by $\hat{\rho }$ and the z-ratio, which has implications for the standard error estimates.

Table 5 Parameter estimates: EU regional productivity

Full size table

With the spatial correction, the conventional two-step standard error estimates increase and the z-ratios diminish, although most parameter estimates remain significantly different from zero. Applying the Windmeijer correction has an even larger impact on estimates standard errors, but not enough to eliminate the causal effects of lagged productivity, GVA and year dummies for 2008/9, which remain significant. The effect of capital stock per worker is only significant if one accepts a lower tail p-value of about 0.055 as sufficiently small to indicate significance. However, modification of the Windmeijer correction to also allow for spatial error dependence further increases standard error estimates and does render insignificant the counter-intuitive negative impact of capital stock per worker. GVA retains its significance, providing further evidence in support of the Verdoorn Law, and the effect of the global shock of 2009 is also significant, but the effect of lagged productivity is now rather borderline, with an upper tail probability of about 0.03 when referred to the N(0,1) distribution.

6 Conclusion

Very often data are analysed by dynamic panel data methods in which the individuals are located in space and there is inherent spatial dependence in the data. One approach to capturing spatial dependence is to introduce it as part of the error term in the model, though there are more elaborate alternatives, such as also introducing contemporaneous and lagged spatial lags of regressors (including the dependent variable), as illustrated in Baltagi et al. (2019). Failure to capture these spatial effects can lead to bias both in point estimates and in estimated standard errors. In this paper for purposes of simplicity and clarity we focus solely on spatial dependence in the error process. The contribution to the literature of this paper is the development of a modified Windmeijer (2005) correction that is corrected both for bias due to estimated parameters being used to calculate the efficient weight matrix, and additionally for the presence of spatial error dependence. The empirical examples demonstrate that unacknowledged positive spatial dependence can lead to downward bias in estimated parameter standard errors and incorrect inference. Given the pervasiveness of spatial error dependence, one should see similar bias across a range of GMM estimators.

Notes

Non-parametric heteroscedasticity and autocorrelation consistent estimation in a spatial framework (or SHAC estimation) was introduced by Kelejian and Prucha (2007) with extension to panel data Schmidt and Tran (2014). A natural and direct introduction to parametric estimation is given by Baltagi et al. (2019).
While this is novel material, its position in the Appendix means that there is no interruption to the flow for the more general reader.
Accessible data with the same geography are not available over the more recent period.
Provided, with thanks, by the Cambridge Econometrics European Regional Economic database.

References

Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to employment. Rev Econ Stud 58:277–297
Article Google Scholar
Baltagi BH (2021) Econometric analysis of panel data, 6th edn. Springer, New York
Book Google Scholar
Baltagi BH, Levin D (1986) Estimating dynamic demand for cigarettes using panel data: the effects of bootlegging, taxation and advertising reconsidered. Rev Econ Stat 68:148–155
Article Google Scholar
Baltagi BH, Fingleton B, Pirotte A (2014) Estimating and forecasting with a dynamic spatial panel data model. Oxford Bull Econ Stat 76:112–136
Article Google Scholar
Baltagi BH, Fingleton B, Pirotte A (2019) A time-space dynamic panel data model with spatial moving average errors. Reg Sci Urban Econ 76:13–31
Article Google Scholar
Doornik JA, Arellano M, Bond S (2001) Panel data estimation using DPD for Ox, http://www.nuff.ox.ac.uk/Users/Doornik/
Fingleton B (2020) Exploring Brexit with dynamic spatial panel models: some possible outcomes for employment across the EU regions. Ann Region Sci 64:455–491
Article Google Scholar
Fingleton B, Le Gallo J (2008) Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: finite sample properties. Pap Reg Sci 87:319–339
Article Google Scholar
Fingleton B, McCombie JS (1998) Increasing returns and economic growth: some evidence for manufacturing from the European Union regions. Oxf Econ Pap 50:89–105
Article Google Scholar
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50:1029–1054
Article Google Scholar
Holtz-Eakin D, Newey W, Rosen HS (1988) Estimating vector autoregressions with panel data. Econometrica 56:1371–1395
Article Google Scholar
Hwang J, Kang B, Lee S (2022) A doubly corrected robust variance estimator for Linear GMM. Journal of Econometrics 229(2):276–298
Article Google Scholar
Kaldor N (1957) A model of economic growth. Econ J 67:591–624
Article Google Scholar
Kaldor N (1981) The role of increasing returns, technical progress and cumulative causation in the theory of international trade and economic growth. Econ Appl XXXIV:593–617
Kapoor M, Kelejian HH, Prucha IR (2007) Panel data models with spatially correlated error components. J Econom 140:97–130
Article Google Scholar
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Financ Econ 17:99–121
Article Google Scholar
Kelejian HH, Prucha IR (2007) HAC estimation in a spatial framework. J Econom 140:131–154
Article Google Scholar
Newey W, West K (1987) A simple positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55:703–708
Article Google Scholar
Roodman D (2009) How to do Xtabond2: an introduction to difference and system GMM in stata. Stata J 9(1): 86–136
Schmidt JR, Tran HPD (2014) The SHAC estimator in panel data with group-specific spatial lags. Lett Spat Resour Sci 7:61–71
Article Google Scholar
Verdoorn PJ (1949) Fattori che regolano lo sviluppo della produttività del lavoro’, L’Industria, vol 1, pp 3–10. Reproduced as ‘On the factors determining the growth of labour productivity. In: McCombie JS, Pugno M, Soro B (eds) Productivity growth and economic performance: essays on Verdoorn’s law, Palgrave, Macmillan UK, pp 28–36
Windmeijer F (2005) A finite sample correction for the variance of linear efficient two-step GMM estimators. J Econom 126:25–51
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Land Economy, 19 Silver Street, Cambridge, CB3 9EP, UK
Bernard Fingleton

Authors

Bernard Fingleton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernard Fingleton.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Starting from Eq. (19) and building on Windmeijer (2005) in particular and Roodman (2009),

$${\hat{\mathbf{\beta }}}_{2} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$

and we obtain a similar expression for the (expected) true value

$${{\varvec{\upbeta}}} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x\beta }}$$

Writing ${\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{{\hat{\beta }_{1} }} {\mathbf{Z}} = {\hat{\mathbf{W}}}$ and subtracting, we obtain this expression which has the same variance as ${\hat{\mathbf{\beta }}}_{2}$,

$$g(\Delta {\mathbf{y,\hat{\Phi }}}_{{\hat{\beta }_{1} }} ) = {\hat{\mathbf{\beta }}}_{2} - {{\varvec{\upbeta}}} = - \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}\left( {{\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{{\hat{\beta }_{1} }} {\mathbf{Z}}} \right)^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}\left( {{\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{{\hat{\beta }_{1} }} {\mathbf{Z}}} \right)^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}}$$

(30)

In which $\Delta {{\varvec{\upvarepsilon}}}$ is unobserved. Denote $g(\Delta {\mathbf{y,\hat{\Phi }}}_{{\beta_{{}} }} )$ for a similar expression for an estimator based on ${\hat{\mathbf{\Phi }}}_{\beta }$. Thus

$$g(\Delta {\mathbf{y,\hat{\Phi }}}_{{\beta_{{}} }} ) = {\tilde{\mathbf{\beta }}}_{2} - {{\varvec{\upbeta}}} = - \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}}$$

(31)

Also we can write a similar expression for the one-step GMM estimator as

$${\hat{\mathbf{\beta }}}_{1} - {{\varvec{\upbeta}}} = - \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{ZW}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{ZW}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}}$$

From Eq. (30), a first-order Taylor expansion of $({\mathbf{Z^{\prime}\hat{\Phi }}}_{{\hat{\beta }_{1} }} {\mathbf{Z}})$ around the true unobserved ${{\varvec{\upbeta}}}$, which holds for linear moment conditions, gives

$$g(\Delta y,{\hat{\mathbf{\Phi }}}_{{\hat{\beta }_{1} }} ) \approx g(\Delta y,{\hat{\mathbf{\Phi }}}_{\beta } ) + \left. {\frac{\partial }{{\partial \hat{\beta }}}g(\Delta y,{\hat{\mathbf{\Phi }}}_{{\hat{\beta }}} )} \right|_{{\hat{\beta } = \beta }} \left( {{\hat{\mathbf{\beta }}}_{1} - {{\varvec{\upbeta}}}} \right)$$

(32)

Writing $\left. {{\mathbf{D}} = \frac{\partial }{{\partial \hat{\beta }}}(\Delta y,{\hat{\mathbf{\Phi }}}_{{\hat{\beta }}} )} \right|_{{\hat{\beta } = \beta }}$ and using the product rule applied to $g(\Delta {\mathbf{y,\hat{\Phi }}}_{{\beta_{{}} }} )$ one obtains

$$\begin{aligned} {\mathbf{D}} & = - \partial (\left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{{\beta_{{}} }} {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}})/\partial \beta \\ {\mathbf{D}} & = - \partial \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} /\partial \beta (\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{{\beta_{{}} }} {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}}) \\ & \quad - \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\mathbf{x}}} \right)^{ - 1} \partial (\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {{\varvec{\upvarepsilon}}})/\partial \beta \\ \end{aligned}$$

But applying the first order conditions for minimizing the GMM criterion function $\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{{^{\prime } }} \Delta {\hat{\mathbf{\varepsilon }}}_{2} = {\mathbf{0}}$, one can write

$$\begin{aligned} {\mathbf{D}} & = - \left( {\Delta {\mathbf{x}}^{{^{\prime } }} {\mathbf{Z}}({\mathbf{Z}}^{{^{\prime } }} {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \partial (\Delta {\mathbf{x}}^{\prime } {\mathbf{Z}}({\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{\prime } \Delta {{\varvec{\upvarepsilon}}})/\partial \beta \\ {\mathbf{D}} & = - \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z}}({\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} (\Delta {\mathbf{x}}^{\prime } {\mathbf{Z}}({\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} \frac{{\delta ({\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})}}{\partial \beta }({\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}})^{ - 1} {\mathbf{Z}}^{\prime } \Delta {{\varvec{\upvarepsilon}}}) \\ \end{aligned}$$

Given ${\text{var}} ({\hat{\mathbf{\beta }}}_{2} ) = {\hat{\mathbf{V}}}_{0} = N\left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1}$ and substituting infeasible terms ${\mathbf{Z}}^{\prime } {\hat{\mathbf{\Phi }}}_{\beta } {\mathbf{Z}}$ and $\Delta {{\varvec{\upvarepsilon}}}$ with feasible approximations ${\hat{\mathbf{W}}}$ and $\Delta {\hat{\mathbf{\varepsilon }}}_{2} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{2}$, one obtains

$${\hat{\mathbf{D}}} = - \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} \left. {\frac{{\partial {\mathbf{W}}}}{{\partial \hat{\beta }}}} \right|_{{\hat{\beta } = \hat{\beta }_{1} }} {\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$$

(33)

With regard to calculating $\frac{{\partial {\hat{\mathbf{W}}}_{{_{{}} }} }}{{\partial \hat{\beta }_{{}} }}$, to recap and elaborate on what is in the main text,

$${\mathbf{W}} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } {\mathbf{\Omega Z}}} \right)$$

which as the inverse of the covariance matrix ${{\varvec{\Pi}}}$ (spectral density) of the sample moments (Hansen, 1982) produces optimal estimates of $\beta$. As noted by many, including Hansen (1982) and Newey and West (1987), ${{\varvec{\Pi}}}$ can be estimated in different ways depending on heteroscedasticity or serial correlation. Likewise spatial dependence also calls for an appropriate estimator. Accordingly,

$${{\varvec{\Omega}}} = \left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}} \right){{\varvec{\Phi}}}\left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}^{\prime } } \right)$$

where ${{\varvec{\Omega}}}$ is an $N(T - 2)$ by $N(T - 2)$ matrix, ${\mathbf{S}} \in \left\{ {{\mathbf{H}}_{1} ,{\mathbf{H}}_{2} } \right\}$ is an $N$ by $N$ matrix where ${\mathbf{H}}_{1} = \left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)^{ - 1}$ for SAR errors or ${\mathbf{H}}_{2} = \left( {{\mathbf{I}}_{N} - \lambda {\mathbf{M}}} \right)$ for SMA errors, ${\mathbf{I}}_{T - 2}$ is an identity matrix of dimension $T - 2$.

Also ${{\varvec{\Omega}}}$ depends on the $N(T - 2)$ by $N(T - 2)$ matrix

$${{\varvec{\Phi}}} = \left[ {\left( {\Delta {{\varvec{\upnu}}}} \right)\left( {\Delta {{\varvec{\upnu}}}} \right)^{\prime } } \right] \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$

Estimation of ${{\varvec{\Phi}}}$ is made possible by replacing Δv by differenced residuals $\Delta {\hat{\mathbf{\varepsilon }}}_{1}$ obtained from the preliminary one-stage consistent estimation giving

$${\hat{\mathbf{W}}} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } \left( {{\mathbf{I}}_{T - 1} \otimes {\hat{\mathbf{S}}}} \right){\hat{\mathbf{\Phi }}}\left( {{\mathbf{I}}_{T - 1} \otimes {\hat{\mathbf{S}}}^{\prime } } \right){\mathbf{Z}}} \right) = \frac{1}{N(T - 2)}{\hat{\mathbf{Z}}}_{S}^{\prime } {\mathbf{\hat{\Phi }\hat{Z}}}_{S}$$

(34)

where ${\hat{\mathbf{Z}}}_{S} = \left( {{\mathbf{I}}_{T - 2} \otimes {\hat{\mathbf{S}}}} \right)^{\prime } {\mathbf{Z}}$. The derivatives of ${\hat{\mathbf{W}}}$ with respect to the $k + 1$ vector ${\hat{\mathbf{\beta }}}_{1}$, which comprises the coefficients $\hat{\gamma },\hat{\beta }_{1} , \ldots ,\hat{\beta }_{k}$,are obtained by applying the product rule to ${\hat{\mathbf{\Phi }}}$ in which

$$\left[ {\left( {\Delta {\hat{\mathbf{\nu }}}} \right)\left( {\Delta {\hat{\mathbf{\nu }}}} \right)^{\prime } } \right] = \left( {\Delta {\hat{\mathbf{\varepsilon }}}_{1} \Delta {\hat{\mathbf{\varepsilon }}}_{1}^{\prime } } \right) = \left( {\Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{1} } \right)\left( {\Delta y - \Delta {\mathbf{x\hat{\beta }}}_{1} } \right)^{\prime }$$

so the derivatives for individual $i$ are

$$\frac{\partial }{{\partial \beta_{j} }} = - {\hat{\mathbf{Z}}}_{{S{\mathbf{i}}}}^{\prime } \left( {{\mathbf{\Delta x}}_{{j{\mathbf{i}}}} \Delta {\hat{\mathbf{\varepsilon }}}_{1}^{\prime } + \Delta {\hat{\mathbf{\varepsilon }}}_{1} {\mathbf{\Delta x}}_{{j{\mathbf{i}}}}^{\prime } } \right){\hat{\mathbf{Z}}}_{{S{\mathbf{i}}}} \quad j = 1, \ldots ,k + 1$$

(35)

Here ${\hat{\mathbf{Z}}}_{Si}$ denotes the portion of the $N(T - 2)$ by $h$ matrix ${\hat{\mathbf{Z}}}_{S}$ for individual $i$, so is a $T - 2$ by $h$ matrix. Likewise, ${\mathbf{\Delta x}}_{{j{\mathbf{i}}}}$ is the set of observations for regressor $j$ observed for individual $i$ over time, so given $j$ it comprises a $T - 2$ by 1 vector. $\Delta {\hat{\mathbf{\varepsilon }}}_{1}$ is a $T - 2$ by 1 vector of first-step residuals estimated for individual $i$. From Eq. (33), each of the $k + 1$ $h$ by $h$ derivative matrices given by Eq. (35) is post-multiplied by the $h$ by 1 matrix ${\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$, in which ${\hat{\mathbf{W}}}$ is $h$ by $h$, ${\mathbf{Z}}$ is $N(T - 2)$ by $h$ and $\Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$ is the $N(T - 2)$ by 1 vector of second-stage residuals. Summing the outcome across $N$ individuals and dividing by $N$ gives an $h$ by $k + 1$ matrix ${\hat{\mathbf{G}}}$, with columns corresponding to the $k + 1$ regressors, thus

$${\hat{\mathbf{G}}}_{j} = - \frac{1}{N}\sum\limits_{i = 1}^{N} {{\hat{\mathbf{Z}}}_{{S{\mathbf{i}}}}^{\prime } } \left( {{\mathbf{\Delta x}}_{{j{\mathbf{i}}}} \Delta {\hat{\mathbf{\varepsilon }}}_{1i}^{\prime } + \Delta {\hat{\mathbf{\varepsilon }}}_{1i} {\mathbf{\Delta x}}_{{j{\mathbf{i}}}}^{\prime } } \right){\hat{\mathbf{Z}}}_{{S{\mathbf{i}}}} ({\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}} );\;j = 1, \ldots ,k + 1$$

(36)

So ${\hat{\mathbf{G}}} = - \left. {\frac{{\partial {\mathbf{W}}}}{{\partial \hat{\beta }}}} \right|_{{\hat{\beta } = \hat{\beta }_{1} }} {\hat{\mathbf{W}}}^{{{\mathbf{ - 1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$, and again from the arguments leading to Eq. (33), pre-multiplying ${\hat{\mathbf{G}}}$ by the $k + 1$ by $h$ matrix $- \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}}$ gives the $k + 1$ by $k + 1$ matrix

$${\hat{\mathbf{D}}} = - \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} {\hat{\mathbf{G}}}$$

Accordingly

$${\hat{\mathbf{D}}} = - \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} \left. {\frac{{\partial {\mathbf{W}}}}{{\partial \hat{\beta }}}} \right|_{{\hat{\beta } = \hat{\beta }_{1} }} {\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$$

(37)

As an adjunct to the above, numerical differentiation was also carried out. The first step in doing this was to calculate differenced errors $\Delta {\hat{\mathbf{\varepsilon }}}_{1}^{*}$ which differ from $\Delta {\hat{\mathbf{\varepsilon }}}_{1}$ because they were obtained using $\Delta {\hat{\mathbf{\varepsilon }}}_{1}^{*} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{1}^{*}$ where, for variable $j$, ${\hat{\mathbf{\beta }}}_{1j}^{*} = {\hat{\mathbf{\beta }}}_{1j}^{{}} + h$, where $h$ is a very small increment such as 0.001.

$${\text{Then}}\;{\hat{\mathbf{\Phi }}}^{*} = \left[ {\left( {\Delta {\hat{\mathbf{\varepsilon }}}_{1}^{*} } \right)\left( {\Delta {\hat{\mathbf{\varepsilon }}}_{1}^{*} } \right)^{\prime } } \right] \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$

and

$${\hat{\mathbf{W}}}^{*} = \frac{1}{N(T - 2)}{\hat{\mathbf{Z}}}_{S}^{\prime } {\hat{\mathbf{\Phi }}}^{*} {\hat{\mathbf{Z}}}_{S}$$

The matrix of numerical derivatives with respect to variable $j$ is $\Gamma_{j}^{*} = \left( {{\hat{\mathbf{W}}} - {\hat{\mathbf{W}}}^{*} } \right)/h$ and ${\hat{\mathbf{G}}}_{j}^{*} = - \Gamma_{j}^{*} ({\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}} )$ closely approximates, depending on $h$, ${\hat{\mathbf{G}}}_{j}$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fingleton, B. Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence. J Spat Econometrics 3, 10 (2022). https://doi.org/10.1007/s43071-022-00029-4

Download citation

Received: 20 August 2022
Accepted: 18 September 2022
Published: 02 October 2022
DOI: https://doi.org/10.1007/s43071-022-00029-4

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence

Abstract

Similar content being viewed by others

Robust estimation and inference of spatial panel data models with fixed effects

The generalized spatial random effects model in R

Alternative Model Specifications for Big Datasets

1 Introducing spatial error dependence into a dynamic panel data model

2 GMM estimation

3 The Windmeijer correction corrected for spatial dependence

4 Numerical illustration

5 Application to real data

5.1 Demand for cigarettes

5.2 EU regional productivity

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Modifying the linear two-step Windmeijer correction for the presence of spatial error dependence

Abstract

Similar content being viewed by others

Robust estimation and inference of spatial panel data models with fixed effects

The generalized spatial random effects model in R

Alternative Model Specifications for Big Datasets

1 Introducing spatial error dependence into a dynamic panel data model

2 GMM estimation

3 The Windmeijer correction corrected for spatial dependence

4 Numerical illustration

5 Application to real data

5.1 Demand for cigarettes

5.2 EU regional productivity

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation