1 Introducing spatial error dependence into a dynamic panel data model

The generalized method of moments (GMM) estimator is a very general approach to estimation with widespread application, which includes two-stage least squares (2SLS) as one member of the class of GMM estimators of the linear model, but in this paper, without loss of generality, we focus on a specific version of GMM applicable to dynamic panel data modelling. More generally, GMM can be applied to estimate parameters of linear or nonlinear models, with parameters chosen so as to satisfy moments conditions as closely as possible. So parameters are chosen which give the best fit to a set of equations, each of which sets a sample moment to zero. Invariably with over-identification these moments conditions cannot be satisfied simultaneously, so the approach involves minimising a quadratic objective function to achieve the best fit. While GMM can be applied to non-linear moments conditions, when all the moments conditions are linear in parameters, one typically obtains an improvement in the variance estimate (Windmeijer 2005, p. 29). This paper adopts a linear in parameters approach. GMM can be applied to linear cross-sectional models in which the sample covariance of regression error and exogenous variables is as small as possible, in other words one is minimising a quadratic function which is in terms of the regression residuals per se, the exogenous variables and an appropriate weight matrix. In the case of dynamic panel data models in which there is a compound error process which captures both individual heterogeneity and a remainder idiosyncratic error component, a commonly used approach is the GMM estimator in the model in first differences (see Arellano and Bond 1991). In this case the moments conditions are in terms of differences in residuals (to avoid correlation between regressors and residuals per se), exogenous variables and the weight matrix, which define the quadratic objective function. This naturally extends to the dynamic panel data model under consideration here, with the additional extension to accommodate spatial error dependence. One-step and two-step variants, in which the second step utilises residuals from the first-step, are also options. While the focus for the paper is on a particular two-step GMM estimator, the Windmeijer correction, which seeks to avoid the severe downward bias in the estimated asymptotic standard errors of the efficient two-step GMM estimator, and the modification proposed in this paper, it should be emphasised that spatial error dependence is undoubtedly present in data across the range of GMM estimators. For example in the cross-sectional regression context, Kelejian and Prucha (1998) give a generalized spatial 2SLS procedure for estimating a spatial autoregressive model with autoregressive disturbances. Fingleton and Le Gallo (2008) extend their feasible spatial 2SLS estimator to allow for several endogenous regressors, and also introduce spatial moving average (SMA) error dependence.

Consider first a simple dynamic panel data model

$$y_{it} = \gamma y_{it - 1} + \beta x_{it} + \varepsilon_{it} ;\;i = 1, \ldots ,N,t = 1, \ldots ,T$$
(1)

in which there are \(N\) individuals /regions/locations and \(T\) times, \(x\) is an exogenous variable, \(\gamma ,\beta\) are scalar parameters to be estimated. The error term is compound, thus

$$\varepsilon_{it} = \mu_{i} + \nu_{it}$$
(2)

where \(\mu_{i}\) is a set of individual effects, one for each of the \(N\) individuals, controlling for unobserved time-invariant heterogeneity across individuals. The term \(\nu_{it}\) varies both by region and by time, and represents other, unpredictable, random effects. The assumption is that each \(\mu_{i}\) and \(\nu_{it}\) is a random draw from independent and identically distributed distributions thus \(\mu_{i} \sim iid(0,\sigma_{\mu }^{2} )\) and \(\nu_{it} \sim iid(0,\sigma_{\nu }^{2} )\) with \(\mu_{i}\) and \(\nu_{it}\) independent of each other and among themselves. Given \(\sigma_{\mu }^{2} > 0\) there is individual heterogeneity with \(\mu_{i}\) capturing individual effects and also regional variation in unobserved effects.

A more general specification written in matrix terms is

$${\mathbf{y}}_{t} = \gamma {\mathbf{y}}_{t - 1} + {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + {{\varvec{\upvarepsilon}}}_{t}$$
(3)

In which \({\mathbf{y}}_{t}\) is an \(N\) by 1 vector, \({\tilde{\mathbf{x}}}_{t}\) is an \(N\) by \(k\) matrix of exogenous regressors, \(\beta\) is a \(k\) by 1 vector of coefficients, and \({{\varvec{\upvarepsilon}}}_{t}\) is a vector of errors at time \(t\).

Spatial dependenceFootnote 1 could be introduced into the errors by a spatial autoregressive error process, which is the most widely adopted on the spatial econometrics literature. Given an N by N connectivity matrix M, with spatial autoregressive (SAR) error dependence \({\mathbf{H}}_{1} = \left( {{\mathbf{I}}_{N} - \hat{\rho }{\mathbf{M}}} \right)^{ - 1}\) where \({\mathbf{I}}_{N}\) is an \(N\) by \(N\) identity matrix, and with SMA error dependence \({\mathbf{H}}_{2} = \left( {{\mathbf{I}}_{N} - \hat{\lambda }{\mathbf{M}}} \right)\), as in Baltagi et al. (2019). Below we use \({\mathbf{S}} \in \left\{ {{\mathbf{H}}_{1} ,{\mathbf{H}}_{2} } \right\}\) to refer to either version of \({\mathbf{H}}\).

Accordingly, given \(N\) locations, for SAR error dependence it is assumed that in each period,

$$\begin{aligned} {{\varvec{\upvarepsilon}}}_{t} & = \rho {\mathbf{M\varepsilon }}_{t} + {{\varvec{\upxi}}}_{t} \\ {{\varvec{\upvarepsilon}}}_{t} & = \left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)^{ - 1} {{\varvec{\upxi}}}_{t} \\ {{\varvec{\upvarepsilon}}}_{t} & = {\mathbf{H}}_{1} ({{\varvec{\upmu}}} + {{\varvec{\upnu}}}_{t} ) \\ \end{aligned}$$
(4)

The diagonal elements of \({\mathbf{M}}\) are zeros since a location cannot be a neighbour of itself, \(\left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)\) is non-singular and \({\mathbf{M}}\) is uniformly bounded in absolute value. This error process implies that a shock at location j is transmitted to all \(N\) locations, as shown, assuming \(\left| \rho \right|\)< 1 and the rows of \({\mathbf{M}}\) sum to 1, by the expansion

$$\left( {{\mathbf{I}}_{N} - \rho {\mathbf{M}}} \right)^{ - 1} {{\varvec{\upxi}}}_{t} = \left( {\sum\limits_{i = 0}^{\infty } {{\mathbf{M}}^{i} } \rho^{i} } \right){{\varvec{\upxi}}}_{t} = {{\varvec{\upxi}}}_{t} + \rho {\mathbf{M\xi }}_{t} + \rho^{2} {\mathbf{M}}^{2} {{\varvec{\upxi}}}_{t} + \rho^{3} {\mathbf{M}}^{3} {{\varvec{\upxi}}}_{t} + \cdots$$
(5)

In Eq. (5), \({\mathbf{M}}^{0} = {\mathbf{I}}_{N}\), \({\mathbf{M}}^{2}\) is the matrix product of \({\mathbf{M}}\) and \({\mathbf{M}}\),and \({\mathbf{M}}^{i}\) is the matrix product of \({\mathbf{M}}\) and \({\mathbf{M}}^{i - 1}\). A shock at \(j\) is felt directly at location \(j\), with an indirect effect due to \(\rho {\mathbf{M\xi }}_{t}\) affecting all pairs of locations with non-zero cells in \({\mathbf{M}}\). Shocks extend beyond these local impacts as a result of transmission via neighbours of neighbours, ultimately affecting all \(N\) locations. Working via neighbours of neighbours of neighbours, shock effect rebound to \(j\) so that the full impact of a shock at \(j\) is the initial shock plus the shock effects feeding back from other locations.

In contrast, an SMA error process is given by

$$\begin{gathered} {{\varvec{\upvarepsilon}}}_{t} = \left( {{\mathbf{I}}_{N} - \lambda {\mathbf{M}}} \right){{\varvec{\upxi}}}_{t} \hfill \\ {{\varvec{\upvarepsilon}}}_{t} = {\mathbf{H}}_{2} ({{\varvec{\upmu}}} + {{\varvec{\upnu}}}_{t} ) \hfill \\ \end{gathered}$$
(6)

Which implies that a shock at \(j\) will only affect locations that are directly connected by non-zero elements of \({\mathbf{M}}\), so shock effects are local rather than global. The consequence of spatially dependent errors is that the parameter standard errors differ from those obtained assuming independent errors.

2 GMM estimation

Estimation of linear GMM panel data regressions involves first differences to avoid dynamic panel bias, eliminating the individual effects \(\mu_{i}\) which would otherwise be correlated with the time lag of the dependent variable. First differencing Eq. (3) gives

$$\Delta {\mathbf{y}}_{t} = \gamma \Delta {\mathbf{y}}_{t - 1} + \Delta {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + \Delta {{\varvec{\upvarepsilon}}}_{t} = \gamma \Delta {\mathbf{y}}_{t - 1} + \Delta {\tilde{\mathbf{x}}}_{t} {{\varvec{\upbeta}}} + {\mathbf{S}}\Delta {{\varvec{\upnu}}}_{t}$$
(7)

and moments equations thus

$$\sum\limits_{i} {y_{il} \Delta \nu_{it} = 0} ,\;\;\forall i,\;l = 0,\;1, \ldots ,t - 2,t = 2,3, \ldots ,T$$
(8)

Additional GMM-style moments equations for the \(j = 1,...,k\) exogenous regressors \({\tilde{\mathbf{x}}}_{t}\) are given by

$$\sum\limits_{i} {\tilde{x}_{j,im} \Delta \nu_{it} } = 0, \, \forall i,j,\;m = 1, \ldots ,T\,t = 2, \ldots ,T$$
(9)

In order to reduce the number of instruments the \({\tilde{\mathbf{x}}}_{t}\) can also be treated as \(k\) instruments in the classic one column for each instrumenting variable design for the matrix of instruments \({\mathbf{Z}}\). The initial step is to obtain consistent estimates of the \(k\) + 1 by 1 vector \({{\varvec{\upbeta}}}_{0}\) [\(\gamma ,{{\varvec{\upbeta}}}\)] using an IV or GM estimator, so as to obtain consistent estimates of residuals.

For example, let

$${\tilde{\mathbf{W}}}_{0} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } ({{\varvec{\Lambda}}} \otimes {\mathbf{I}}_{N} ){\mathbf{Z}}} \right)$$
(10a)

and

$${\hat{\mathbf{W}}}_{0} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } ({\hat{\mathbf{\Lambda }}}_{1} ){\mathbf{Z}}} \right)$$
(10b)

Also

$${\mathbf{\hat{\tilde{\beta }}}}_{0} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\tilde{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\tilde{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$
(11a)

and

$${\hat{\mathbf{\beta }}}_{0} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{0}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$
(11b)

In Eqs. (11a and 11b), \(\Delta {\mathbf{x}} = [\Delta {\mathbf{y}} \, \Delta {\tilde{\mathbf{x}}}]\) is an \(N(T - 2)\) by \(k\) + 1 matrix and \({\mathbf{Z}}\) is an \(N(T - 2)\) by \(h \ge k + 1\) matrix of \(h\) instruments and \((T - 2)\) by \((T - 2)\) matrix \({{\varvec{\Lambda}}}\) has 2’s on the main diagonal, − 1’s on the first off-diagonals and zeros elsewhere.

The efficiency of \({\tilde{\mathbf{W}}}_{0}\) depends on \(\nu_{it}\) being i.i.d. (Windmeijer 2005 p. 32), but the resulting \(N(T - 2)\) by 1 vector of residuals \(\Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\tilde{\beta }}}}_{0}\) allows more robust estimates given by Eqs. (10b) and (11b) involving the \(N(T - 2)\) by \(N(T - 2)\) matrix

$${\hat{\mathbf{\Lambda }}}_{1} = \left( {\Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0} \Delta {\mathbf{\hat{\tilde{\varepsilon }}}}_{0}^{\prime } } \right) \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$

where \({\mathbf{J}}_{T - 2} = \iota_{T - 2} \iota^{\prime}_{T - 2}\) and \(\iota_{T - 2}\) is a \((T - 2)\) by 1 vector of ones.

Using (10b) in Eq. (11b) gives estimates of residuals \(\Delta {\hat{\mathbf{\varepsilon }}}_{0} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{0}\) that embody spatial dependence, so on this basis a GM estimator of λ is the solution of sample moments using nonlinear least squares, as shown by Baltagi et al (2019). Likewise, the residuals are used to obtain consistent estimates of the autoregressive parameter \(\rho\) based on the Kapoor et al. (2007) approach, as in Baltagi et al (2014). Given \({\mathbf{S}}\), one obtains an ‘initial’ weight matrix as an estimate of the covariance matrix of the moment conditions.

$${\hat{\mathbf{W}}}_{1} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } \left( {{{\varvec{\Lambda}}} \otimes {\hat{\mathbf{S}}\mathbf{S}}^{\prime } } \right){\mathbf{Z}}} \right)$$
(12)

Then the ‘first- step’ parameter estimates are given by

$${\hat{\mathbf{\beta }}}_{1} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{1}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}_{1}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$
(13)

Using the Moore–Penrose pseudo-inverse throughout maintains the symmetry and positive definitiveness of the weight matrix estimates.

Equation (13) gives the first-step residuals

$$\Delta {\hat{\mathbf{\varepsilon }}}_{1} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{1} = \Delta {\hat{\mathbf{\nu }}}$$
(14)

In the second step, \({\hat{\mathbf{W}}}_{1}\) is replaced by its robust version,

$${\hat{\mathbf{W}}} = \frac{1}{N(T - 2)}\left( {{\mathbf{Z}}^{\prime } {\mathbf{\hat{\Omega }Z}}} \right)$$
(15)

and

$${{\varvec{\Omega}}} = \left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}} \right){{\varvec{\Phi}}}\left( {{\mathbf{I}}_{T - 2} \otimes {\mathbf{S}}^{\prime } } \right)$$
(16)

where \({{\varvec{\Omega}}}\) is an \(N(T - 2)\) by \(N(T - 2)\) matrix and \({\mathbf{I}}_{T - 2}\) is an identity matrix of dimension \(T - 2\). Also \({{\varvec{\Omega}}}\) depends on the \(N(T - 2)\) by \(N(T - 2)\) matrix

$${{\varvec{\Phi}}} = \left[ {\left( {\Delta {{\varvec{\upnu}}}} \right)\left( {\Delta {{\varvec{\upnu}}}} \right)^{\prime } } \right] \odot \left( {{\mathbf{J}}_{T - 2} \otimes {\mathbf{I}}_{N} } \right)$$
(17)

Also (see Arellano and Bond 1991; Doornik et al. 2001; Roodman 2009; Hwang et al. 2022)

$${\text{var}} ({\hat{\mathbf{\beta }}}_{1} ) = {\hat{\mathbf{V}}}_{01} = N\left( {\Delta {\tilde{\mathbf{x}}}^{\prime } {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\tilde{\mathbf{x}}}} \right)^{ - 1} \Delta {\tilde{\mathbf{x}}}^{\prime } {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{\hat{W}\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{{^{\prime } }} \Delta {\tilde{\mathbf{x}}}\left( {\Delta {\tilde{\mathbf{x}}}^{{^{\prime } }} {\mathbf{Z\hat{W}}}_{{\mathbf{1}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\tilde{\mathbf{x}}}} \right)^{ - 1}$$
(18)

The \(h\) by \(h\) matrix \({\hat{\mathbf{W}}}\) is the optimal weight matrix which is used in the second step of linear two-step GMM.

The vector of two-step parameter estimates is

$${\hat{\mathbf{\beta }}}_{2} = \left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1} \Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{ - 1} {\mathbf{Z}}^{\prime } \Delta {\mathbf{y}}$$
(19)

With two-step residuals

$$\Delta {\hat{\mathbf{\varepsilon }}}_{2} = \Delta {\mathbf{y}} - \Delta {\mathbf{x\hat{\beta }}}_{2}$$
(20)

Accordingly,

$${\text{var}} ({\hat{\mathbf{\beta }}}_{2} ) = {\hat{\mathbf{V}}}_{0} = N\left( {\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\mathbf{x}}} \right)^{ - 1}$$
(21)

The standard errors of the parameters resulting from the two-step spatial GMM estimator, or conventional standard errors, are given by the \(k\) + 1 by 1 vector

$${\text{conventional }}s.e.\left( {{\hat{\mathbf{\beta }}}_{2} } \right) = \sqrt {diag({\hat{\mathbf{V}}}_{0} )}$$
(22)

3 The Windmeijer correction corrected for spatial dependence

Given that the estimated asymptotic standard errors of the efficient, two-step, GMM estimator can be downward biased in small samples, Windmeijer(2005), corrects for the bias, due to the presence of estimated parameters in the efficient weight matrix, by applying a Taylor series expansion leading to the expression

$${\hat{\mathbf{V}}}_{W} = {\hat{\mathbf{V}}}_{0} + {\mathbf{\hat{D}\hat{V}}}_{0} + {\hat{\mathbf{V}}}_{0} {\hat{\mathbf{D}}}^{\prime } + {\mathbf{\hat{D}\hat{V}}}_{01} {\hat{\mathbf{D}}}^{\prime }$$
(23)

in which

$${\hat{\mathbf{D}}} = - \frac{{{\hat{\mathbf{V}}}_{0} }}{N}\Delta {\mathbf{x}}^{\prime } {\mathbf{Z\hat{W}}}^{{ - {\mathbf{1}}}} \left. {\frac{{\partial {\mathbf{W}}}}{{\partial \hat{\beta }}}} \right|_{{\hat{\beta } = \hat{\beta }_{1} }} {\hat{\mathbf{W}}}^{{ - {\mathbf{1}}}} {\mathbf{Z}}^{\prime } \Delta {\hat{\mathbf{\varepsilon }}}_{{\mathbf{2}}}$$
(24)

and

$${\text{Windmeijer}}\;s.e.\left( {{\hat{\mathbf{\beta }}}_{2} } \right) = \sqrt {diag\left( {{\hat{\mathbf{V}}}_{{\mathbf{W}}} } \right)}$$
(25)

A summary focussing on the computation of \({\mathbf{D}}\) is provided in the Appendix.Footnote 2 Integral to this is \({\mathbf{W}}\) as estimated by Eq. (15) which embodies spatial dependence, but assuming \(\rho = 0\) or \(\lambda = 0\) in \({\mathbf{S}}\) eliminates spatial error dependence. Alternatively, instead of \({\mathbf{V}}_{W}\) one might opt for \({\mathbf{V}}_{{\mathbf{0}}}\) as in Eq. (22), thus ignoring the Windmeijer correction. Overall therefore we have four alternative standard errors. First, what we might term the naïve conventional standard error based on \({\mathbf{V}}_{{\mathbf{0}}}\) and an assumption that \(\rho = 0\) or \(\lambda = 0\). Secondly the spatially corrected conventional standard error also applies \({\mathbf{V}}_{{\mathbf{0}}}\) given by Eq. (21) but with \({\hat{\mathbf{W}}}\) and \({\hat{\mathbf{S}}}\) based on \(\hat{\rho },\hat{\lambda } \ne 0\). Thirdly we have the classic Windmeijer correction given by Eq. (23) but with \(\hat{\rho },\hat{\lambda } = 0\). Finally the spatially corrected Windmeijer correction also applies Eq. (23) but \({\mathbf{V}}_{{\mathbf{0}}}\), \({\mathbf{V}}_{{{\mathbf{01}}}}\) and \({\mathbf{D}}\) are estimated using \({\hat{\mathbf{W}}}\) and \({\hat{\mathbf{S}}}\) as determined by \(\hat{\rho },\hat{\lambda } \ne 0\). To save space we focus on the first, third and fourth of these in the Monte Carlo simulation, but all four are reported in the empirical examples.

4 Numerical illustration

To show the impact of different assumptions regarding the error process on the conventional and Windmeijer parameter standard error estimates, a Monte Carlo approach is adopted with mean estimates based on 100 replications for each combination of assumptions. Throughout the matrix inverse is obtained using the Moore–Penrose pseudo-inverse to allow for asymmetric non-positive definite weight matrices, so the replications are an attempt to moderate any resulting inaccuracy. The simulations are based on four exogenous regressors and the lagged dependent variable, thus

$$\begin{aligned} y_{it} & = \gamma y_{it - 1} + \beta_{1} x_{1it} + \beta_{2} x_{2it} + \beta_{3} x_{3it} + \beta_{4} x_{4it} + \varepsilon_{it} ;\;i = 1, \ldots ,N,t = 1, \ldots ,T \\ \xi_{it} & = \mu_{i} + \nu_{it} \\ {{\varvec{\upvarepsilon}}}_{t} & = {\mathbf{S\xi }}_{t} \\ \end{aligned}$$
(26)

With \(\mu_{i} \sim iid.N(0,\sigma_{\mu }^{2} )\) and \(\nu_{it} \sim iid.N(0,\sigma_{\nu }^{2} )\) and \(N\) = 100 and \(T\) = 10.

The exogenous variables are generated using

$$x_{kit} = \delta x_{kit - 1} + \upsilon_{it}\,k = 1, \ldots ,4$$
(27)

In which \(\upsilon_{it} \sim iid.N(0,\sigma_{\upsilon }^{2} )\).

\({\mathbf{M}}\) is an \(N\) by \(N\) matrix of non-stochastic weights defining the error interdependence across \(N\) locations. The tabulated outcomes use a ‘r ahead and r behind’ connectivity matrix popularised by Kelejian and Prucha (1998), in which it is assumed that r = 5. This is subsequently row normalised so that each row sums to 1. Thus each row of spatial matrix \({\mathbf{M}}(i.e. \, m_{ij} ,{\text{ with }}i = 1,...,N,j = 1,...,N)\) has up to 10 non-zero elements (5 ahead and 5 behind each with equal weights), with zeros on the main diagonal and elsewhere.

In practice, for the purposes of simulation, various alternative true parameter values have been considered, but the results presented subsequently are based on \(\sigma_{\mu }^{2} = 0.2,0.8\), \(\sigma_{\nu }^{2} = 0.8,0.2\), \(\rho = 0.25,0.75,\lambda = - 0.25, - 0.75\) with \(\delta = 0.8,\sigma_{\upsilon }^{2} = 0.9\),\(\gamma = 0.2,\beta_{1} = 1,\beta_{2} = 0.5,\beta_{3} = 0.75\) and \(\beta_{4} = 1.0\). Given these true values of the various parameters, and drawing in each replication from the normal distributions defined by \(\sigma_{\mu }^{2} ,\sigma_{\nu }^{2}\), Eq. (26) is repeatedly calculated to give realisations of \(y_{it} ,i = 1,...,N;t = 1,...,T\) and the conventional and Windmeijer parameter standard error estimates. Each of the 100 replications for each parameter combination discards an initial 51 simulation outcomes in order to minimise the effect of initial values at \(t = - 50\) of zero (i.e. simulation outcomes for \(t = - 50, - 49,...,0\) are discarded).

For each replication, estimates of \(\gamma ,\rho ,\lambda ,{{\varvec{\upbeta}}}_{1} ,{{\varvec{\upbeta}}}_{2}\) and hence \({\mathbf{W}}\) and the parameter standard error estimates were obtained by one-step and two-step GMM estimation collapsing the HENR instruments for \(y_{it}\) to give \(T - 2\) instruments plus 4 instruments in the classic one column for each instrumenting variable design for the 4 exogenous variables, giving 12 instruments in total. Note that replications producing parameter estimates indicating non-stationarity were rejected. Hence we require that \(- 1 < \hat{\gamma } < 1\) and \(e_{\min }^{ - 1} < \left[ {\hat{\rho }{\text{ or }}\hat{\lambda }} \right] < e_{\max }^{ - 1}\) where \(e_{\min }\) and \(e_{\max }\) are the minimum and maximum real characteristic roots of \({\mathbf{M}}\).

Tables 1, 2, 3 illustrate how the standard errors increase as we transition from zero error dependence in the DGP and the corresponding adjusted standard error estimators to positive error dependence. The corrected Windmeijer correction is always larger than the conventional standard errors reflecting the correction for bias in estimating the efficient weight matrix embodied in Windmeijer. These higher standard error estimates are maintained as the variance of the error components increase.

Table 1 Mean standard errors assuming DGP with no spatial error dependence
Table 2 Mean standard errors assuming DGP with spatial moving average error dependence
Table 3 Mean standard errors assuming DGP with spatial autoregressive error dependence

5 Application to real data

5.1 Demand for cigarettes

Baltagi and Levin (1986) and Baltagi (2021) consider the problem of the determinants of demand for cigarettes across \(n\) = 46 US States over \(T\) = 30 years. In a panel analysis and using data measured in real terms and in logarithmic form, they regress per capita sales of cigarettes to people aged 14 and above on the average retail price and per capita disposable income inter alia. The analysis here uses the same data (starting at year 2, so \(T\) = 29) and is based on a very simple dynamic panel data model in which the key elements are the relationship between consumer demand (\(D\)), prices (\(P\)) and income (\(Y\)). A priori theory suggests that demand will fall with higher prices and increase with higher income.

In this application there are three additions to this basic theory. One is that we allow the impact of income to be non-linear in logs, anticipating a possibly quadratic relationship between log income and log demand. As income increases, demand will rise before falling as one reaches higher income levels. The idea is that at the micro-level higher income and possibly better educated consumers will be more aware of, or sensitive to, health issues relating to cigarette consumption. This might be reflected in the aggregate State level data analysed here by a negative parameter on \(Y^{2}\). A second addition to the basic demand model is the possibility that consumption in a given State may be affected by the minimum real price of cigarettes in other nearby States. At the micro-level, this could reflect cross-border travel whereby demand is transferred to contiguous States with lower prices (reflecting maybe taxation regime differences across States). This possibility is represented in the model by a spatially lagged variable, \(\ln P_{{it}}^{L} = \sum\nolimits_{{j = 1}}^{n} {m_{{ij}} \log \left( {P_{{it}} } \right)}\) where \({\mathbf{M}}\) is an \(n\) by \(n\) matrix of weights applied to contiguous States according to total population of each State, subsequently standardised by dividing each cell by its row total. States with larger populations have higher weights, on the assumption that cross-border travel will be correspondingly greater. Thus \(\ln P_{it}^{L}\) is an \(n\) by 1 vector of weighted averages of log prices in States contiguous to each of the \(n\) States. It acts as a substitute price attracting consumers from high-tax States, to nearby low-tax States (Baltagi 2021). The third and critical amendment to the basic demand model is the possibility of spatial error dependence. The assumption is that, notwithstanding controlling for individual State heterogeneity via differencing in the estimator, a host of omitted spatially correlated effects may affect the level of demand.

We attempt to capture the influence of these by invoking a SAR error dependence process as defined by Eq. (4). To summarise, the model specification is

$$\ln D_{it} = \beta_{0} + \gamma \ln D_{it - 1} + \beta_{1} \ln P_{it} + \beta_{2} \ln P_{it}^{L} + \beta_{3} \ln Y_{it} + \beta_{4} \ln Y_{it}^{2} + \varepsilon_{it}$$
(28)

Estimation proceeds by two-step difference GMM as outlined above. Rather than the moments in Eq. (9), the four exogenous variables are treated as 4 instruments in the classic one column for each instrumenting variable design. The other instruments based on the endogenous variable are the outcome of collapsing the standard set of GMM instruments (Holtz-Eakin et al. 1988), so that there is one instrument for each lag distance, rather than one for each time period and lag distance, giving 27 additional moments equations and 31 instruments in total.

The estimates in Table 4 indicate that each variable is statistically significant and correctly signed on the basis of no spatial correction to the conventional two-step standard errors. Evidently demand falls with rising prices locally, and lower prices in contiguous States reduce demand locally. Increasing income increases demand, but the negative coefficient on income-squared indicates a quadratic relation, with demand rising then falling as income increases. The conventional two-step standard errors with spatial correction reaffirm these interpretations. Note that the corrected standard errors are larger, but not sufficiently large to lead to failure to reject the null hypothesis of zero effect. The Windmeijer corrected standard errors are larger again, but with no modification due to spatial error correlation one would again formally accept that each of the variables has a significant impact on demand. However, introducing the spatial modification one would reject of the effect of prices in contiguous States on the basis of a two-tailed test, using a 5% level of risk and referring to the N(0,1) distribution. Nevertheless, it seems irrational to consider a two-sided alternative hypothesis, since theory states that higher prices in contiguous States boosts local demand, and that lower prices cause local demand to fall as a result of cross border travel. So there is a sound basis for a one-sided test of this specific line of theoretical reasoning. The fact that z = 1.60 equates to an upper tail p-value of 0.055 in the N(0,1) reference distribution. This is somewhat borderline in terms of significance, but since it is conditional on the definition of \(\ln P_{it}^{L}\), one surely cannot rule out demand being transferred to nearby states with lower prices.

Table 4 Parameter estimates: demand for cigarettes

5.2 EU regional productivity

A simple model of productivity levels across EU NUTS2 regions also illustrates the effect of spatial dependence in the errors on estimated parameter standard errors. The theoretical motivation for the model specification is the so-called Verdoorn law (Verdoorn 1949), which traditionally is a relationship between the growth of labour productivity and the growth of output in the manufacturing sector (Fingleton and McCombie 1998), but which has been applied across sectors where increasing returns to scale may also exist. One would also envisage spatial effects for various reasons. For example, causal effects may transgress regional boundaries, and there could be interdependence between the levels of productivity across regions through supply-chain effects, spillovers of technology, etc. We attempt to capture these effects via an SAR error dependence process in the model specification. Also, one might assume that a region’s productivity is affected by its labour productivity in the previous period, perhaps as a manifestation of localised variation in technical knowledge which is transmitted to the next period. This is captured by the presence of a lag term in the specification given by Eq. (29).

In this example data are available for 255 regions over the period 2001 to 2010, thus spanning the economic crisis of 2008.Footnote 3 The dataFootnote 4 comprise employment levels, output as measured by gross value added (GVA) and gross fixed capital formation (GFCF) for each region and each year, with GVA and GFCF denominated in €2005 m. The model treats GVA and capital stock per worker as causal variables, with capital stock derived as a nonlinear function of GFCF, following the approach of Fingleton (2020). Productivity is GVA per worker. Given severe global economic instability through the period of analysis, year-specific dummy variables from 2003 to 2010 are also included as additional causes of regional fluctuations in productivity levels. Earlier years are omitted to avoid collinearity.

$$\ln p_{it} = \beta_{0} + \gamma \ln p_{it - 1} + \beta_{1} \ln GVA_{it} + \beta_{2} \ln cap_{it}^{{}} + \beta_{3} D2003_{it} + ... + \beta_{10} D2010_{it}^{{}} + \varepsilon_{it}$$
(29)

Again the assumed SAR error dependence process is defined by Eq. (4).

We assume that \(\ln p,\ln GVA\) and \(\ln cap\) are endogenous, since the levels of output and the capital stock per worker could respond to variations in productivity levels as well as being causes. This endogenous interaction is in the spirit of Kaldor (1957, 1981), who integrated the Verdoorn Law as part of a recursive causal chain of regional export-driven productivity. Estimation follows the standard GMM approach of differencing, thus eliminating the individual-specific effects \(\mu_{i} ,i = 1,...,255\) from the compound error process. Differencing log levels means that the estimator is in terms of exponential growth rates, which is the traditional Verdoorn Law specification. Lagging the endogenous right hand side variables sufficiently creates instrumental variables that satisfy the moments equations so that, following Eq. (8), \(\begin{array}{*{20}c} {\sum\nolimits_{i} {\ln p_{il} \Delta \nu_{it} = 0,} } & {\sum\nolimits_{i} {\ln GVA_{il} \Delta \nu_{it} = 0} } \\ \end{array}\) and \(\sum\nolimits_{i} {\ln cap_{il} \Delta \nu_{it} = 0}\). Collapsing the lagged instruments amounts to 24 instruments, and introducing the 8 exogenous year dummies as one column for each instrumenting variable, gives 32 instruments in total.

Table 5 shows the resulting parameter estimates and the standard errors and z-ratios. With neither the spatial correction nor the Windmeijer correction, the conventional two-step estimates indicate that there is a highly significant lag parameter \(\gamma\), so that productivity is dependent on productivity in the previous year, which is suggestive of inter-temporal spillovers of technical knowledge as proposed above. Also productivity depends on output, with the elasticity indicating that a 1% point increase in output causes productivity to increase by about 0.6% of a percentage point, which is close to the elasticity of 0.5% typical of the Verdoorn law. The estimated coefficient of capital stock per worker is evidently significantly negative, which is counter-intuitive in that one would expect productivity to increase as capital stock per worker increases. There are also some significant year dummies, particularly close to the economic crisis of the 2008/9, where global shocks caused a significant reduction in productivity across all regions. There is in addition significant positive error correlation, as indicated by \(\hat{\rho }\) and the z-ratio, which has implications for the standard error estimates.

Table 5 Parameter estimates: EU regional productivity

With the spatial correction, the conventional two-step standard error estimates increase and the z-ratios diminish, although most parameter estimates remain significantly different from zero. Applying the Windmeijer correction has an even larger impact on estimates standard errors, but not enough to eliminate the causal effects of lagged productivity, GVA and year dummies for 2008/9, which remain significant. The effect of capital stock per worker is only significant if one accepts a lower tail p-value of about 0.055 as sufficiently small to indicate significance. However, modification of the Windmeijer correction to also allow for spatial error dependence further increases standard error estimates and does render insignificant the counter-intuitive negative impact of capital stock per worker. GVA retains its significance, providing further evidence in support of the Verdoorn Law, and the effect of the global shock of 2009 is also significant, but the effect of lagged productivity is now rather borderline, with an upper tail probability of about 0.03 when referred to the N(0,1) distribution.

6 Conclusion

Very often data are analysed by dynamic panel data methods in which the individuals are located in space and there is inherent spatial dependence in the data. One approach to capturing spatial dependence is to introduce it as part of the error term in the model, though there are more elaborate alternatives, such as also introducing contemporaneous and lagged spatial lags of regressors (including the dependent variable), as illustrated in Baltagi et al. (2019). Failure to capture these spatial effects can lead to bias both in point estimates and in estimated standard errors. In this paper for purposes of simplicity and clarity we focus solely on spatial dependence in the error process. The contribution to the literature of this paper is the development of a modified Windmeijer (2005) correction that is corrected both for bias due to estimated parameters being used to calculate the efficient weight matrix, and additionally for the presence of spatial error dependence. The empirical examples demonstrate that unacknowledged positive spatial dependence can lead to downward bias in estimated parameter standard errors and incorrect inference. Given the pervasiveness of spatial error dependence, one should see similar bias across a range of GMM estimators.