1 Introduction

Recently, Fingleton et al. (2016) introduced a generalization of the Kapoor et al. (2007) (hereafter KKP) generalized moments (GM) procedure to multidimensional panel data models assuming that the disturbances follow a first-order spatial autoregressive (SAR) process, which includes a nested random effects structure, namely SAR-NRE. They refer to this specification as a panel data model with spatially nested random effects disturbances. They derive a spatial feasible generalized least squares (S-FGLS) estimator for the model’s regression parameters which uses the GM parameter estimates of the SAR parameter and the variance components of the disturbance process, namely GM-S-FGLS. This estimator is based on a spatial counterpart to the Cochrane–Orcutt transformation, as well as on transformations which are used in the estimation of classical error component models.

In this paper, we consider a more general multidimensional panel data model which includes a spatial lag and where the disturbances are assumed to follow a spatial moving average (SMA) process (local spatial spillover effects) in the spirit of Fingleton (2008). This structure constitutes an alternative to incorporating spatial lags on the explanatory variables. In the cross-sectional case, when the model contains a spatial lag dependent variable, Kelejian and Prucha (1998, 1999) suggest a 2SLS procedure. They propose that the instrument set should be kept to a low order to avoid linear dependence and retain full column rank for the matrix of instruments, and thus recommend that (XWX) should be used, if the number of regressors is large. Inclusion of spatial lags of the explanatory variables could have a major impact on the performance of the estimation procedures if one were to keep to this recommendation. Pace et al. (2012) show that instrumental variable estimation suffers greatly in situations where spatial lags of the explanatory variables (WX) are included in the model specification. The reason is that this requires the use of (\(W^2X,W^3X,\ldots \)) as instruments, in place of the conventional instruments that rely on WX, and this appears to result in a weak instrument problem. Our motivation for the adoption of a SMA specification of the error process, which has been largely neglected in spatial econometrics, is that it mitigates against the problem for instrumental variable estimation identified by Pace et al. (2012). Naturally the choice of this specification should be predicated on the applied researcher, at a preliminary stage, examining the nature of local spillovers in order to establish its appropriateness for the empirical application at hand.

We propose a three-stage procedure to estimate the parameters. In a first stage, the spatial lag panel data model is estimated using an instrumental variable (IV) estimator. In a second stage, a GM approach is developed to estimate the SMA parameter and the variance components of the disturbance process using IV residuals from the first stage. In a third stage, to purge the equation of the specific structure of the disturbances, a Cochrane–Orcutt-type transformation combined with the IV principle is applied. This leads to the GM spatial IV estimator and the regression parameter estimates of the spatial lag model. Monte Carlo simulations show that our estimates are not very different in terms of root mean square error compared to those produced by maximum likelihood (ML).

The outline of the paper is as follows: Sect. 2 presents the spatial lag panel data model with spatial moving average nested random effects errors, and Sect. 3 focuses on estimation methods. This section introduces a spatial GM instrumental variable approach to estimate the parameters of the model. Section 4 presents the Monte Carlo design and describes the Monte Carlo results. Section 5 illustrates our approach using an application to EU regional employment data for regions nested within countries. The last section concludes.

2 The spatial model

Our point of departure is a three-dimensional model that combines two different types of spatial interaction effects, i.e. endogenous interaction effects via a spatial lag on the dependent variable and interaction effects among the disturbances via a spatial moving average (SMA) process on the error term. The notation is as follows: the dependent variable \(y_{ijt}\) is observed along three indices, with \(i=1,\ldots ,N\), \(j=1,\ldots ,M_{i}\) and \(t=1,\ldots ,T\). N denotes the number of groups. \(M_{i}\) denotes the number of individuals in group i, so in total there are \(S=\sum _{i=1}^{N}M_{i}\) individuals. Since this model allows for an unequal number of individuals across the N groups, it is therefore unbalanced in the spatial dimension, although it is balanced in the time dimension. Hence, the model describes a hierarchical structure with the index j pertaining to individuals that are nested within the N groups. Assuming that spatial autocorrelation only takes place at the individual level and that the slope coefficients are homogenous, the model can be written as:

$$\begin{aligned} y_{ijt}=\rho \sum _{g=1}^{N}\sum _{h=1}^{M_{g}}w_{ij,gh}y_{ght} +x_{ijt}\beta +\varepsilon _{ijt}, \end{aligned}$$
(1)

where \(y_{ijt}\) is the dependent variable; \(x_{ijt}\) is a (\(1\times K)\) vector of explanatory exogenous variables; \(\beta \) represents a (\(K\times 1\)) vector of parameters to be estimated; and \(\varepsilon _{ijt}\) is the disturbance, the properties of which will be discussed below.

The weight \(w_{ij,gh}=w_{k,l}\) is the (\(k=ij;l=gh\)) element of the spatial matrix \(W_{S}\) with ij denoting individual j within group i, and similarly for gh. Thus, \(k,l=1,\ldots ,S\) and \(W_{S}\) is a (\(S\times S\)) matrix of known spatial weights which has zero on the leading diagonal and is usually row-normalized so that for row k, \(\sum _{g=1}^{N}\sum _{h=1}^{M_{g}}w_{k,gh}=1\), although as we will illustrate in the empirical example other normalizations are permissible. We maintain the standard assumption concerning the weight matrix, i.e. \(W_{S}\) is assumed non-stochastic, and its row and column sums are required to be uniformly bounded in absolute value. \(\rho \) is the spatial lag parameter to be estimated. This coefficient is bounded numerically to ensure spatial stationarity, i.e. \(e_{\mathrm{min}}^{-1}<\rho <1\) where \(e_{\mathrm{min}}\) is the minimum real characteristic root of \(W_{S}\).

In this paper, we consider the case of disturbances \(\varepsilon _{ijt}\) that are contemporaneously correlated through a moving average process at the individual level:

$$\begin{aligned} \varepsilon _{ijt}=u_{ijt} - \lambda \sum _{g=1}^{N}\sum _{h=1}^{M_{g}}m_{ij,gh}u_{ght}. \end{aligned}$$
(2)

The weight \(m_{ij,gh}\) is an element of the spatial matrix \(M_{S}\) which satisfies the same assumptions as for \(W_{S}\). For simplicity in the following, we assume that \(M_{S}=W_{S}\). \(\lambda \) is the spatial moving average parameter to be estimated. \(u_{ijt}\) is assumed to be i.i.d.(\(0,\sigma _{u}^{2}\)). Spatial heterogeneity is captured through a random effects structure for the errors \(u_{ijt}\): it contains an unobserved permanent unit-specific error component \(\alpha _{i}\), a nested permanent unit-specific error component \(\mu _{ij}\) together with a remainder error component \(v_{ijt}\). Hence, we envisage a time-invariant group effect applying equally to all individuals nested within a group, time-invariant individual group-specific effects and transient effects that vary at random across groups, individuals and time. More formally:

$$\begin{aligned} u_{ijt}=\alpha _{i}+\mu _{ij}+v_{ijt}, \end{aligned}$$
(3)

in which \(\alpha _{i}\) is the unobservable group-specific time-invariant effect which is assumed to be i.i.d.\(N\left( 0,\sigma _{\alpha }^{2}\right) \); \(\mu _{ij}\) is the nested effect of individual j within the ith group which is assumed to be i.i.d.\(N\left( 0,\sigma _{\mu }^{2}\right) \) and \(v_{ijt}\) is the remainder term which is also assumed to be i.i.d.\(N\left( 0,\sigma _{v}^{2}\right) \). The \(\alpha _{i}\)’s, \(\mu _{ij}\)’s and \(v_{ijt}\)’s are independent of each other and among themselves.

In contrast to the classical literature on panel data, grouping the data by periods rather than units is more convenient when we consider spatial autocorrelation. For a cross section t, Eqs. (1), (2) and (3) can be written as:

$$\begin{aligned} y_{t}=\rho W_{S}y_{t}+x_{t}\beta +\varepsilon _{t}, \end{aligned}$$
(4)

where \(y_{t}\) is of dimension \(\left( S\times 1\right) \) and \(x_{t}\) is an \(\left( S\times K\right) \) matrix of explanatory variables that are assumed to be exogenous and non-stochastic and have elements uniformly bounded in absolute value. The first-order moving average error process \(\epsilon _{t}\) is given by

$$\begin{aligned} \varepsilon _{t} = u_{t} - \lambda W_{S}u_{t}, \end{aligned}$$
(5)

with

$$\begin{aligned} u_{t}=\hbox {diag}\left( \iota _{M_{i}}\right) \alpha +\mu +v_{t}, \end{aligned}$$
(6)

where \(u_{t}\) is \(\left( S\times 1\right) \), \(\alpha \) is the vector of group effects of dimension \(\left( N\times 1\right) \), \(\mu ^{\top }=\left( \mu _{1}^{\top },\ldots ,\mu _{N}^{\top }\right) \), a vector of dimension \((1\times S)\), \(\mu _{i}^{\top }=\left( \mu _{i1},\ldots ,\mu _{iM_{i}}\right) \), a vector of dimension \((1\times M_{i})\), \(\iota _{M_{i}}\) is a vector of ones of dimension \(\left( M_{i}\times 1\right) \). By \(\hbox {diag}\left( \iota _{M_{i}}\right) \), we mean \(\hbox {diag}\left( \iota _{M_{1}},\ldots ,\iota _{M_{N}}\right) \). Finally \(v_{t}\) is of dimension \(\left( S\times 1\right) \).

Stacking the T cross sections gives

$$\begin{aligned} y=Z\delta +\varepsilon , \end{aligned}$$
(7)

and

$$\begin{aligned} \varepsilon =u - \lambda Wu, \end{aligned}$$
(8)

with \(Z=\left[ Wy,X\right] \) and \(\delta =\left[ \rho ,\beta \right] ^\top \). y and X are the vector and matrix of the dependent and explanatory variables (covariates), respectively, of size \((TS\times 1)\) and \((TS\times K)\); \(\beta \) is the vector of the slope parameters of size \((K\times 1)\); and finally \(\varepsilon \) is the vector of the disturbance terms of dimension (\(TS\times 1\)). Given that \(I_{T}\) is an identity matrix of dimension \((T\times T)\), then \(W=(I_{T}\otimes W_{S})\) is of size \((TS\times TS)\). Finally, for the full \(\left( TS\times 1\right) \) vector u, we have:

$$\begin{aligned} u=\left( \iota _{T}\otimes \hbox {diag}\left( \iota _{M_{i}}\right) \right) \alpha +\left( \iota _{T}\otimes I_{S}\right) \mu +v. \end{aligned}$$
(9)

In order to compute the GM-S-IV estimator of \(\delta \), which is described in Sect. 3.2, we need to obtain the inverse of the covariance matrix of u, which is \(\Omega _{u}^{-1}\). This is achieved by means of the spectral decomposition. Following Baltagi et al. (2014), the covariance matrix of \(u_{t}\) is:

$$\begin{aligned} E\left[ u_{t}u_{t}^{\top }\right] =\sigma _{\alpha }^{2}\hbox {diag} \left( J_{M_{i}}\right) +\left( \sigma _{\mu }^{2}+\sigma _{v}^{2}\right) I_{S} , \end{aligned}$$
(10)

where \(I_{S}\left( =\hbox {diag}\left( I_{M_{i}}\right) \right) \) is an identity matrix of dimension S. \(J_{M_{i}}=\left( \iota _{M_{i}}\iota _{M_{i}}^{\top }\right) \) is a matrix of ones of dimension \(\left( M_{i}\times M_{i}\right) \). The covariance matrix of u corresponds to:

$$\begin{aligned} \Omega _{u}= & {} \sigma _{\alpha }^{2}\left( Z_{\alpha }Z_{\alpha }^{\top }\right) +\sigma _{\mu }^{2}\left( Z_{\mu }Z_{\mu }^{\top }\right) +\sigma _{v}^{2}\left( I_{T}\otimes I_{S}\right) \nonumber \\= & {} \sigma _{\alpha }^{2}\left( J_{T}\otimes \hbox {diag}\left( J_{M_{i}}\right) \right) +\left( \sigma _{\mu }^{2}J_{T} +\sigma _{v}^{2}I_{T}\right) \otimes I_{S}, \end{aligned}$$
(11)

where \(Z_{\alpha }=\iota _{T}\otimes \hbox {diag}\left( \iota _{M_{i}}\right) \), \(Z_{\mu }=\iota _{T}\otimes I_{S}\) and \(J_{T}=\left( \iota _{T}\iota _{T}^{\top }\right) \) is a matrix of ones of dimension \(\left( T\times T\right) \). Replace \(J_{T}\) by its idempotent counterpart \(T\overline{J}_{T}\), \(J_{M_{i}}\) by \(M_{i}\overline{J}_{M_{i}}\) with \(\overline{J}_{T}=J_{T}/T\) and \(\overline{J}_{M_{i}}=J_{M_{i}}/M_{i}\). Also, define \(E_{T}=I_{T}-\overline{J}_{T},\) and \(E_{M_{i}}=I_{M_{i}}-\overline{J}_{M_{i}},\) and replace \(I_{T}\) by \(\left( E_{T}+\overline{J}_{T}\right) \), \(I_{M_{i}}\) by \(\left( E_{M_{i}}+\overline{J}_{M_{i}}\right) \). Collecting terms with the same matrices, one gets the spectral decomposition of \(\Omega _{u}\):

$$\begin{aligned} \Omega _{u}=\theta _{1}Q_{1}+\theta _{2}Q_{2}+\left( I_{T}\otimes \hbox {diag}\left( \theta _{3i}I_{M_{i}}\right) \right) Q_{3}, \end{aligned}$$
(12)

with

$$\begin{aligned} \theta _{1}=\sigma _{v}^{2}\text {, }\theta _{2}=T\sigma _{\mu }^{2}+\sigma _{v}^{2}\text {, }\theta _{3i}=M_{i}T\sigma _{\alpha }^{2}+T\sigma _{\mu }^{2} +\sigma _{v}^{2}. \end{aligned}$$
(13)

These equalities occur because of the definitionsFootnote 1 of \(Q_{1},Q_{2}\) and \(Q_{3}\). It turns out that \(Q_{1}\) relates to within transformation. \(Q_{2}\) and \(Q_{3}\) relate, respectively, to between and mean transformation matrices. More formally,

$$\begin{aligned} Q_{1}= & {} E_{T}\otimes I_{S}\text {, }Q_{2}=\overline{J}_{T}\otimes \hbox {diag}\left( E_{M_{i}}\right) , \end{aligned}$$
(14)
$$\begin{aligned} Q_{3}= & {} \overline{J}_{T}\otimes \hbox {diag}\left( \overline{J}_{M_{i}}\right) . \end{aligned}$$
(15)

The operators \(Q_{1}\), \(Q_{2}\) and \(Q_{3}\) are symmetric and idempotent, with their rank equal to their trace. Moreover, they are pairwise orthogonal and sum to the identity matrix. From (12), we can easily obtain \(\Omega _{u}^{-1}\) as:

$$\begin{aligned} \Omega _{u}^{-1}=\theta _{1}^{-1}Q_{1}+\theta _{2}^{-1}Q_{2}+\left( I_{T}\otimes \hbox {diag}\left( \theta _{3i}^{-1}I_{M_{i}}\right) \right) Q_{3}. \end{aligned}$$
(16)

For the full \(\left( TS\times 1\right) \) vector \(\varepsilon \), we then have:

$$\begin{aligned} \varepsilon =u - \lambda (I_{T}\otimes W_{S})u, \end{aligned}$$
(17)

or

$$\begin{aligned} \varepsilon =[I_{TS} - \lambda (I_{T}\otimes W_{S})]u=(I_{T}\otimes G_{S})u, \end{aligned}$$
(18)

where \(G_{S}=I_{S} - \lambda W_{S}\). The corresponding \(\left( TS\times TS\right) \) covariance matrix is given by:

$$\begin{aligned} \Omega _{\varepsilon }=A\Omega _{u}A^{\top }, \end{aligned}$$
(19)

where A is a block diagonal matrix equal to (\(I_{T}\otimes G_{S}\)). Following the properties of the matrices \(\Omega _{u}\) and A, we obtain the inverse covariance matrix of \(\varepsilon \) defined as:

$$\begin{aligned} {\Omega }_{\varepsilon }^{-1}{=}\left( {A}'\right) ^{-1}{\Omega }_{u}^{-1}{A}^{-1} . \end{aligned}$$
(20)

3 Estimation methods

The estimation methods of multidimensional spatial panel models are direct extensions of the ones that have been created for the standard spatial panel data econometrics. This means that two main approaches are used to estimate these models: one based on ML principle and the other one linked to method of moments procedures.

3.1 Maximum likelihood estimation

Upton and Fingleton (1985), Anselin (1988), LeSage and Pace (2009) and Elhorst (2014) provide the general framework for ML estimation of spatial models. Under normality of the disturbances, the log-likelihood function is

$$\begin{aligned} \ln L= & {} -\, \frac{TS}{2} \ln {(2\pi )} -\frac{1}{2}\ln \left| {\Omega }_{\varepsilon }\right| +T\ln \left| {D}_{S}\right| \nonumber \\&-\,\frac{1}{2}\left( D{y}-{X\beta } \right) ' {\Omega }_{\varepsilon }^{-1}\left( D {y}-{X\beta } \right) , \end{aligned}$$
(21)

where \(D_{S}=(I_{S}-\rho W_{S})\) and \(D=(I_{T}\otimes D_{S})\). For a SMA process for the disturbances \(\varepsilon \), and after some mathematical manipulations, we obtain

$$\begin{aligned} \ln L= & {} - \,\frac{TS}{2} \ln {(2\pi )} -\frac{1}{2}\ln \left| {\Omega }_{u} \right| -T\ln \left| {G}_{S}\right| +T\ln \left| {D}_{S} \right| \nonumber \\&-\,\frac{1}{2}\left( D{y}-{X\beta } \right) ' {\Omega }_{\varepsilon }^{-1}\left( D{y}-{X\beta }\right) . \end{aligned}$$
(22)

Let \(\gamma _{1}=\sigma _{\alpha }^{2}/\sigma _{v}^{2}\), \(\gamma _{2}=\sigma _{\mu }^{2}/\sigma _{v}^{2}\) and \({\Omega }_{\varepsilon }=\sigma _{v}^{2}{\Sigma }\), then the log-likelihood function (22) can be written asFootnote 2

$$\begin{aligned} \ln L= & {} -\frac{TS}{2}\ln {(2\pi )}-\frac{TS}{2}\ln \sigma _{v}^{2} -\frac{1}{2}\overset{N}{\underset{i=1}{\sum }}\ln \left( T\left( M_{i}\gamma _{1} +\gamma _{2}\right) +1\right) \nonumber \\&-\,\frac{1}{2}\underset{i=1}{\overset{N}{\sum }}\left( M_{i} -1\right) \ln \left( T\gamma _{2}+1\right) \nonumber \\&-\,T\underset{i=1}{\overset{N}{\sum }}\underset{j=1}{\overset{M_{i}}{\sum }} \ln \left( 1-\omega _{ij}\lambda \right) +T\underset{i=1}{\overset{N}{\sum }} \underset{j=1}{\overset{M_{i}}{\sum }}\ln \left( 1-\eta _{ij}\rho \right) \nonumber \\&-\,\frac{1}{2\sigma _{v}^{2}}\left( D{y}-{X\beta } \right) ' {\Sigma }^{-1}\left( D{y}-{X\beta } \right) . \end{aligned}$$
(23)

The first-order conditions for the parameters in (22) and (23) are intertwined which means that they are nonlinear, i.e. the equations cannot be solved analytically. Therefore, a numerical solution by means of an iterative procedure is needed in the spirit of Anselin (1988).

3.2 GM and instrumental variables

There are several issues with ML procedures. First, they call for explicit distributional assumptions, which may be difficult to satisfy, although quasi-ML (QML) approaches may to some extent allay this problem. Second, specifying and maximizing likelihood functions appropriate to extensions to more complex models may be problematic, especially if there are endogenous variables other than the spatial lag, as ML estimation is not possible when endogeneity is in implicit form. Finally, there are very computationally intensive. In view of the desirability of estimation approaches that avoid some of these challenges posed by ML, Kelejian and Prucha (1998, 1999) suggested an alternative instrumental variable estimation procedure for the cross-sectional spatial lag model also including a SAR process for the disturbances. This approach is based on a GM estimator of the parameter in the SAR process. The procedures suggested in Kelejian and Prucha (1998, 1999) are computationally feasible even for very large sample sizes. In a panel data context with a spatial error autoregressive process, KKP (2007) derive a GM estimator, which is computationally feasible even for large sample size, while Fingleton et al. (2016) extend this procedure to capture spatial autoregressive nested random effects errors. We follow this and adapt the moments conditions in order to consider SMA nested random effects errors.

3.2.1 Moments conditions

We follow Fingleton et al. (2016) to develop a GM approach leading to estimators of \(\lambda \), \(\sigma _{\alpha }^{2}\), \(\sigma _{\mu }^{2}\), \(\sigma _{v}^{2}\), or equivalently of \(\lambda \), \(\sigma _{\alpha }^{2}\), \(\theta _{2}\left( =T\sigma _{\mu }^{2}+\sigma _{v}^{2}\right) \) and \(\sigma _{v}^{2},\) relies on moments conditions related to \(E[u^{\top }Q_{i}u]\), \(E[\overline{u}^{\top }Q_{i}u]\), \(E[\overline{\overline{u}}^{\top }Q_{i}u]\), \(E[\overline{u}^{\top }Q_{i}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}\overline{\overline{u}}]\), \(i=1,2,3\). For notational convenience, we have

$$\begin{aligned} \overline{{\varepsilon }}= & {} \left( {I}_{T}\otimes {W}_{S}\right) {\varepsilon }, \end{aligned}$$
(24)
$$\begin{aligned} \overline{\overline{{\varepsilon }}}= & {} \left( {I}_{T}\otimes {W}_{S}\right) \overline{{\varepsilon }},\end{aligned}$$
(25)
$$\begin{aligned} \overline{{u}}= & {} \left( {I}_{T}\otimes {W}_{S}\right) {u},\end{aligned}$$
(26)
$$\begin{aligned} \overline{\overline{{u}}}= & {} \left( {I}_{T}\otimes {W}_{S}\right) \overline{{u}}. \end{aligned}$$
(27)

Following (17), we have

$$\begin{aligned} \varepsilon= & {} u - \lambda \overline{u},\end{aligned}$$
(28)
$$\begin{aligned} \overline{\varepsilon }= & {} \overline{u} - \lambda \overline{\overline{u}}. \end{aligned}$$
(29)

First, we compute the quadratic moments with respect to \(Q_1\):

$$\begin{aligned} \varepsilon ^{\top }Q_{1}\varepsilon= & {} (u - \lambda \overline{u})^{\top }Q_1(u - \lambda \overline{u}) \nonumber \\= & {} u^{\top }Q_1u+\lambda ^2\overline{u}^{\top }Q_1\overline{u} -2\lambda \overline{u}^{\top }Q_1u , \end{aligned}$$
(30)
$$\begin{aligned} \overline{\varepsilon }^{\top }Q_{1}\varepsilon= & {} (\overline{u} - \lambda \overline{\overline{u}})^{\top }Q_1(u - \lambda \overline{u}) \nonumber \\= & {} \overline{u}^{\top }Q_1u + \lambda ^2\overline{\overline{u}}^{\top }Q_1\overline{u} - \lambda \left[ \overline{u}^{\top }Q_1\overline{u} + \overline{\overline{u}}^{\top }Q_1u\right] , \end{aligned}$$
(31)
$$\begin{aligned} \overline{\varepsilon }^{\top }Q_{1}\overline{\varepsilon }= & {} (\overline{u} - \lambda \overline{\overline{u}})^{\top }Q_1(\overline{u} - \lambda \overline{\overline{u}}) \nonumber \\= & {} \overline{u}^{\top }Q_1\overline{u}+ \lambda ^2\overline{\overline{u}}^{\top }Q_1\overline{\overline{u}} - 2\lambda \overline{\overline{u}}^{\top }Q_1\overline{u}. \end{aligned}$$
(32)

Then, the expectations of the quadratic moments (30), (31), (32) depend on the moments \(E[u^{\top }Q_{1}u]\), \(E[\overline{u}^{\top }Q_{1}u]\), \(E[\overline{u}^{\top }Q_{1}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{1}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}\overline{\overline{u}}]\), \(E[\overline{\overline{u}}^{\top }Q_{1}u]\). After some computations,Footnote 3 these latter expectations are given by:

$$\begin{aligned} E[u^{\top }Q_{1}u]= & {} \sigma _{v}^{2}S\left( T-1\right) , \end{aligned}$$
(33)
$$\begin{aligned} E[\overline{u}^{\top }Q_{1}u]= & {} 0, \end{aligned}$$
(34)
$$\begin{aligned} E[\overline{u}^{\top }Q_{1}\overline{u}]= & {} \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}\right) , \end{aligned}$$
(35)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{1}\overline{u}]= & {} \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}^{\top }W_{S}\right) , \end{aligned}$$
(36)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{1}\overline{\overline{u}}]= & {} \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}^{\top }W_{S}W_{S}\right) , \end{aligned}$$
(37)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{1}u]= & {} \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}^{\top }\right) . \end{aligned}$$
(38)

Substituting (33) to (38) into (30), (31) and (32) gives:

$$\begin{aligned} E[\varepsilon ^{\top }Q_{1}\varepsilon ]= & {} \sigma _{v}^{2}S\left( T-1\right) \nonumber \\&+\, \lambda ^2\sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}\right) , \end{aligned}$$
(39)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{1}\varepsilon ]= & {} \lambda ^2\sigma _{v}^{2}\left( T-1\right) \text {tr} \left( W_{S}^{\top }W_{S}^{\top }W_{S}\right) \nonumber \\&-\, \lambda \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S} + W_{S}^{\top }W_{S}^{\top }\right) , \end{aligned}$$
(40)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{1}\overline{\varepsilon }]= & {} \sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top }W_{S}\right) \nonumber \\&+\, \lambda ^2\sigma _{v}^{2}\left( T-1\right) \text {tr}\left( W_{S}^{\top } W_{S}^{\top }W_{S}W_{S}\right) \nonumber \\&-\, \lambda \sigma _{v}^{2}2\left( T-1\right) \text {tr} \left( W_{S}^{\top }W_{S}^{\top }W_{S}\right) . \end{aligned}$$
(41)

We proceed in a similar fashion as a result of replacing \(Q_1\) in (30), (31) and (32) by \(Q_2\) and by \(Q_3\). The moments of \(E[u^{\top }Q_{i}u]\), \(E[\overline{u}^{\top }Q_{i}u]\), \(E[\overline{u}^{\top }Q_{i}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}\overline{u}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}\overline{\overline{u}}]\), \(E[\overline{\overline{u}}^{\top }Q_{i}u]\), \(i=2,3\), are:

$$\begin{aligned} E[u^{\top }Q_{2}u]= & {} \theta _{2} \left( S-N\right) , \end{aligned}$$
(42)
$$\begin{aligned} E[\overline{u}^{\top }Q_{2}u]= & {} \theta _{2} \text {tr}\left( W_{S}^{\bullet \top }\right) , \end{aligned}$$
(43)
$$\begin{aligned} E[\overline{u}^{\top }Q_{2}\overline{u}]= & {} \theta _{2}\text {tr} \left( W_{S}^{\bullet \top }W_{S}^{\bullet }\right) + T \sigma _{\alpha }^{2} \text {tr}\left( \Gamma W_{S}^{\bullet \top }W_{S}^{\bullet }\right) , \end{aligned}$$
(44)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{2}\overline{u}]= & {} \theta _{2}\text {tr} \left( W_{S}^{\bullet \bullet \top }W_{S}^{\bullet }\right) + T \sigma _{\alpha }^{2} \text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^{\bullet }\right) , \end{aligned}$$
(45)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{2}\overline{\overline{u}}]= & {} \theta _{2}\text {tr}\left( W_{S}^{\bullet \bullet \top }W_{S}^{\bullet \bullet }\right) + T \sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top } W_{S}^{\bullet \bullet }\right) , \end{aligned}$$
(46)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{2}u]= & {} \theta _{2}\text {tr}\left( W_{S}^{\bullet \bullet \top }\right) , \end{aligned}$$
(47)

and

$$\begin{aligned} E[u^{\top }Q_{3}u]= & {} N \theta _{2} + ST \sigma _{\alpha }^{2} , \end{aligned}$$
(48)
$$\begin{aligned} E[\overline{u}^{\top }Q_{3}u]= & {} \theta _{2}\text {tr}\left( W_{S}^{*\top }\right) + T \sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\top }\right) , \end{aligned}$$
(49)
$$\begin{aligned} E[\overline{u}^{\top }Q_{3}\overline{u}]= & {} \theta _{2}\text {tr}\left( W_{S}^{*\top } W_{S}^{*}\right) + T \sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{*\top }W_{S}^{*}\right) , \end{aligned}$$
(50)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{3}\overline{u}]= & {} \theta _{2}\text {tr}\left( W_{S}^{**\top } W_{S}^{*}\right) + T \sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{*}\right) , \end{aligned}$$
(51)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{3}\overline{\overline{u}}]= & {} \theta _{2}\text {tr}\left( W_{S}^{**\top } W_{S}^{**}\right) + T \sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{**}\right) , \end{aligned}$$
(52)
$$\begin{aligned} E[\overline{\overline{u}}^{\top }Q_{3}u]= & {} \theta _{2}\text {tr}\left( W_{S}^{**\top }\right) . \end{aligned}$$
(53)

Then, the use of these moments leads to:

$$\begin{aligned} E[\varepsilon ^{\top }Q_{2}\varepsilon ]= & {} \theta _{2}\left( S-N\right) \nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{\bullet \top } W_{S}^{\bullet }\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \top }W_{S}^{\bullet }\right) \right] \nonumber \\&-\, \lambda \theta _{2}2\text {tr}\left( W_{S}^{\bullet \top }\right) , \end{aligned}$$
(54)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{2}\varepsilon ]= & {} \theta _{2}\text {tr}\left( W_{S}^{\bullet \top }\right) \nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{\bullet \bullet \top }W_{S}^{\bullet }\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^{\bullet }\right) \right] \nonumber \\&-\, \lambda \left[ \theta _{2}\text {tr}\left( W_{S}^{\bullet \top }W_{S}^{\bullet } + W_{S}^{\bullet \bullet \top }\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \top }W_{S}^{\bullet }\right) \right] , \end{aligned}$$
(55)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{2}\overline{\varepsilon }]= & {} \theta _{2}\text {tr}\left( W_{S}^{\bullet \top }W_{S}^{\bullet }\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \top }W_{S}^{\bullet }\right) \nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{\bullet \bullet \top } W_{S}^{\bullet \bullet }\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^{\bullet \bullet }\right) \right] \nonumber \\&-\, \lambda \left[ \theta _{2}2\text {tr}\left( W_{S}^{\bullet \bullet \top } W_{S}^{\bullet }\right) +T\sigma _{\alpha }^{2}2\text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^{\bullet }\right) \right] , \end{aligned}$$
(56)

where \(\Gamma = \hbox {diag}(J_{M_i})\), \(W_{S}^{\bullet }=\hbox {diag}(E_{M_i}) W_S\), \(W_{S}^{\bullet \bullet }=\hbox {diag}(E_{M_i}) W_S W_S\) and

$$\begin{aligned} E\left[ \varepsilon ^{\top }Q_{3}\varepsilon \right]= & {} \theta _{2} N + \sigma _{\alpha }^{2} ST\nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{*\top }W_{S}^{*}\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{*\top }W_{S}^{*}\right) \right] \nonumber \\&-\, \lambda \left[ \theta _{2}2\text {tr}\left( W_{S}^{*\top }\right) +T\sigma _{\alpha }^{2}2\text {tr}\left( \Gamma W_{S}^{\top }\right) \right] , \end{aligned}$$
(57)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{3}\varepsilon ]= & {} \theta _{2}\text {tr}\left( W_{S}^{*\top }\right) +T\sigma _{\alpha }^{2} \text {tr}\left( \Gamma W_{S}^{\top }\right) \nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{**\top }W_{S}^{*} \right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{*}\right) \right] \nonumber \\&-\, \lambda \left[ \theta _{2}\text {tr}\left( W_{S}^{*\top }W_{S}^{*} + W_{S}^{**\top }\right) \right. \end{aligned}$$
(58)
$$\begin{aligned}&+\, \left. T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{*\top }W_{S}^{*}+ \Gamma W_{S}^{\top }W_{S}^{\top }\right) \right] , \end{aligned}$$
(59)
$$\begin{aligned} E[\overline{\varepsilon }^{\top }Q_{3}\overline{\varepsilon }]= & {} \theta _{2}\text {tr}\left( W_{S}^{*\top }W_{S}^{*}\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{*\top }W_{S}^{*}\right) \nonumber \\&+\, \lambda ^2\left[ \theta _{2}\text {tr}\left( W_{S}^{**\top } W_{S}^{**}\right) +T\sigma _{\alpha }^{2}\text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{**}\right) \right] \nonumber \\&-\, \lambda \left[ \theta _{2}2\text {tr}\left( W_{S}^{**\top } W_{S}^{*}\right) +T\sigma _{\alpha }^{2}2\text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{*}\right) \right] , \end{aligned}$$
(60)

where \(W_{S}^{*}=\hbox {diag}(\overline{J}_{M_i}) W_S\) and \(W_{S}^{**}=\hbox {diag}(\overline{J}_{M_i}) W_S W_S\). Overall, we obtain a system of nine equations involving the second moments of \(\varepsilon \), \(\overline{\varepsilon }\):

$$\begin{aligned} \Lambda \Upsilon - \gamma = 0, \end{aligned}$$
(61)

where

$$\begin{aligned} \Lambda = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} \kappa _1 \kappa _2 &{} \kappa _2 t_1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} \kappa _2 t_2 &{} \kappa _2 t_3 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ \kappa _2 t_1 &{} \kappa _2 t_4 &{} 2\kappa _2 t_2 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{}\kappa _3 &{} t_1^\bullet &{} 2 t_6^\bullet &{} 0 &{} \kappa _4 t_5^\bullet &{} 0\\ 0 &{} 0 &{} 0 &{} t_6^\bullet &{} t_7 &{} t_8^\bullet &{} 0 &{} \kappa _4 t_9 &{} \kappa _4 t_5^\bullet \\ 0 &{} 0 &{} 0 &{} t_1^\bullet &{} t_1^{\bullet \bullet } &{} 2 t_7 &{} \kappa _4 t_5^\bullet &{} \kappa _4 t_5^{\bullet \bullet } &{} 2\kappa _4 t_9^\bullet \\ 0 &{} 0 &{} 0 &{} \kappa _5 &{} t_1^*&{} 2 t_6^*&{} \kappa _1 \kappa _4 &{} \kappa _4 t_5^*&{} 2\kappa _4 t_{10} \\ 0 &{} 0 &{} 0 &{} t_6^*&{} t_{11} &{} t_8^*&{} \kappa _4 t_{10} &{} \kappa _4 t_9^*&{} \kappa _4 t_{12}\\ 0 &{} 0 &{} 0 &{} t_1^*&{} t_1^{**} &{} 2 t_{11} &{} \kappa _4 t_5^*&{} \kappa _4 t_5^{**} &{} 2 \kappa _4 t_9^{*}\\ \end{array}\right) , \end{aligned}$$
(62)

and

$$\begin{aligned} \Upsilon = \left( \begin{array}{c} \sigma ^2_v \\ {\lambda ^2 \sigma ^2_v} \\ -\lambda \sigma ^2_v \\ \theta _2 \\ \lambda ^2 \theta _2 \\ - \lambda \theta _2 \\ \sigma ^2_\alpha \\ \lambda ^2 \sigma ^2_\alpha \\ -\lambda \sigma ^2_\alpha \\ \end{array}\right) \text {, } \gamma = \left( \begin{array}{c} E\left[ \varepsilon ^{\top } Q_1 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_1 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_1 \overline{\varepsilon }\right] \\ E\left[ \varepsilon ^{\top } Q_2 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_2 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_2 \overline{\varepsilon }\right] \\ E\left[ \varepsilon ^{\top } Q_3 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_3 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_3 \overline{\varepsilon }\right] \\ \end{array}\right) , \end{aligned}$$
(63)

with \(\kappa _1=S\), \(\kappa _2=T-1\), \(\kappa _3=S-N\), \(\kappa _4=T\), \(\kappa _5=N\), \(t_1 = \text {tr}\left( W_{S}^{\top }W_{S}\right) \),\(t_2 = \text {tr}\left( W_{S}^{\top }W_{S}^{\top } W_{S}\right) \), \(t_3 = \text {tr}\left( W_{S}^{\top } W_{S} + W_{S}^{\top }W_{S}^{\top }\right) \), \(t_4 = \text {tr}\left( W_{S}^{\top }W_{S}^{\top } W_{S} W_{S}\right) \), \(t_1^\bullet = \text {tr}\left( W_{S}^{\bullet \top }W_{S}^\bullet \right) \), \(t_1^*= \text {tr}\left( W_{S}^{*\top }W_{S}^*\right) \), \(t_1^{\bullet \bullet } = \text {tr}\left( W_{S}^{\bullet \bullet \top }W_{S}^{\bullet \bullet }\right) \), \(t_1^{**} = \text {tr}\left( W_{S}^{**\top }W_{S}^{**}\right) \), \(t_5^\bullet = \text {tr}\left( \Gamma W_{S}^{\bullet \top }W_{S}^\bullet \right) \), \(t_5^*= \text {tr}\left( \Gamma W_{S}^{*\top }W_{S}^*\right) \), \(t_5^{\bullet \bullet } = \text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^{\bullet \bullet }\right) \), \(t_5^{**} = \text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^{**}\right) \), \(t_6^\bullet = \text {tr}\left( W_{S}^{\bullet \top }\right) \), \(t_6^*= \text {tr}\left( W_{S}^{*\top }\right) \), \(t_7 = \text {tr}\left( W_{S}^{\bullet \bullet \top }W_{S}^\bullet \right) \), \(t_8^\bullet = \text {tr}\left( W_{S}^{\bullet \top }W_{S}^\bullet + W_{S}^{\bullet \bullet \top } \right) \), \(t_8^*= \text {tr}\left( W_{S}^{*\top }W_{S}^*+ W_{S}^{**\top } \right) \), \(t_9^\bullet = \text {tr}\left( \Gamma W_{S}^{\bullet \bullet \top }W_{S}^\bullet \right) \), \(t_9^*= \text {tr}\left( \Gamma W_{S}^{**\top }W_{S}^*\right) \), \(t_{10} = \text {tr}\left( \Gamma W_{S}^\top \right) \), \(t_{11} = \text {tr}\left( W_{S}^{**\top }W_{S}^*\right) \) and \(t_{12} = t_5^{*} + \text {tr}\left( \Gamma W_{S}^\top W_{S}^\top \right) \).

In practice, to obtain the GM estimators of \(\lambda \), \(\sigma _{\alpha }^{2}\), \(\theta _{2}\) and \(\sigma _{v}^{2}\), we have to use the sample counterparts of the terms in Eq. (61), i.e. \(\widetilde{\Lambda }\) and \(\widetilde{\gamma }\). Nevertheless, to estimate \(\lambda \) and \(\sigma _{v}^{2}\), it is possible to use only the moments from (33) to (38). Then, the estimates of \(\theta _{2}\) and \(\sigma _{\alpha }^{2}\) follow from the moments (42) and (48). This estimator is called the unweighted GM estimator. In other words, the GM estimators of \(\lambda \) and \(\sigma _{v}^{2}\) are obtained from the reduced system

$$\begin{aligned} \Lambda ^\bullet \Upsilon ^\bullet - \gamma ^\bullet = 0, \end{aligned}$$
(64)

where

$$\begin{aligned} \Lambda ^\bullet = \left( \begin{array}{c@{\quad }c@{\quad }c} \kappa _1 \kappa _2 &{} \kappa _2 t_1 &{} 0\\ 0 &{} \kappa _2 t_2 &{} \kappa _2 t_3 \\ \kappa _2 t_1 &{} \kappa _2 t_4 &{} 2\kappa _2 t_2 \\ \end{array}\right) , \end{aligned}$$
(65)

and

$$\begin{aligned} \Upsilon ^\bullet = \left( \begin{array}{c} \sigma ^2_v \\ {\lambda ^2 \sigma ^2_v} \\ -\lambda \sigma ^2_v \\ \end{array}\right) \text {, } \gamma ^\bullet = \left( \begin{array}{c} E\left[ \varepsilon ^{\top } Q_1 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_1 \varepsilon \right] \\ E\left[ \overline{\varepsilon }^{\top } Q_1 \overline{\varepsilon }\right] \\ \end{array}\right) . \end{aligned}$$
(66)

Then, from Eqs. (42) and (48) we obtain, respectively:

$$\begin{aligned} \hat{\theta }_{2} = \frac{1}{\left( S-N\right) } \left( \hat{G}^{-1} \hat{\varepsilon }\right) ^{\top } Q_2 \hat{G}^{-1}\hat{\varepsilon } , \end{aligned}$$
(67)

and

$$\begin{aligned} \hat{\sigma }_{\alpha }^{2} = \frac{1}{ST} \left( \hat{G}^{-1}\hat{\varepsilon }\right) ^{\top } Q_3 \hat{G}^{-1}\hat{\varepsilon } - {N \over ST} \hat{\theta }_2, \end{aligned}$$
(68)

where \(\hat{G}^{-1} = \left[ {{I}}_{T}{\otimes } \hat{G}_S^{-1} \right] \) with \(\hat{G}_S={{I}}_{S}{-}\widehat{\lambda } W_S\).

The above approach relates to unweighted GM. Nevertheless, the literature on generalized method of moments estimators indicates that it is optimal to use the inverse of the variance–covariance matrix of the sample moments at the true parameter values as a weighting matrix in order to obtain asymptotic efficiency. In the following, our Monte Carlo simulations show that our results are not very different from those produced by ML, especially when our three-stage procedure is iterated. We therefore leave the weighted GM method for further research.

3.2.2 The GM spatial IV estimator

To obtain the GM-S-IV estimator of \(\delta \left( =\left[ \rho ,\beta \right] ^\top \right) \), one first calculates the unweighted GM estimates of \(\lambda ,\sigma _{v}^{2},\theta _{2}\) and \( \sigma _{\alpha }^{2}\), following a three-stage procedure:

  • in the first stage, the model (7) is estimated using an IV approach based on the matrix of instruments H which is given by \(\left( X,WX,W^2X, \dots \right) \). Thus, the IV estimator of \(\delta \) is defined as:

    $$\begin{aligned} \hat{\delta }_{\mathrm{IV}}=\left( Z^\top P_H Z\right) ^{-1}Z^\top P_H y, \end{aligned}$$
    (69)

    where \(P_H = H \left( H^\top H\right) ^{-1} H^\top \);

  • in the second stage, the parameters \(\lambda \), \(\sigma _v^{2}\), \(\theta _{2}\) and \(\sigma _\alpha ^{2}\) are estimated using the GM approach from Sect. 3.2 based on IV residuals, i.e. \(\hat{\varepsilon } = y - Z\hat{\delta }_{\mathrm{IV}}\). The GM estimates are obtained from the sample counterpart of the reduced system (64) which is:

    $$\begin{aligned} \widetilde{\Lambda }^{\bullet }\Upsilon ^\bullet -\widetilde{\gamma }^{\bullet }=\xi \left( \lambda ,\sigma _{v}^{2}\right) , \end{aligned}$$
    (70)

    where \(\xi \left( \lambda ,\sigma _{v}^{2}\right) \) is a vector of residuals. The unweighted GM estimators of \(\lambda \) and \(\sigma _{v}^{2}\) are the nonlinear least squares estimators based on (70):

    $$\begin{aligned} \left( \widehat{\lambda },\widehat{\sigma }_{v}^{2}\right) =\arg \min \left\{ \xi \left( \lambda ,\sigma _{v}^{2}\right) ^{\top }\xi \left( \lambda ,\sigma _{v}^{2}\right) \right\} . \end{aligned}$$
    (71)

    Then, the estimated parameters of \(\theta _2\) and \(\sigma ^2_\alpha \) are obtained using, respectively (67) and (68);

  • in the third stage, we need the estimated variance–covariance matrix \( \widehat{\Omega }_{u}\) obtained using the first stage estimates of \(\widehat{\sigma }_{v}^{2}\), \(\widehat{\sigma }_{\mu }^{2}\), \( \widehat{\sigma }_{\alpha }^{2}\). In order to obtain an equation in terms of u, from which spatial autocorrelation is absent, rather than in terms of \(\varepsilon \) in which it is present, we can purge the equation of spatial dependence by pre-multiplication by \(\hat{G}^{-1}\). This can be seen to be a type of Cochrane–Orcutt transformation appropriate to spatially dependent data. Hence, pre-multiplication of the model (7) by \(\hat{G}^{-1}\) yields:

    $$\begin{aligned} y^{*} =Z^{*} \delta + u, \end{aligned}$$
    (72)

    where \(Z^{*} =\hat{G}^{-1} Z\), \(y^{*} = \hat{G}^{-1} y\). If we are guided by the classical panel data random effects literature (see Baltagi 2013), and transform the model in (72) by pre-multiplying it by \(\widehat{ \Omega }_{u}^{-1/2}\), then applying the IV principle gives the GM-S-IV estimator \(\hat{\delta }_{GM-S-IV}\) which corresponds to:

    $$\begin{aligned} \widehat{\delta }_{GM-S-IV}= & {} \left( {{Z} ^{**\top }}P_{H^{**}}Z^{**}\right) ^{-1}{{Z}^{**\top }} P_{H^{**}} y^{**}, \end{aligned}$$
    (73)

    where \(Z^{**}= \widehat{\Omega }_{u}^{-1/2} Z^*\), \(y^{**}=\widehat{\Omega }_{u}^{-1/2} y^*\), \(H^{**}=\widehat{\Omega }_{u}^{-1/2} H^*\), \(H^{*} =\hat{G}^{-1} H\), \(P_{H^{**}}=H^{**}\left( H^{**\top } H^{**}\right) ^{-1} H^{**\top }\).

This three-stage procedure can be iterated. After the first iteration, i.e. the application of the procedure describes above, the GM-S-IV residuals are computed. Then, they are used to compute new sets of unweighted GM estimates. Last, these latter are used to obtain new GM-S-IV parameter estimates of the multidimensional spatial lag model and so on.

4 A Monte Carlo study

The idea here is to demonstrate the comparative performance of the various estimators described thus far, namely ML and the GM-S-IV approach. For this purpose, we generate data using a model with known parameters and see how accurately the different estimators recover the true parameter values. Our data generating process is the spatial lag regression model:

$$\begin{aligned} y_{t}= D_S^{-1}\left[ \beta _{0} \iota _t+\beta _{1} x_{t}+\varepsilon _{t}\right] t=1,\ldots ,T , \end{aligned}$$
(74)

where \(y_t\) is of dimension \(\left( S \times 1 \right) \) as is the exogenous variable \(x_t\). Likewise, \(\iota _t\) is a vector of ones of dimension \(\left( S \times 1 \right) \). \(D_S = I_S - \rho W_S\) where \(W_{S}\) is the spatial matrix of size \(\left( S \times S\right) \). We retain the spatial structure proposed by Kelejian and Prucha (1999), which are referred to as “J ahead and J behind”, with the nonzero elements equal to 1 / 2J. Note that, as J increase, the value of nonzero elements 1 / 2J decreases and this is turn may reduce the amount of spatial correlation. Here, we consider \(J=2, 6\) and 10. The error term \(\varepsilon _{t}\) has a SMA structure

$$\begin{aligned} \varepsilon _{t}= u_t - \lambda W_S u_t, \end{aligned}$$
(75)

and \(u_{t}\) has a nested random components structure given by

$$\begin{aligned} u_{t}= \hbox {diag}\left( \iota _{M_i}\right) \alpha + \mu + v_{t}, \end{aligned}$$
(76)

where \(u_{t}\) is \(\left( S\times 1\right) \), \(\alpha \) is the vector of group effects of dimension \(\left( N\times 1\right) \), \(\mu ^{\top }=\left( \mu _{1}^{\top },\ldots ,\mu _{N}^{\top }\right) \), a vector of dimension \((1\times S)\), \(\mu _{i}^{\top }=\left( \mu _{i1},\ldots ,\mu _{iM_{i}}\right) \), a vector of dimension \((1\times M_{i})\), \(\iota _{M_{i}}\) is a vector of ones of dimension \(\left( M_{i}\times 1\right) \). \(v_{t}\) is of dimension \(\left( S\times 1\right) \).

Throughout the experiment, the parameters of (74) and (75) were set at \(\beta _{0}=5\), \(\beta _{1}=2\) and \(\rho =0.3, 0.6\) and \(\lambda =-\,0.2, -\,0.5, -\,0.9\), i.e. positive dependence. The explanatory variable \(x_{ijt}\) is generated by a similar method to that of Nerlove (1971), Antweiler (2001) and Baltagi et al. (2001). More precisely, we have:

$$\begin{aligned} x_{ijt}=0.3t+0.8x_{ijt-1}+\omega _{ijt}, \end{aligned}$$
(77)

where \(i=1,\dots ,N\), \(j=1,\dots ,M_i\), and \(\omega _{ijt}\) is a random variable uniformly distributed on the interval \([ -\,0.5,0.5] \) and \(x_{ij0}=60+30\omega _{ij0}\). Observations over the first 10 periods are discarded to minimize the effect of initial values. For the data generating process for the errors, we assume \(\alpha _{i}\sim iid.N\left( 0,\sigma _{\alpha }^{2}\right) \), \(\mu _{ij}\sim iid.N\left( 0,\sigma _{\mu }^{2}\right) \) and \(v_{ijt}\sim iid.N\left( 0,\sigma _{v}^{2}\right) \). We fix \(\sigma _{u}^{2}=\sigma _{\alpha }^{2}+\sigma _{\mu }^{2}+\sigma _{v}^{2}=2\) and define \(\gamma _{1}=\sigma _{\alpha }^{2}/\sigma _{u}^{2}\) and \(\gamma _{2}=\sigma _{\mu }^{2}/\sigma _{u}^{2}\). These two ratios vary over the set (0.2, 0.4, 0.6) such that \(\left( 1-\gamma _{1}-\gamma _{2}\right) \) is always positive. For all experiments, we have 20 groups observed over 5 periods, hence \(\left( N,T\right) =\left( 20,5\right) \), and we have \(S=100\) individuals, so the sample size \(\left( \text {i.e. }TS\right) \) is fixed at 500. We consider the 3 unbalanced patterns proposed by Fingleton et al. (2016) denoted by \(P_{1},P_{2}\) and \( P_{3}\), with individuals nested within the N groups with differing frequencies \((M_{1},\ldots ,M_{20})\). More precisely, considering \(N=20\), \(P_1\) is characterized by \(M_i=5\), \(i=1,\dots ,20\). \(P_2\), \(M_i=3\), \(i=1,\dots ,12\), \(M_i=4\), \(i=13,\dots ,16\) and \(M_i=12\), \(i=17,\dots ,20\). For \(P_3\), we have \(M_i=2\), \(i=1,\dots ,8\), \(M_i=3\), \(i=9,\dots ,12\), \(M_i=4\), \(i=13,\dots ,18\) and \(M_i=24\), \(i=19,20\).

For each experiment, we focus on the estimates of the parameters \(\rho \), \(\beta _{0}\) , \(\beta _{1}\), \(\lambda \), \(\sigma _{\alpha }^{2}\), \(\sigma _{\mu }^{2}\) and \( \sigma _{v}^{2}\). Following KKP (2007), we adopt a measure of dispersion which is closely related to the standard measure of RMSE defined as follows:

$$\begin{aligned} \text {RMSE}=\left[ \hbox {bias}^{2}+\left( \frac{IQ}{1.35}\right) ^{2}\right] ^{1/2} , \end{aligned}$$
(78)

where bias corresponds to the difference between the median and the true value of the parameter, while IQ is the interquantile range defined as \(q_{1}-q_{2}\) where \(q_{1}\) is the 0.75 quantile and \(q_{2}\) is the 0.25 quantile.

In the tables below, three sets of RMSE parameters are reported. They are the outcomes of ML, unweighted GM-S-IV and iterated unweighted GM-S-IV estimators. More precisely, (\(\hat{\lambda }, \hat{\sigma }_\alpha ^{2}, \hat{\sigma }_\mu ^{2}, \hat{\sigma }_v^{2}\)) and (\(\hat{\lambda }^{(1)}, \hat{\sigma }_\alpha ^{2(1)}, \hat{\sigma }_\mu ^{2(1)}, \hat{\sigma }_v^{2(1)}\)) denote the unweighted and iterated unweighted GM estimates, respectively, whereas (\(\hat{\lambda }^{\mathrm{ML}}, \hat{\sigma }_\alpha ^{2,ML}, \hat{\sigma }_\mu ^{2,ML}, \hat{\sigma }_v^{2,ML}\)) denote the ML estimates. These estimates are based on IV residuals. Subsequently, the GM-S-IV estimates of \(\rho \), \(\beta _0\) and \(\beta _1\) are computed, i.e. (\(\hat{\rho }, \hat{\beta }_0, \hat{\beta }_1\)) and (\(\hat{\rho }^{(1)}, \hat{\beta }_0^{(1)}, \hat{\beta }_1^{(1)}\)), respectively. The ML estimates are denoted (\(\hat{\rho }^{\mathrm{ML}}, \hat{\beta }_0^{\mathrm{ML}}, \hat{\beta }_1^{\mathrm{ML}}\)). The results of 1, 000 replications for \(P_{1}\) (balanced subgroups pattern, \(M_{i}=5, \forall i=1,\ldots ,N\)) and \(\rho = 0.3\) are given in Table 1, whereas Tables 2 and 3 give results for the unbalanced patterns \(P_{2}\) and \(P_{3}\).

From Table 1, it is apparent that while ML is the most efficient for all parameters, the iterated GM-S-IV is almost equally as good for almost all parameters. For example, on average, the RMSE of both GM-S-IV estimators of the spatial autoregressive parameter \(\rho \) is approximately only \(2\%\) larger than the ML estimate \(\hat{\rho }^{\mathrm{ML}}\). The differences for \(\beta _{0}\), \(\beta _{1}\), \(\sigma _\alpha ^{2}\), \(\sigma _\mu ^{2}\) are also very small between ML and iterated GM-S-IV and never larger than \(4\%\). This is important because the parameters \(\beta _{0}\), \(\beta _{1}\) are of particular interest in applied economics. It also means that the computational benefits associated with the use of the GM approach do not seem to have much cost in terms of efficiency. For \(\beta _{0}\), \(\beta _{1}\), \(\sigma _\alpha ^{2}\), \(\sigma _\mu ^{2}\), the differences between ML and the simple (i.e. not iterated GM-S-IV) are a bit larger (up to \(5\%\) for \(\beta _{1}\), \(\sigma _\alpha ^{2}\), \(\sigma _\mu ^{2}\) and \(28\%\) for \(\beta _{0}\)). While iterating the GM-S-IV estimator is likely to achieve marginally more efficient estimates, this is definitely not the case for \(\lambda \), especially when \(\lambda \) is near the upper end of its range. Indeed, it appears that the RMSE of the iterated GM-S-IV estimator \(\hat{\lambda }^{(1)}\) is \(32\%\) larger on average than the RMSE of \(\hat{\lambda }^{\mathrm{ML}}\). Looking in more details, the difference is especially high for \(\lambda =-\,0.9\) (up to \(100\%\)), while it remains acceptable for smaller values (in absolute value) of \(\lambda \) (for instance: \(17\%\) for \(\lambda =-\,0.2\)). Hence, caution is in order as the absolute value of \(\lambda \) tends to unity.

Table 1 RMSEs of the estimators of \(\rho \), \(\beta _0 \), \(\beta _1 \), \(\lambda \), \(\sigma _{\alpha } ^2 \), \(\sigma _{\mu } ^2 \)and \(\sigma _v^2 \)for pattern \(P_1 \) considering \(\left( N,T \right) =\left( 20,5 \right) \), \(\rho =0.3\), 1,000 replications
Table 2 RMSEs of the estimators of \(\rho \), \(\beta _0 \), \(\beta _1 \), \(\lambda \), \(\sigma _{\alpha } ^2 \), \(\sigma _{\mu }^2 \) and \(\sigma _v^2 \) for pattern \(P_2 \)considering \(\left( N,T \right) =\left( 20,5 \right) \), \(\rho =0.3\), 1,000 replications
Table 3 RMSEs of the estimators of \(\rho \), \(\beta _0 \), \(\beta _1 \), \(\lambda \), \(\sigma _{\alpha } ^2 \), \(\sigma _{\mu } ^2 \)and \(\sigma _v^2 \) for pattern \(P_3 \)considering \(\left( N,T \right) =\left( 20,5 \right) \), \(\rho =0.3\), 1,000 replications

Tables 2 and 3 concern two unbalanced patterns \(P_{2}\) and \(P_{3}\). More precisely, the distribution of individuals over the twenty subgroups changes but the sample size remain fixed at \(TS=5\times 100=500\). In Table 2, based on the least unbalanced of the two, the results are qualitatively similar to those of Table 1. In terms of averages, the RMSE of both the simple and iterated GM-S-IV estimator for the spatial autoregressive parameter \(\rho \) is approximately \(0.73\%\) larger than that produced by ML estimator. The differences for the regression parameters \(\beta _{0}\), \(\beta _{1}\) are very small (\(1\%\) for \(\beta _{0}\) and even \(-\,0.58\%\) for \(\beta _{1}\)). Conversely, the differences between ML and iterated GM-S-IV for the variances \(\sigma _\alpha ^{2}\), \(\sigma _\mu ^{2}\) are a bit larger than in the balanced case and up to \(35\%\) for \(\sigma _\alpha ^{2}\). These values are even higher for the simple GM-S-IV estimator, highlighting the less efficient estimates of GM compared with ML estimation, and the need to consider this in relation to the advantages provided by GM. With respect to \(\lambda \), we find again that the RMSE for the iterated GM-S-IV estimator is larger than under ML, with an average difference of \(35\%\) produced by an assumption that the true value of \(\lambda \) is \(-\,0.9\). With smaller \(\lambda \), the difference is less stark. For example, when \(\lambda = -0.2\), the difference is equal to approximately \(16\%\).

In Table 3, the RMSEs are affected differently because of the way we have treated unbalancedness in \(P_{3}\). Focusing especially on the differences between ML and iterated GM-S-IV, we note that the regression parameters are estimated efficiently in both cases as is \(\rho \). As for the variances, the differences between ML and iterated GM-S-IV are higher for \(P_{3}\) than was the case \(P_{2}\) when one consider \(\sigma _\alpha ^{2}\) (\(28\%\) for \(P_{3}\) compared to \(12\%\) for \(P_{2}\)). Conversely, the differences are smaller for \(\sigma _\mu ^{2}\) (\(0.74\%\) for \(P_{3}\) compared to \(2.86\%\) for \(P_{2}\)) and \(\sigma _v^{2}\) (\(4.38\%\) for \(P_{3}\) compared to \(8.49\%\) for \(P_{2}\)). The estimates of the spatial error parameter are affected in a similar way under \(P_3\) as was the case for \(P_2\) : the RMSE of \(\hat{\lambda }^{(1)}\) is \(33\%\) higher than the RMSE of \(\hat{\lambda }^{\mathrm{ML}}\) with especially high differences for \(\lambda =0.9\).

We have also performed the simulations with \(\rho =0.6\) considering the same patterns \(P_1\), \(P_2\) and \(P_3\). The results are provided in an online appendix and the conclusions remain identical to those described above.

5 Empirical application

In this section, we consider the relationship between log employment (\(\ln E\) ), log output (\(\ln Q\)) and an indicator of (log) capital investment (\(\ln K)\) across \(S=\sum _{i=1}^{N}M_{i}=255\) NUTS2 regions nested within N \(=25\) countries of the EU. \(\ln Q\) is measured by gross value added, or GVA, and \(\ln K\) is gross fixed capital formation, or GFCF. These annual regional data series are based on Cambridge Econometrics’ European Regional Economic Data Base. As an illustration, Fig. 1 shows the distribution of log employment in the year 2010 across the 255 regions. Similar maps but with varying regional employment levels covering the period 1999–2010 constitute the dependent variable. Our model endeavours to explain the spatio-temporal variation in \( \ln E\) as a function of \(\ln Q\) and \(\ln K\) organized on the same basis as Fig. 1, and also as an outcome of unobservable region-specific random effects nested within country-specific random effects. Accordingly, the model specification is

$$\begin{aligned} \ln E_{t}=\rho W_{S}\ln E_{t}+\beta _{0}\iota _{t}+\beta _{1}\ln Q_{t}+\beta _{2}\ln K_{t}+\varepsilon _{t}, \end{aligned}$$
(79)

in which \(\ln E_{t}\) is an (\(S \times 1\)) vector of levels of (log) employment at time t, with exogenous variables \(\ln Q_{t}\) and \(\ln K_{t}\), and \( \iota _{t}\) is a vector of ones of dimension (\(S \times 1\)). The compound errors \(\varepsilon _{t}\) are an (\(S \times 1\)) vector of spatially dependent unobservables comprising time-invariant national effects, one for each of N countries and denoted by \(\alpha _{i},i=1,\dots ,N\), together with time-invariant regional effects with the region j effect, where j is nested within country i, denoted by \(\mu _{ij}=\mu _{k},k=1,\dots ,S\). In addition, there are remainders of dimension S which vary across regions and time, and the remainder effect for region j within country i at time t is denoted by \(\nu _{ijt}\). Thus, repeating for convenience Eqs. (5) and (6) in vector and matrix notation, we have

$$\begin{aligned} \varepsilon _{t}=u_{t}-\lambda M_{S}u_{t}, \end{aligned}$$
(80)

and

$$\begin{aligned} u_{t}=\hbox {diag}\left( \iota _{M_{i}}\right) \alpha +\mu +v_{t}. \end{aligned}$$
(81)

So, the estimation procedure takes into account two different spatial interaction processes: one for the endogenous spatial lag and the other for the errors. For the spatial lag at time t, \(W_{S}\ln E_{t}\), the matrix \(W_{S}\) is based on interregional trade flows between the 255 EU regions in the year 2000. The method of estimating these trade flows has been discussed elsewhere, for example by Polasek et al. (2010), Vidoli and Mazziotta (2010) and Fingleton et al. (2015), so here we simply note that the method employed bases interregional trade on data for international trade using a spatial version of the method for the construction of quarterly time series from annual series introduced by Chow and Lin (1971). The resulting matrix of bilateral interregional trade flows \(W_{S}^{*}\) is scaled following the approach of Ord (1975), so that

$$\begin{aligned} W_{S}=\widetilde{D}^{-0.5}W_{S}^{*}\widetilde{D}^{-0.5} , \end{aligned}$$
(82)

in which \(\widetilde{D}\) is a diagonal matrix with each cell on the leading diagonal containing the corresponding row total from \(W_{S}^{*}\). This normalization means that the most positive real eigenvalue of \(W_{S}\) is equal to \(\max (\hbox {eig})=1.0\), and the continuous range for which \( (I_{S}-\rho W_{S})\) is non-singular is \(1/\min (\hbox {eig})<\rho <1\). Thus, we require estimated \(\rho \) to occur within this range to ensure stationarity.

Fig. 1
figure 1

Distribution of log employment in the year 2010

For GM-S-IV, the assumption is that the compound errors are interrelated according to an SMA process (80). In this case, the spatial matrix \(M_{S}\) is based on a contiguity matrix of dimension (\( S \times S\)) with 1 in cell (mn) indicating that regions m and n share a common border and 0 indicating otherwise, although for nine isolated regions it has been necessary to create artificial, contiguous neighbours. The resulting contiguity matrix has been subsequently standardized to give \(M_{S}\) in which the rows sum to 1. This means that stationary region for \(\lambda \) is also given by \(1/\min (\hbox {eig})<\lambda <1/\max (\hbox {eig})=1\). The key feature of an SMA process is that shocks to the unobservables have local rather than global effects. Note that all components of the compound errors are assumed to be subject to this same spatial error dependence processes.

Table 4 gives the resulting non-iterated estimates. The ratios of \(\widehat{\beta }_{1}\) and \(\widehat{\beta }_{2}\) to their respective standard errors indicate that \(\ln Q\) and \(\ln K\) are significantly positively related to \(\ln E\), and there is also a significant positive effect due to the endogenous spatial lag, since \(\rho >0\) with t ratio equal to 34.2374. The significant positive effect due to the endogenous spatial lag means that we should interpret the effects of these variables via the true derivatives, following LeSage and Pace (2009) and Elhorst (2014). These show that, allowing for both the direct and indirect effects of spatial interaction across regions, the total effect of a 1% change in Q is associated with a 0.3795% change in employment. The total effect of 1% change in K leads to 0.0459% change in employment.

We find that the null hypothesis that \(\lambda =0\) is rejected in favour of positive residual spatial dependence (which is indicated by a negative estimate of \(\lambda \)). The distribution of \(\lambda _{\mathrm{null}}\), which is \( \lambda \) under a null hypothesis of no spatial dependence among the errors, is based on the residuals from the nested error model assuming no spatial error dependence, but which includes a spatial lag (as described in Baltagi et al. (2014)). We refer to this by the acronym NRE-IV. The residuals on which the null distribution is based have the same moments as the NRE-IV residuals, and are assumed to be normally distributed, but they are randomly assigned to regions in order to eliminate spatial dependence. Given randomly assigned residuals, the same GM estimation method used to obtain \(\widehat{\lambda }\) is applied to obtain \(\lambda _{\mathrm{null}}\), and this estimation is repeated 100 times to obtain 100 estimates of \( \lambda _{\mathrm{null}}\). We find that the estimate \(\widehat{\lambda }\) is not a typical member of this \(\lambda _{\mathrm{null}}\) distribution, since

$$\begin{aligned} t=\frac{\widehat{\lambda }-\overline{\lambda }_{\mathrm{null}}}{\sqrt{\hbox {var}(\lambda _{\mathrm{null}})}}=\frac{-0.8641-(-0.0021)}{0.0252}=-\,34.24, \end{aligned}$$
(83)

in which \(\overline{\lambda }_{\mathrm{null}}\) is the mean of the empirical null distribution, and \(\hbox {var}(\lambda _{\mathrm{null}})\) is the variance.

The estimated variance \(\widehat{\sigma }_{\alpha }^{2}=0.0577\) of unobserved country effects is larger than estimated regional effects variance, which is \(\widehat{\sigma }_{\mu }^{2}=0.0483\), and both of these are large relative to the remainder variance \(\widehat{\sigma }_{\nu }^{2}=0.0008\). In the generation of the \(\lambda _{\mathrm{null}}\) distribution, we also generate null distributions of \(\sigma _{\alpha ,\mathrm{null}}^{2}\), \(\sigma _{\mu ,\mathrm{null}}^{2}\) and \(\sigma _{\nu ,\mathrm{null}}^{2}\). Because the null distributions are based on a random pattern of errors, this also breaks up any effects due to country or region. This is evident from the means of the resulting distribution, hence \(\overline{\sigma }_{\alpha ,\mathrm{null}}^{2}=0.0131\) , and \(\overline{\sigma }_{\mu ,\mathrm{null}}^{2}=0.0118\). In contrast, the remainder null variance is comparatively large, hence \(\overline{\sigma }_{\nu ,\mathrm{null}}^{2}=0.1719\). Using also the standard deviations of the null distributions, the resulting t-ratios given in Table 4 indicate that \( \widehat{\sigma }_{\alpha }^{2}\) and \(\widehat{\sigma }_{\mu }^{2}\) are significantly greater than expected under the null hypothesis, suggesting significant effects due to countries and regions. In contrast is \(\widehat{ \sigma }_{\nu }^{2}\) significantly below the null mean indicating that unexplained remainder effects varying across regions and across time are much less than one would expect were the errors distributed at random. Note that these interpretations are informal because generally the null distributions may be asymmetrical. The t-ratios give the distance from the mean of the null distributions is units of standard error, and the ratios given in Table 4 are far outside the range of outcomes observed in the null distributions.

For comparison, we also estimate the parameters of three different models which also assume a nested error structure, but which differ in the way spatial dependence is treated. The first estimator is the above-mentioned NRE-IV, which assumes that there is a spatial lag but no spatial error dependence, hence \(\lambda =0\). The second estimator, referred to by the acronym GM-S-FGLS, in line with the published literature (Fingleton et al. 2016), does assume spatial error dependence, but the dependence is an autoregressive process not moving average. Also, it assumes there is no spatial lag effect, hence \(\rho =0\). The third comparator introduces a SAR structure for the disturbances instead of a SMA one, but is otherwise identical to GM-S-IV. We refer to this as GM-S-IV*.

The resulting estimates are given in Table 4. It is evident that on the whole the outcomes with regard to the effects of \(\ln Q\) and \(\ln K\) are similar to those produced by GM-S-IV. One difference, however, is that the effect of \(\ln Q\) under GM-S-FGLS is larger than the effects reported under the other estimators (note that with \(\rho \) constrained to zero the estimate \(\widehat{\beta }_{1}\) is directly comparable to the total effects under the other estimators). The introduction of a SAR process under estimator GM-S-IV* moderates the apparent effect of \(\ln Q\), which is now quite similar to the outcomes from the other estimators including a spatial lag. Elimination of spatial dependence among the errors, due to the restriction \(\lambda =0\), gives the NRE-IV estimates. The effect of this is to increase \(\widehat{\sigma }_{\alpha }^{2}\). This suggest that there is more intercountry heterogeneity than under the other estimators. Using the same null reference distribution as for GM-S-IV, \(\widehat{\sigma }_{\alpha }^{2}\) is 50.84 standard errors above the mean of the null distribution, suggesting a significant country effect. In contrast, \(\widehat{\sigma } _{\mu }^{2}\) is very close to the mean of the null reference distribution for region effects, suggesting that employment is not subject to a region effect. However, these interpretations are not taking into account the presence of error dependence, due to either a spatial moving average or a spatial autoregressive process.

Spatial error dependence is shown by the other estimators to be highly significant and should be taken into account in interpreting the nested effects embodied in the errors. It is noteworthy that smallest estimate occurs under the GM-S-IV estimator, indicating that this model explains more of the overall variance using \(\ln Q,\) \(\ln K\), the spatial lag, the spatial moving average error process and the national and regional effects in the errors. The other specifications leave more of overall the variance as an unexplained remainder component. It is apparent that controlling for localized spillovers via the spatial moving average process for the errors produces superior outcomes to assuming spatially autoregressive errors, as under GM-S-FGLS and GM-S-IV*, and provides a more appropriate interpretation of the magnitude of country and regional effects than is given by the NRE-IV estimator.

Table 4 Estimates for spatial panel models with nested random effects

6 Conclusion

In this paper, we focus on estimation methods for a multidimensional spatial lag panel data model with SMA nested random effects errors. The introduction of spatial effects via the spatial lag and the errors is an extension of previously published work by Fingleton et al. (2016) in which the spatial effects are only due to the SAR nested random effects errors. The SMA structure constitutes an alternative to incorporating spatial lags of the exogenous variables (X), and potentially avoids the weak instrument problem related to the use of spatial lags of X, given that it embodies local spillovers that one would otherwise control via the spatial lags.

We derived GM estimators for the SMA error coefficient and the variance components of the error process. Using a spatial counterpart to the Cochrane–Orcutt transformation, the regression parameters are estimated through a spatial IV estimator. Compared to the ML estimators, the GM-S-IV estimators are computationally feasible even for large sample sizes, and are robust to deviation from distributional assumptions (normality) typifying ML estimation. The Monte Carlo simulations show that RMSE magnitudes of the ML and GM-S-IV estimators of the regression coefficients are globally similar. This means that the benefits associated with the use of the GM-S-IV approaches do not seem to have much cost in terms of efficiency, although it seems that less efficient outcomes for \(\lambda \) and \(\sigma _\alpha ^2\) may be due to our reliance on unweighted GM. It is also evident that there are benefits in terms of efficiency as a result of iterated estimation. The results of the empirical example indicate that in the context of EU regions nested within countries, the assumption of a SMA error process with spatial dependence is preferable to assuming no spatial error dependence, or SAR error dependence. This may reflect the fact that exogenous spatial lags are comparatively unrepresented in the latter, whereas the SMA error process picks up local spillovers explicitly. For future research, possibilities include the investigation of the performance of the weighted GM approach, a study of formal large sample results of the estimators, and the inclusion of dynamic effects in the model.