A multidimensional spatial lag panel data model with spatial moving average nested random effects errors

This paper focuses on a three-dimensional model that combines two different types of spatial interaction effects, i.e. endogenous interaction effects via a spatial lag on the dependent variable and interaction effects among the disturbances via a spatial moving average (SMA) nested random effects errors. A three-stage procedure is proposed to estimate the parameters. In a first stage, the spatial lag panel data model is estimated using an instrumental variable (IV) estimator. In a second stage, a generalized moments (GM) approach is developed to estimate the SMA parameter and the variance components of the disturbance process using IV residuals from the first stage. In a third stage, to purge the equation of the specific structure of the disturbances a Cochrane–Orcutt-type transformation is applied combined with the IV principle. This leads to the GM spatial IV estimator and the regression parameter estimates. Monte Carlo simulations show that our estimators are not very different in terms of root mean square error from those produced by maximum likelihood. The approach is applied to European Union regional employment data for regions nested within countries.


Introduction
Recently, Fingleton et al. (2016) introduced a generalization of the Kapoor et al. (2007) (hereafter KKP) generalized moments (GM) procedure to multidimensional panel data models assuming that the disturbances follow a first-order spatial autoregressive (SAR) process, which includes a nested random effects structure, namely SAR-NRE. They refer to this specification as a panel data model with spatially nested random effects disturbances. They derive a spatial feasible generalized least squares (S-FGLS) estimator for the model's regression parameters which uses the GM parameter estimates of the SAR parameter and the variance components of the disturbance process, namely GM-S-FGLS. This estimator is based on a spatial counterpart to the Cochrane-Orcutt transformation, as well as on transformations which are used in the estimation of classical error component models.
In this paper, we consider a more general multidimensional panel data model which includes a spatial lag and where the disturbances are assumed to follow a spatial moving average (SMA) process (local spatial spillover effects) in the spirit of Fingleton (2008). This structure constitutes an alternative to incorporating spatial lags on the explanatory variables. In the cross-sectional case, when the model contains a spatial lag dependent variable, Prucha (1998, 1999) suggest a 2SLS procedure. They propose that the instrument set should be kept to a low order to avoid linear dependence and retain full column rank for the matrix of instruments, and thus recommend that (X, W X) should be used, if the number of regressors is large. Inclusion of spatial lags of the explanatory variables could have a major impact on the performance of the estimation procedures if one were to keep to this recommendation. Pace et al. (2012) show that instrumental variable estimation suffers greatly in situations where spatial lags of the explanatory variables (W X) are included in the model specification. The reason is that this requires the use of (W 2 X, W 3 X, . . .) as instruments, in place of the conventional instruments that rely on W X, and this appears to result in a weak instrument problem. Our motivation for the adoption of a SMA specification of the error process, which has been largely neglected in spatial econometrics, is that it mitigates against the problem for instrumental variable estimation identified by Pace et al. (2012). Naturally the choice of this specification should be predicated on the applied researcher, at a preliminary stage, examining the nature of local spillovers in order to establish its appropriateness for the empirical application at hand.
We propose a three-stage procedure to estimate the parameters. In a first stage, the spatial lag panel data model is estimated using an instrumental variable (IV) estimator. In a second stage, a GM approach is developed to estimate the SMA parameter and the variance components of the disturbance process using IV residuals from the first stage. In a third stage, to purge the equation of the specific structure of the disturbances, a Cochrane-Orcutt-type transformation combined with the IV principle is applied. This leads to the GM spatial IV estimator and the regression parameter estimates of the spatial lag model. Monte Carlo simulations show that our estimates are not very different in terms of root mean square error compared to those produced by maximum likelihood (ML).
The outline of the paper is as follows: Sect. 2 presents the spatial lag panel data model with spatial moving average nested random effects errors, and Sect. 3 focuses on estimation methods. This section introduces a spatial GM instrumental variable approach to estimate the parameters of the model. Section 4 presents the Monte Carlo design and describes the Monte Carlo results. Section 5 illustrates our approach using an application to EU regional employment data for regions nested within countries. The last section concludes.

The spatial model
Our point of departure is a three-dimensional model that combines two different types of spatial interaction effects, i.e. endogenous interaction effects via a spatial lag on the dependent variable and interaction effects among the disturbances via a spatial moving average (SMA) process on the error term. The notation is as follows: the dependent variable y i jt is observed along three indices, with i = 1, . . . , N , j = 1, . . . , M i and t = 1, . . . , T . N denotes the number of groups. M i denotes the number of individuals in group i, so in total there are S = N i=1 M i individuals. Since this model allows for an unequal number of individuals across the N groups, it is therefore unbalanced in the spatial dimension, although it is balanced in the time dimension. Hence, the model describes a hierarchical structure with the index j pertaining to individuals that are nested within the N groups. Assuming that spatial autocorrelation only takes place at the individual level and that the slope coefficients are homogenous, the model can be written as: where y i jt is the dependent variable; x i jt is a (1 × K ) vector of explanatory exogenous variables; β represents a (K × 1) vector of parameters to be estimated; and ε i jt is the disturbance, the properties of which will be discussed below. The weight w i j,gh = w k,l is the (k = i j; l = gh) element of the spatial matrix W S with i j denoting individual j within group i, and similarly for gh. Thus, k, l = 1, . . . , S and W S is a (S × S) matrix of known spatial weights which has zero on the leading diagonal and is usually row-normalized so that for row k, N g=1 M g h=1 w k,gh = 1, although as we will illustrate in the empirical example other normalizations are permissible. We maintain the standard assumption concerning the weight matrix, i.e. W S is assumed non-stochastic, and its row and column sums are required to be uniformly bounded in absolute value. ρ is the spatial lag parameter to be estimated. This coefficient is bounded numerically to ensure spatial stationarity, i.e. e −1 min < ρ < 1 where e min is the minimum real characteristic root of W S .
In this paper, we consider the case of disturbances ε i jt that are contemporaneously correlated through a moving average process at the individual level: (2) The weight m i j,gh is an element of the spatial matrix M S which satisfies the same assumptions as for W S . For simplicity in the following, we assume that M S = W S . λ is the spatial moving average parameter to be estimated. u i jt is assumed to be i.i.d.(0, σ 2 u ). Spatial heterogeneity is captured through a random effects structure for the errors u i jt : it contains an unobserved permanent unit-specific error component α i , a nested permanent unit-specific error component μ i j together with a remainder error component v i jt . Hence, we envisage a time-invariant group effect applying equally to all individuals nested within a group, time-invariant individual group-specific effects and transient effects that vary at random across groups, individuals and time. More formally: in which α i is the unobservable group-specific time-invariant effect which is assumed to be i.i.d.N 0, σ 2 α ; μ i j is the nested effect of individual j within the ith group which is assumed to be i.i.d.N 0, σ 2 μ and v i jt is the remainder term which is also assumed to be i.i.d.N 0, σ 2 v . The α i 's, μ i j 's and v i jt 's are independent of each other and among themselves.
In contrast to the classical literature on panel data, grouping the data by periods rather than units is more convenient when we consider spatial autocorrelation. For a cross section t, Eqs. (1), (2) and (3) can be written as: where y t is of dimension (S × 1) and x t is an (S × K ) matrix of explanatory variables that are assumed to be exogenous and non-stochastic and have elements uniformly bounded in absolute value. The first-order moving average error process t is given by where u t is (S × 1), α is the vector of group effects of dimension (N × 1), Finally v t is of dimension (S × 1). Stacking the T cross sections gives and with Z = [W y, X ] and δ = [ρ, β] . y and X are the vector and matrix of the dependent and explanatory variables (covariates), respectively, of size (T S × 1) and (T S × K ); β is the vector of the slope parameters of size (K × 1); and finally ε is the vector of the disturbance terms of dimension (T S × 1). Given that I T is an identity Finally, for the full (T S × 1) vector u, we have: In order to compute the GM-S-IV estimator of δ, which is described in Sect. 3.2, we need to obtain the inverse of the covariance matrix of u, which is −1 u . This is achieved by means of the spectral decomposition. Following Baltagi et al. (2014), the covariance matrix of u t is: where The covariance matrix of u corresponds to: where Collecting terms with the same matrices, one gets the spectral decomposition of u : with These equalities occur because of the definitions 1 of Q 1 , Q 2 and Q 3 . It turns out that Q 1 relates to within transformation. Q 2 and Q 3 relate, respectively, to between and mean transformation matrices. More formally, The operators Q 1 , Q 2 and Q 3 are symmetric and idempotent, with their rank equal to their trace. Moreover, they are pairwise orthogonal and sum to the identity matrix. From (12), we can easily obtain −1 u as: For the full (T S × 1) vector ε, we then have: where G S = I S − λW S . The corresponding (T S × T S) covariance matrix is given by: where A is a block diagonal matrix equal to (I T ⊗ G S ). Following the properties of the matrices u and A, we obtain the inverse covariance matrix of ε defined as:

Estimation methods
The estimation methods of multidimensional spatial panel models are direct extensions of the ones that have been created for the standard spatial panel data econometrics. This means that two main approaches are used to estimate these models: one based on ML principle and the other one linked to method of moments procedures. Upton and Fingleton (1985), Anselin (1988), LeSage and Pace (2009) and Elhorst (2014) provide the general framework for ML estimation of spatial models. Under normality of the disturbances, the log-likelihood function is

Maximum likelihood estimation
where D S = (I S −ρW S ) and D = (I T ⊗ D S ). For a SMA process for the disturbances ε, and after some mathematical manipulations, we obtain Let γ 1 = σ 2 α /σ 2 v , γ 2 = σ 2 μ /σ 2 v and ε = σ 2 v , then the log-likelihood function (22) can be written as 2 The first-order conditions for the parameters in (22) and (23) are intertwined which means that they are nonlinear, i.e. the equations cannot be solved analytically. Therefore, a numerical solution by means of an iterative procedure is needed in the spirit of Anselin (1988).

GM and instrumental variables
There are several issues with ML procedures. First, they call for explicit distributional assumptions, which may be difficult to satisfy, although quasi-ML (QML) approaches may to some extent allay this problem. Second, specifying and maximizing likelihood functions appropriate to extensions to more complex models may be problematic, especially if there are endogenous variables other than the spatial lag, as ML estimation is not possible when endogeneity is in implicit form. Finally, there are very computationally intensive. In view of the desirability of estimation approaches that avoid some of these challenges posed by ML, Prucha (1998, 1999) suggested an alternative instrumental variable estimation procedure for the cross-sectional spatial lag model also including a SAR process for the disturbances. This approach is based on a GM estimator of the parameter in the SAR process. The procedures suggested in Prucha (1998, 1999) are computationally feasible even for very large sample sizes. In a panel data context with a spatial error autoregressive process, KKP (2007) derive a GM estimator, which is computationally feasible even for large sample size, while Fingleton et al. (2016) extend this procedure to capture spatial autoregres-sive nested random effects errors. We follow this and adapt the moments conditions in order to consider SMA nested random effects errors.

Moments conditions
We follow Fingleton et al. (2016) to develop a GM approach leading to estimators of λ, σ 2 α , σ 2 μ , σ 2 v , or equivalently of λ, σ 2 α , θ 2 = T σ 2 μ + σ 2 v and σ 2 v , relies on moments conditions related to Following (17), we have First, we compute the quadratic moments with respect to Q 1 : Then, the expectations of the quadratic moments (30), (31), (32) depend on the moments After some computations, 3 these latter expectations are given by: Substituting (33) to (38) into (30), (31) and (32) gives: We proceed in a similar fashion as a result of replacing Q 1 in (30) and Then, the use of these moments leads to: where Overall, we obtain a system of nine equations involving the second moments of ε, ε: and with W * S and t 12 = t * 5 + tr W S W S . In practice, to obtain the GM estimators of λ, σ 2 α , θ 2 and σ 2 v , we have to use the sample counterparts of the terms in Eq. (61), i.e. and γ . Nevertheless, to estimate λ and σ 2 v , it is possible to use only the moments from (33) to (38). Then, the estimates of θ 2 and σ 2 α follow from the moments (42) and (48). This estimator is called the unweighted GM estimator. In other words, the GM estimators of λ and σ 2 v are obtained from the reduced system where and Then, from Eqs. (42) and (48) we obtain, respectively: whereĜ −1 = I T ⊗Ĝ −1 S withĜ S = I S − λW S . The above approach relates to unweighted GM. Nevertheless, the literature on generalized method of moments estimators indicates that it is optimal to use the inverse of the variance-covariance matrix of the sample moments at the true parameter values as a weighting matrix in order to obtain asymptotic efficiency. In the following, our Monte Carlo simulations show that our results are not very different from those produced by ML, especially when our three-stage procedure is iterated. We therefore leave the weighted GM method for further research.

The GM spatial IV estimator
To obtain the GM-S-IV estimator of δ = [ρ, β] , one first calculates the unweighted GM estimates of λ, σ 2 v , θ 2 and σ 2 α , following a three-stage procedure: • in the first stage, the model (7) is estimated using an IV approach based on the matrix of instruments H which is given by X, W X, W 2 X, . . . . Thus, the IV estimator of δ is defined as: where P H = H H H −1 H ; • in the second stage, the parameters λ, σ 2 v , θ 2 and σ 2 α are estimated using the GM approach from Sect. 3.2 based on IV residuals, i.e.ε = y−Zδ IV . The GM estimates are obtained from the sample counterpart of the reduced system (64) which is: where ξ λ, σ 2 v is a vector of residuals. The unweighted GM estimators of λ and σ 2 v are the nonlinear least squares estimators based on (70): Then, the estimated parameters of θ 2 and σ 2 α are obtained using, respectively (67) and (68); • in the third stage, we need the estimated variance-covariance matrix u obtained using the first stage estimates of σ 2 v , σ 2 μ , σ 2 α . In order to obtain an equation in terms of u, from which spatial autocorrelation is absent, rather than in terms of ε in which it is present, we can purge the equation of spatial dependence by pre-multiplication byĜ −1 . This can be seen to be a type of Cochrane-Orcutt transformation appropriate to spatially dependent data. Hence, pre-multiplication of the model (7) byĜ −1 yields: where Z * =Ĝ −1 Z , y * =Ĝ −1 y. If we are guided by the classical panel data random effects literature (see Baltagi 2013), and transform the model in (72) by pre-multiplying it by −1/2 u , then applying the IV principle gives the GM-S-IV estimatorδ G M−S−I V which corresponds to: where This three-stage procedure can be iterated. After the first iteration, i.e. the application of the procedure describes above, the GM-S-IV residuals are computed. Then, they are used to compute new sets of unweighted GM estimates. Last, these latter are used to obtain new GM-S-IV parameter estimates of the multidimensional spatial lag model and so on.

A Monte Carlo study
The idea here is to demonstrate the comparative performance of the various estimators described thus far, namely ML and the GM-S-IV approach. For this purpose, we generate data using a model with known parameters and see how accurately the different estimators recover the true parameter values. Our data generating process is the spatial lag regression model: where y t is of dimension (S × 1) as is the exogenous variable x t . Likewise, ι t is a vector of ones of dimension (S × 1). D S = I S − ρW S where W S is the spatial matrix of size (S × S). We retain the spatial structure proposed by Kelejian and Prucha (1999), which are referred to as "J ahead and J behind", with the nonzero elements equal to 1/2J . Note that, as J increase, the value of nonzero elements 1/2J decreases and this is turn may reduce the amount of spatial correlation. Here, we consider J = 2, 6 and 10. The error term ε t has a SMA structure and u t has a nested random components structure given by Throughout the experiment, the parameters of (74) and (75) were set at β 0 = 5, β 1 = 2 and ρ = 0.3, 0.6 and λ = − 0.2, − 0.5, − 0.9, i.e. positive dependence. The explanatory variable x i jt is generated by a similar method to that of Nerlove (1971), Antweiler (2001) and Baltagi et al. (2001). More precisely, we have: where i = 1, . . . , N , j = 1, . . . , M i , and ω i jt is a random variable uniformly distributed on the interval [− 0.5, 0.5] and x i j0 = 60 + 30ω i j0 . Observations over the first 10 periods are discarded to minimize the effect of initial values. For the data generating process for the errors, we assume For each experiment, we focus on the estimates of the parameters ρ, β 0 , β 1 , λ, σ 2 α , σ 2 μ and σ 2 v . Following KKP (2007), we adopt a measure of dispersion which is closely related to the standard measure of RMSE defined as follows: where bias corresponds to the difference between the median and the true value of the parameter, while IQ is the interquantile range defined as q 1 − q 2 where q 1 is the 0.75 quantile and q 2 is the 0.25 quantile.
From Table 1, it is apparent that while ML is the most efficient for all parameters, the iterated GM-S-IV is almost equally as good for almost all parameters. For example, on average, the RMSE of both GM-S-IV estimators of the spatial autoregressive parameter ρ is approximately only 2% larger than the ML estimateρ ML . The differences for β 0 , β 1 , σ 2 α , σ 2 μ are also very small between ML and iterated GM-S-IV and never larger than 4%. This is important because the parameters β 0 , β 1 are of particular interest in applied economics. It also means that the computational benefits associated with the use of the GM approach do not seem to have much cost in terms of efficiency. For β 0 , β 1 , σ 2 α , σ 2 μ , the differences between ML and the simple (i.e. not iterated GM-S-IV) are a bit larger (up to 5% for β 1 , σ 2 α , σ 2 μ and 28% for β 0 ). While iterating the GM-S-IV estimator is likely to achieve marginally more efficient estimates, this is definitely not the case for λ, especially when λ is near the upper end of its range. Indeed, it appears that the RMSE of the iterated GM-S-IV estimatorλ (1) is 32% larger on average than the RMSE ofλ ML . Looking in more details, the difference is especially high for λ = − 0.9 (up to 100%), while it remains acceptable for smaller values (in absolute value) of λ (for instance: 17% for λ = − 0.2). Hence, caution is in order as the absolute value of λ tends to unity.
Tables 2 and 3 concern two unbalanced patterns P 2 and P 3 . More precisely, the distribution of individuals over the twenty subgroups changes but the sample size remain fixed at T S = 5 × 100 = 500. In Table 2, based on the least unbalanced of the two, the results are qualitatively similar to those of Table 1. In terms of averages, the RMSE of both the simple and iterated GM-S-IV estimator for the spatial autoregressive parameter ρ is approximately 0.73% larger than that produced by ML estimator. The differences for the regression parameters β 0 , β 1 are very small (1% for β 0 and even − 0.58% for β 1 ). Conversely, the differences between ML and iterated GM-S-IV for the variances σ 2 α , σ 2 μ are a bit larger than in the balanced case and up to 35% for σ 2 α . These values are even higher for the simple GM-S-IV estimator, highlighting the less efficient estimates of GM compared with ML estimation, and the need to consider this in relation to the advantages provided by GM. With respect to λ, we find again that the RMSE for the iterated GM-S-IV estimator is larger than under ML, with an average difference of 35% produced by an assumption that the true value of λ is − 0.9. With Table 1 RMSEs of the estimators of  Table 2 continued Parameter values  Table 2 continued Parameter values  Table 3 continued Parameter values  Table 3 continued Parameter values  Table 3 continued Parameter values smaller λ, the difference is less stark. For example, when λ = −0.2, the difference is equal to approximately 16%. In Table 3, the RMSEs are affected differently because of the way we have treated unbalancedness in P 3 . Focusing especially on the differences between ML and iterated GM-S-IV, we note that the regression parameters are estimated efficiently in both cases as is ρ. As for the variances, the differences between ML and iterated GM-S-IV are higher for P 3 than was the case P 2 when one consider σ 2 α (28% for P 3 compared to 12% for P 2 ). Conversely, the differences are smaller for σ 2 μ (0.74% for P 3 compared to 2.86% for P 2 ) and σ 2 v (4.38% for P 3 compared to 8.49% for P 2 ). The estimates of the spatial error parameter are affected in a similar way under P 3 as was the case for P 2 : the RMSE ofλ (1) is 33% higher than the RMSE ofλ ML with especially high differences for λ = 0.9.
We have also performed the simulations with ρ = 0.6 considering the same patterns P 1 , P 2 and P 3 . The results are provided in an online appendix and the conclusions remain identical to those described above.

Empirical application
In this section, we consider the relationship between log employment (ln E ), log output (ln Q) and an indicator of (log) capital investment (ln K ) across S = N i=1 M i = 255 NUTS2 regions nested within N = 25 countries of the EU. ln Q is measured by gross value added, or GVA, and ln K is gross fixed capital formation, or GFCF. These annual regional data series are based on Cambridge Econometrics' European Regional Economic Data Base. As an illustration, Fig. 1 shows the distribution of log employment in the year 2010 across the 255 regions. Similar maps but with varying regional employment levels covering the period 1999-2010 constitute the dependent variable. Our model endeavours to explain the spatio-temporal variation in ln E as a function of ln Q and ln K organized on the same basis as Fig. 1, and also as an outcome of unobservable region-specific random effects nested within country-specific random effects. Accordingly, the model specification is in which ln E t is an (S × 1) vector of levels of (log) employment at time t, with exogenous variables ln Q t and ln K t , and ι t is a vector of ones of dimension (S × 1). The compound errors ε t are an (S × 1) vector of spatially dependent unobservables comprising time-invariant national effects, one for each of N countries and denoted by α i , i = 1, . . . , N , together with time-invariant regional effects with the region j effect, where j is nested within country i, denoted by μ i j = μ k , k = 1, . . . , S. In addition, there are remainders of dimension S which vary across regions and time, and the remainder effect for region j within country i at time t is denoted by ν i jt . Thus, repeating for convenience Eqs. (5) and (6) in vector and matrix notation, we have and So, the estimation procedure takes into account two different spatial interaction processes: one for the endogenous spatial lag and the other for the errors. For the spatial lag at time t, W S ln E t , the matrix W S is based on interregional trade flows between the 255 EU regions in the year 2000. The method of estimating these trade flows has been discussed elsewhere, for example by Polasek et al. (2010), Vidoli and Mazziotta (2010) and Fingleton et al. (2015), so here we simply note that the method employed bases interregional trade on data for international trade using a spatial version of the method for the construction of quarterly time series from annual series introduced by Chow and Lin (1971). The resulting matrix of bilateral interregional trade flows W * S is scaled following the approach of Ord (1975), so that in which D is a diagonal matrix with each cell on the leading diagonal containing the corresponding row total from W * S . This normalization means that the most positive real eigenvalue of W S is equal to max(eig) = 1.0, and the continuous range for which (I S − ρW S ) is non-singular is 1/ min(eig) < ρ < 1. Thus, we require estimated ρ to occur within this range to ensure stationarity.
For GM-S-IV, the assumption is that the compound errors are interrelated according to an SMA process (80). In this case, the spatial matrix M S is based on a contiguity matrix of dimension (S × S) with 1 in cell (m, n) indicating that regions m and n share a common border and 0 indicating otherwise, although for nine isolated regions it has been necessary to create artificial, contiguous neighbours. The resulting contiguity matrix has been subsequently standardized to give M S in which the rows sum to 1. This means that stationary region for λ is also given by 1/ min(eig) < λ < 1/ max(eig) = 1. The key feature of an SMA process is that shocks to the unobservables have local rather than global effects. Note that all components of the compound errors are assumed to be subject to this same spatial error dependence processes. Table 4 gives the resulting non-iterated estimates. The ratios of β 1 and β 2 to their respective standard errors indicate that ln Q and ln K are significantly positively related to ln E, and there is also a significant positive effect due to the endogenous spatial lag, since ρ > 0 with t ratio equal to 34.2374. The significant positive effect due to the endogenous spatial lag means that we should interpret the effects of these variables via the true derivatives, following LeSage and Pace (2009) and Elhorst (2014). These show that, allowing for both the direct and indirect effects of spatial interaction across regions, the total effect of a 1% change in Q is associated with a 0.3795% change in employment. The total effect of 1% change in K leads to 0.0459% change in employment.
We find that the null hypothesis that λ = 0 is rejected in favour of positive residual spatial dependence (which is indicated by a negative estimate of λ). The distribution of λ null , which is λ under a null hypothesis of no spatial dependence among the errors, is based on the residuals from the nested error model assuming no spatial error dependence, but which includes a spatial lag (as described in Baltagi et al. (2014)). We refer to this by the acronym NRE-IV. The residuals on which the null distribution is based have the same moments as the NRE-IV residuals, and are assumed to be normally distributed, but they are randomly assigned to regions in order to eliminate spatial dependence. Given randomly assigned residuals, the same GM estimation method used to obtain λ is applied to obtain λ null , and this estimation is repeated 100 times to obtain 100 estimates of λ null . We find that the estimate λ is not a typical member of this λ null distribution, since in which λ null is the mean of the empirical null distribution, and var(λ null ) is the variance. The estimated variance σ 2 α = 0.0577 of unobserved country effects is larger than estimated regional effects variance, which is σ 2 μ = 0.0483, and both of these are large relative to the remainder variance σ 2 ν = 0.0008. In the generation of the λ null distribution, we also generate null distributions of σ 2 α,null , σ 2 μ,null and σ 2 ν,null . Because the null distributions are based on a random pattern of errors, this also breaks up any effects due to country or region. This is evident from the means of the resulting distribution, hence σ 2 α,null = 0.0131 , and σ 2 μ,null = 0.0118. In contrast, the remainder null variance is comparatively large, hence σ 2 ν,null = 0.1719. Using also the standard deviations of the null distributions, the resulting t-ratios given in Table 4 indicate that σ 2 α and σ 2 μ are significantly greater than expected under the null hypothesis, suggesting significant effects due to countries and regions. In contrast is σ 2 ν significantly below the null mean indicating that unexplained remainder effects varying across regions and across time are much less than one would expect were the errors distributed at random. Note that these interpretations are informal because generally the null distributions may be asymmetrical. The t-ratios give the distance from the mean of the null distributions is units of standard error, and the ratios given in Table 4 are far outside the range of outcomes observed in the null distributions.
For comparison, we also estimate the parameters of three different models which also assume a nested error structure, but which differ in the way spatial dependence is treated. The first estimator is the above-mentioned NRE-IV, which assumes that there is a spatial lag but no spatial error dependence, hence λ = 0. The second estimator, referred to by the acronym GM-S-FGLS, in line with the published literature (Fingleton et al. 2016), does assume spatial error dependence, but the dependence is an autoregressive process not moving average. Also, it assumes there is no spatial lag effect, hence ρ = 0. The third comparator introduces a SAR structure for the disturbances instead of a SMA one, but is otherwise identical to GM-S-IV. We refer to this as GM-S-IV*.
The resulting estimates are given in Table 4. It is evident that on the whole the outcomes with regard to the effects of ln Q and ln K are similar to those produced by GM-S-IV. One difference, however, is that the effect of ln Q under GM-S-FGLS is larger than the effects reported under the other estimators (note that with ρ constrained to zero the estimate β 1 is directly comparable to the total effects under the other estimators). The introduction of a SAR process under estimator GM-S-IV* moderates the apparent effect of ln Q, which is now quite similar to the outcomes from the other estimators including a spatial lag. Elimination of spatial dependence among the errors, due to the restriction λ = 0, gives the NRE-IV estimates. The effect of this is to increase σ 2 α . This suggest that there is more intercountry heterogeneity than under the other estimators. Using the same null reference distribution as for GM-S-IV, σ 2 α is 50.84 standard errors above the mean of the null distribution, suggesting a significant country effect. In contrast, σ 2 μ is very close to the mean of the null reference distribution for region effects, suggesting that employment is not subject to a region effect. However, these interpretations are not taking into account the presence of error dependence, due to either a spatial moving average or a spatial autoregressive process.
Spatial error dependence is shown by the other estimators to be highly significant and should be taken into account in interpreting the nested effects embodied in the errors. It is noteworthy that smallest estimate occurs under the GM-S-IV estimator, indicating that this model explains more of the overall variance using ln Q, ln K , the spatial lag, the spatial moving average error process and the national and regional effects in the errors. The other specifications leave more of overall the variance as an unexplained remainder component. It is apparent that controlling for localized spillovers via the spatial moving average process for the errors produces superior outcomes to assuming spatially autoregressive errors, as under GM-S-FGLS and GM-S-IV*, and provides a more appropriate interpretation of the magnitude of country and regional effects than is given by the NRE-IV estimator.

Conclusion
In this paper, we focus on estimation methods for a multidimensional spatial lag panel data model with SMA nested random effects errors. The introduction of spatial effects via the spatial lag and the errors is an extension of previously published work by Fingleton et al. (2016) in which the spatial effects are only due to the SAR nested random effects errors. The SMA structure constitutes an alternative to incorporating spatial lags of the exogenous variables (X ), and potentially avoids the weak instrument problem related to the use of spatial lags of X , given that it embodies local spillovers that one would otherwise control via the spatial lags. We derived GM estimators for the SMA error coefficient and the variance components of the error process. Using a spatial counterpart to the Cochrane-Orcutt transformation, the regression parameters are estimated through a spatial IV esti-mator. Compared to the ML estimators, the GM-S-IV estimators are computationally feasible even for large sample sizes, and are robust to deviation from distributional assumptions (normality) typifying ML estimation. The Monte Carlo simulations show that RMSE magnitudes of the ML and GM-S-IV estimators of the regression coefficients are globally similar. This means that the benefits associated with the use of the GM-S-IV approaches do not seem to have much cost in terms of efficiency, although it seems that less efficient outcomes for λ and σ 2 α may be due to our reliance on unweighted GM. It is also evident that there are benefits in terms of efficiency as a result of iterated estimation. The results of the empirical example indicate that in the context of EU regions nested within countries, the assumption of a SMA error process with spatial dependence is preferable to assuming no spatial error dependence, or SAR error dependence. This may reflect the fact that exogenous spatial lags are comparatively unrepresented in the latter, whereas the SMA error process picks up local spillovers explicitly. For future research, possibilities include the investigation of the performance of the weighted GM approach, a study of formal large sample results of the estimators, and the inclusion of dynamic effects in the model.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.