1 Introduction

Panel data models usually incorporate individual effects in order to account for the unobserved heterogeneity at the individual level. The main issue is to decide whether said effects are allowed to be correlated with the regressors (known as the “fixed effects” case, henceforth FE) or not (“random effects”, or RE). In the former case, they are usually estimated out; in the latter, they are treated as a part of the error term which becomes composite: the sum of a time-invariant random effect, representing individual heterogeneity, and an idiosyncratic error representing shocks that happen independently across time and individual. This note is concerned with the RE case (as will be observed below, most of the following becomes irrelevant in a FE setting).Footnote 1

In the case of spatial panels, the error term is likely to be the result of a spatial diffusion process. The question arises then, whether one of the two error components, or both, diffuse spatially. There are two mainstream solutions to this issue which have been extensively studied in the literature, but each of which represents just a special case of a more general specification. The first model dates back to Anselin (1998), and assumes that the idiosyncratic errors follow a first-order spatial autoregressive (SAR(1)) process, while the individual effects are independent of each other. The second, due to Kapoor et al. (2007), assumes that both the individual effects and the remainder errors diffuse in space according to the same SAR(1) process. Henceforth we will label them, respectively, ANS and KKP.

The generalizedFootnote 2 spatial random effects model (henceforth GSRE) encompasses the previous models by allowing for SAR(1) processes in both the individual effects and the idiosyncratic errors, with different parameters and without any restriction on them but for the standard requirements.Footnote 3 It has been first proposed in the literature by Baltagi (2007) and then discussed thoroughly in Baltagi et al. (2013). Nevertheless, practical applications of this model have been virtually non existent and spatial econometricians have continued to rely on the two usual specifications. To our knowledge, the only paper actually employing the GSRE has long remained Delbecq et al. (2012).Footnote 4

The ANS model is the standard specification and, it can be argued, the most commonly employed in applied practice. A review of the empirical literature would be vaste programme w.r.t. the scope of this short note, but a good starting point could be the Google Scholar citation hits for the relevant “software papers” of, respectively, Belotti et al. (2017) (Stata) and Millo and Piras (2012) (R) which will reveal a large number of empirical studies. As of now, the ANS model needs to be estimated by ML.

The KKP model—which can indeed be estimated by ML (see Millo 2014)—was instead born in the tradition of generalized moments (GM) estimators of Kelejian and Prucha (1999), further popularized by Bell and Bockstael (2000) because of their many practical advantages. In fact, like all GM, the KKP-GM estimates are much easier to obtain computationally and do not depend on distributional assumptions on the error term. Moreover, they are asymptotically equivalent to the ML ones, and, unlike the ANS-ML counterpart, it is also possible to allow for endogenous regressors (Fingleton et al. 2008; Piras 2013). Actually, despite this, the KKP model is seldom employed in applied practice. Most of the literature building on it is methodological, and the applied papers are scarce. In a review of the first 130 Google Scholar citation hits (in order of relevance) for the Kapoor et al. (2007) paper, only a handful actually apply the KKP estimator (GM or ML) empirically: Chakir et al. (2013), Fingleton et al. (2015), Baylis et al. (2012), Wheeler et al. (2013), Romão et al. (2017), Wan et al. (2015), Gomez et al. (2013), Kopczewska et al. (2017), Padovano and Petrarca (2014) and Jacquot et al. (2013).Footnote 5 All this despite the fact that a user-friendly R implementation of the KKP-GM estimator has been available since before 2010 as function spgm in the ’splm’ package for R (Millo and Piras 2012), together with a more recent KKP-ML equivalent (see function spreml, Millo 2014).

The dearth of applications of the theoretically more appealing generalized model of Baltagi et al. (2013) can instead be due to the lack of user-friendly and well tested software for: estimating the GSRE tout court (until 2017); and, later on, for combining the generalized RE structure with SAR. In fact, the current trend in spatial econometrics—at least since the “sea change” described in Elhorst (2010)—has been to consider more than one source of spatial dependence at once; in this respect, in the current taxonomy of spatial models, the GSRE stands out for resting on an exclusion assumption on the SAR term. This note aims at describing a software routine that fills this gap.

2 The generalized spatial random effects model

Consider a general static panel model that includes a spatial lag of the dependent variable:

$$\begin{aligned} y = \lambda (I_T \otimes W_N)y + X \beta + u \end{aligned}$$

where y is an \(NT \times 1\) vector of observations on the dependent variable, X is a \(NT \times k\) matrix of observations on the non-stochastic exogenous regressors, \(I_T\) an identity matrix of dimension T, \(W_N\) is the \(N \times N\) spatial weights matrix of known constants whose diagonal elements are set to zero, and \(\lambda\) the corresponding spatial parameter.

As usual, individual unobserved heterogeneity is accounted for through individual effects: so that the disturbance vector is the sum of two terms

$$\begin{aligned} u=(\iota _T \otimes I_N) \mu +\varepsilon \end{aligned}$$

where \(\iota _T\) is a \(T \times 1\) vector of ones, \(I_N\) an \(N \times N\) identity matrix, \(\mu\) is a vector of time-invariant individual specific effects and \(\varepsilon\) a vector of idiosyncratic errors. The unobserved individual effects are assumed uncorrelated with the other explanatory variables in the model (so-called “random effects” assumption), and can therefore be safely treated as components of the error term: see, e.g., Assumption RE.1.b in Wooldridge (2010 (10.4)).

Moreover, the remainder error follows a SAR(1) process of the form

$$\begin{aligned} \varepsilon =\rho (I_T \otimes W_N) \varepsilon + e \end{aligned}$$

with \(\rho\) as the spatial autoregressive parameter, \(W_N\) the spatial weights matrix,Footnote 6\(e \sim IID(0, \sigma ^2_e)\) and \(I_N - \rho W_N\) assumed non-singular. Thus the model combines a spatial process in the dependent variable with one in the error term.Footnote 7

It remains to be decided how do the individual effects correlate in space, if at all. Two spatial specifications have been proposed in the literature:

2.1 Independent random effects

(Anselin 1998) considers a panel data regression model with spatial errors and incorrelated individual heterogeneity (a special case of the model presented above setting \(\lambda =0\) i.e. without a spatial lag). In this case, \(\mu \sim IID(0, \sigma ^2_\mu )\), and the remainder error term can be rewritten as:

$$\begin{aligned} \varepsilon = (I_T \otimes B_N^{-1})e \end{aligned}$$

where \(B_N = (I_N - \rho W_N)\). As a consequence, the composite error term becomes

$$\begin{aligned} u = (\iota _T \otimes I_N) \mu + (I_T \otimes B_N^{-1})e \end{aligned}$$

and its variance-covariance matrix, if \(J_T=\iota _T\iota ^\top _T\) is a \(T \times T\) matrix of ones, can be expressed as

$$\begin{aligned} \Omega _u = \sigma ^{2}_\mu (J_T \otimes I_N) + \sigma _{e}^{2} [I_T \otimes (B^\top _N B_N)^{-1}]. \end{aligned}$$
(1)

which is computationally convenient as involving only inversions of matrices of size N instead of NT (Baltagi et al. 2003).

2.2 Spatially correlated random effects

Kapoor et al. (2007) choose a different specification where spatial correlation applies to both the individual effects and the remainder error components in exactly the same way. In this case, commonly referred to as “KKP”, the composite disturbance term

$$\begin{aligned} u = (\iota _T \otimes I_{N})\mu + \varepsilon \end{aligned}$$

follows a first order spatial autoregressive process of the form:

$$\begin{aligned} u = \rho (I_{T} \otimes W_{N}) u + e. \end{aligned}$$

which is equivalent to saying that both \(\mu\) and \(\varepsilon\) follow a SAR(1) with the same parameter \(\rho\).Footnote 8 The variance-covariance matrix of u is:

$$\begin{aligned} \Omega _{u} = [I_T \otimes B_N^{-1}] \Omega _\varepsilon [I_{T} \otimes ({B_N}^{\top })^{-1}] \end{aligned}$$
(2)

where \(\Omega _\varepsilon = [\sigma _e^2 I_T + \sigma _{\mu }^2 J_T]\otimes I_N\) is the typical variance-covariance matrix of a one-way error component model. The variance matrix in (2) is simpler than the one in (1), and therefore its inverse is easier to calculate (Millo 2014 [4.3.2]).

The two data generating processes imply different spatial spillover mechanisms with a different economic meaning (Baltagi et al. 2013): in the first model only the time-varying components diffuse spatially, in the second spatial spillovers too have a permanent component. Lee and Yu (2012, 2.4) illustrate the difference between this latter specification and ANS through the likelihood of the between model.

2.3 The generalized spatial random effects model

Baltagi et al. (2013) (see also Baltagi et al. 2007) propose a generalized spatial random effects (GSRE) panel data model which relaxes both the hypothesis of no spatial correlation between random effects (ANS) and the somewhat irrealistic assumption that the spatial effects be correlated with the same spatial structure of the remainder errors (KKP).

In this general, encompassing case, each component of the composite error follows a first order spatial autoregressive process of its own:

$$\begin{aligned}&\mu = \rho _1 W_N \mu + \eta \\&\varepsilon = \rho _2 (I_T \otimes W_N) \varepsilon + e \end{aligned}$$

with \(\eta \sim IID(0, \sigma ^2_\eta )\) and \(e \sim IID(0, \sigma ^2_e)\). The variance-covariance matrix of u is then:

$$\begin{aligned} \Omega _u&= ({\bar{J}}_T \otimes (T\sigma ^2_{\eta }(B_{1N}'B_{1N})^{-1}) + \sigma ^2_e(B_{2N}'B_{2N})^{-1}) \nonumber \\& \quad +\sigma ^2_e (E_T \otimes (B_{2N}'B_{2N})^{-1}) \end{aligned}$$
(3)

where \(B_{1N} = (I_N - \rho _1 W_N)\), \(B_{2N} = (I_N - \rho _2 W_N)\) and \({\bar{J}}_T=J_T/T\), \(E_T=I_T-{\bar{J}}_T\).

With respect to the GSRE, the ANS model can be obtained setting \(\rho _1=0\), the KKP setting \(\rho _1=\rho _2\): so that each imposes one parameter restriction that is, in general, not guaranteed to be true.

3 Estimation of the extended model

In the following we outline the maximum likelihood (ML) estimator for the extended specification of the GSRE including a spatial lag (henceforth SAR+GSRE). This specification has been introduced by Baltagi and Liu (2016) in a GM framework. Here we propose ML estimation, in the framework of Millo (2014). As such, this can be seen either as an extension of Baltagi et al. (2013) to including a spatial lag, or as an extension of Baltagi and Liu (2016) to ML estimation.

ML routines within the function spreml in the ’splm’ package are constructed according to the general principle of combining a spatial lag with a composite error structure (Millo 2014 [4.3.1]). Any error structure is specified by the scaled error covariance \(\Sigma =[\sigma ^2_e]^{-1}\Omega\). The extension to a spatial lag is accomplished by including a spatial filter on y, using \(I_T \otimes A = I_T \otimes (I_N - \lambda W)\), and the determinant of the spatial filter matrix \(|I_T \otimes A| = |A|^T\) in the likelihood. The general likelihood for the spatial panel model is:

$$\begin{aligned} logL &= - \frac{NT}{2} ln (2 \pi \sigma _e^2) - \frac{1}{2}ln | \Sigma | + Tln|A| \\ &\quad - \frac{1}{2 \sigma _e^2} [(I_T \otimes A)y- X \beta ]' \Sigma ^{-1} [(I_T \otimes A)y - X \beta ]. \end{aligned}$$
(4)

Concentrating the likelihood w.r.t. the parameters in A and \(\Sigma\) and optimizing in the usual two-step fashion, alternating between the concentrated likelihood and the (GLS) first order conditions:

$$\begin{aligned}&{\hat{\beta }}=(X'\Sigma ^{-1}X)^{-1}X'\Sigma ^{-1}(I_T \otimes A)y \\&\hat{\sigma ^2_e}= [(I_T \otimes A)y-X\beta ]^\top \Sigma ^{-1}[(I_T \otimes A)y-X\beta ]/NT. \end{aligned}$$

until convergence, one gets the ML estimates for all parameters. Given \(\Sigma\) specifying a particular error structure, the inverse and determinant of it can in principle be calculated by brute force; or, preferably, analytical expressions for them can be used when available, for the sake of speed and stability. This approach encompasses, i.a., the simple RE case as well as ANS and KKP (Millo 2014[4.3.2]).

In the GSRE case, from Baltagi et al. (2013), the scaled error covariance is:

$$\begin{aligned} \Sigma = {\bar{J}}_T \otimes (T \phi (B_{1N}'B_{1N})^{-1} + (B_{2N}'B_{2N})^{-1}) + E_T \otimes (B_{2N}'B_{2N})^{-1} \end{aligned}$$

with \(\phi =\frac{\sigma ^2_{\mu }}{\sigma ^2_e}\); and the expressions for \(\Sigma ^{-1}\) and \(|\Sigma |\) to be plugged into (4):

$$\begin{aligned}&\Sigma ^{-1} = ({\bar{J}}_T \otimes (T \phi (B_{1N}'B_{1N})^{-1}) + (B_{2N}'B_{2N})^{-1})^{-1}) + (E_T \otimes (B_{2N}'B_{2N}))\\&|\Sigma | = |T \phi (B_{1N}'B_{1N})^{-1} + (B_{2N}'B_{2N})^{-1}| \dot{|}(B_{2N}'B_{2N})^{-1}|^{T-1} \end{aligned}$$

with \(B_{1N}, B_{2N}, J_T, I_T, I_N, {\bar{J}}_T, E_T\) as defined above.

4 Empirical examples

In this section we apply the (SAR+)GSRE estimator to some well-known examples – taken from Millo (2014), Croissant and Millo (2018) and Baltagi (2021), original sources reported below – where the restricted ANS and KKP specifications have been applied.

There are no substantial changes in the estimates \({\hat{\beta }}\) throughout all the examples, bar of course for the difference in magnitude and interpretation between the group of models excluding and those including a SAR term. Therefore we concentrate on the estimates of the variance of individual effects \({\hat{\phi }}\) and those of the spatial parameters (\({\hat{\lambda }}\),) \({\hat{\rho }}_1\) and \({\hat{\rho }}_2\). In the ANS and the KKP models, of course, we set \({\hat{\rho }}_1=0\) and \({\hat{\rho }}_1={\hat{\rho }}_2\) respectively.

4.1 Rice farming

The “Rice farming” example of Druska and Horrace (2004) regards the estimation of a production frontier equation relating rice output to the following inputs: seed, urea, phosphate, labour hours and land size, all but phosphate in logs. Dummy variables account for the use of high yield varieties of seed, or for a mix of seed varieties and for the use of pesticides. Dummy variables are also added for the six villages and for the season being a wet one. The proximity matrix is constructed considering all the farms of the same village as neighbours. The example is considered in Millo (2014) where a number of different spatial specifications are estimated. 171 rice farms in Indonesia are observed over six growing seasons, three wet and three dry, between 1975 and 1983.Footnote 9

Estimation results for the spatial and RE parameters are reported in Table 1. The unrestricted model yields a large negative estimate for \(\rho _1\) and an even larger standard error; its results are therefore not totally at odds with ANS, but inconsistent with KKP.

Table 1 Rice farming model

Including a SAR term gives rise to a slight compensation between \({\hat{\rho }}_2\) and \({\hat{\lambda }}\), the latter nevertheless being small in magnitude and on the verge of statistical significance (p-value: 0.062).

4.2 Italian insurance

The “Italian insurance” example of Millo and Carmeci (2011) is, again, considered in Millo (2014) for comparing many different combinations of spatial and random effects (GSRE excluded).Millo and Carmeci (2011) analyze the determinants of per-capita equilibrium consumption of non-life insurance in all 103 Italian provinces over five years, 1998 to 2002, based on socioeconomic characteristics of territory: per-capita income and wealth as proxied by bank deposits, real lending rates, territorial density of population and of the distribution network; demographic characteristics as average family size and schooling and the prevailing level of trust; the share of agriculture on value added; and the level of inefficiency of civil justice. Estimation results are reported in Table 2.

Table 2 Italian insurance model

In this case the SEM term is deemed insignificant by all estimators; by contrast, using the GSRE lets the spatial process in the random effects emerge, which would be assumed out by the ANS or, alternatively, disappear in the KKP as the estimator is unable to tell between \(\rho _1\) and the insignificant \(\rho _2\).

The interpretation becomes less clear when including a SAR term: \(\lambda\) is marginally significant (p-value: 0.062) and of considerable magnitude; at the same time, \(\rho _1\) is not significant any more (it even changes sign) while \(\rho _2\) becomes significant and negative. The meaning is not clear; tentative interpretations can be based on considerations in the original paper: aggregation biases from concentration of salespoints and/or omitted variable effects from the impossibility of observing purchasing power differentials (Millo and Carmeci 2011 [5.1]). At least, here the complete specification of the spatial process provides us with an unbiased estimate of the \({\hat{\beta }}\)s, while the simple GSRE would yield biased coefficients.

4.3 Public capital productivity

The third and last example from Millo (2014) is the Munnell Alicia (1990) “Public capital productivity” model. It involves a social production function, estimated with the main goal of assessing the productivity of public capital (roads, water facilities, other infrastructure) in 48 US States observed over 17 years. It has been originally popularized by the famous panel data textbook of Baltagi (2021,see Example 3). The model is a Cobb-Douglas production function where the gross social product (gsp) of a given state depends on the inputs of: public capital, private capital and labour; plus state unemployment rate as a control for the business cycle. The relevant estimates are reported in Table 3.

Table 3 Munnell’s public capital productivity model

The Munnell example has issues of nonstationarity (see again Millo 2014), so maybe a difference specification would be in order. For our illustrative purposes here, let us observe that the GSRE yields another significant estimate \({\hat{\rho }}_2\), while \({\hat{\rho }}_1\) is about one half in magnitude, and not far from significance (\(p-\)value: 0.12) so that the “truth”, at least maintaining the GSRE world, lies somewhere in between ANS and KKP.

The SAR extension is surprisingly clear-cut, given that ex ante one might have expected some direct influence of one state’s product on the neighbours. The \({\hat{\lambda }}\) estimate is instead very small in magnitude, while the estimation of the standard error fails. This behaviour of the numerical Hessian, documented in Millo (2014,5.1.5), tends to happen when the target parameter is so close to zero to make an assessment of statistical significance redundant. In other words, economic priors notwithstanding, the statistical evidence very strongly upholds the error model.

4.4 Evapotranspiration

The next example, “Evapotranspiration”, is taken from Croissant and Millo (2018,Ch. 10). Obojes et al. (2015) explore the effect of vegetation composition and structure on water balance on some high elevation grasslands in the Alps. They repeatedly measure the water balance of soil monoliths in deep seepage collectors in four experimental sites over three study areas, two in the French Alps, one in Switzerland and one in Austria. The present example replicates the results from 5 repeated measurements over 86 observation units in the Austrian site. See the estimates of spatial and RE parameters in Table 4.

Table 4 Evapotranspiration model

The “Evapotranspiration” data show very strong evidence of a spatial process in idiosyncratic errors, as is to be expected from data collected at nearby locations and influenced by the weather. As for random effects, none are detected by the GSRE; the ANS is hence consistent with the GSRE, while the KKP, imposing to \(\rho _1\) the same very high estimate as \(\rho _2\), completely misses the mark. As can be seen, though, this bias does not have consequences on the estimates \({\hat{\phi }}, {\hat{\rho }}_2\) and seems therefore quite harmless unless one is interested precisely in the diffusion process of the random effects.

The complete SAR+GSRE model upholds the above conclusions: the SAR term, although not negligibly small in magnitude, is statistically not significant while the significant error components’ estimates \({\hat{\phi }}\) and \({\hat{\rho }}_2\) (and their standard errors) are largely unchanged.

4.5 Cigarette

A spatial econometrics paper would not be complete without the ubiquitous “Cigarette” example, whose pervasiveness made it a standard which helps comparing different pieces of research. Featuring prominently in a number of textbooks (one for all, Baltagi 2021), the original application is in Baltagi and Levin (1992) and it has been reconsidered, i.a., by Baltagi and Griffin (2001).

The Cigarette dataset contains data for the years 1963–1992 and 46 American states on real per capita sales of cigarettes per adult person, average real retail price and real disposable income per capita. Originally, the minimum price in neighbouring states was included in order to proxy for cross-border smuggling; alternatively, this can be controlled for through spatial effects, which has made this dataset a good candidate for spatial examples. It must be kept in mind, though, that the original formulation was dynamic, as appropriate for models of persistent habits. The relevant parameter estimates are reported in Table 5.

Table 5 Cigarette model

Including a SAR term hardly changes the substantive conclusions; although numerically said term turns out weakly significant, its magnitude is very small (− 0.007). Once more, a tendency emerges for the spatial lag and error terms to compensate each other.

4.6 Summary of results

The results are in general not supportive of the KKP restriction: the spatial processes of errors and individual effects have different features, as could be expected. In some cases, they do instead uphold the ANS restriction: no spatial process in the random effects. Whether this last result is due to the spatial correlation of the individual heterogeneity actually being zero, or to the lack of precision of the estimator \({\hat{\rho }}_2\) , is a question we are not in a position to answer.

The extension to a spatial lag does generally uphold the analyses based on the spatial error specifications; with the partial exception of the Italian insurance example, where spatial effects emerge which are likely to be the artifact of an incorrect specification of the connectivity structure. In some cases, in fact (Rice farming, Evapotranspiration), the original authors had a strong prior in favour of the error model; but this might often not be the case. In the next Section we will address the consequences of misspecification on parameter estimation by putting us in some clear-cut simulated situations.

5 Simulated examples

In the following Section we perform estimation on some simulated datasets, in order to highlight the effects of misspecification in a controlled environment. We address in this order the consequences of two kinds of misspecification: of the composite error structure (e.g., estimating a KKP model when the DGP is ANS) which can be addressed using the GSRE; and of spatial lag vs. error, which calls for the use of the most general SAR+GSRE. We will show how the second kind can produce the most seriously biased results in terms of the spatial structure. It goes without saying that the misspecification of the spatial lag is by far the worst even as regards the effects of the regressors: in fact, omitting a spatial lag will yield biased \({\hat{\beta }}\)s, while a misspecified error structure will only affect their efficiency.

A large scale Monte Carlo exercise considering many parameter combinations is out of the scope of this note. Our compact illustrative example considers just one panel size representative of commonly found datasets: \(N=48\), \(T=10\); the spatial ordering is taken from the US states. We only consider zero or nonzero values for the parameters, and for the sake of simplicity, we set every nonzero value to 0.6. We trust our examples to be representative of a much more general situation. 1000 simulation runs are performed for each scenario/combination of parameter values.

5.1 Specification of the error structure

In this subsection we report estimation of the GSRE and SAR+GSRE models when the “true” DGP is, respectively, ANS, KKP or RHO1 (i.e., \(\rho _1\ne 0\) while \(\rho _2 = 0\)). All graphs (omitted) show that both the GSRE and the SAR+GSRE estimates concentrate around the “true” parameter values.

5.2 Misspecification of the spatial lag vs. error

In this subsection we illustrate the effect of omitting the SAR term from different DGPs containing spatial lags. As the comparison between the density of GSRE (red lines) and SAR+GSRE (blue lines) estimates shows, the former are severely biased if the DGP contains a SAR term (see Figs. 1, 2, 3).

Fig. 1
figure 1

Distribution of estimates under the SAR+ANS scenario: GSRE is red, SAR+GSRE is blue; the vertical line corresponds to the “true” parameter value

Fig. 2
figure 2

Distribution of estimates under the SAR+KKP scenario: GSRE is red, SAR+GSRE is blue; the vertical line corresponds to the “true” parameter value

Fig. 3
figure 3

Distribution of estimates under the SAR+RHO1 scenario: GSRE is red, SAR+GSRE is blue; the vertical line corresponds to the “true” parameter value

The omitted spatial lag does clearly “discharge” on the included spatial error coefficients, yielding a spurious result. The random effects’ variance \({\hat{\phi }}\) is affected as well, but its magnitude is usually of lesser substantial interest.

5.3 Reliability of the optimization procedure

A last important aspect of estimation is the reliability of the optimization procedure. In Table 6 below, we report the success (i.e., convergence) rates for the two GSRE estimators, with and without SAR, under the seven different simulation scenarios.

Table 6 Rate of successful convergences out of 1000 simulation runs for the GSRE and SAR+GSRE maximum likelihood estimators under different scenarios; percent

The only failures have been recorded under the RHO1 scenario; unexpectedly, the failure rate is slightly higher for the simpler GSRE estimator than for the SAR+GSRE.

5.4 Summary of simulation results

Our limited exercise can only give a hint about the real-world properties of the proposed software procedures: a larger Monte Carlo project would be required for a formal assessment, mixing different values for all parameters involved over a dense multidimensional grid. Still, the results are encouraging. If correctly specified, the GSRE and SAR+GSRE look quite reliable in our limited simulation exercise, both in terms of convergence and of precision, under each of the different DGP scenarios. Only in the least interesting one, RHO1, which is usually not considered in the literature and was added for the sake of completeness, did a minority of the simulation runs fail to converge. The effects of misspecification are most serious when a relevant SAR term is omitted: the spatial effect then “discharges” on the estimates of the error parameters \(\rho _1\) and \(\rho _2\), inflating and biasing them; but of course, as observed at the beginning of this section, the most serious consequence would be the bias of \({\hat{\beta }}\). This further motivates the extension of the estimator discussed in the present paper.

The reader shall keep in mind that there are actually two sides to the modelling of a spatial process: the functional form (SAR or GSRE etc.) and the structure of spatial proximity. We address the misspecification of the spatial process in terms of effects, taking instead for granted the spatial structure: i.e., we employ the “true” W matrix in estimation. Addressing the effects of misspecification of W would be a very interesting but complex task which we leave to future research.

6 Conclusions

Random effects methods, as discussed above, are not always of interest in spatial applications. When the RE hypothesis cannot be safely assumed, FE methods are in order, and in this case the issue about spatial correlation of the individual heterogeneity becomes moot because the individual effects are estimated out (although they can be subsequently recovered and their spatial correlation assessed ex post). Nevertheless, if individual heterogeneity of the RE type enters a spatial model, its correlation in space is a potentially interesting topic. The currently employed estimators impose one of two arbitrary restrictions: either there is no correlation (“Anselin” model) or it follows the very same process as the idiosyncratic errors do (“KKP” model). An encompassing model relaxing these restrictions (generalized spatial random effects, or GSRE) and the relevant ML estimator have been proposed by Baltagi (2007); Baltagi et al. (2013) and a production-quality software implementation has been available in Stata since Belotti et al. (2017). Still, to date there were no user-friendly available routines allowing to consider the GSRE together with a spatial lag (SAR) (as done by Baltagi and Liu 2016). This note describes the extension of the ML estimation framework to the SAR+GSRE and an R implementation of said estimator within the ’splm’ package for spatial panel econometrics, and presents some examples comparing the results from the generalized model to the restricted ones.

The gains from implementing the encompassing error covariance structure are not guaranteed to be substantial; after all, eventual spatial lags apart, the models considered might be consistently estimated by OLS: any RE structure will only improve precision. In turn, by the very nature of the model (independence between individual effects and remainder errors) a misspecification of the random effects does little harm to the estimator of the spatial error parameter (here, \(\rho _2\)). But in spatial models the spatial process in error components is often likely to be of interest in itself, and in this case it is essential to allow for an unrestricted structure of the GSRE type. In turn, in doing so it is essential – unless one has a strong prior for excluding them – to control for spatial lags; otherwise the omitted SAR process will (bias the \({\hat{\beta }}\)s and-) inflate the spatial error coefficients. The software presented in this note allows to fill the gap between theoretical specifications of spatial random effects and empirical practice, removing the need to arbitrarily restrict the spatial process in the random effects for the sake of computational convenience.

7 Computational Details

All the computations in this paper have been performed within the R system for statistical computing (R Core Team 2021). In particular, the estimators illustrated in this paper are forthcoming in the splm package (Millo and Piras 2012) as option errors = “semgre” in the function spreml().