The generalized spatial random effects model in R

Millo, Giovanni

doi:10.1007/s43071-022-00024-9

The generalized spatial random effects model in R

Original Paper
Open access
Published: 24 June 2022

Volume 3, article number 7, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Spatial Econometrics

The generalized spatial random effects model in R

Download PDF

Giovanni Millo ORCID: orcid.org/0000-0002-0140-6681¹

3497 Accesses
Explore all metrics

Abstract

We describe a user-friendly, production quality R implementation of the maximum likelihood estimator of the generalized spatial random effects (GSRE) model of Baltagi, Egger and Pfaffermayr within the well known ’splm’ package for spatial panel econometrics. We extend the maximum likelihood estimator for the GSRE to including a spatial lag of the dependent variable (SAR), and we discuss the theoretical and computational approach. This is the first implementation of the SAR+GSRE, and the second of the original GSRE. Until recently only estimators restricting the spatial structure of individual effects in an arbitrary way have been available and widely employed in applied practice. We present results from the SAR+GSRE and the restricted estimators side by side, drawing on some well-known examples from the spatial econometrics literature. The potential biases from imposing inappropriate restrictions to the spatial error process and/or from omitting the SAR term are illustrated by simulation.

Heterogeneous spatial models in R: spatial regimes models

Article Open access 22 July 2023

Software for Bayesian cross section and panel spatial model comparison

Article 09 September 2015

Bootstrap LM tests for higher-order spatial effects in spatial linear regression models

Article 28 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Panel data models usually incorporate individual effects in order to account for the unobserved heterogeneity at the individual level. The main issue is to decide whether said effects are allowed to be correlated with the regressors (known as the “fixed effects” case, henceforth FE) or not (“random effects”, or RE). In the former case, they are usually estimated out; in the latter, they are treated as a part of the error term which becomes composite: the sum of a time-invariant random effect, representing individual heterogeneity, and an idiosyncratic error representing shocks that happen independently across time and individual. This note is concerned with the RE case (as will be observed below, most of the following becomes irrelevant in a FE setting).^{Footnote 1}

In the case of spatial panels, the error term is likely to be the result of a spatial diffusion process. The question arises then, whether one of the two error components, or both, diffuse spatially. There are two mainstream solutions to this issue which have been extensively studied in the literature, but each of which represents just a special case of a more general specification. The first model dates back to Anselin (1998), and assumes that the idiosyncratic errors follow a first-order spatial autoregressive (SAR(1)) process, while the individual effects are independent of each other. The second, due to Kapoor et al. (2007), assumes that both the individual effects and the remainder errors diffuse in space according to the same SAR(1) process. Henceforth we will label them, respectively, ANS and KKP.

The generalized^{Footnote 2} spatial random effects model (henceforth GSRE) encompasses the previous models by allowing for SAR(1) processes in both the individual effects and the idiosyncratic errors, with different parameters and without any restriction on them but for the standard requirements.^{Footnote 3} It has been first proposed in the literature by Baltagi (2007) and then discussed thoroughly in Baltagi et al. (2013). Nevertheless, practical applications of this model have been virtually non existent and spatial econometricians have continued to rely on the two usual specifications. To our knowledge, the only paper actually employing the GSRE has long remained Delbecq et al. (2012).^{Footnote 4}

The ANS model is the standard specification and, it can be argued, the most commonly employed in applied practice. A review of the empirical literature would be vaste programme w.r.t. the scope of this short note, but a good starting point could be the Google Scholar citation hits for the relevant “software papers” of, respectively, Belotti et al. (2017) (Stata) and Millo and Piras (2012) (R) which will reveal a large number of empirical studies. As of now, the ANS model needs to be estimated by ML.

The KKP model—which can indeed be estimated by ML (see Millo 2014)—was instead born in the tradition of generalized moments (GM) estimators of Kelejian and Prucha (1999), further popularized by Bell and Bockstael (2000) because of their many practical advantages. In fact, like all GM, the KKP-GM estimates are much easier to obtain computationally and do not depend on distributional assumptions on the error term. Moreover, they are asymptotically equivalent to the ML ones, and, unlike the ANS-ML counterpart, it is also possible to allow for endogenous regressors (Fingleton et al. 2008; Piras 2013). Actually, despite this, the KKP model is seldom employed in applied practice. Most of the literature building on it is methodological, and the applied papers are scarce. In a review of the first 130 Google Scholar citation hits (in order of relevance) for the Kapoor et al. (2007) paper, only a handful actually apply the KKP estimator (GM or ML) empirically: Chakir et al. (2013), Fingleton et al. (2015), Baylis et al. (2012), Wheeler et al. (2013), Romão et al. (2017), Wan et al. (2015), Gomez et al. (2013), Kopczewska et al. (2017), Padovano and Petrarca (2014) and Jacquot et al. (2013).^{Footnote 5} All this despite the fact that a user-friendly R implementation of the KKP-GM estimator has been available since before 2010 as function spgm in the ’splm’ package for R (Millo and Piras 2012), together with a more recent KKP-ML equivalent (see function spreml, Millo 2014).

The dearth of applications of the theoretically more appealing generalized model of Baltagi et al. (2013) can instead be due to the lack of user-friendly and well tested software for: estimating the GSRE tout court (until 2017); and, later on, for combining the generalized RE structure with SAR. In fact, the current trend in spatial econometrics—at least since the “sea change” described in Elhorst (2010)—has been to consider more than one source of spatial dependence at once; in this respect, in the current taxonomy of spatial models, the GSRE stands out for resting on an exclusion assumption on the SAR term. This note aims at describing a software routine that fills this gap.

2 The generalized spatial random effects model

Consider a general static panel model that includes a spatial lag of the dependent variable:

$$\begin{aligned} y = \lambda (I_T \otimes W_N)y + X \beta + u \end{aligned}$$

where y is an $NT \times 1$ vector of observations on the dependent variable, X is a $NT \times k$ matrix of observations on the non-stochastic exogenous regressors, $I_T$ an identity matrix of dimension T, $W_N$ is the $N \times N$ spatial weights matrix of known constants whose diagonal elements are set to zero, and $\lambda$ the corresponding spatial parameter.

As usual, individual unobserved heterogeneity is accounted for through individual effects: so that the disturbance vector is the sum of two terms

$$\begin{aligned} u=(\iota _T \otimes I_N) \mu +\varepsilon \end{aligned}$$

where $\iota _T$ is a $T \times 1$ vector of ones, $I_N$ an $N \times N$ identity matrix, $\mu$ is a vector of time-invariant individual specific effects and $\varepsilon$ a vector of idiosyncratic errors. The unobserved individual effects are assumed uncorrelated with the other explanatory variables in the model (so-called “random effects” assumption), and can therefore be safely treated as components of the error term: see, e.g., Assumption RE.1.b in Wooldridge (2010 (10.4)).

Moreover, the remainder error follows a SAR(1) process of the form

$$\begin{aligned} \varepsilon =\rho (I_T \otimes W_N) \varepsilon + e \end{aligned}$$

with $\rho$ as the spatial autoregressive parameter, $W_N$ the spatial weights matrix,^{Footnote 6}$e \sim IID(0, \sigma ^2_e)$ and $I_N - \rho W_N$ assumed non-singular. Thus the model combines a spatial process in the dependent variable with one in the error term.^{Footnote 7}

It remains to be decided how do the individual effects correlate in space, if at all. Two spatial specifications have been proposed in the literature:

2.1 Independent random effects

(Anselin 1998) considers a panel data regression model with spatial errors and incorrelated individual heterogeneity (a special case of the model presented above setting $\lambda =0$ i.e. without a spatial lag). In this case, $\mu \sim IID(0, \sigma ^2_\mu )$, and the remainder error term can be rewritten as:

$$\begin{aligned} \varepsilon = (I_T \otimes B_N^{-1})e \end{aligned}$$

where $B_N = (I_N - \rho W_N)$. As a consequence, the composite error term becomes

$$\begin{aligned} u = (\iota _T \otimes I_N) \mu + (I_T \otimes B_N^{-1})e \end{aligned}$$

and its variance-covariance matrix, if $J_T=\iota _T\iota ^\top _T$ is a $T \times T$ matrix of ones, can be expressed as

$$\begin{aligned} \Omega _u = \sigma ^{2}_\mu (J_T \otimes I_N) + \sigma _{e}^{2} [I_T \otimes (B^\top _N B_N)^{-1}]. \end{aligned}$$

(1)

which is computationally convenient as involving only inversions of matrices of size N instead of NT (Baltagi et al. 2003).

2.2 Spatially correlated random effects

Kapoor et al. (2007) choose a different specification where spatial correlation applies to both the individual effects and the remainder error components in exactly the same way. In this case, commonly referred to as “KKP”, the composite disturbance term

$$\begin{aligned} u = (\iota _T \otimes I_{N})\mu + \varepsilon \end{aligned}$$

follows a first order spatial autoregressive process of the form:

$$\begin{aligned} u = \rho (I_{T} \otimes W_{N}) u + e. \end{aligned}$$

which is equivalent to saying that both $\mu$ and $\varepsilon$ follow a SAR(1) with the same parameter $\rho$.^{Footnote 8} The variance-covariance matrix of u is:

$$\begin{aligned} \Omega _{u} = [I_T \otimes B_N^{-1}] \Omega _\varepsilon [I_{T} \otimes ({B_N}^{\top })^{-1}] \end{aligned}$$

(2)

where $\Omega _\varepsilon = [\sigma _e^2 I_T + \sigma _{\mu }^2 J_T]\otimes I_N$ is the typical variance-covariance matrix of a one-way error component model. The variance matrix in (2) is simpler than the one in (1), and therefore its inverse is easier to calculate (Millo 2014 [4.3.2]).

The two data generating processes imply different spatial spillover mechanisms with a different economic meaning (Baltagi et al. 2013): in the first model only the time-varying components diffuse spatially, in the second spatial spillovers too have a permanent component. Lee and Yu (2012, 2.4) illustrate the difference between this latter specification and ANS through the likelihood of the between model.

2.3 The generalized spatial random effects model

Baltagi et al. (2013) (see also Baltagi et al. 2007) propose a generalized spatial random effects (GSRE) panel data model which relaxes both the hypothesis of no spatial correlation between random effects (ANS) and the somewhat irrealistic assumption that the spatial effects be correlated with the same spatial structure of the remainder errors (KKP).

In this general, encompassing case, each component of the composite error follows a first order spatial autoregressive process of its own:

$$\begin{aligned}&\mu = \rho _1 W_N \mu + \eta \\&\varepsilon = \rho _2 (I_T \otimes W_N) \varepsilon + e \end{aligned}$$

with $\eta \sim IID(0, \sigma ^2_\eta )$ and $e \sim IID(0, \sigma ^2_e)$. The variance-covariance matrix of u is then:

$$\begin{aligned} \Omega _u&= ({\bar{J}}_T \otimes (T\sigma ^2_{\eta }(B_{1N}'B_{1N})^{-1}) + \sigma ^2_e(B_{2N}'B_{2N})^{-1}) \nonumber \\& \quad +\sigma ^2_e (E_T \otimes (B_{2N}'B_{2N})^{-1}) \end{aligned}$$

(3)

where $B_{1N} = (I_N - \rho _1 W_N)$, $B_{2N} = (I_N - \rho _2 W_N)$ and ${\bar{J}}_T=J_T/T$, $E_T=I_T-{\bar{J}}_T$.

With respect to the GSRE, the ANS model can be obtained setting $\rho _1=0$, the KKP setting $\rho _1=\rho _2$: so that each imposes one parameter restriction that is, in general, not guaranteed to be true.

3 Estimation of the extended model

In the following we outline the maximum likelihood (ML) estimator for the extended specification of the GSRE including a spatial lag (henceforth SAR+GSRE). This specification has been introduced by Baltagi and Liu (2016) in a GM framework. Here we propose ML estimation, in the framework of Millo (2014). As such, this can be seen either as an extension of Baltagi et al. (2013) to including a spatial lag, or as an extension of Baltagi and Liu (2016) to ML estimation.

ML routines within the function spreml in the ’splm’ package are constructed according to the general principle of combining a spatial lag with a composite error structure (Millo 2014 [4.3.1]). Any error structure is specified by the scaled error covariance $\Sigma =[\sigma ^2_e]^{-1}\Omega$. The extension to a spatial lag is accomplished by including a spatial filter on y, using $I_T \otimes A = I_T \otimes (I_N - \lambda W)$, and the determinant of the spatial filter matrix $|I_T \otimes A| = |A|^T$ in the likelihood. The general likelihood for the spatial panel model is:

$$\begin{aligned} logL &= - \frac{NT}{2} ln (2 \pi \sigma _e^2) - \frac{1}{2}ln | \Sigma | + Tln|A| \\ &\quad - \frac{1}{2 \sigma _e^2} [(I_T \otimes A)y- X \beta ]' \Sigma ^{-1} [(I_T \otimes A)y - X \beta ]. \end{aligned}$$

(4)

Concentrating the likelihood w.r.t. the parameters in A and $\Sigma$ and optimizing in the usual two-step fashion, alternating between the concentrated likelihood and the (GLS) first order conditions:

$$\begin{aligned}&{\hat{\beta }}=(X'\Sigma ^{-1}X)^{-1}X'\Sigma ^{-1}(I_T \otimes A)y \\&\hat{\sigma ^2_e}= [(I_T \otimes A)y-X\beta ]^\top \Sigma ^{-1}[(I_T \otimes A)y-X\beta ]/NT. \end{aligned}$$

until convergence, one gets the ML estimates for all parameters. Given $\Sigma$ specifying a particular error structure, the inverse and determinant of it can in principle be calculated by brute force; or, preferably, analytical expressions for them can be used when available, for the sake of speed and stability. This approach encompasses, i.a., the simple RE case as well as ANS and KKP (Millo 2014[4.3.2]).

In the GSRE case, from Baltagi et al. (2013), the scaled error covariance is:

$$\begin{aligned} \Sigma = {\bar{J}}_T \otimes (T \phi (B_{1N}'B_{1N})^{-1} + (B_{2N}'B_{2N})^{-1}) + E_T \otimes (B_{2N}'B_{2N})^{-1} \end{aligned}$$

with $\phi =\frac{\sigma ^2_{\mu }}{\sigma ^2_e}$; and the expressions for $\Sigma ^{-1}$ and $|\Sigma |$ to be plugged into (4):

$$\begin{aligned}&\Sigma ^{-1} = ({\bar{J}}_T \otimes (T \phi (B_{1N}'B_{1N})^{-1}) + (B_{2N}'B_{2N})^{-1})^{-1}) + (E_T \otimes (B_{2N}'B_{2N}))\\&|\Sigma | = |T \phi (B_{1N}'B_{1N})^{-1} + (B_{2N}'B_{2N})^{-1}| \dot{|}(B_{2N}'B_{2N})^{-1}|^{T-1} \end{aligned}$$

with $B_{1N}, B_{2N}, J_T, I_T, I_N, {\bar{J}}_T, E_T$ as defined above.

4 Empirical examples

In this section we apply the (SAR+)GSRE estimator to some well-known examples – taken from Millo (2014), Croissant and Millo (2018) and Baltagi (2021), original sources reported below – where the restricted ANS and KKP specifications have been applied.

There are no substantial changes in the estimates ${\hat{\beta }}$ throughout all the examples, bar of course for the difference in magnitude and interpretation between the group of models excluding and those including a SAR term. Therefore we concentrate on the estimates of the variance of individual effects ${\hat{\phi }}$ and those of the spatial parameters (${\hat{\lambda }}$,) ${\hat{\rho }}_1$ and ${\hat{\rho }}_2$. In the ANS and the KKP models, of course, we set ${\hat{\rho }}_1=0$ and ${\hat{\rho }}_1={\hat{\rho }}_2$ respectively.

4.1 Rice farming

The “Rice farming” example of Druska and Horrace (2004) regards the estimation of a production frontier equation relating rice output to the following inputs: seed, urea, phosphate, labour hours and land size, all but phosphate in logs. Dummy variables account for the use of high yield varieties of seed, or for a mix of seed varieties and for the use of pesticides. Dummy variables are also added for the six villages and for the season being a wet one. The proximity matrix is constructed considering all the farms of the same village as neighbours. The example is considered in Millo (2014) where a number of different spatial specifications are estimated. 171 rice farms in Indonesia are observed over six growing seasons, three wet and three dry, between 1975 and 1983.^{Footnote 9}

Estimation results for the spatial and RE parameters are reported in Table 1. The unrestricted model yields a large negative estimate for $\rho _1$ and an even larger standard error; its results are therefore not totally at odds with ANS, but inconsistent with KKP.

Table 1 Rice farming model

Full size table

Including a SAR term gives rise to a slight compensation between ${\hat{\rho }}_2$ and ${\hat{\lambda }}$, the latter nevertheless being small in magnitude and on the verge of statistical significance (p-value: 0.062).

4.2 Italian insurance

The “Italian insurance” example of Millo and Carmeci (2011) is, again, considered in Millo (2014) for comparing many different combinations of spatial and random effects (GSRE excluded).Millo and Carmeci (2011) analyze the determinants of per-capita equilibrium consumption of non-life insurance in all 103 Italian provinces over five years, 1998 to 2002, based on socioeconomic characteristics of territory: per-capita income and wealth as proxied by bank deposits, real lending rates, territorial density of population and of the distribution network; demographic characteristics as average family size and schooling and the prevailing level of trust; the share of agriculture on value added; and the level of inefficiency of civil justice. Estimation results are reported in Table 2.

Table 2 Italian insurance model

Full size table

In this case the SEM term is deemed insignificant by all estimators; by contrast, using the GSRE lets the spatial process in the random effects emerge, which would be assumed out by the ANS or, alternatively, disappear in the KKP as the estimator is unable to tell between $\rho _1$ and the insignificant $\rho _2$.

The interpretation becomes less clear when including a SAR term: $\lambda$ is marginally significant (p-value: 0.062) and of considerable magnitude; at the same time, $\rho _1$ is not significant any more (it even changes sign) while $\rho _2$ becomes significant and negative. The meaning is not clear; tentative interpretations can be based on considerations in the original paper: aggregation biases from concentration of salespoints and/or omitted variable effects from the impossibility of observing purchasing power differentials (Millo and Carmeci 2011 [5.1]). At least, here the complete specification of the spatial process provides us with an unbiased estimate of the ${\hat{\beta }}$s, while the simple GSRE would yield biased coefficients.

4.3 Public capital productivity

The third and last example from Millo (2014) is the Munnell Alicia (1990) “Public capital productivity” model. It involves a social production function, estimated with the main goal of assessing the productivity of public capital (roads, water facilities, other infrastructure) in 48 US States observed over 17 years. It has been originally popularized by the famous panel data textbook of Baltagi (2021,see Example 3). The model is a Cobb-Douglas production function where the gross social product (gsp) of a given state depends on the inputs of: public capital, private capital and labour; plus state unemployment rate as a control for the business cycle. The relevant estimates are reported in Table 3.

Table 3 Munnell’s public capital productivity model

Full size table

The Munnell example has issues of nonstationarity (see again Millo 2014), so maybe a difference specification would be in order. For our illustrative purposes here, let us observe that the GSRE yields another significant estimate ${\hat{\rho }}_2$, while ${\hat{\rho }}_1$ is about one half in magnitude, and not far from significance ($p-$value: 0.12) so that the “truth”, at least maintaining the GSRE world, lies somewhere in between ANS and KKP.

The SAR extension is surprisingly clear-cut, given that ex ante one might have expected some direct influence of one state’s product on the neighbours. The ${\hat{\lambda }}$ estimate is instead very small in magnitude, while the estimation of the standard error fails. This behaviour of the numerical Hessian, documented in Millo (2014,5.1.5), tends to happen when the target parameter is so close to zero to make an assessment of statistical significance redundant. In other words, economic priors notwithstanding, the statistical evidence very strongly upholds the error model.

4.4 Evapotranspiration

The next example, “Evapotranspiration”, is taken from Croissant and Millo (2018,Ch. 10). Obojes et al. (2015) explore the effect of vegetation composition and structure on water balance on some high elevation grasslands in the Alps. They repeatedly measure the water balance of soil monoliths in deep seepage collectors in four experimental sites over three study areas, two in the French Alps, one in Switzerland and one in Austria. The present example replicates the results from 5 repeated measurements over 86 observation units in the Austrian site. See the estimates of spatial and RE parameters in Table 4.

Table 4 Evapotranspiration model

Full size table

The “Evapotranspiration” data show very strong evidence of a spatial process in idiosyncratic errors, as is to be expected from data collected at nearby locations and influenced by the weather. As for random effects, none are detected by the GSRE; the ANS is hence consistent with the GSRE, while the KKP, imposing to $\rho _1$ the same very high estimate as $\rho _2$, completely misses the mark. As can be seen, though, this bias does not have consequences on the estimates ${\hat{\phi }}, {\hat{\rho }}_2$ and seems therefore quite harmless unless one is interested precisely in the diffusion process of the random effects.

The complete SAR+GSRE model upholds the above conclusions: the SAR term, although not negligibly small in magnitude, is statistically not significant while the significant error components’ estimates ${\hat{\phi }}$ and ${\hat{\rho }}_2$ (and their standard errors) are largely unchanged.

4.5 Cigarette

A spatial econometrics paper would not be complete without the ubiquitous “Cigarette” example, whose pervasiveness made it a standard which helps comparing different pieces of research. Featuring prominently in a number of textbooks (one for all, Baltagi 2021), the original application is in Baltagi and Levin (1992) and it has been reconsidered, i.a., by Baltagi and Griffin (2001).

The Cigarette dataset contains data for the years 1963–1992 and 46 American states on real per capita sales of cigarettes per adult person, average real retail price and real disposable income per capita. Originally, the minimum price in neighbouring states was included in order to proxy for cross-border smuggling; alternatively, this can be controlled for through spatial effects, which has made this dataset a good candidate for spatial examples. It must be kept in mind, though, that the original formulation was dynamic, as appropriate for models of persistent habits. The relevant parameter estimates are reported in Table 5.

Table 5 Cigarette model

Full size table

Including a SAR term hardly changes the substantive conclusions; although numerically said term turns out weakly significant, its magnitude is very small (− 0.007). Once more, a tendency emerges for the spatial lag and error terms to compensate each other.

4.6 Summary of results

The results are in general not supportive of the KKP restriction: the spatial processes of errors and individual effects have different features, as could be expected. In some cases, they do instead uphold the ANS restriction: no spatial process in the random effects. Whether this last result is due to the spatial correlation of the individual heterogeneity actually being zero, or to the lack of precision of the estimator ${\hat{\rho }}_2$ , is a question we are not in a position to answer.

The extension to a spatial lag does generally uphold the analyses based on the spatial error specifications; with the partial exception of the Italian insurance example, where spatial effects emerge which are likely to be the artifact of an incorrect specification of the connectivity structure. In some cases, in fact (Rice farming, Evapotranspiration), the original authors had a strong prior in favour of the error model; but this might often not be the case. In the next Section we will address the consequences of misspecification on parameter estimation by putting us in some clear-cut simulated situations.

5 Simulated examples

In the following Section we perform estimation on some simulated datasets, in order to highlight the effects of misspecification in a controlled environment. We address in this order the consequences of two kinds of misspecification: of the composite error structure (e.g., estimating a KKP model when the DGP is ANS) which can be addressed using the GSRE; and of spatial lag vs. error, which calls for the use of the most general SAR+GSRE. We will show how the second kind can produce the most seriously biased results in terms of the spatial structure. It goes without saying that the misspecification of the spatial lag is by far the worst even as regards the effects of the regressors: in fact, omitting a spatial lag will yield biased ${\hat{\beta }}$s, while a misspecified error structure will only affect their efficiency.

A large scale Monte Carlo exercise considering many parameter combinations is out of the scope of this note. Our compact illustrative example considers just one panel size representative of commonly found datasets: $N=48$, $T=10$; the spatial ordering is taken from the US states. We only consider zero or nonzero values for the parameters, and for the sake of simplicity, we set every nonzero value to 0.6. We trust our examples to be representative of a much more general situation. 1000 simulation runs are performed for each scenario/combination of parameter values.

5.1 Specification of the error structure

In this subsection we report estimation of the GSRE and SAR+GSRE models when the “true” DGP is, respectively, ANS, KKP or RHO1 (i.e., $\rho _1\ne 0$ while $\rho _2 = 0$). All graphs (omitted) show that both the GSRE and the SAR+GSRE estimates concentrate around the “true” parameter values.

5.2 Misspecification of the spatial lag vs. error

In this subsection we illustrate the effect of omitting the SAR term from different DGPs containing spatial lags. As the comparison between the density of GSRE (red lines) and SAR+GSRE (blue lines) estimates shows, the former are severely biased if the DGP contains a SAR term (see Figs. 1, 2, 3).

The omitted spatial lag does clearly “discharge” on the included spatial error coefficients, yielding a spurious result. The random effects’ variance ${\hat{\phi }}$ is affected as well, but its magnitude is usually of lesser substantial interest.

5.3 Reliability of the optimization procedure

A last important aspect of estimation is the reliability of the optimization procedure. In Table 6 below, we report the success (i.e., convergence) rates for the two GSRE estimators, with and without SAR, under the seven different simulation scenarios.

Table 6 Rate of successful convergences out of 1000 simulation runs for the GSRE and SAR+GSRE maximum likelihood estimators under different scenarios; percent

Full size table

The only failures have been recorded under the RHO1 scenario; unexpectedly, the failure rate is slightly higher for the simpler GSRE estimator than for the SAR+GSRE.

5.4 Summary of simulation results

Our limited exercise can only give a hint about the real-world properties of the proposed software procedures: a larger Monte Carlo project would be required for a formal assessment, mixing different values for all parameters involved over a dense multidimensional grid. Still, the results are encouraging. If correctly specified, the GSRE and SAR+GSRE look quite reliable in our limited simulation exercise, both in terms of convergence and of precision, under each of the different DGP scenarios. Only in the least interesting one, RHO1, which is usually not considered in the literature and was added for the sake of completeness, did a minority of the simulation runs fail to converge. The effects of misspecification are most serious when a relevant SAR term is omitted: the spatial effect then “discharges” on the estimates of the error parameters $\rho _1$ and $\rho _2$, inflating and biasing them; but of course, as observed at the beginning of this section, the most serious consequence would be the bias of ${\hat{\beta }}$. This further motivates the extension of the estimator discussed in the present paper.

The reader shall keep in mind that there are actually two sides to the modelling of a spatial process: the functional form (SAR or GSRE etc.) and the structure of spatial proximity. We address the misspecification of the spatial process in terms of effects, taking instead for granted the spatial structure: i.e., we employ the “true” W matrix in estimation. Addressing the effects of misspecification of W would be a very interesting but complex task which we leave to future research.

6 Conclusions

Random effects methods, as discussed above, are not always of interest in spatial applications. When the RE hypothesis cannot be safely assumed, FE methods are in order, and in this case the issue about spatial correlation of the individual heterogeneity becomes moot because the individual effects are estimated out (although they can be subsequently recovered and their spatial correlation assessed ex post). Nevertheless, if individual heterogeneity of the RE type enters a spatial model, its correlation in space is a potentially interesting topic. The currently employed estimators impose one of two arbitrary restrictions: either there is no correlation (“Anselin” model) or it follows the very same process as the idiosyncratic errors do (“KKP” model). An encompassing model relaxing these restrictions (generalized spatial random effects, or GSRE) and the relevant ML estimator have been proposed by Baltagi (2007); Baltagi et al. (2013) and a production-quality software implementation has been available in Stata since Belotti et al. (2017). Still, to date there were no user-friendly available routines allowing to consider the GSRE together with a spatial lag (SAR) (as done by Baltagi and Liu 2016). This note describes the extension of the ML estimation framework to the SAR+GSRE and an R implementation of said estimator within the ’splm’ package for spatial panel econometrics, and presents some examples comparing the results from the generalized model to the restricted ones.

The gains from implementing the encompassing error covariance structure are not guaranteed to be substantial; after all, eventual spatial lags apart, the models considered might be consistently estimated by OLS: any RE structure will only improve precision. In turn, by the very nature of the model (independence between individual effects and remainder errors) a misspecification of the random effects does little harm to the estimator of the spatial error parameter (here, $\rho _2$). But in spatial models the spatial process in error components is often likely to be of interest in itself, and in this case it is essential to allow for an unrestricted structure of the GSRE type. In turn, in doing so it is essential – unless one has a strong prior for excluding them – to control for spatial lags; otherwise the omitted SAR process will (bias the ${\hat{\beta }}$s and-) inflate the spatial error coefficients. The software presented in this note allows to fill the gap between theoretical specifications of spatial random effects and empirical practice, removing the need to arbitrarily restrict the spatial process in the random effects for the sake of computational convenience.

7 Computational Details

All the computations in this paper have been performed within the R system for statistical computing (R Core Team 2021). In particular, the estimators illustrated in this paper are forthcoming in the splm package (Millo and Piras 2012) as option errors = “semgre” in the function spreml().

Notes

The use of RE methods in spatial econometrics is debated; the issue goes beyond the scope of the present note, for some references see Millo (2014,3.1). In short, some authors (e.g., Elhorst 2014) argue that the random effects hypothesis be out of place when the individual units cannot be supposed to be drawn randomly from a larger population, being themselves “the population”: unique regions in a geographic space without an obvious asymptotic extension. Others, following Wooldridge (2010), contend that what counts, ultimately, are the properties of the individual error components, and in particular the correlation with the regressors, or lack thereof. More practically, one suggested advantage of random effects estimation is that it permits identification of time-constant covariates, and can pick up long-run effects, whereas fixed effect estimation focuses on short-run variation (Fingleton and Palombi 2013). For a thorough comparison of FE versus RE methods in spatial panels, see Lee and Yu (2012). On the issue of FE versus RE in general, Baltagi (2021, p. 28).
The GSRE does not explicitly consider time effects: in this, it does not live up to its name because it omits an essential feature (we thank an anonymous reviewer for this observation). Yet in the typical spatial panel with a moderate number of time periods, time dummies are an easy addition which readily solves the problem.
It is common in the spatial literature to require that the coefficient $\lambda$ for a SAR process be strictly contained in the interval $-\frac{1}{\eta _{max}}, \frac{1}{\eta _{max}}$ where $\eta _{max}$ is the largest eigenvalue of the W matrix.
This was until recently, after the user-friendly Stata package XLSME of Belotti et al. (2017) was published becoming an immediate success, counting over 300 Google Scholar citations to date (March 2022). Yet it remains true in the substance; again to the best of our knowledge, the avilability of XSLME produced a total of another two published papers (Amidi et al. 2020, 2020) and three unpublished manuscripts actually employing the GSRE, most other users going instead for SDM/AR/EM.
Also, Baltagi et al. (2016) employ the KKP at the second stage of an improved specification, see their Footnote 6; and Baltagi et al. (2012) include it in a roundup of estimators of which they assess the forecasting performance.
In principle, the spatial weights matrices in the lag and the error term can differ, although here we dispose with this slight complication.
In the taxonomy of the ’splm’ package, the usual acronym SAR (as in Spatial AutoRegressive) is used to indicate the presence of a spatial lag; SEM (Spatial Error Model) for a spatially autoregressive process in the error; SAREM for the combined model – the one often named SAC or SARAR in the literature. A suffix (G/2)RE is added for the different kinds of random effects. In the text we will use the more evocative ANS (=SEMRE), KKP (=SEM2RE) and GSRE (=SEMGRE) throughout.
There are actually some differences between the original formulation in Kapoor et al. (2007, Eqns. 4 and 5) and the one in Baltagi et al. (2013). We thank Anna Gloria Billè for pointing this out. For consistency with the previous literature we refer to the Baltagi et al. (2013) formulation throughout.
It is reasonable to see the many small farms as random draws from a bigger population, in the spirit of the random effects specification (Table 1).

References

Amidi S, Fagheh MA (2020) Geographic proximity, trade and economic growth: a spatial econometrics approach. Ann GIS 26(1):49–63
Article Google Scholar
Amidi S, Fagheh MA, Javaheri B (2020) Growth spillover: a spatial dynamic panel data and spatial cross section data approaches in selected Asian countries. Fut Bus J 6(1):1–14
Google Scholar
Anselin Luc.(1988) Spatial econometrics: methods and models. Vol. 4. Springer
Baltagi, Badi H. (2021) Econometric analysis of panel data. Springer
Baltagi BH, Griffin JM (2001) The econometrics of rational addiction: the case of cigarettes. J Bus Econ Stat 19(4):449–454
Article Google Scholar
Baltagi BH, Levin D (1992) Cigarette taxation: raising revenues and reducing consumption. Struct Change Econ Dyn 3(2):321–335
Article Google Scholar
Baltagi BH, Liu L (2016) Random effects, fixed effects and Hausman’s test for the generalized mixed regressive spatial autoregressive panel data model. Econom Rev 35(4):638–658
Article Google Scholar
Baltagi BH, Bresson G, Pirotte A (2003) Fixed effects, random effects or Hausman-Taylor?: A pretest estimator. Econ Lett 79(3):361–369
Article Google Scholar
Baltagi BH, Peter E, Michael P (2007) A Monte Carlo study for pure and pretest estimators of a panel data model with spatially autocorrelated disturbances. Annales d’Economie et de Statistique 11–38
Baltagi BH, Bresson G, Pirotte A (2012) Forecasting with spatial panel data. Comput Stat Data Anal 56(11):3381–3397
Article Google Scholar
Baltagi BH, Egger P, Pfaffermayr M (2013) A generalized spatial panel data model with random effects. Econom Rev 32(5–6):650–685
Article Google Scholar
Baltagi BH, Egger PH, Kesina M (2016) Firm-level productivity spillovers in china’s chemical industry: a spatial hausman-taylor approach. J Appl Econom 31(1):214–248
Article Google Scholar
Baylis K, Garduño-Rivera R, Piras G (2012) The distributional effects of NAFTA in Mexico: evidence from a panel of municipalities. Reg Sci Urban Econ 42(1–2):286–302
Article Google Scholar
Bell KP, Bockstael NE (2000) Applying the generalized-moments estimation approach to spatial problems involving micro-level data. Rev Econ Stat 82(1):72–82
Article Google Scholar
Belotti F, Hughes G, Mortari AP (2017) Spatial panel-data models using Stata. Stata J 17(1):139–180
Article Google Scholar
Chakir R, Le Gallo J (2013) Predicting land use allocation in France: a spatial panel data analysis. Ecol Econ 92:114–125
Article Google Scholar
Croissant Y, Millo G (2018) Panel data econometrics with R. Wiley
Delbecq BA, Brown JP, Florax RJGM, Kladivko EJ, Nistor AP, Lowenberg-DeBoer JM (2012) The impact of drainage water management technology on corn yields. Agron J 104(4):1100–1109
Article Google Scholar
Druska V, Horrace WC (2004) Generalized moments estimation for spatial panel data: Indonesian rice farming. Am J Agric Econ 185–198
Elhorst JP (2010) Applied spatial econometrics: raising the bar. Spat Econ Anal 5(1):9–28
Article Google Scholar
Elhorst JP (2014) Spatial econometrics from cross-sectional data to spatial panels. Springer
Fingleton B, Le Gallo J (2008) Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: finite sample properties. Pap Reg Sci 87(3):319–339
Article Google Scholar
Fingleton B, Palombi S (2013) Spatial panel data estimation, counterfactual predictions, and local economic resilience among British towns in the Victorian era. Reg Sci Urban Econ 43(4):649–660
Article Google Scholar
Fingleton B, Garretsen H, Martin R (2015) Shocking aspects of monetary union: the vulnerability of regions in Euroland. J Econ Geogr 15(5):907–934
Article Google Scholar
Gomez LM, Filippini M, Heimsch F (2013) Regional impact of changes in disposable income on Spanish electricity demand: a spatial econometric analysis. Energy Econ 40:S58–S66
Article Google Scholar
Jacquot M, Coeurdassier M, Couval G, Renaude R, Pleydell D, Truchetet D, Raoul F, Giraudoux P (2013) Using long-term monitoring of red fox populations to assess changes in rodent control practices. J Appl Ecol 50(6):1406–1414
Article Google Scholar
Kapoor M, Kelejian HH, Prucha IR (2007) Panel data models with spatially correlated error components. J Econom 140(1):97–130
Article Google Scholar
Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econ Rev 40(2):509–533
Article Google Scholar
Kopczewska K, Kudła J, Walczyk K (2017) Strategy of spatial panel estimation: spatial spillovers between taxation and economic growth. Appl Spat Anal Policy 10(1):77–102
Article Google Scholar
Lee L, Yu J (2012) Spatial panels: random components versus fixed effects. Int Econ Rev 53(4):1369–1412
Article Google Scholar
Millo G (2014) Maximum likelihood estimation of spatially and serially correlated panels with random effects. Comput Stat Data Anal 71:914–933
Article Google Scholar
Millo G, Carmeci G (2011) Non-life insurance consumption in Italy: a sub-regional panel data analysis. J Geogr Syst 13(3):273–298
Article Google Scholar
Millo G, Piras G (2012) splm: spatial panel data models in R. J Stat Softw 47:1–38
Article Google Scholar
Munnell AH et al. (1990) Why has productivity growth declined? Productivity and public investment. New England Econ Eev:3–22
Obojes N, Bahn M, Tasser E, Walde J, Inauen N, Hiltbrunner E, Saccone P, Lochet J, Clément J-C, Lavorel S et al (2015) Vegetation effects on the water balance of mountain grasslands depend on climatic conditions. Ecohydrology 8(4):552–569
Article Google Scholar
Padovano F, Petrarca I (2014) Are the responsibility and yardstick competition hypotheses mutually consistent? Eur J Polit Econ 34:459–477
Article Google Scholar
Piras G (2013) Efficient GMM estimation of a cliff and ord panel data model with random effects. Spat Econ Anal 8(3):370–388
Article Google Scholar
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Romão J, Guerreiro J, Rodrigues PMM (2017) Territory and sustainable tourism development: a space-time analysis on European regions. Region 4(3):1–17
Article Google Scholar
Wan J, Baylis K, Mulder P (2015) Trade-facilitated technology spillovers in energy productivity convergence processes across EU countries. Energy Econ 48:253–264
Article Google Scholar
Wheeler D, Hammer D, Kraft R, Dasgupta S, Blankespoor B (2013) Economic dynamics and forest clearing: a spatial econometric analysis for Indonesia. Ecol Econ 85:85–96
Article Google Scholar
Wooldridge JM (2010) Econometric analysis of cross section and panel data. MIT press

Download references

Funding

Open access funding provided by Università degli Studi di Trieste within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

DEAMS, University of Trieste, Trieste, Italy
Giovanni Millo

Authors

Giovanni Millo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Millo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Giovanni Millo: The software described will be available in the R package splm from Version 1.6.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Millo, G. The generalized spatial random effects model in R. J Spat Econometrics 3, 7 (2022). https://doi.org/10.1007/s43071-022-00024-9

Download citation

Received: 31 December 2021
Accepted: 21 May 2022
Published: 24 June 2022
DOI: https://doi.org/10.1007/s43071-022-00024-9

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The generalized spatial random effects model in R

Abstract

Similar content being viewed by others

Heterogeneous spatial models in R: spatial regimes models

Software for Bayesian cross section and panel spatial model comparison

Bootstrap LM tests for higher-order spatial effects in spatial linear regression models

1 Introduction