Abstract
This paper presents the progress made so far in the development of the R package hspm. The package hspm aims at implementing a variety of models and methods to control for heterogeneity in spatial models. Spatial heterogeneity can be specified in different ways, ranging from exogenous (or endogenous) spatial regimes models, to models with coefficients that potentially vary for each observations (i.e., continuous heterogeneity). We focus on a few R functions that allow for the estimation of a general spatial regimes model, as well as all of the nested specifications deriving from it. The models are estimated by instrumental variables and generalized method of moments techniques.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Spatial effects are generally divided into two different categories: spatial dependence and spatial heterogeneity (Anselin 1988). While cross-sectional dependence has to do with correlation between spatial units, spatial heterogeneity consists of instabilities over space that are generally reflected by variations across individual units (Anselin 2010).
In practice there are various ways of tackling unobserved heterogeneity, such as controlling for spatial heteroscedasticity (Kelejian and Prucha 2007), spatial regimes models (Anselin 1988; Anselin and Rey 2014), geographically weighted regressions (Fotheringham et al. 1998, 2002), and multilevel (or hierarchical) models (Arcaya et al. 2012), among others.Footnote 1
An interesting distinction between discrete and continuous spatial heterogeneity has been made by Anselin and Amaral (2021). From a discrete perspective, they argue that spatial regimes models are the most common way of dealing with spatial heterogeneity. In a nutshell, spatial regimes models are a class of models whose coefficients may vary across space. The term regimes indicates that the observations are grouped according to some criteria that relates to space. Interestingly, Anselin and Amaral (2021) point out that, even if the estimation of spatial regimes regressions is well established, the identification of the regimes still remains a subject for investigation. Additionally, they acknowledge the existence of three approaches to identify the regimes. The first approach is based on exogenous regimes (e.g., determined through administrative boundaries); the second is when the regimes result from a data-driven procedure (e.g., observation are aggregated using some clustering method); and the last one corresponds to a situation where the coefficients and the regimes are jointly determined.
From an empirical perspective, attempts to consider spatial heterogeneity in model specification have mostly, but not exclusively, focused on economic geography and regional sciences. This is verified by the special attention that local labor markets (Huiban et al. 2004; Longhi and Nijkamp 2005; Melo et al. 2012), and regional economic convergence (Rey and Janikas 2005; Ramajo et al. 2008; Ertur et al. 2006) have received over the years. However, spatial heterogeneity has gained an increasing interest also in other disciplines, such as quantitative geography (Song et al. 2020; Georganos et al. 2021; Shu et al. 2019), urban growth (Zhai et al. 2021), urban sprawl (Deng et al. 2020; Irwin and Bockstael 2007), geology (de Marsily et al. 2005), ecology and evolution (Vinatier et al. 2011), epidemiology (Thomas et al. 2020), physics and air pollution (He et al. 2022), among others.
From a software availability perspective, spatial models to control for spatial dependence are well established.Footnote 2 Code dealing with spatial heterogeneity is relatively sparse but also long-established.Footnote 3 In this scenario, hspm is an ambitious project that aims at developing and implementing various methodology to control for heterogeneity in spatial models. This article presents the methodological innovations that have been made so far dealing with spatial and (non spatial) regimes models. In particular, we present R functions that allow for the estimation of a general spatial regimes model, as well as all of the nested specifications deriving from it. The models are estimated by instrumental variables (IV) and generalized method of moments (GMM) techniques.
The rest of this paper is a mere description of the package functionality to get the readers to familiarize with the different functions contained in it. In particular, Sect. 2 introduces the two data sets that we use throughout the paper: the first is based on a housing price model in the city of Baltimore; the second contains county level data for homicides and selected socio-economic characteristics for the continental United States. The difference between these two data sets is that the second one suffers from endogeneity and requires instrumental variables methods implemented in hspm. In Sect. 3 we introduce the function regimes which is the basic function to deal with (non-spatial) regimes models. Section 4 is devoted to the illustration of the function ivregimes that allows for endogenous variables in a non-spatial context. The function spregimes is presented in Sect. 5. spregimes is a wrapper function that allows to estimate a regimes model with a spatial lag of the dependent variable, the spatial lag of (part of) the regressors, a spatially lagged error term and additional (other than the spatial lag) endogenous variables. As we will see, spregimes also allows to estimate all of the nested specifications included in this general model. In Sect. 6, we explain why hspm does not calculate the impacts measures put forth by LeSage and Pace (2009), and we show a simple way to deal with those impacts in a special case (i.e., when the spatial weighting matrix is block diagonal). Section 7 draws some conclusions and gives indications for future developments of the package. Finally, “Appendix A” compares our implementation with code available from PySAL library (Rey et al. 2022; Rey and Anselin 2007, 2010) developed in Python (van Rossum 1995).
2 Data sets
To illustrate the capabilities of hspm we make use of two data sets: baltimore and natreg.Footnote 4
2.1 Baltimore
The Baltimore data set on housing price (Dubir 1992) contains many standard factors to explain the price of a dwelling (PRICE): the number of rooms (NROOM), the number of bathrooms (NBATH), the age of the construction (AGE), the size of the lot (LOTSZ), the number of car space in a garage (GAR), and the square footage of the house (SQFT). Additional dummy variables are included to check whether the house has a patio (PATIO), a fireplace (FIREPL), and air conditioning (AC). The variable employed to identify the regimes is a binary equal to one if the dwelling is situated in Baltimore County and zero otherwise (CITCOU).
The following code loads the data and creates the spatial weighting matrix (of class listw) using a binary contiguity criterion.
2.2 natreg
The data in natreg contains information on homicides and selected socio-economic characteristics for the (continental) counties in the U.S., for four decennial census years, last of which is 1990 (Messner et al. 2000; Baller et al. 2001). Specifically, the dependent variable is the homicide rate in 1990 (HR90). Among the regressors we include median age (MA90), population structure (PS90), resource deprivation (RD90), and the, potentially endogenous, unemployment rate (UE90).Footnote 5 The instruments consist of three variables: percentage of female headed households (FH90), percentage of families below poverty (FP89), and the Gini index of family income inequality (GI89).Footnote 6 The regimes identifier is the variable REGIONS that divides the counties in three regions: south, west, and other (not south or west).
The following code loads the data and the spatial weighting matrix of class Matrix (Bates et al. 2022) based on the six nearest neighbors criteria:
3 The basic (non spatial) model and the function regimes
3.1 The basic (non spatial) model
For convenience and without loss of generality, we assume the presence of only two regimes (i.e., \(j=1,2\)). The basic (non spatial) model can be written in a general way as:
where \(y = [y_1^\prime ,y_2^\prime ]^\prime \), and the \(n_1 \times 1\) vector \(y_1\) contains the observations on the dependent variable for the first regime, and the \(n_2 \times 1\) vector \(y_2\) (with \(n_1 + n_2 = n\)) contains the observations on the dependent variable for the second regime. The \(n_1 \times k\) matrix \(X_1\) and the \(n_2 \times k\) matrix \(X_2\) are blocks of a block diagonal matrix of regressors, the vectors of parameters \(\beta _1\) and \(\beta _2\) have dimensions \(k_1 \times 1\) and \(k_2 \times 1\), respectively, X is the \(n \times p\) matrix of regressors that do not vary by regime, \(\beta \) is a \(p\times 1\) vector of parameters, and \(\varepsilon = [\varepsilon _1^\prime ,\varepsilon _2^\prime ]^\prime \) is the n-dimensional vector of regression disturbances.Footnote 7 Even though this is not a “traditional” spatial model, spatial heterogeneity is taken into account by considering a regimes variable that is revealing some spatial aspects of the data. The model in Eq. (1) can be estimated by OLS after reorganizing the data according to Eq. (1).Footnote 8
3.2 The function regimes
The function regimes has four arguments: formula, data, rgv, and vc. The right hand side of the formula can be of different lengths. If the length is one, it is assumed that all coefficients are different by regimes. When the length of the formula is two, the variables in the first part are kept constant, while those in the second part are different by regimes.
The argument rgv is a formula that indicates the regimes variable. The are two options to estimate the variance-covariance matrix of the estimated coefficients: "groupwise" and "homoskedastic": If vc is set to "groupwise", the model is estimated according to a feasible generalized least squares procedure.Footnote 9
In the example below, all the regressors vary by regimes and the vc argument is set to "homoskedastic":
In the coefficients table printed by the summary method, the different regimes are indicated with numbers inherited from the regime variable.
Since this basic specification does not account explicitly for space, one can obtain the spatial LM tests for spatial dependence by estimating two separate equations, and then using the function lm.LMtests available from the package spdep:
Table 1 reports the results of the five tests implemented in lm.LMtests.Footnote 10 While none of the tests is statistically significant in the first equation, the equation for the second regime points at a spatial lag specification.
Interestingly, it is also possible to test various types of restrictions. As an example, we can consider restrictions on the coefficients for the same variable in different regimes. The code below shows how to implement those tests for the variable NBATH: \(H_0:\beta _{\texttt {NBATH\_0}} = \beta _{\texttt {NBATH\_1}}\). There are multiple ways to test linear hypothesis in R. We choose the implementation provided by the function linearHypothesis from the car package (Fox and Weisberg 2019).
The result shows that we can reject the null hypothesis that NBATH has the same effect on housing price regardless of whether the dwelling is in Baltimore County or another county.
One can also perform a Wald test for the joint significance of the coefficients using the function wald.test from the library aods3 (Lesnoff and Lancelot 2022). In our example below, the null hypothesis is
In the example below, we show that it is possible to identify regimes using a (clustering) data driven procedure. For an illustrative purpose, we use the (scaled) geographical coordinates of the dwellings to identify two regimes using the kmeans function. The results from this model are different from the previous one.
4 Endogenous variables and the function ivregimes
4.1 Endogenous variables
The basic (non spatial) model with endogenous variables can be written in a general way as:
where the difference with Equation (1) is given by the presence of the \(n_1 \times q\) matrix \(Y_1,\) the \(n_2 \times q\) matrix \(Y_2\) and the \(n \times r\) matrix Y, with the corresponding vectors of parameters \(\pi _1, \pi _2\) and \(\pi \). Since those three matrices contain endogenous variables, the model is estimated using IV techniques.
4.2 The function ivregimes
The function ivregimes has four arguments: formula, data, rgv and vc. The right-hand side of the formula has four parts. The first part must contain all the regressors (exogenous and endogenous) that do not vary by regimes. The second part has all the regressors (exogenous and endogenous) that vary by regimes. The third part includes all the exogenous regressors and external instruments that do not vary by regimes. The fourth part has all the exogenous regressors and external instruments that vary by regimes. Let H be the matrix of instruments (exogenous regressors and additional instruments for the endogenous variables) for the endogenous variables. Then the formula for ivregimes has the following structure:
The following formula states that none of the regressors (exogenous and endogenous) is fixed (note the 0), and they all vary by regime. The instrument matrix is made up of the exogenous variables MA90, PS90, and RD90, and the external instruments FH90, FP89, and GI89. The function ivregimes checks internally that the instruments are at least as many as the endogenous variables.
The argument vc determines how the variance-covariance matrix should be estimated. Specifically, it takes on three values: "homoskedastic", "robust" and "OGMM".,Footnote 11
We use ivregimes to estimate the previous model, form_nse, and we set vc = "robust":
5 The spatial models and the function spregimes
5.1 The spatial model
A general spatial model is one that contains spatial lag of the dependent variable, spatial lag of the error term, and spatial lag of (some of) the regressors. This is combined with the fact that hspm allows for additional endogenous variables and regimes. For this reason, we decided to present each model separately. It is worth emphasizing again that our presentation of the function is not intended to guide users’ choice in terms of model specification, but rather to illustrate the arguments of the function. The general model is estimated following a series of steps that alternate IV with GMM techniques. These steps are an adaptation of the general cross-sectional model in Kelejian and Prucha (2010a) and Arraiz et al. (2010) to spatial regimes models.Footnote 12
5.2 The function spregimes
spregimes is used to estimate the general model as well as all of the nested specifications that derive from it. The function has eleven arguments. In this section we describe the formula, and we delay the discussion of the other arguments to the next sections. In spregimes, the right-hand side of formula must be specified with six parts. Specifically, the formula for spregimes has the following structure:
Since the specification of the formula is the trickiest part, we use three examples.
form_sp_b below is based on the Baltimore data. The variables AC, AGE and NROOM are the regressors that do not vary by regimes, while PATIO, FIREPL, and SQFT are those that vary. The third part is used to specify the spatially weighted regressors (in this case, AGE, NROOM and NBATH). It is important to stress that the spatial lag of one regressor varies only if the regressor itself vary. Vice-versa, if the regressor is fixed, also the lag would be so. For example, since AGE and NROOM vary by regimes also their lags vary. On the other hand, since NBATH is fixed, also the lag of NBATH will not vary. The next three parts of the formula serve to specify the fixed instruments (part four), the instruments that vary (part five), and the spatial lag of the external instruments (part six). Since there are no endogenous variables in Baltimore data, part four and part five of the formula are the same as part one and part two. The sixth part is set to 0 indicating that there are no external instruments to be lagged.
The second and third formulas are specified in terms of natreg data. The formula form_sp_n should be interpreted in the following way. The regressor MA90 is fixed. The intercept, PS90, RD90, and UE90 are the regressors that vary by regimes. The spatial lag of MA90 is also considered among the regressors. Since MA90 is fixed, also its spatial lag is fixed. Next, we have one instrument fixed (MA90), and five instruments that change by regimes, namely PS90, RD90, FH90, FP89, and GI89. None of the additional instruments is spatially lagged (the 0 in the last line below).
In form_sp_n2 all of the regressors vary by regime (since the first part of the formula is 0). The spatial lag of MA90 is also included. Since MA90 varies, so does its spatial lag. Furthermore, all of the instruments vary by regimes, and none of the external instruments is lagged.
5.3 Linear model with regimes, spatially lagged regressors and potential endogeneity
The first case that we consider is that of a linear model with regimes that includes spatial lag of the regressors (both exogenous and endogenous):
Compared to Equation (2), Equation (3) includes the spatial lags of the exogenous and endogenous variables whether or not they change by regimes, and the relative vectors of parameters. Interestingly, there is no limitation as for the specification of the spatial weighting matrix. This means that the spatial weighting matrix does not necessarely need to be block-diagonal, but it can have a structure where observations that are in different regimes are considered neighbours.Footnote 13
For models with endogenous variables, we use the specification reflected in form_sp_n based on the natreg data. The arguments model = "ols" and the regimes variables is set to ~ REGION.Footnote 14 The spatial weighting matrix is listw = w_6. The argument listw, as in other spatial packages, can be of class listw, or matrix, or Matrix (Bates et al. 2022).
The summary method prints a description reflecting the fact that the model contains endogenous variables. In the bottom part of the output, a list of the endogenous variables and the instruments is given. The function spregimes checks internally that the instruments are at least as many as the endogenous variables.
5.4 Spatial Lag (and Durbin) regimes model
In this section we include both the spatial lag and the spatial Durbin model (with or without additional endogenous variables). Regimes models that include the spatial lag of the dependent variable can be specified in two different ways depending on whether the spatial lag coefficient is allowed to vary by regimes. When the coefficient of the spatial lag is not allowed to change, the model can be written in the following way:
where \(\lambda \) is a scalar parameter. On the contrary, when the coefficient of the spatial lag is allowed to vary, the model can be written asFootnote 15:
5.4.1 No endogenous variables
In the example below, we are assuming that the spatial process is different by regimes (wy_rg = TRUE), and that the model = "lag".Footnote 16
Note that since the wy_rg is TRUE and there are two regimes, the function estimates two coefficients for the spatially lagged dependent variable (W_PRICE_0 and W_PRICE_1).
5.4.2 With endogenous variables
The model specification below does not allow for a varying \(\lambda \) (wy_rg = FALSE) but allows for heteroskedasticity in the error term (het = TRUE). When het is set to TRUE, a robust estimator of the variance-covariance is calculated.
The last row in the coefficients table reports the lag of the homicides rate. The other endogenous variable are UE90_2, UE90_0, and UE90_1. As a consequence, the external instruments are also different by regimes.
5.5 Spatial error regimes model
The spatial error regimes model is slightly different from the previous specification. In fact, the spatial error coefficient can be different by regime if and only if all the explanatory variables in the model vary by regimes, that isFootnote 17:
where
where \(\rho _1\) and \(\rho _2\) are the spatial error parameters for the first and the second regime, respectively. Alternatively, the hybrid model can include a spatial error process that does not vary by regimes, such as:
and
where the spatial error coefficient \(\rho \) is a scalar parameter.
The spatial error regimes model is obtained from the natreg data setting the argument model = "error". For the following example we use the formula formula = form_sp_n2, where all the exogenous and endogenous variables are different by regime. We also set weps_rg = TRUE, and we allow for heteroskedasticity by settting het = TRUE:
In this case we have the endogenous variable UE90 that varies by regimes, and the instruments matrix that includes all the exogenous variables in the model and the external instruments that are also different by regimes.
5.6 Spatial SARAR regimes model
The spatial SARAR regimes model is selected by the argument model = "sarar". One can then choose to set wy_rg and weps_rg in order to have four possible combinations. However, as for the error regimes model, weps_rg can be TRUE only if all variables are different by regimes. Since this model is just a combinations of the arguments presented in the previous sections, to save space we do not include output for the SARAR model.
6 Impacts
As noted by LeSage and Pace (2009), models that contain spatial lag of the dependent variable needs to be correctly interpreted. This means that appropriate impacts measures have to be calculated. LeSage and Pace (2009) suggested the computation of three average spillover impacts for spatial models including a spatial lag of the dependent variable. These spillover impacts defined on the j-th variable of a spatial lag model (without regimes) are:
and
where e is a vector of ones, \(I_n\) is a diagonal matrix whose diagonal elements are one, and tr indicates the trace operator.
For a variety of reasons, the impacts measures suggested by LeSage and Pace (2009) may not be straightforwardly extended to spatial regimes models. This is mainly, but not exclusively related to how the spatial weighting matrix can be defined in this context. Of course, a general treatment of impacts measures for spatial regimes models is outside of the scope of the present paper. However, in what follows, we show a few (simple) situations in which the impacts can be calculated using matrix algebra.
The first and easiest option relates to the case where all coefficients, including the spatial lag of the dependent variable, vary by regimes. Interestingly, this scenario corresponds to the estimation of two separate equations. In R this can be achieved using the function spreg from the sphet package (Bivand and Piras 2015; Bivand et al. 2021; Piras and Postiglione 2022). Basically, one can take advantage of the function subset that can be applied both to the data and the listw object wlis:
Since sphet has functions to calculate the effects, one can take advantage of the infrastructure already available in R. For reason of space, the code below illustrates how to obtain impacts and inference only for the first equation. Particularly, the inference for the impacts is obtained using an analytical formula derived in Kelejian and Piras (2020).
The summary method for the impacts reports the direct, indirect, and total impact for each variables, along with the standard error, a z-value and a p-value to determine the statistical significance of the impacts.
As an alternative, one can calculate the impacts manually taking advantage of sparse matrix representations from the package Matrix. In doing this we will use the model described by the formula = form which is written in such a way that all variables are different by regimes, and there are no spatially weighted regressors:
In spregimes we set the argument wy_rg to FALSE:
To calculate the impact, we follow a series of steps described in what follows. First of all, we create a sparse spatial weighting matrix with the function listw2dgCMatrix, we separate the betas from the spatial lag, and calculate the inverse term of \((I_n - \lambda W)\):
Since we have multiple betas, we use a simple loop to iterate over and obtain the three impacts.Footnote 18
An alternative way to produce inference would be to generate Monte Carlo samples for the effects from a multivariate normal distribution. For this purpose, we use the package mvtnorm (Genz et al. 2021; Genz and Bretz 2009). The function rmvnorm generates S samples from a multivariate normal with means equal to the vector of the estimated coefficients, and variance-covariance matrix as the estimated variance-covariance matrix of the model. We apply to the generated samples the function eff which calculates the three impacts. Finally, we compute the standard deviation of the S samples of impacts.
7 Conclusions
In this paper we described the main functionality of the newly developed package hspm. We started from a very simple regime model where space is accounted for by specifying a “spatial” regime variable, and then we presented various spatial specifications containing spatial lag of the dependent variable, spatial lag of the regressors, spatial lag of the error term, and any possible combinations of those lags. Finally, we showed how to manually calculate impacts when the spatial weighting matrix has a block-diagonal structure. Implicitly, we recommend not to use the impacts defined by LeSage and Pace (2009) when the structure of the spatial weighting matrix allows for interactions between observations that belong to different regimes.
However, this is only the beginning stage of the package. In the future, we would like to proceed mainly in two directions. On the one hand, we want to expand the package to include more ways to determine the regimes such as data-driven procedures or endogenous ways of determining the regimes. On the other hand, we intend to explore more the continuous approach where coefficients are allowed to vary smoothly over space.
Notes
There are various packages in R (R Development Core Team 2012), such as spatialreg (Bivand and Piras 2015; Bivand et al. 2021), sphet (Piras 2010; Piras and Postiglione 2022), spldv (Sarrias and Piras 2022) and splm (Millo and Piras 2012), among others, as well as in other software environment, such as the Spatial Econometrics toolbox in MATLAB (MATLAB 2011), the Python (van Rossum 1995) spatial analysis library PySAL (Anselin and Rey 2014), and Stata (StataCorp 2007).
The packages in R mostly deal with geographically spatial regression (GWR), such as, gwrr (Wheeler 2022) spgwr (Bivand and Yu 2022), mgwrsar (Geniaux and Martinetti 2017), GWmodel (Gollini et al. 2015). Furthermore, varycoef (Dambon et al. 2021a, b) and spBayes (Finley et al. 2007, 2015) provide implementation of spatially varying coefficients models which may be preferred to GWR models as having proper statistical foundation. In the PySAL library developed in Python there is code dealing with spatial regime models. In Sect. 1 we compare our implementation with the one in PySAL.
We include the data and the spatial weighting matrices in the package. The original data are available at https://geodacenter.github.io/data-and-lab//. Anselin and Rey (2014) use the same data in Chapters 12 and 13.
The variables RD90 and PS90 were created from other indicators in a principal component analysis.
Percentage of families below poverty and Gini index are based on 1989 figures.
The reorganization of the data is dealt internally by the function regimes that is introduced in Sect. 3.2.
The feasible generalized least squares corresponds to a weighted least squares estimator where the weights for each regime are calculated as the sum of the squared residuals divided by the corresponding degrees of freedom (see Anselin and Rey 2014, for further details.
This table was produced with the package xtable (Dahl et al. 2019).
When vc = "OGMM" a two step procedure is adopted. In the first step, the model is estimated by two stage least squares using the matrix of instruments which is made up of all the exogenous variables and the external instruments. In the second step, the optimal weighted GMM is obtained by using the residuals from the first step to estimate the weighting matrix for the moments conditions (see Anselin and Rey 2014, for additional details).
For implementation details in the context of cross-sectional models see also Bivand and Piras (2015). For additional information it is also possible to consult the details of the help function.
The specification of the spatial weighting matrix was one of the crucial aspects in the implementation of the package. It is quite reasonable that W is not block-diagonal when one is constructing the spatial lag for variables that are fixed. At the same time, a spatial weighting matrix that allows for interactions across regimes can be difficult to justify for variables that vary across regimes. However, also the opposite situation is true: forcing the spatial weighting matrix to a block-diagonal structure, while it seems a logical choice if the regressors are different by regimes, it would be a limitation for variables that are fixed by regimes. Because of this, in the end we decided to leave the decision to the users. Additionally, this was one of the reasons for not implementing the impacts for the regimes models. We delay this discussion to Sect. 6.
It might seems counterintuitive that the model argument is set to "ols" when there are endogenous variables in the model. However, this is consistent with other spatial libraries, such as sphet.
Also in this case no restriction on W is imposed (i.e., W is not necessarily block diagonal). For the spatial model in Eq. (5), this means that observations that equal zero in \(y_1\) (because they belong to a different regimes) may not necessarily be zero when one takes the spatial lag of the first column of \(\begin{bmatrix} y_1&{} 0 \\ 0 &{} y_2 \\ \end{bmatrix}\). Clearly, the same is also true if we consider \(y_2\) and the spatial lag of the second column of \(\begin{bmatrix} y_1&{} 0 \\ 0 &{} y_2 \\ \end{bmatrix}.\)
Note that we do not allow explicitly the argument model to be set to Durbin. This is due to the fact that hspm deals with spatially lagged regressors through the formula argument.
Note that the impacts are different from those in the previous case since now the spatial lag of price is unique.
See Table 11 and the discussion on page 22 in Bivand and Piras (2015).
For details on the arguments see the file replication.py available from the additional material associated with the paper.
This is because spreg.GM\(\_\) Lag\(\_\) Regimes transforms the spatial weighting matrix to a block-diagonal when all variables are different by regimes.
We investigated the issue and we found that, while the matrix of regressors is the same, the matrix of instruments is different. In particular, the spatial lag of the dependent variable is calculated using the queen contiguity matrix for both implementations. However, while spregimes calculates the lag of the regressors from the same matrix, spreg.GM\(\_\) Lag\(\_\) Regimes uses a block diagonal matrix.
References
Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht
Anselin L (2010) Thirty years of spatial econometrics. Pap Reg Sci 89(1):3–25
Anselin L, Rey SJ (2014) Modern spatial econometrics in practice: a guide to GeoDa, GeoDaSpace and PySal. GeoDa Press LLC, Chicago
Anselin L, Amaral P (2021) Endogenous spatial regimes
Aquaro M, Bailey N, Pesaran MH (2020) Estimation and inference for spatial models with heterogeneous coefficients: an application to us house prices. J Appl Econom 36:18–44
Arcaya M, Brewster M, Zigler CM, Subramanian S (2012) Area variations in health: a spatial multilevel modeling approach. Health Place 18(4):824–831. Infectious Insecurities
Arraiz I, Drukker DM, Kelejian HH, Prucha IR (2010) A spatial Cliff-Ord-type model with heteroskedastic innovations: small and large sample results. J Reg Sci 50(2):592–614
Baller R, Anselin L, Messner S, Deane G, Hawkins D (2001) Structural covariates of US county homicide rates: incorporating spatial effects. Criminology 39:561–590
Bates D, Maechler M, Jagan M (2022) Matrix: sparse and dense matrix classes and methods. R package version 1.5-1
Bivand RS, Piras G (2015) Comparing implementations of estimation methods for spatial econometrics. J Stat Softw 63(1):1–36
Bivand R, Millo G, Piras G (2021) A review of software for spatial econometrics in R. Mathematics 9(11):1276
Bivand R, Yu D (2022) spgwr: geographically weighted regression. R package version 0.6-35
Chen J, Shin Y, Zheng C (2022) Estimation and inference in heterogeneous spatial panels with a multifactor error structure. J Econom 229(1):55–79
Dahl DB, Scott D, Roosen C, Magnusson A, Swinton J (2019) xtable: export tables to LaTeX or HTML. R package version 1.8-4
Dambon JA, Sigrist F, Furrer R (2021) Joint variable selection of both fixed and random effects for Gaussian process-based spatially varying coefficient models. Int J Geogr Inf Sci 36:2525–2548
Dambon JA, Sigrist F, Furrer R (2021) Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction. Spat Stat 41:100470
de Marsily G, Delay F, Gonçalvès J, Renard P, Teles V, Violette S (2005) Dealing with spatial heterogeneity. Hydrogeol J 13:161–183
Deng Y, Qi W, Fu B, Wang K (2020) Geographical transformations of urban sprawl: exploring the spatial heterogeneity across cities in china 1992–2015. Cities 105:102415
Dubir RA (1992) Spatial autocorrelation and neighborhood quality. Reg Sci Urban Econ 22(3):433–452
Ertur C, Le Gallo J, Baumont C (2006) The European regional convergence process, 1980–1995: do spatial regimes and spatial dependence matter? Int Reg Sci Rev 29(1):3–34
Finley AO, Banerjee S, Carlin BP (2007) spBayes: an R package for univariate and multivariate hierarchical point-referenced spatial models. J Stat Softw 19(4):1–24
Finley AO, Banerjee S, Gelfand EA (2015) spBayes for large univariate and multivariate point-referenced spatio-temporal data models. J Stat Softw 63(13):1–28
Fotheringham AS, Brundson C, Charlton M (1998) Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis. Environ Plan A 30:1905–1927
Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression. Wiley, Chichester
Fox J, Weisberg S (2019) An R companion to applied regression, 3rd edn. Sage, Thousand Oaks
Geniaux G, Martinetti D (2017) A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Reg Sci Urban Econ 72:74–85
Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Lecture notes in statistics. Springer, Heidelberg
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2021) mvtnorm: multivariate normal and t distributions. R package version 1.1-3
Georganos S, Grippa T, Niang Gadiaga A, Linard C, Lennert M, Vanhuysse S, Mboga N, Wolff E, Kalogirou S (2021) Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int 36(2):121–136
Gollini I, Lu B, Charlton M, Brunsdon C, Harris P (2015) GWmodel: an r package for exploring spatial heterogeneity using geographically weighted models. J Stat Softw
He H, Schäfer B, Beck C (2022) Spatial heterogeneity of air pollution statistics. arXiv preprint arXiv:2203.04296
Huiban J-P, Détang-Dessendre C, Aubert F (2004) Urban versus rural firms: is there a spatial heterogeneity of labour demand? Environ Plan A 36(11):2033–2045
Irwin EG, Bockstael NE (2007) The evolution of urban sprawl: evidence of spatial heterogeneity and increasing land fragmentation. Proc Natl Acad Sci 104(52):20672–20677
Kelejian HH, Piras G (2020) Spillover effects in spatial models: generalizations and extensions. J Reg Sci 60(2):425–442
Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econ Rev 40:509–533
Kelejian HH, Prucha IR (2007) HAC estimation in a spatial framework. J Econom 140(1):131–154
Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econom 157:53–67
Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econom 157(1):53–67
LeSage JP, Chih Y-Y (2018) A Bayesian spatial panel model with heterogeneous coefficients. Reg Sci Urban Econ 72:58–73 (New Advances in Spatial Econometrics: Interactions Matter)
LeSage J, Pace RK (2009) Introduction to spatial econometrics. Chapman and Hall/CRC, Boca Raton
Lesnoff M, Lancelot R (2022) aods3: analysis of overdispersed data using S3 methods. aods3 package version 0.4-1.2
Longhi S, Nijkamp P (2005) Forecasting regional labour market developments under spatial heterogeneity and spatial autocorrelation. Technical report, Tinbergen Institute Discussion Paper
MATLAB (2011) version 7.13 (R2011b). The MathWorks Inc., Natick, Massachusetts
Melo PC, Graham DJ, Noland RB (2012) The effect of labour market spatial structure on commuting in England and Wales. J Econ Geogr 12(3):717–737
Messner S, Anselin L, Hawkins D, Deane G, Tolnay S, Baller R (2000) An Atlas of the spatial patterning of county-level homicide, 1960–1990. National Consortium on Violence Research (NCOVR), Pittsburgh
Millo G, Piras G (2012) splm: spatial panel data models in R. J Stat Softw 47(1):1–38
Piras G (2010) sphet: spatial models with heteroskedastic innovations in R. J Stat Softw 35(1):1–21
Piras G, Postiglione P (2022) A deeper look at impacts in spatial Durbin model with sphet. Geogr Anal 54(3):664–684
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
Ramajo J, Marquez MA, Hewings GJ, Salinas MM (2008) Spatial heterogeneity and interregional spillovers in the European Union: do cohesion policies encourage convergence across regions? Eur Econ Rev 52(3):551–567
Rey SJ, Anselin L (2007) PySAL: a python library of spatial analytical methods. Rev Reg Stud 37:5–27
Rey SJ, Anselin L (2010) Pysal: a python library of spatial analytical methods. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin, pp 175–193
Rey SJ, Janikas MV (2005) Regional convergence, inequality, and space. J Econ Geogr 5(2):155–176
Rey S, Anselin L, Amaral P, Arribas-Bel D, Cortes R, Gaboardi J, Kang W, Knaap E, Li Z, Lumnitz S, Oshan T, Shao H, Wolf L (2022) PySAL ecosystem: philosophy and implementation. Geogr Anal 54(3):467–487
Sarrias M, Piras G (2022) spldv: spatial models for limited dependent variables. R package version 0.1.1
Shu H, Pei T, Song C, Ma T, Du Y, Fan Z, Guo S (2019) Quantifying the spatial heterogeneity of points. Int J Geogr Inf Sci 33(7):1355–1376
Song Y, Wang J, Ge Y, Xu C (2020) An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data. GIScience Remote Sens 57(5):593–610
StataCorp (2007) Stata statistical software: release 10. StataCorp LP, College Station
Thomas LJ, Huang P, Yin F, Luo XI, Almquist ZW, Hipp JR, Butts CT (2020) Spatial heterogeneity can lead to substantial local variations in COVID-19 timing and severity. Proc Natl Acad Sci 117(39):24180–24187
van Rossum G (1995) Python Reference manual. CWI report
Vinatier F, Tixier P, Duyck P-F, Lescourret F (2011) Factors and mechanisms explaining spatial heterogeneity: a review of methods for insect populations. Methods Ecol Evol 2(1):11–22
Wheeler D (2022) gwrr: fits geographically weighted regression models with diagnostic tools. R package version 0.2-2
Zhai S, Feng Y, Yan X, Wei Y, Wang R, Li P (2021) Using spatial heterogeneity to strengthen the neighbourhood effects of urban growth simulation models. J Spat Sci 1–19
Funding
Open access funding provided by Università degli Studi G. D’Annunzio Chieti Pescara within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
1.1 Checks with other available implementations
In this appendix we compare the results from hspm with other available implementations. To the best of our knowledge, the only other implementation of spatial regimes models is available in the Python package spreg (Anselin and Rey 2014) which is part of the PySAL library. Additionally, we limit our attention to the spatial lag regimes model for a couple of reasons. First of all, the spatial lag model should be fully comparable since it is not based on optimization routines (like the case of the error and SARAR models) that can influence the results. Second, the implementation of hspm relies heavily on code available from the sphet package as it is the case for the spreg package in PySAL. Bivand and Piras (2015) compared implementations of spatial cross sectional models using, among others, results from sphet and spreg. They noticed that comparison for the error model where slightly different.Footnote 19 Based on this, we expect the same differences to appear in the regimes specification.
The function to estimate a spatial lag regime model in spreg is spreg.GM_Lag_Regimes.Footnote 20
We consider two cases: the first case corresponds to a situation where all coefficients (including the spatial parameter) are different by regimes; while the second corresponds to a situation where all coefficients are different by regimes but the spatial process is unique. We use the data in baltim, and we consider two spatial weighting matrices: the first was introduced in Sect. 2 and it is based on the queen contiguity criteria. The second matrix is simply a block diagonal version of the queen where observations are neighbors only if they belong to the same regime. The results from spreg are reported in Table 2. The first two columns of the table are based on the queen contiguity matrix, with spatial lag coefficient that varies by regimes (column (1)) or is fixed (column (2)). The second two columns are based on the block diagonal version of the matrix, with spatial lag coefficient that vary by regimes (column (3)) or is fixed (column (4)).
Looking at Table 2, it stands clear that columns (1) and (3) are the same. This means that, if the model is specified such that all variables differ by regimes, spreg.GM_Lag_Regimes “forces” the spatial weighting matrix to be block-diagonal even if the original spatial matrix is not. The same results can be obtained in R in two different ways. The first way is to consider two separate equations and use the function spreg from the sphet package. These two equations were estimated in Sect. 6. We set het = TRUE to obtain a robust estimator of the variance-covariance matrix in order to match the results for the standard errors in Table 2. Here we report only the summary of the two equations.
The second way is using the function spregimes after transforming W into a block-diagonal matrix.
The code below creates two objects of class listw, corresponding to the blocks of the spatial weighting matrix. l0 and l1 are then transformed in matrix and organized to form a block diagonal matrix.
The formula was introduced in Sect. 6, and it is written in such a way that all variables are different by regimes. In spregimes the data should be ordered according to the regime variable CITCOU and the argument wy_rg should be TRUE:
Note that if we do not change the spatial weighting matrix and keep wlis (i.e., the queen), we obtain results that are different from column (1) of Table 2.Footnote 21
Column (4) of Table 2 can be matched by estimating the following model:
In this model wy_rg = FALSE, and the spatial weighting matrix is block diagonal.
The final comparison concerns a model where wy_rg is FALSE, and the spatial weighting matrix is based on the queen criteria. This model was estimated in Sect. 6, and corresponds to column (2) of Table 2.
However, the results obtained from the function spregimes are (slightly) different from those in spreg.GM_Lag_Regimes.Footnote 22
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Piras, G., Sarrias, M. Heterogeneous spatial models in R: spatial regimes models. J Spat Econometrics 4, 4 (2023). https://doi.org/10.1007/s43071-023-00034-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43071-023-00034-1