1 Introduction

Spatial effects are generally divided into two different categories: spatial dependence and spatial heterogeneity (Anselin 1988). While cross-sectional dependence has to do with correlation between spatial units, spatial heterogeneity consists of instabilities over space that are generally reflected by variations across individual units (Anselin 2010).

In practice there are various ways of tackling unobserved heterogeneity, such as controlling for spatial heteroscedasticity (Kelejian and Prucha 2007), spatial regimes models (Anselin 1988; Anselin and Rey 2014), geographically weighted regressions (Fotheringham et al. 1998, 2002), and multilevel (or hierarchical) models (Arcaya et al. 2012), among others.Footnote 1

An interesting distinction between discrete and continuous spatial heterogeneity has been made by Anselin and Amaral (2021). From a discrete perspective, they argue that spatial regimes models are the most common way of dealing with spatial heterogeneity. In a nutshell, spatial regimes models are a class of models whose coefficients may vary across space. The term regimes indicates that the observations are grouped according to some criteria that relates to space. Interestingly, Anselin and Amaral (2021) point out that, even if the estimation of spatial regimes regressions is well established, the identification of the regimes still remains a subject for investigation. Additionally, they acknowledge the existence of three approaches to identify the regimes. The first approach is based on exogenous regimes (e.g., determined through administrative boundaries); the second is when the regimes result from a data-driven procedure (e.g., observation are aggregated using some clustering method); and the last one corresponds to a situation where the coefficients and the regimes are jointly determined.

From an empirical perspective, attempts to consider spatial heterogeneity in model specification have mostly, but not exclusively, focused on economic geography and regional sciences. This is verified by the special attention that local labor markets (Huiban et al. 2004; Longhi and Nijkamp 2005; Melo et al. 2012), and regional economic convergence (Rey and Janikas 2005; Ramajo et al. 2008; Ertur et al. 2006) have received over the years. However, spatial heterogeneity has gained an increasing interest also in other disciplines, such as quantitative geography (Song et al. 2020; Georganos et al. 2021; Shu et al. 2019), urban growth (Zhai et al. 2021), urban sprawl (Deng et al. 2020; Irwin and Bockstael 2007), geology (de Marsily et al. 2005), ecology and evolution (Vinatier et al. 2011), epidemiology (Thomas et al. 2020), physics and air pollution (He et al. 2022), among others.

From a software availability perspective, spatial models to control for spatial dependence are well established.Footnote 2 Code dealing with spatial heterogeneity is relatively sparse but also long-established.Footnote 3 In this scenario, hspm is an ambitious project that aims at developing and implementing various methodology to control for heterogeneity in spatial models. This article presents the methodological innovations that have been made so far dealing with spatial and (non spatial) regimes models. In particular, we present R functions that allow for the estimation of a general spatial regimes model, as well as all of the nested specifications deriving from it. The models are estimated by instrumental variables (IV) and generalized method of moments (GMM) techniques.

The rest of this paper is a mere description of the package functionality to get the readers to familiarize with the different functions contained in it. In particular, Sect. 2 introduces the two data sets that we use throughout the paper: the first is based on a housing price model in the city of Baltimore; the second contains county level data for homicides and selected socio-economic characteristics for the continental United States. The difference between these two data sets is that the second one suffers from endogeneity and requires instrumental variables methods implemented in hspm. In Sect. 3 we introduce the function regimes which is the basic function to deal with (non-spatial) regimes models. Section 4 is devoted to the illustration of the function ivregimes that allows for endogenous variables in a non-spatial context. The function spregimes is presented in Sect. 5. spregimes is a wrapper function that allows to estimate a regimes model with a spatial lag of the dependent variable, the spatial lag of (part of) the regressors, a spatially lagged error term and additional (other than the spatial lag) endogenous variables. As we will see, spregimes also allows to estimate all of the nested specifications included in this general model. In Sect. 6, we explain why hspm does not calculate the impacts measures put forth by LeSage and Pace (2009), and we show a simple way to deal with those impacts in a special case (i.e., when the spatial weighting matrix is block diagonal). Section 7 draws some conclusions and gives indications for future developments of the package. Finally, “Appendix A” compares our implementation with code available from PySAL library (Rey et al. 2022; Rey and Anselin 2007, 2010) developed in Python (van Rossum 1995).

2 Data sets

To illustrate the capabilities of hspm we make use of two data sets: baltimore and natreg.Footnote 4

2.1 Baltimore

The Baltimore data set on housing price (Dubir 1992) contains many standard factors to explain the price of a dwelling (PRICE): the number of rooms (NROOM), the number of bathrooms (NBATH), the age of the construction (AGE), the size of the lot (LOTSZ), the number of car space in a garage (GAR), and the square footage of the house (SQFT). Additional dummy variables are included to check whether the house has a patio (PATIO), a fireplace (FIREPL), and air conditioning (AC). The variable employed to identify the regimes is a binary equal to one if the dwelling is situated in Baltimore County and zero otherwise (CITCOU).

The following code loads the data and creates the spatial weighting matrix (of class listw) using a binary contiguity criterion.

figure a

2.2 natreg

The data in natreg contains information on homicides and selected socio-economic characteristics for the (continental) counties in the U.S., for four decennial census years, last of which is 1990 (Messner et al. 2000; Baller et al. 2001). Specifically, the dependent variable is the homicide rate in 1990 (HR90). Among the regressors we include median age (MA90), population structure (PS90), resource deprivation (RD90), and the, potentially endogenous, unemployment rate (UE90).Footnote 5 The instruments consist of three variables: percentage of female headed households (FH90), percentage of families below poverty (FP89), and the Gini index of family income inequality (GI89).Footnote 6 The regimes identifier is the variable REGIONS that divides the counties in three regions: south, west, and other (not south or west).

The following code loads the data and the spatial weighting matrix of class Matrix (Bates et al. 2022) based on the six nearest neighbors criteria:

figure b

3 The basic (non spatial) model and the function regimes

3.1 The basic (non spatial) model

For convenience and without loss of generality, we assume the presence of only two regimes (i.e., \(j=1,2\)). The basic (non spatial) model can be written in a general way as:

$$\begin{aligned} y = \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \varepsilon , \end{aligned}$$
(1)

where \(y = [y_1^\prime ,y_2^\prime ]^\prime \), and the \(n_1 \times 1\) vector \(y_1\) contains the observations on the dependent variable for the first regime, and the \(n_2 \times 1\) vector \(y_2\) (with \(n_1 + n_2 = n\)) contains the observations on the dependent variable for the second regime. The \(n_1 \times k\) matrix \(X_1\) and the \(n_2 \times k\) matrix \(X_2\) are blocks of a block diagonal matrix of regressors, the vectors of parameters \(\beta _1\) and \(\beta _2\) have dimensions \(k_1 \times 1\) and \(k_2 \times 1\), respectively, X is the \(n \times p\) matrix of regressors that do not vary by regime, \(\beta \) is a \(p\times 1\) vector of parameters, and \(\varepsilon = [\varepsilon _1^\prime ,\varepsilon _2^\prime ]^\prime \) is the n-dimensional vector of regression disturbances.Footnote 7 Even though this is not a “traditional” spatial model, spatial heterogeneity is taken into account by considering a regimes variable that is revealing some spatial aspects of the data. The model in Eq. (1) can be estimated by OLS after reorganizing the data according to Eq. (1).Footnote 8

3.2 The function regimes

The function regimes has four arguments: formula, data, rgv, and vc. The right hand side of the formula can be of different lengths. If the length is one, it is assumed that all coefficients are different by regimes. When the length of the formula is two, the variables in the first part are kept constant, while those in the second part are different by regimes.

The argument rgv is a formula that indicates the regimes variable. The are two options to estimate the variance-covariance matrix of the estimated coefficients: "groupwise" and "homoskedastic": If vc is set to "groupwise", the model is estimated according to a feasible generalized least squares procedure.Footnote 9

In the example below, all the regressors vary by regimes and the vc argument is set to "homoskedastic":

figure c

In the coefficients table printed by the summary method, the different regimes are indicated with numbers inherited from the regime variable.

Since this basic specification does not account explicitly for space, one can obtain the spatial LM tests for spatial dependence by estimating two separate equations, and then using the function lm.LMtests available from the package spdep:

figure d

Table 1 reports the results of the five tests implemented in lm.LMtests.Footnote 10 While none of the tests is statistically significant in the first equation, the equation for the second regime points at a spatial lag specification.

Table 1 LM tests results for the regime model with two equations using a spatial weight matrix based on the queen contiguity criterion

Interestingly, it is also possible to test various types of restrictions. As an example, we can consider restrictions on the coefficients for the same variable in different regimes. The code below shows how to implement those tests for the variable NBATH: \(H_0:\beta _{\texttt {NBATH\_0}} = \beta _{\texttt {NBATH\_1}}\). There are multiple ways to test linear hypothesis in R. We choose the implementation provided by the function linearHypothesis from the car package (Fox and Weisberg 2019).

figure e

The result shows that we can reject the null hypothesis that NBATH has the same effect on housing price regardless of whether the dwelling is in Baltimore County or another county.

One can also perform a Wald test for the joint significance of the coefficients using the function wald.test from the library aods3 (Lesnoff and Lancelot 2022). In our example below, the null hypothesis is

$$\begin{aligned} \begin{aligned} H_0:&\beta _{\texttt {AGE\_0}} = \beta _{\texttt {AGE\_1}}, \\&\beta _{\texttt {LOTSZ\_0}} = \beta _{\texttt {LOTSZ\_1}}, \\&\beta _{\texttt {SQFT\_0}} = \beta _{\texttt {SQFT\_1}}. \end{aligned} \end{aligned}$$
figure f

In the example below, we show that it is possible to identify regimes using a (clustering) data driven procedure. For an illustrative purpose, we use the (scaled) geographical coordinates of the dwellings to identify two regimes using the kmeans function. The results from this model are different from the previous one.

figure g

4 Endogenous variables and the function ivregimes

4.1 Endogenous variables

The basic (non spatial) model with endogenous variables can be written in a general way as:

$$\begin{aligned} y = \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} + Y\pi + \varepsilon , \end{aligned}$$
(2)

where the difference with Equation (1) is given by the presence of the \(n_1 \times q\) matrix \(Y_1,\) the \(n_2 \times q\) matrix \(Y_2\) and the \(n \times r\) matrix Y, with the corresponding vectors of parameters \(\pi _1, \pi _2\) and \(\pi \). Since those three matrices contain endogenous variables, the model is estimated using IV techniques.

4.2 The function ivregimes

The function ivregimes has four arguments: formula, data, rgv and vc. The right-hand side of the formula has four parts. The first part must contain all the regressors (exogenous and endogenous) that do not vary by regimes. The second part has all the regressors (exogenous and endogenous) that vary by regimes. The third part includes all the exogenous regressors and external instruments that do not vary by regimes. The fourth part has all the exogenous regressors and external instruments that vary by regimes. Let H be the matrix of instruments (exogenous regressors and additional instruments for the endogenous variables) for the endogenous variables. Then the formula for ivregimes has the following structure:

figure h

The following formula states that none of the regressors (exogenous and endogenous) is fixed (note the 0), and they all vary by regime. The instrument matrix is made up of the exogenous variables MA90, PS90, and RD90, and the external instruments FH90, FP89, and GI89. The function ivregimes checks internally that the instruments are at least as many as the endogenous variables.

figure i

The argument vc determines how the variance-covariance matrix should be estimated. Specifically, it takes on three values: "homoskedastic", "robust" and "OGMM".,Footnote 11

We use ivregimes to estimate the previous model, form_nse, and we set vc = "robust":

figure j

5 The spatial models and the function spregimes

5.1 The spatial model

A general spatial model is one that contains spatial lag of the dependent variable, spatial lag of the error term, and spatial lag of (some of) the regressors. This is combined with the fact that hspm allows for additional endogenous variables and regimes. For this reason, we decided to present each model separately. It is worth emphasizing again that our presentation of the function is not intended to guide users’ choice in terms of model specification, but rather to illustrate the arguments of the function. The general model is estimated following a series of steps that alternate IV with GMM techniques. These steps are an adaptation of the general cross-sectional model in Kelejian and Prucha (2010a) and Arraiz et al. (2010) to spatial regimes models.Footnote 12

5.2 The function spregimes

spregimes is used to estimate the general model as well as all of the nested specifications that derive from it. The function has eleven arguments. In this section we describe the formula, and we delay the discussion of the other arguments to the next sections. In spregimes, the right-hand side of formula must be specified with six parts. Specifically, the formula for spregimes has the following structure:

figure k

Since the specification of the formula is the trickiest part, we use three examples.

form_sp_b below is based on the Baltimore data. The variables AC, AGE and NROOM are the regressors that do not vary by regimes, while PATIO, FIREPL, and SQFT are those that vary. The third part is used to specify the spatially weighted regressors (in this case, AGE, NROOM and NBATH). It is important to stress that the spatial lag of one regressor varies only if the regressor itself vary. Vice-versa, if the regressor is fixed, also the lag would be so. For example, since AGE and NROOM vary by regimes also their lags vary. On the other hand, since NBATH is fixed, also the lag of NBATH will not vary. The next three parts of the formula serve to specify the fixed instruments (part four), the instruments that vary (part five), and the spatial lag of the external instruments (part six). Since there are no endogenous variables in Baltimore data, part four and part five of the formula are the same as part one and part two. The sixth part is set to 0 indicating that there are no external instruments to be lagged.

figure l

The second and third formulas are specified in terms of natreg data. The formula form_sp_n should be interpreted in the following way. The regressor MA90 is fixed. The intercept, PS90, RD90, and UE90 are the regressors that vary by regimes. The spatial lag of MA90 is also considered among the regressors. Since MA90 is fixed, also its spatial lag is fixed. Next, we have one instrument fixed (MA90), and five instruments that change by regimes, namely PS90, RD90, FH90, FP89, and GI89. None of the additional instruments is spatially lagged (the 0 in the last line below).

figure m

In form_sp_n2 all of the regressors vary by regime (since the first part of the formula is 0). The spatial lag of MA90 is also included. Since MA90 varies, so does its spatial lag. Furthermore, all of the instruments vary by regimes, and none of the external instruments is lagged.

figure n

5.3 Linear model with regimes, spatially lagged regressors and potential endogeneity

The first case that we consider is that of a linear model with regimes that includes spatial lag of the regressors (both exogenous and endogenous):

$$\begin{aligned} \begin{aligned} y =&\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} + Y\pi + \\&W\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \delta _1 \\ \delta _2 \\ \end{bmatrix}+ WX\delta + W \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \theta _1 \\ \theta _2 \\ \end{bmatrix} + WY\theta + \varepsilon . \end{aligned} \end{aligned}$$
(3)

Compared to Equation (2), Equation (3) includes the spatial lags of the exogenous and endogenous variables whether or not they change by regimes, and the relative vectors of parameters. Interestingly, there is no limitation as for the specification of the spatial weighting matrix. This means that the spatial weighting matrix does not necessarely need to be block-diagonal, but it can have a structure where observations that are in different regimes are considered neighbours.Footnote 13

For models with endogenous variables, we use the specification reflected in form_sp_n based on the natreg data. The arguments model = "ols" and the regimes variables is set to ~ REGION.Footnote 14 The spatial weighting matrix is listw = w_6. The argument listw, as in other spatial packages, can be of class listw, or matrix, or Matrix (Bates et al. 2022).

figure o

The summary method prints a description reflecting the fact that the model contains endogenous variables. In the bottom part of the output, a list of the endogenous variables and the instruments is given. The function spregimes checks internally that the instruments are at least as many as the endogenous variables.

5.4 Spatial Lag (and Durbin) regimes model

In this section we include both the spatial lag and the spatial Durbin model (with or without additional endogenous variables). Regimes models that include the spatial lag of the dependent variable can be specified in two different ways depending on whether the spatial lag coefficient is allowed to vary by regimes. When the coefficient of the spatial lag is not allowed to change, the model can be written in the following way:

$$\begin{aligned} \begin{aligned} y&= \lambda W y + \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} + Y\pi \\&\quad + W\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \delta _1 \\ \delta _2 \\ \end{bmatrix}+ WX\delta + W \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \theta _1 \\ \theta _2 \\ \end{bmatrix} + WY\theta + \varepsilon \end{aligned} \end{aligned}$$
(4)

where \(\lambda \) is a scalar parameter. On the contrary, when the coefficient of the spatial lag is allowed to vary, the model can be written asFootnote 15:

$$\begin{aligned} \begin{aligned} y&= W\begin{bmatrix} y_1&{} 0 \\ 0 &{} y_2 \\ \end{bmatrix} \begin{bmatrix} \lambda _1 \\ \lambda _2 \\ \end{bmatrix} + \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} + Y\pi \\&\quad + W\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \delta _1 \\ \delta _2 \\ \end{bmatrix}+ WX\delta + W \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \theta _1 \\ \theta _2 \\ \end{bmatrix} + WY\theta + \varepsilon \end{aligned} \end{aligned}$$
(5)

5.4.1 No endogenous variables

In the example below, we are assuming that the spatial process is different by regimes (wy_rg = TRUE), and that the model = "lag".Footnote 16

figure p

Note that since the wy_rg is TRUE and there are two regimes, the function estimates two coefficients for the spatially lagged dependent variable (W_PRICE_0 and W_PRICE_1).

5.4.2 With endogenous variables

The model specification below does not allow for a varying \(\lambda \) (wy_rg = FALSE) but allows for heteroskedasticity in the error term (het = TRUE). When het is set to TRUE, a robust estimator of the variance-covariance is calculated.

figure q

The last row in the coefficients table reports the lag of the homicides rate. The other endogenous variable are UE90_2, UE90_0, and UE90_1. As a consequence, the external instruments are also different by regimes.

5.5 Spatial error regimes model

The spatial error regimes model is slightly different from the previous specification. In fact, the spatial error coefficient can be different by regime if and only if all the explanatory variables in the model vary by regimes, that isFootnote 17:

$$\begin{aligned} \begin{aligned} y&= \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} \\&\quad + W\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \delta _1 \\ \delta _2 \\ \end{bmatrix}+ W \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \theta _1 \\ \theta _2 \\ \end{bmatrix} + \begin{bmatrix} \varepsilon _1 \\ \varepsilon _2 \\ \end{bmatrix}, \end{aligned} \end{aligned}$$
(6)

where

$$\begin{aligned} \begin{bmatrix} \varepsilon _1 \\ \varepsilon _2 \\ \end{bmatrix} =W \begin{bmatrix} \varepsilon _1&{}0 \\ 0&{}\varepsilon _2 \\ \end{bmatrix} \begin{bmatrix} \rho _1 \\ \rho _2 \\ \end{bmatrix} +u, \end{aligned}$$

where \(\rho _1\) and \(\rho _2\) are the spatial error parameters for the first and the second regime, respectively. Alternatively, the hybrid model can include a spatial error process that does not vary by regimes, such as:

$$\begin{aligned} \begin{aligned} y&= \begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \beta _1 \\ \beta _2 \\ \end{bmatrix} + X\beta + \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \pi _1 \\ \pi _2 \\ \end{bmatrix} + Y\pi \\&\quad + W\begin{bmatrix} X_1&{} 0 \\ 0 &{} X_2 \\ \end{bmatrix} \begin{bmatrix} \delta _1 \\ \delta _2 \\ \end{bmatrix}+ WX\delta + W \begin{bmatrix} Y_1&{} 0 \\ 0 &{} Y_2 \\ \end{bmatrix} \begin{bmatrix} \theta _1 \\ \theta _2 \\ \end{bmatrix} + WY\theta + \varepsilon , \end{aligned} \end{aligned}$$
(7)

and

$$\begin{aligned} \varepsilon = \rho W \varepsilon + u, \end{aligned}$$

where the spatial error coefficient \(\rho \) is a scalar parameter.

The spatial error regimes model is obtained from the natreg data setting the argument model = "error". For the following example we use the formula formula = form_sp_n2, where all the exogenous and endogenous variables are different by regime. We also set weps_rg = TRUE, and we allow for heteroskedasticity by settting het = TRUE:

figure r

In this case we have the endogenous variable UE90 that varies by regimes, and the instruments matrix that includes all the exogenous variables in the model and the external instruments that are also different by regimes.

5.6 Spatial SARAR regimes model

The spatial SARAR regimes model is selected by the argument model = "sarar". One can then choose to set wy_rg and weps_rg in order to have four possible combinations. However, as for the error regimes model, weps_rg can be TRUE only if all variables are different by regimes. Since this model is just a combinations of the arguments presented in the previous sections, to save space we do not include output for the SARAR model.

6 Impacts

As noted by LeSage and Pace (2009), models that contain spatial lag of the dependent variable needs to be correctly interpreted. This means that appropriate impacts measures have to be calculated. LeSage and Pace (2009) suggested the computation of three average spillover impacts for spatial models including a spatial lag of the dependent variable. These spillover impacts defined on the j-th variable of a spatial lag model (without regimes) are:

$$\begin{aligned} ATI_j= & {} \beta _j n^{-1}e^\prime (I_n - \lambda W)^{-1}e, \end{aligned}$$
(8)
$$\begin{aligned} ADI_j= & {} \beta _j n^{-1} tr[(I_n - \lambda W)^{-1}], \end{aligned}$$
(9)

and

$$\begin{aligned} AII_j = ATI_j - ADI_j, \end{aligned}$$
(10)

where e is a vector of ones, \(I_n\) is a diagonal matrix whose diagonal elements are one, and tr indicates the trace operator.

For a variety of reasons, the impacts measures suggested by LeSage and Pace (2009) may not be straightforwardly extended to spatial regimes models. This is mainly, but not exclusively related to how the spatial weighting matrix can be defined in this context. Of course, a general treatment of impacts measures for spatial regimes models is outside of the scope of the present paper. However, in what follows, we show a few (simple) situations in which the impacts can be calculated using matrix algebra.

The first and easiest option relates to the case where all coefficients, including the spatial lag of the dependent variable, vary by regimes. Interestingly, this scenario corresponds to the estimation of two separate equations. In R this can be achieved using the function spreg from the sphet package (Bivand and Piras 2015; Bivand et al. 2021; Piras and Postiglione 2022). Basically, one can take advantage of the function subset that can be applied both to the data and the listw object wlis:

figure s

Since sphet has functions to calculate the effects, one can take advantage of the infrastructure already available in R. For reason of space, the code below illustrates how to obtain impacts and inference only for the first equation. Particularly, the inference for the impacts is obtained using an analytical formula derived in Kelejian and Piras (2020).

figure t

The summary method for the impacts reports the direct, indirect, and total impact for each variables, along with the standard error, a z-value and a p-value to determine the statistical significance of the impacts.

As an alternative, one can calculate the impacts manually taking advantage of sparse matrix representations from the package Matrix. In doing this we will use the model described by the formula = form which is written in such a way that all variables are different by regimes, and there are no spatially weighted regressors:

figure u

In spregimes we set the argument wy_rg to FALSE:

figure v

To calculate the impact, we follow a series of steps described in what follows. First of all, we create a sparse spatial weighting matrix with the function listw2dgCMatrix, we separate the betas from the spatial lag, and calculate the inverse term of \((I_n - \lambda W)\):

figure w

Since we have multiple betas, we use a simple loop to iterate over and obtain the three impacts.Footnote 18

figure x

An alternative way to produce inference would be to generate Monte Carlo samples for the effects from a multivariate normal distribution. For this purpose, we use the package mvtnorm (Genz et al. 2021; Genz and Bretz 2009). The function rmvnorm generates S samples from a multivariate normal with means equal to the vector of the estimated coefficients, and variance-covariance matrix as the estimated variance-covariance matrix of the model. We apply to the generated samples the function eff which calculates the three impacts. Finally, we compute the standard deviation of the S samples of impacts.

figure y

7 Conclusions

In this paper we described the main functionality of the newly developed package hspm. We started from a very simple regime model where space is accounted for by specifying a “spatial” regime variable, and then we presented various spatial specifications containing spatial lag of the dependent variable, spatial lag of the regressors, spatial lag of the error term, and any possible combinations of those lags. Finally, we showed how to manually calculate impacts when the spatial weighting matrix has a block-diagonal structure. Implicitly, we recommend not to use the impacts defined by LeSage and Pace (2009) when the structure of the spatial weighting matrix allows for interactions between observations that belong to different regimes.

However, this is only the beginning stage of the package. In the future, we would like to proceed mainly in two directions. On the one hand, we want to expand the package to include more ways to determine the regimes such as data-driven procedures or endogenous ways of determining the regimes. On the other hand, we intend to explore more the continuous approach where coefficients are allowed to vary smoothly over space.