Handbook of Regional Science pp 1-19 | Cite as

# Heterogeneous Coefficient Spatial Regression Panel Models

## Abstract

Space-time panel data samples covering longer time spans are becoming increasingly prevalent, and some recent spatial econometrics research has proposed exploiting sample data along the time dimension to produce estimates for all spatial units or regions. The appeal of these models that have been labeled heterogeneous coefficient models should be clear, since each observation represents a spatial unit or region. Theoretical models that underpin econometric specifications often specify different utility or production functions for each economic agent, and urban and regional economic theories also focus on individual cities or regions. Typical panel regression models contain information on each observation over a number of different time periods in the case of a balanced panel model. Estimates from the typical model average over all observations and time periods, producing a coarse summary of relationships thought to derive from interaction between individual observations. In contrast, heterogeneous coefficient models produce separate estimates of the parameters of the model relationship for each observation.

## Keywords

Regional heterogeneity Mixture models Cross-sectional dependence Panel data Static panel## 1 Introduction

Aquaro et al. (2015) make the observation that space-time panel data samples covering longer time spans are becoming increasingly prevalent. They propose exploiting sample data along the time dimension to produce estimates for all spatial units or regions, and label this type of model a *heterogeneous coefficient spatial autoregressive model* (HSAR). For (balanced) spatial panel data sets, we can have observations that reflect a set of *N* regions for which information has been collected over *T* time periods. There is also the possibility that the *N* spatial observations reflect *point-level-observations*, for example, a household, firm, etc. located at the same point in space for each of the *T* time periods.

The appeal of these models in the case of *point-level observations* should be clear, since each observation represents a household or firm and theoretical models that underpin econometric specifications often specify different utility or production functions for each economic agent, (e.g., household or firm). It is also the case that urban and regional economic theories often focus on individual cities or regions. Typical (balanced) panel regression models that we can label *homogeneous coefficient models* contain information on each observation over a number of different time periods and produce estimates that average over all observations and time periods. This of course produces a coarse summary of relationships thought to derive from interaction between individual (point-level or regional) observations. In contrast, heterogeneous coefficient models produce separate estimates of the parameters of the model relationship for each of the *N* observations, e.g., household, firm or region.

## 2 Independent Heterogeneous Coefficient Models

*N*observational units that are

*independent,*the heterogeneous coefficient approach to estimation of separate parameters for each unit reduces to a somewhat trivial case of applying ordinary least-squares regression to the time-series observations on each of the

*N*units to produce a set of

*N*different parameter estimates. Such a situation is shown in (1), where we would need to have observations covering an ample number of time periods to produce good quality estimates of the parameters.

*y*

_{it}is the observation in the

*i*th cross section unit at time

*t, α*

_{i}and \( {\beta}_i^k \) denote coefficients for the

*i*th cross section unit. The set of

*K*explanatory variables \( {x}_{it}^k \) would need to be exogenous, and the covariance matrices \( E\left({x}_{it}^k{x}_{jt}^{k^{\prime }}\right),\forall i,j,k \) would have to be time-invariant and finite as well as nonsingular. This requirement of time-invariance arises because we are using the time dimension of the sample data to estimate parameters of the (linear) relationship for each regional unit,

*i =*1, …

*, N*, so the relationship would need to be unchanging over time. In addition, if we assume independence in the model disturbances across both time and cross-sectional units, the variance-covariance of the set of

*NT*disturbances will take the form: \( E\left({\varepsilon}_{it}{\varepsilon}_{jt}\right)\sim \mathcal{N}\left(0,\Omega \right),\forall i,j,k,t, \) where Ω =

*I*

_{T}⊗ Σ is an

*NT*×

*NT*diagonal matrix with Σ an

*N*×

*N*diagonal matrix with elements \( {\sigma}_1^2,{\sigma}_2^2,\dots, {\sigma}_N^2 \) that do not change over time.

Given a set of observations that are independent over both the *N-* and *T*-dimensions of the sample data, estimates for \( {\hat{\beta}}_i \) would take the least-squares form: \( {\hat{\beta}}_i={\left({X}_i^{\prime }{X}_i\right)}^{-1}\left({X}_i^{\prime }{y}_i\right) \) where *X*_{i} = (*x*_{i1}, *x*_{i2}, …*x*_{iT})^{′} is the *T* × *K* matrix of regressors on the *i*th regional unit, with *x*_{it} = (*x*_{i1, t}, *x*_{i2, t}, … , *x*_{iK, t})^{′}. Similarly, *y*_{i} = (*y*_{i1}, *y*_{i2}, … , *y*_{iT})^{′} a *T* × 1 vector of observations covering the time dimension of the dependent variable. An analogous expression can be used to produce estimates for the parameters \( {\hat{\sigma}}_i^2 \) based on the sum-of-squared residuals from each of the *N* regressions.

If we assume that Σ is not diagonal, but exhibits non-zero covariance across the *N* regional observations, we have a *seemingly unrelated regression* (SUR) specification, where residuals can be used to produce an estimate \( \hat{\Sigma} \), which leads to estimates for \( {\hat{\beta}}_i \) that take the SUR form: \( {\hat{\beta}}_i={\left({X}_i^{\prime }{\hat{\Omega}}^{-1}{X}_i\right)}^{-1}\left({X}_i^{\prime }{\hat{\Omega}}^{-1}{y}_i\right) \) where \( {\hat{\Omega}}^{-1}={I}_T\otimes {\hat{\Sigma}}^{-1} \).

This approach relies only on observations from each region to produce estimates, but Brunsdon et al. (1996) proposed using observations for each region plus those from nearby regions to produce estimates for all *N* regions in a cross-sectional setting. They label their approach *Geographically Weighted Regression* (GWR) which involves introducing a diagonal matrix *M*_{i} that selects regions that are nearby to region *i,* based on alternative distance metrics. For example, \( {M}_i=\sqrt{\exp \left(-{d}_i/\theta \right)} \) where *d*_{i} is a vector of distances from region *i* to all other regions, and *θ* is a distance decay tuning parameter fixed for all observations that is estimated using cross-validation methods. This leads to the model: *M*_{i}*y* = *M*_{i}*Xβ*_{i} + *M*_{i}*ε* with corresponding estimates: \( {\hat{\beta}}_i={\left({X}^{\prime }{M}_i^2X\right)}^{-1}{X}^{\prime }{M}_i^2y \). Of course, this approach could be extended to a panel data setting by introducing both geographical measures of closeness between regions and temporal measures of closeness in time, something that has been label geographically and temporally weighted regression (GTWR) by Crespo et al. (2007).

The GWR and GTWR approaches re-use sample data from nearby regions when estimating *β*_{i} and *β*_{j}, where regions *i* and *j* are nearby. The extent of overlapping data used to produce the *N* estimates *β*_{i}, *i* = 1, … , *N* depends on the distance metric used and the estimated distance decay tuning parameter. Reliance on small distances when defining the diagonal matrix *M*_{i} produces highly variable estimates *β*_{i} with a small amount of data overlap, while use of larger distances to form *M*_{i} leads to more stable estimates *β*_{i} involving more sample data overlap. Sample data overlap means that inference regarding estimates *β*_{i} cannot be carried out using methods from conventional regression models.

## 3 Spatial Autoregressive Processes and Dependence

In cases where there is dependence between the regional (or point-level) observational units, the model becomes more useful. Theoretical models that underpin econometrics specifications often specify different level of utility, production functions and spillover impacts for each economic agent. Spatial dependence is one type of dependence across the regional observations where observations located nearby in space exhibit dependence. In this type of situation, the dependence parameters from one region will depend on those from nearby regions and (potentially) all other regions. Spatial dependence is a specific type of dependence, with peer group dependence between students in an educational or social environment, or network dependence being other examples, where dependence is on peers or nearby nodes in the network. Dependence other than spatial could be generally labeled as *cross-sectional dependence*.

*w*

_{ij}represents the

*i*,

*j*th element of a spatial weight matrix with

*w*

_{ii}= 0, and non-zero (

*i*,

*j*) elements indicating that observations

*i*and

*j*are dependent. More generally, a non-zero (

*i*,

*j*) element (

*j*≠

*i*) of the weight matrix could reflect: a peer to student

*i*in a educational setting; a nearest node to node

*i*in a network setting, etc.

*ψ*

_{i}is a scalar spatial dependence parameter for the

*i*th cross section unit. In addition, \( {\sum}_{j=1}^N{w}_{ij}{y}_{jt} \) reflects spatial lags of the dependent variable for

*i*th cross section unit at time

*t*.

The same assumptions regarding time-invariance of the relationship are made, and regarding region-specific variance scalars \( {\sigma}_i^2 \) as discussed surrounding (1).

A related approach is that of Cornwall and Parent (2017) who develop an approach they label a *spatial autoregressive mixture model*, that allows for spatial heterogeneous coefficient estimates *α*_{j}, *ψ*_{j}, *β*_{j}, *σ*_{j} associated with *j* = 1, … , *G* where *G* is the number of distinct groups of observations within the sample data. Estimation of their model requires a statistical approach for assignment of each observation to one of the *G* groups. They do this based on the joint distribution of a set of latent indicators *z*_{1}, … *z*_{G} for group membership of each observation and the distribution of *y* associated with the data generating process (DGP) for the model in (3) delineated into *G* groups. The latent indicators *z* are treated as model parameters and estimated simultaneously with the other model parameters. The estimated *z*_{1}, … *z*_{G} parameters basically determine a joint distribution that “best fits” the distribution of *y* associated with the DGP of the *spatial autoregressive mixture model.* We note that the HSAR specification reflects a generalization of mixture model approach of Cornwall and Parent (2017) that extends the number of groups *G* to equal *N,* the number of regions in the sample. That is, each region is treated as a separate group.

## 4 Details Regarding the Heterogeneous Coefficient Spatial Autoregressive Model

*heterogeneous coefficient spatial Durbin model*(HSDM) that includes spatial lags of the explanatory variables is shown in (5) (see Aquaro et al. 2015). The matrix expression of the models stacks the

*N*regional units for each time period

*t,*with

*y*

_{t}= (

*y*

_{1t},

*y*

_{2t}, … ,

*y*

_{Nt})

^{′},

*α*= (

*α*

_{1},

*α*

_{2}, … ,

*α*

_{N})

^{′}, Ψ = diag (

*ψ*),

*ψ*= (

*ψ*

_{1},

*ψ*

_{2}, … ,

*ψ*

_{N})

^{′},

*W*=

*w*

_{ij},

*i*,

*j*= 1, … ,

*N*, \( {B}^k=\operatorname{diag}\left({\beta}_1^k,{\beta}_2^k,\dots, {\beta}_N^k\right) \), \( {x}_t^k={\left({x}_{1t}^k,{x}_{2t}^k\dots, {x}_{Nt}^k\right)}^{\prime } \), \( {P}^k=\operatorname{diag}\left({\phi}_1^k,{\phi}_2^k,\dots, {\phi}_N^k\right) \),

*ε*

_{t}= (

*ε*

_{1t},

*ε*

_{2t}, … ,

*ε*

_{Nt})

^{′}, \( {\sigma}^2={\left({\sigma}_1^2,{\sigma}_2^2,\dots, {\sigma}_N^2\right)}^{\prime }. \)

In *homogeneous panel models,* the parameters *β*_{0} describing the relationship between *N* regions over *T* time periods in the *NT* × 1 vector *y* and the *NT* × *K* matrix of regional characteristics *X*, are assumed the same (homogeneous) for all regions and time periods in the sample. The conventional *homogeneous SAR panel model* can be written in matrix-vector form as: *y* = *ρ*(*I*_{T} ⊗ *W*)*y* + (*ι*_{T} ⊗ *ι*_{N})*α*_{0} + *Xβ*_{0} + *ε* where *β*_{0} is a *K* × 1 vector of parameters, *ρ* a scalar parameter reflecting the strength of spatial dependence, *α* a scalar intercept parameter, *ι*_{M} an *M* × 1 vector of ones, and *ε* an *NT* × 1 vector of disturbances. Of course, we can allow for region-specific and time-specific fixed effects in an attempt to ameliorate the fact that we rely on homogeneous coefficients in the model. Introducing these allows for region- and time-specific differences in the model intercept *α.* In spatial autoregressive models, the scalar parameter *ψ* represents the level of spatial interaction between observed outcomes averaged over all time and space observations in the *NT* vector *y* and neighboring region outcomes in the *NT* spatial lag vector (*I*_{T} ⊗ *W*)*y.* In the conventional panel model, this parameter is assumed the same for all regions and time periods.

*x*

^{k}with an augmented set containing both (

*x*

^{k},

*Wx*

^{k}), so we can proceed by considering only the HSAR specification.

*α*

_{i}is included in

*β*

_{i}for simplicity.)

*θ*denotes a vector of model parameters \( \left({\psi}_i,{\beta}_i,{\sigma}_i^2,i=1,\dots, N\right) \) and ln |

*I*

_{N}− Ψ

*W*| represents the log of the determinant of the

*N*×

*N*matrix

*I*

_{N}− Ψ

*W.*In addition, \( {\Sigma}_t=\operatorname{diag}\left({\sigma}_1^2,{\sigma}_2^2,\dots, {\sigma}_N^2\right). \) Aquaro et al. (2015) point out an alternative expression for the log likelihood function in (8) that will be useful here.

*X*

_{i}= (

*x*

_{i1},

*x*

_{i2}, …

*x*

_{iT})

^{′}is the

*T*×

*K*matrix of regressors on the

*i*th regional unit, with

*x*

_{it}= (

*x*

_{i1,t},

*x*

_{i2,t}, … ,

*x*

_{iK,t})

^{′}. Similarly,

*y*

_{i}= (

*y*

_{i1},

*y*

_{i2}, …

*y*

_{iT})

^{′}a

*T*× 1 vector of observations covering the time dimension of the dependent variable. The \( {y}_i^{\ast }={\left({y}_{i1}^{\ast },{y}_{i2}^{\ast },\dots {y}_{iT}^{\ast}\right)}^{\prime } \), with the elements \( {y}_{it}^{\ast }={\sum}_{j=1}^N{w}_{ij}{y}_{jt} \) reflecting spatial lags of the dependent variable for region

*i*at time

*t.*

One point regarding the log likelihood function in (8) is that the presence of *simultaneous spatial dependence* between observations results in a non-identity for the Jacobian matrix that transforms the model disturbance to the dependent variable, requiring the log determinant term ln |*I*_{N} − Ψ*W*| in the (log) likelihood function. Another point is that the (quasi) likelihood function needs to be maximized with respect to a large number of parameters, specifically, *N* different parameters *β*_{i}, *ψ*_{i}, *σ*_{i}. Aquaro et al. (2015) describe procedures for producing quasi-maximum likelihood (QML) estimates.

These authors set forth assumptions required for identification of the HSAR model, and prove consistent asymptotic normality for the QML model estimates. The assumptions made are relatively straightforward extensions of those invoked in the spatial econometrics literature to the case of dependence parameters and coefficients for each region, *i* = 1, … *N*. The disturbances are assumed distributed independently, and they show that QML estimates are robust with respect to two error generating processes, a Gaussian *ε*_{it} ∼ *IIDN*(0, *σ*_{i0}), as well as a non-Gaussian IID chi-square variate where: *ε*_{it}/*σ*_{i0} ∼ *IID*(*χ*^{2}(2) − 2)/2, with *σ*_{i0} generated as independent draws from *χ*^{2}(2)/4 + 0.5, for *i* = 1, … , *N.* This requires use of a variance matrix estimate based on the sandwich formula (which they provide) for their QML procedure.

## 5 Markov Chain Monte Carlo Estimation of the Heterogenous Coefficients Spatial Autoregressive Panel Model

LeSage and Chih (2018a) provide a Markov Chain Monte Carlo (MCMC) estimation approach that produces Bayesian estimates. Since the Bayesian estimates are likelihood-based, they share the same consistent asymptotic properties as the QML estimates in cases where prior distributions are centered on the true parameters, or where prior variances for the prior distributions approach infinity leading to relatively uninformative prior distributions.

They argue that an advantage of MCMC estimation may be computational speed. Typical of MCMC estimation, a complicated problem involving *N* × *K* parameters *β*_{i}, and *N* × 1 parameter vectors for \( {\alpha}_i,{\psi}_i,{\sigma}_i^2, \) is decomposed into a sequence of problems involving conditional distributions which are typically simple. MCMC estimation proceeds by sequentially sampling from the complete sequence of conditional distributions of parameters \( {\beta}_i,{\psi}_i,{\sigma}_i^2, \) for each unit *i* = 1, … *N*, conditional on other units *j* ≠ *i* parameters. For example,\( \left({\beta}_i\left|{\beta}_j,{\psi}_i,{\psi}_j,{\sigma}_i^2,{\sigma}_j^2\right.\right), \) \( \left({\psi}_i\left|{\psi}_j,{\beta}_i,{\beta}_j,{\sigma}_i^2,{\sigma}_j^2\right.\right), \) \( \left({\sigma}_i^2\left|{\sigma}_j^2,\right.{\beta}_i,{\beta}_j,{\psi}_i,{\psi}_j\right). \) Specifically, the problem is decomposed into deriving the conditional distributions for each group of parameters: (1) the *N* different *K*-vectors of parameters *β*_{i}, (2) the *N* different scalar parameters *ψ*_{i} and (3) the scalar noise variances \( {\sigma}_i^2 \).

MCMC estimation proceeds by sampling sequentially from the conditional distributions of each set of parameters. A single pass through the sampler involves evaluating only *3N* different conditional distributions. Each of these conditional distributions is relatively simple to sample from, and involves calculations based on matrices or vectors of small dimensions. In contrast, QML estimation requires likelihood optimization over *N* × (*K* + 2) parameters, which include *NK* parameters *β*_{i}, *N* parameters *ψ*_{i}, and *N* parameters \( {\sigma}_i^2. \) LeSage and Chih (2018a) present results from a Monte Carlo study carried out by Aquaro et al. (2015) to show that equivalent estimation results can be achieved in the case of uninformative prior distributions assigned to the parameters.

### 5.1 Bayesian Prior Information

LeSage and Chih (2018a) also argue for advantages associated with the ability of the Bayesian HSAR specification to incorporate prior information regarding the parameters. They suggest alternative approaches to specifying prior distributions, for example, using empirical estimates from a *homogenous coefficient* static panel SAR specification with fixed effects of the type set forth in Elhorst (2014) as empirical Bayes priors. They note that tight imposition of the *homogenous coefficient* estimates as a prior mean for the parameters would result in estimates that do not vary greatly across the *N* regions, whereas looser imposition of these prior mean values would allow for more variability of the estimates across regions, or more parameter heterogeneity.

These authors also emphasize that a second advantage of prior information would be in overcoming problems arising from near linear combinations (collinear relationships) between explanatory variables. This might arise when variables for certain regions exhibit little variation over the time dimension of the sample. In addition, they point out that some explanatory variables may be constant over all time periods, leading to a non-invertible explanatory variables matrix. Prior information would overcome this problematical situation by augmenting small eigenvalues of the explanatory variables matrix, resulting in an invertible matrix.

Autant-Bernard and LeSage (2019) provide an illustration of the use of prior information in an application of the HSAR specification to a *knowledge production function* for 94 NUTS3 regions in France. They introduce a ridge-type prior providing two motivations for use of this type of prior distribution assigned to the parameters *β*_{i}, *i* = 1, … , *N* of the knowledge production function. One is that knowledge production function inputs (explanatory variables) in the HSAR model exhibit collinear relationships because they are highly correlated. Belsley et al. (1980) describe how ridge regression can overcome collinearity between explanatory variables in regression models. Autant-Bernard and LeSage (2019) point out that collinearity between production function inputs is a widely occurring phenomena, which can be exacerbated when variables for certain regions exhibit little variation over the time dimension of the sample. Ridge-type prior information overcomes this problematical situation by augmenting small eigenvalues of the explanatory variables matrix which stabilizes estimates and increases precision of estimates. A second motivation given for the ridge-type prior is that innovation is primarily an urban phenomena with rural regions not producing a great deal of innovation. Since the HSAR model produces estimates for all regions, use of the ridge-type prior will shrink small coefficients from rural regions toward the prior mean values of zero assigned to the parameters *β*_{i}, which de-emphasizes rural regions. They argue that this will result in relatively more emphasis on larger urban region coefficients.

### 5.2 A Matrix Exponential Spatial Specification

LeSage and Chih (2018b) extend the HSAR to the case of a heterogeneous coefficients *matrix exponential spatial specification* (HMESS). The HSAR specification relies on a spatial autoregressive process which applies geometric decay of influence to higher-order neighboring regions, whereas the HMESS relies on a matrix exponential function to apply exponential decay to higher-order neighbors. To see the geometric decay, consider the matrix inverse: (*I*_{N} − Ψ*W*)^{−1} from the data generating process in (6), which can be written as an infinite series: *I*_{N} + Ψ*W* + Ψ^{2}*W*^{2} + Ψ^{3}*W*^{3} + … . The matrix *W* reflects first-order spatial neighbors, *W*^{2} second-order neighbors (neighbors to the first-order neighbors), *W*^{3} neighbors to the neighbors of the neighbors (third-order neighbors), and so on. Higher-order neighbors are assigned influence that declines geometrically since the dependence parameters *ψ*_{i}, *i* = 1, … , *N* on the diagonal of the matrix Ψ are constrained to have values less than one. LeSage and Chih (2018b) extend the cross-sectional MESS to the case of a heterogeneous coefficients model, and set forth a Bayesian Markov Chain Monte Carlo estimation scheme.

*I*

_{N}− Ψ

*W*) with a matrix exponential

*e*

^{ΓW}, where Γ is a diagonal matrix containing the spatial dependence parameters

*γ*

_{1},

*γ*

_{2}, … ,

*γ*

_{N}for each region. For comparison, the HSAR and HMESS specifications are shown in (9) and (10), where \( {y}_t,{x}_t^k \) are

*N*× 1 vectors containing observations on

*N*regions at time period

*t.*

Equations (9) and (10) highlight the fact that the HSAR and HMESS specifications can be viewed as applying different types of *spatial smoothing* to the dependent variable vectors for time periods *t* = 1, … , *T*. In (9), an autoregressive spatial smoothing scheme is used, whereas (10) applies a matrix exponential spatial smoothing approach. Specifically, note that: \( {e}^{\Gamma W}={\sum}_{i=0}^{\infty }{\left(\Gamma W\right)}^i/i!,{\left({e}^{\Gamma W}\right)}^{-1}={e}^{-\Gamma W}, \) where (Γ*W*)^{0} = *I*_{N}, so the matrix exponential imposes a pattern of exponential decay of influence on higher- order neighbors.

LeSage and Pace (2007) point to some potential computational advantages of the MESS specification over the SAR specification in a cross-sectional setting. Debarsy, Jin and Lee (2015) study the large sample properties of the matrix exponential spatial specification (MESS), showing that the quasi-maximum likelihood estimator (QMLE) for a MESS specification is consistent under heteroskedasticity, a property not shared by the QMLE of the SAR model. In addition, the spatial dependence parameter in the MESS specification ranges from minus to plus infinity, which allows for use of normal priors assigned to this parameter in a Bayesian setting.

## 6 Interpreting Model Estimates

Aquaro et al. (2015) do not consider how to interpret estimates from the HSAR model. Both the HSAR and HMESS represent models that result in non-linear impacts arising from changes in the explanatory variables on outcomes in the dependent variables. It is a hallmark of spatial regression models that changes in values of the explanatory variables in a single region *i* will impact outcomes in region *i* as well as other regions *j*, a phenomenon that has been labeled *spatial spillovers*.

*direct and indirect effects.*In the case of

*homogeneous static panel models,*LeSage and Pace (2019) propose an average of the main diagonal elements of the

*N*×

*N*matrix of partial derivatives for this model shown in (13) as a

*scalar summary measure*of own-partial derivatives that LeSage and Pace (2019) label

*direct effects.*They also propose a scalar summary measure of the

*indirect effects*(spatial spillover) impacts based on the cumulative sum of the off-diagonal elements from each row, averaged over all rows. A scalar summary measure of the total impact of a change in regional outcomes arising from changes in the

*kth*regional characteristic is the sum of the scalar direct plus indirect effects estimates. In the case of the homogeneous coefficient models where all

*ψ*

_{i}=

*ψ*and all \( {\beta}_i^k={\beta}^k \), this approach holds intuitive appeal.

Elhorst (2014) notes that in the case of static spatial panel models, expression (13) arises from recognizing that the coefficients *ψ*, *β* do not change over the time periods of the panel. Main diagonal elements of the matrix (*I*_{N} − *ψW*)^{−1}*I*_{N}*β*^{k} reflect own-partial derivatives and off-diagonal elements represent cross-partial derivatives. Scalar summary measure of own- and cross-partial derivatives simplify the task of interpreting estimates from the homogeneous coefficient model, which take the form of an *N* × *N* matrix for each of the *K* explanatory variables.

LeSage and Chih (2016) extend this approach to the case of the HSAR specification, where the matrix partial derivatives are shown in (14), which is an *N* × *N* matrix. This arises because a change in a single region’s *kth* characteristic could impact own-region outcomes, plus (potentially) outcomes of all other regions, with the strength of these other-region impacts dependent on the levels of spatial dependence for all regions Ψ = *diag* (*ψ*_{1}, … , *ψ*_{N}), and magnitude of \( {B}^k=\mathit{\operatorname{diag}}\left({\beta}_1^k,\dots, {\beta}_N^k\right) \). This results in an *N* × 1 vector of impacts (in the columns of (14)) from changing each region’s *k*th characteristic. They argue that since we typically interpret regression models by considering changes in each of the *N* observations’ *k*th characteristic, this would lead to a series of *N* different *N* × 1 vectors of impacts, which make up the *N* × *N* matrix in (14).

As in the case of the *homogeneous coefficient* model, the main diagonal elements of the matrix represent own-partial derivatives \( \left(\partial {y}_i/\partial {x}_i^k\right), \) showing how a change in the *k*th explanatory would *directly* impact each region’s *y*_{i} value, while the off-diagonal elements are cross-partial derivatives \( \left(\partial {y}_j/\partial {x}_i^k,\partial {y}_i/\partial {x}_j^k\right) \) showing impacts on other region’s outcomes.

For the case of the heterogeneous coefficient panel model where scalar summary measures are not consistent with the notion of parameter heterogeneity, LeSage and Chih (2016) propose use of the *N* diagonal elements of the matrix in (14) to produce *observation-level* direct effects estimates for each of the *N* regions. As estimates of region-specific (observation-level) indirect spill-in and spill-out effects, they propose use of the cumulative sum of off-diagonal elements in each row and column of (14), which are of course consistent with the partial derivatives.

Off-diagonal elements in the *first row*:\( \partial {y}_1/\partial {x}_2^k,\partial {y}_1/\partial {x}_3^k,\dots, \partial {y}_1/\partial {x}_N^k, \) show how changes in the value of the *k*th explanatory variable in regions neighboring region #1 impact outcomes in region #1, (*y*_{1}). LeSage and Chih (2018b) label these *spatial spill-in* effects, since region #1 is experiencing impacts arising from changes in characteristics of neighboring regions. An example might be an increase in cigarette sales in Kentucky (say *y*_{1}) due to (relative) increases in cigarette taxes of neighboring states (*x*_{j}, *j* ≠ 1).

The sales increase is presumably due to consumers from neighboring states crossing the border to shop for cigarettes in Kentucky.

Off-diagonal elements in the *first column*:\( \partial {y}_2/\partial {x}_1^k,\partial {y}_3/\partial {x}_1^k,\dots, \partial {y}_N/\partial {x}_1^k, \) show how changes in the value of the *k*th explanatory variable in region #1 impacts outcomes in neighboring regions (*y*_{i}, *i* = 2, … , *N*). These show *spill-out* impacts on regions neighboring region #1 arising from a change in region #f characteristics. An example might be an increase in cigarette sales *y*_{2}, … , *y*_{N} in states that surround New York (say region #1) because of (relative) increases in cigarette taxes \( \left({x}_1^k\right) \) in New York. The sales increases in neighboring states are presumably due to New York consumers crossing the border to shop for cigarettes in neighboring states.

When cumulating off-diagonal elements of the rows and columns to determine spill-in and spill-out effects estimates, LeSage and Chih (2016) point out that conventional row-stochastic *W-*matrices might be problematical. The importance of the matrix *W* can be seen in (14), whose non-zero elements along with the heterogeneous coefficients *ψ*_{i}, *β*_{i} determine the spill-in and spill-out effects. They propose use of a *doubly stochastic W-*matrix when estimating HSAR and HMESS models. A *doubly-stochastic* spatial weight matrix is one whose row and columns sums are unity, which they argue would treat spill-in and spill-out effects symmetrically. By this they mean that spill-in and spill-out effects emphasize the heterogeneity in the coefficient estimates *ψ*_{i}, *β*_{i} not differences in elements of the matrix *W,* which seems appropriate given the emphasis of the model on coefficient heterogeneity across the regions.

One advantage of MCMC estimation is that posterior draws from sampling the complete sequence of conditional distributions for the *N* × *K* parameters *β*_{i}, *i* = 1, … , *N* and the *N* × 1 parameters *ψ*_{i}, *i* = 1, … , *N* can be used to construct point estimates of the direct, spill-in and spill-out effects parameters along with empirical measures of dispersion for these estimates that are used for inference in HSAR models (see LeSage and Chih, 2018a). As noted, the own- and cross-partial derivatives for the response of *y* to changes in the explanatory variables *X* reflect a non-linear relationship with the underlying model parameters *β*_{i}, *ψ*_{i} which motivates the need to construct empirical estimates of the dispersion for the effects estimates. In the case of quasi maximum likelihood estimation, empirical estimates of dispersion for the direct, spill-in and spill-out effects parameters can be constructed using a sequence of draws from the estimated asymptotic variance-covariance matrix, in conjunction with the point estimates and the assumption of normality.

## 7 Applications of the Heterogeneous Coefficient Spatial Regression Panel Models

We discuss three applications of the HSAR and HMESS models that appear in the literature, that highlight advantages derived from observation-level estimates of parameters in the model relationship.

### 7.1 Gas Station Pricing Application

*W*taking the block diagonal form in (15), where station #2 is the neighbor to station #1 and so on, with station

*N*− 1 a neighbor to station

*N*and station

*N*a neighbor to station

*N*− 1.

An implication of this spatial configuration is that each station interacts with only its neighbor, so the matrix inverse: (*I*_{N} − Ψ*W*)^{−1} is also block diagonal, consisting of a series of 2 × 2 blocks. The main diagonal elements of this matrix inverse reflect feedback effects from the neighboring station, since we can express the inverse as: *I*_{N} + *ψW* + *ψ*^{2}*W*^{2} + *ψ*^{3}*W*^{3} + … , where *W* has zero elements on the main diagonal, but *W*^{2} has non-zero diagonal elements because station #1 is a neighbor to its neighboring station #2 as is station #2 a neighbor to station #1 (and so on for all pairs of stations). The non-zero diagonal elements reflect reactions of each station to pricing actions taken by their neighbors.

The matrix (*I*_{N} − Ψ*W*)^{−1} summarizes first-round reactions of the two stations to the other station’s pricing actions plus reactions to the reactions, and so on. LeSage et al. (2017) show that in this spatial configuration, the signs of the parameters *ψ*_{i}, *β*_{i} can be used to determine competition as well as price leadership cooperation, as well as other scenarios where competition versus cooperation cannot be ascertained from the parameter estimates.

This application showcases the advantage of having observation-level estimates for each station. A model that produced homogeneous coefficient estimates by averaging over all stations (observations) would result in an inference that *on average* stations either compete or cooperate on pricing decisions. In reality, it seems more plausible that some stations compete while others cooperate, and still others ignore pricing actions taken by neighboring stations. The heterogenous coefficient model estimates reflect a situation where inferences regarding *individual* observational units *i* = 1, … , *N* react to pricing actions taken by each unit *i*’s neighboring units. There are many regional theories that predict heterogeneous outcomes that might plausibly vary across the sample of regions, for example tax rates competition between local governments, where it seems plausible that some local governments reaction to tax rates set by neighboring governments while others do not. Some governments might engage in tax rate competition, while others cooperate with their neighbors. Spatial econometric models that produce homogeneous coefficients that average over a sample of local governments cannot produce heterogeneous conclusions, since they are constrained to produce estimates summarizing a *global* relationship between a sample of (say *N*) regions/institutions/individuals and the average behavior of neighboring regions/insitutions/peers in the sample.

### 7.2 Regional Wage Curve Application

LeSage and Chih (2018b) illustrate the HMESS model using a panel wage curve relationship between quarterly unemployment and wage rates from 261 counties centered on the Bakken shale oil region in North Dakota and Montana. The regional wage curve relationship relates regional wage rates to unemployment rates, with the idea that mobile workers will pursue employment in neighboring regions with higher wage rates in the face of own-region unemployment changes.

*N*regions can be difficult to analyze and summarize for readers. This application summarizes direct, spill-in and spill-out effects using Bakken versus non-Bakken counties. LeSage and Chih (2018b) partition the spill-in and spill-out effects between 27 Bakken counties and 234 non-Bakken counties. This partitioning uses county-specific estimates of the partial derivative impacts associated with changes taking place in one county on own-county outcomes plus neighboring county outcomes, plus neighbors to the neighboring counties, and so on. Specifically, they arrange the

*N*×

*N*matrix of effects estimates that reflects partial derivative responses of wages rates (

*w*) to changes in unemployment rates (

*u*), as shown in (16). (Recall that

*e*

^{−ΓW}is the inverse of the matrix exponential

*e*

^{ΓW}.) In (16),

*B*is a diagonal matrix of the

*N*different coefficient estimates for the single explanatory variable, unemployment rates, and Γ a diagonal matrix of

*N*different dependence estimates.

The first 27 observations reflect Bakken counties and the next 234 observations non-Bakken counties, so that *E*_{11} is a 27 × 27 matrix of partial derivatives, *E*_{12} a 27 × 234 matrix, *E*_{21} a 234 × 27 matrix and *E*_{22} a 234 × 234 matrix.

They sum down the column elements of the 234 × 27 *E*_{21} matrix to produce a measure of the cumulative spill-out effects from the 27 Bakken counties on the 234 non-Bakken counties. These show how changes in each of the 27 Bakken county unemployment rates cumulatively impact wage rates in non-Bakken counties, where the cumulation takes place over all non-Bakken counties. Specifically, these cumulative spill-out effects take the form: \( {\sum}_{i=28}^{261}\partial {w}_i/\partial {u}_1 \) for the spill-out impact of Bakken county 1 on non-Bakken counties, \( {\sum}_{i=28}^{261}\partial {w}_i/\partial {u}_2 \) for the spill-out impact of Bakken county 2 on non-Bakken counties, and so on up to \( {\sum}_{i=28}^{261}\partial {w}_i/\partial {u}_{27}\quad \) for the 27th Bakken county impact on non-Bakken counties.

Summing across the row elements of the 27 × 234 *E*_{12} matrix, produces a measure of the cumulative spill-in effects from the 234 non-Bakken counties to the 27 Bakken counties. These show how changes unemployment rates in all of the 234 non-Bakken counties cumulatively impact wage rates in the 27 Bakken counties, where the cumulation takes place over all non-Bakken counties. Specifically, these cumulative spill-in effects take the form: \( {\sum}_{j=28}^{261}\partial {w}_1/\partial {u}_j\quad \) for the spill-in impact of non-Bakken counties on Bakken county 1, \( {\sum}_{j=28}^{261}\partial {w}_2/\partial {u}_j\quad \) for the spill-out impact of non-Bakken counties on Bakken county 2, and so on up to \( {\sum}_{j=28}^{261}\partial {w}_{27}/\partial {u}_j\quad \) for the 27th Bakken county.

Spill-in impacts from Bakken to Bakken counties would of course be calculated by summing across the 27 rows of the *E*_{11} block from the partitioned matrix, excluding the diagonal elements, which reflect own-partial derivatives that we label direct effects. Similarly, spill-out impacts from Bakken to Bakken counties are constructed by summing down the columns of the *E*_{11} block, excluding the diagonal elements.

### 7.3 Knowledge Production Function

Autant-Bernard and LeSage (2019) note that past literature has used homogeneous spatial autoregressive panel data models to relate regional patent production output to regional knowledge production inputs. These models ignore research on regional innovation systems that has emphasized regional disparities in the ability of regions to turn their knowledge inputs into innovation output and to access external knowledge as part of this knowledge production process. The HSAR panel model is used to estimate region-specific knowledge production functions for 94 NUTS3 regions in France, using a panel covering 21 years from 1988 to 2008 and four high-technology industries. A great deal of regional heterogeneity in the knowledge production function relationship across regions exists the estimates allow analysis of spatial spillin and spillout effects between regions.

One point made by Autant-Bernard and LeSage (2019) is that attempts to introduce specific characteristics of regions in the knowledge production relationship in conjunction with the traditional regional R&D inputs, (for example, regional specialization and diversity indices, concentration or urbanization, etc.), runs a high risk of confounding causes and effects.

*Cobb Douglas*knowledge production function, where the dependent variable is regional knowledge output and the explanatory variables are industry-specific regional inputs reflecting private and public R&D. The (log-transformed) relationship in (18) between annual patents (

*K*) (smoothed over a two-year period) and past periods private and public knowledge inputs is used, with private R&D expenditures (

*R*) and scientific publications (

*U*) used as proxies for private and public knowledge input, respectively.

In the case of France and many other countries, knowledge production is concentrated in a relatively small group of regions, but spatial econometric investigations require a large group of (contiguous) regions to produce reasonable estimates of the role played by spatial dependence/interaction between regions. This leads to homogeneous panel data models that produce parameter estimates describing the relationship between the different explanatory variables and the dependent variable averaged over *N* regions and *T* time periods. The implicit restriction on the parameters is that the relationship between regional knowledge production and regional inputs to production are the same (homogeneous) for all regions and time periods in the sample. In an attempt to relax this restriction assumption, panel data models typically allow for region-specific and time-specific fixed effects that provide different region-specific and time-specific intercepts, while restricting the level of spatial interaction between observed outcomes over time and space to be the same for all regions and time periods.

Since HSAR panel data models attempt to exploit variation over time to provide region-specific estimates of the model parameters, minimal variation over time in patent outputs or knowledge production inputs can lead to collinear relationships. For example, if there were no variation over time in one of the knowledge production function inputs, we would have an explanatory variable matrix with a perfect linear relationship between the intercept vector and the non-varying input vector. Another possible source of collinearity is the fact that the two knowledge production inputs are likely to be highly correlated. That is, regions with growing private R&D inputs are also likely to have growing public inputs, and vice versa for regions where both inputs reflect a downward trend over time.

Belsley et al. (1980) demonstrate how ridge regression can be used to augment small eigenvalues of the explanatory variables matrix that arise in the face of near linear relationships (collinearity) between explanatory variables. Autant-Bernard and LeSage (2019) use a Bayesian prior that mimics ridge regression. The prior uses a normal distribution with prior means of zero for the parameters *β*_{1}, *β*_{2} and a diagonal prior variance-covariance matrix with a small scalar *ridge parameter*, 0.01 × *I*_{2}. This overcomes the problem of an ill-conditioned matrix inverse (see Belsley et al. 1980 for details and examples).

HSAR estimation results suggest that all regions do not successfully turn public and private knowledge inputs into patented innovations, nor do all regions generate knowledge spillovers. Only a small set of regions generate positive effects, benefiting the region itself as well as neighboring regions. In addition, public research produces more systematic effects than private R&D both within and between regions, indicating that private R&D expenditures are perhaps not the main input for knowledge creation. Public research that involves more fundamental knowledge creation generates greater knowledge spillovers. Also, significant spill-in impacts are more prevalent than significant spill-out effects, something that was true across all four high-technology industries studied. Finally, public spill-in and spill-out effect were more prevalent in the mechanics field, whereas private spill-in and spill-out effects prevail in the chemistry field.

## 8 Conclusions

The existence of spatial panel data sets covering longer time spans raises the prospect of exploiting sample data along the time dimension to produce estimates for all spatial units or regions. The appeal of these (heterogeneous coefficient) models is partially due to the fact that theoretical models that give rise to econometric specifications often specify different utility or production functions for each economic agent. Urban and regional economic theories also focus on individual cities or regions. Conventional (homogeneous coefficient) models produce estimates that reflect the *average relationship* over all time periods and regions, resulting in a coarse summary of relationships thought to derive from interaction between individual observations.

Heterogeneous coefficient models produce separate estimates of the parameters of the model relationship for each observation, which allow for variation in the nature of interaction across individual economic agents or regions. Inferences based on heterogeneous model estimates allow for observation-level *spill-out* effects showing how changes in region *i* characteristics impact outcomes in all other regions *j* ≠ *i*. We can also draw inferences regarding observation-level *spill-in* effects that show how changes in other-region *j* ≠ *i* characteristics impact each region *i* outcomes. This is appealing because there are a great many regional theories regarding hierarchical structures of regions that involve *leading* or *influential* regions, versus *following* or isolated regions.

The HSAR and HMESS model specifications also allow analysis of interaction patterns between economic agents (consumers, commuters, firms) located at points in space. Conventional homogeneous coefficient models produce estimates that average over all economic agents, not allowing for variation in patterns of interaction. For example, all firms in the sample compete with neighbors located nearby, or all firms cooperate with neighbors. The HSAR model allows for more realistic outcomes where some firms compete with neighbors, other firms cooperate, and still others operate independently, ignoring actions taken by nearby firms.

## 9 Cross-References

## References

- Aquaro M, Bailey N, Pesaran MH (2015) Quasi maximum likelihood estimation of spatial models with heterogeneous coefficients. CESifo Working Paper No. 5428, Category 12: Empirical and Theoretical Methods, June 2015Google Scholar
- Autant-Bernard C, LeSage JP (2019) A heterogeneous coefficient approach to the knowledge production function. Spatial Economic Analysis, 1–23Google Scholar
- Belsley DA, Kuh E, Welch RE (1980) Regression diagnostics: identifying influential data and source of collinearity. Wiley, New YorkCrossRefGoogle Scholar
- Brunsdon C, Fotheringham AS, Charlton ME (1996) Geographically weighted regression: a method for exploring spatial non-stationarity. Geogr Anal 28(4):281–298CrossRefGoogle Scholar
- Cornwall GJ, Parent O (2017) Embracing heterogeneity: the spatial autoregressive mixture model. Reg Sci Urban Econ 64:148–161CrossRefGoogle Scholar
- Crespo R, Fotheringham S, Charlton ME (2007) Application of geographically weighted regression to a 19-year set of house price data in London to calibrate local hedonic price models. In: Proceedings of the 9th international conference on geocomputation. National University of Ireland Maynooth, IrelandGoogle Scholar
- Debarsy N, Jin F, Lee LF (2015) Large sample properties of the MESS with an application to FDI. J Econ 188(1):1–21CrossRefGoogle Scholar
- Elhorst JP (2014) Spatial econometrics: from cross-sectional data to spatial panels. Springer, Berlin/HeidelbergCrossRefGoogle Scholar
- LeSage JP, Chih YY (2016) Interpreting heterogeneous coefficient spatial autoregressive panel models. Econ Lett 142:1–5CrossRefGoogle Scholar
- LeSage JP, Chih YY (2018a) A Bayesian spatial panel model with heterogeneous coefficients. Reg Sci Urban Econ 72:58–73CrossRefGoogle Scholar
- LeSage JP, Chih YY (2018b) A matrix exponential spatial panel model with heterogeneous coefficients. Geogr Anal 50(4):422–453CrossRefGoogle Scholar
- LeSage JP, Pace RK (2007) A matrix exponential spatial specification. J Econ 140(1):190–214CrossRefGoogle Scholar
- LeSage JP, Pace RK (2019) Interpreting spatial econometric models. In: Fischer MM, Nijkamp P (eds) Handbook of regional science. Springer, Berlin/HeidelbergGoogle Scholar
- LeSage JP, Vance C, Chih YY (2017) A Bayesian heterogeneous coefficients spatial autoregressive panel data model of retail fuel duopoly pricing. Reg Sci Urban Econ 62:46–55CrossRefGoogle Scholar