# Revisiting estimation methods for spatial econometric interaction models

## Introduction

Spatial interaction models are a mathematical representation of interaction behavior between entities located at an origin and a destination. They are widely applied to explain and predict, for example, passenger-flows between airports, trade or migration flows between countries, and flows of customers from residential areas to stores. The most famous spatial interaction model is the gravity model, which is originally inspired by Newton’s law of universal gravitation. The gravity model states that the attraction between an origin and a destination increases as their masses increase and the distance between them decreases. Young (1924) was among the first to use a simple gravity model to explain migration flows of the agricultural population between farms. Since then, the gravity model has been continuously refined and has become one of the most popular econometric models used across industries and scientific disciplines.

Initially, the success of the gravity model has been solely attributed to its excellent agreement with the data in empirical applications. However, by taking a macro perspective of interactions between cities or countries, it lacked a solid theoretical underpinning. Wilson (1967) addressed this problem by showing that the gravity equation can be derived from the behavior of individual actors. An overview of various ways to ground the gravity equation in economic theory is provided by Anderson (2011). The theoretical underpinning increased the recognition of the model in academic circles, but Curry (1972) raised another type of criticism when she showed that the distance parameter is not estimated consistently by ordinary least-squares (OLS) estimation. This would only be possible if the flows were independent, an assumption that is rather unrealistic as many flows start from the same origin or go to the same destination. Moreover, an increasing number of studies report significant spatial dependence in origin-destination (OD) flows (see for example Porojan 2001; Lee and Pace 2005; Chun et al. 2012; Kerkman et al. 2017; Margaretic et al. 2017). Ignoring spatial dependence in such cases would compromise the estimation of all parameters (Anselin 1988) and different strategies have emerged to address spatial correlation of OD flows (for a recent overview see Oshan 2020).

LeSage and Pace (2008) account for spatial dependence in gravity type models using a spatial econometric approach. In general, spatial econometric models relax the independence assumption by allowing observations to influence their neighbors, where the neighborhood structure is defined by spatial weight matrices. These weight matrices typically reflect geographic proximity and allow to compute spatial lags that contain information on the average value of a variable in the neighborhood around a location. The spatial lag (LAG) model includes a lag of the dependent variable and using additional lags of the explanatory variables leads to the so-called spatial Durbin model (SDM). A great overview on different types of spatial econometric models is provided by Elhorst (2010). Apart from exceptional cases (see Lee 2002) the LAG and SDM model cannot be estimated consistently by OLS and much effort has been devoted to the development of alternative estimation procedures, such as MLE (Ord 1975; Anselin 1988), S2SLS estimation (Kelejian and Prucha 1998), or Bayesian MCMC (LeSage 1997). Due to the lack of an analytical solution for the MLE and Bayesian estimators, their computation can be very costly, and a large strand of the literature is devoted to improvements that bring down these estimation cost. Usually, the S2SLS is presented as a more efficient alternative, but this must be qualified for spatial interaction models whose double spatial index amplifies the computational challenges for two reasons: The number of OD pairs $$N = n^2$$ grows quadratically with the number of origins and destinations, increasing problems related to the dimension of the data. The double index also complicates the definition of a geographical neighborhood, as a flow might be influenced by the neighbors of the origin, the destination, or both. To model this neighborhood structure LeSage and Pace (2008) use three spatial weight matrices, which augments the number of instruments used for the S2SLS estimator.

This article presents computational techniques that allow to efficiently estimate the spatial econometric interaction model of LeSage and Pace (2008). While the expositions regarding the MLE and the Bayesian MCMC estimators constitute refinements of existing methods, the detailed implementation of the S2SLS estimator of this model is novel in the literature. The guiding principle of the improvements is a reformulation of the three estimation procedures in terms of low-dimensional moment matrices that need to be computed only once prior to the estimation. Consequently, the computation cost becomes, up to the calculation of these moments, independent of the sample size. In fact, this idea can be generalized to all spatial econometric models that use gaussian residuals, but it is especially efficient for spatial interaction models whose OD structure allows to represent the model in a matrix formulation. Using this matrix formulation, which was introduced by LeSage and Pace (2008), we obtain most elements of the required moments using $${\mathcal {O}}(\sqrt{N})$$ instead $${\mathcal {O}}(N)$$ computations. For the moment based MLE we derive a simple restructuring of the quadratic term in the likelihood function and a generalization of the mixed numerical analytical Hessian estimator (see LeSage and Pace 2009) to spatial models with multiple weight matrices. The improved Bayesian MCMC estimator is the one of LeSage and Pace (2009), with an adjusted sampling procedure that expresses all sampling steps in terms of low-dimensional moments. To improve the performance of the S2SLS estimator, we develop its matrix formulation and also address multicollinearity problems of the spatial instruments. One source of these problems is the OD structure of the data, while another one is related to the fact that the SDM model already includes spatial lags of the covariates. Hence, solutions to the second type of problem also apply outside the context of OD flows. Regarding the feasible parameter space, we develop a new method that allows to verify the coherence of the model and does not substantially increase the computational cost of the estimation. To facilitate the adoption of the model, all estimators presented in this article are implemented in the R package spflow.Footnote 1 These contributions should help to establish spatial econometric interaction models as a standard approach to modeling origin-destination flows. Failing to make this model operational implies that spatial dependence continues to be ignored in many studies, which may have severe consequences. Kerkman et al. (2017), for example, point out that omitted spatial autocorrelation might be a reason for the systematic overestimation of the effects of policy interventions in the public transport sector.

Section 2 introduces the model and the matrix formulation. Section 3 discusses the issue of defining the feasible parameter space. Section 4 details the improved calculation procedures for the three estimators and Section 5 presents benchmarks of these methods. In Section 6, all estimators and the new parameter space solutions are illustrated based on the example of home-to-work commuting flows between the municipalities of Paris. The final section concludes.

## A spatial econometric model for origin-destination flows

LeSage and Pace (2008) develop a model for origin-destination flows that accounts for three types of spatial autocorrelation. Their matrix formulation leads to significant efficiency gains, as it allows to perform most calculations linked to the parameter estimation with objects of dimension n (the number of spatial sites) instead of $$N = n^2$$ (the number of origin-destination pairs). The following paragraphs introduce this model and present a generalized version of the matrix formulation that is more flexible in handling the explanatory variables.

### Representation of the flows

To construct the flow matrix $$Y (n \times n)$$ we have to define an ordering for the spatial sites at the origins and the destinations of the flows. Based on this ordering, all flows are arranged into a matrix, whose columns correspond to the origins and whose rows correspond to the destinations.

\begin{aligned} \left( \begin{array}{cccc} o_{1} \rightarrow d_{1} &{} o_{2} \rightarrow d_{1} &{} \cdots &{} o_{n} \rightarrow d_{1} \\ o_{1} \rightarrow d_{2} &{} o_{2} \rightarrow d_{2} &{} \cdots &{} o_{n} \rightarrow d_{2} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ o_{1} \rightarrow d_{n} &{} o_{2} \rightarrow d_{n} &{} \cdots &{} o_{n} \rightarrow d_{n} \end{array}\right) \implies \left( \begin{array}{cccc} y_{1:1} &{} y_{2:1} &{} \cdots &{} y_{n:1} \\ y_{1:2} &{} y_{2:2} &{} \cdots &{} y_{n:2} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ y_{1:n} &{} y_{2:n} &{} \cdots &{} y_{n:n} \end{array}\right) = Y \end{aligned}
(1)

Using the $${{\,\mathrm{VEC}\,}}$$-operator, that stacks the columns of a matrix (from left to right), it is then possible to derive the flow vector $$y = {{\,\mathrm{VEC}\,}}(Y)$$. LeSage and Pace (2009) describe this representation of y as an origin-centric flow arrangement.

### Representation of the spatial neighborhood

To illustrate how LeSage and Pace (2008) derive the three neighborhood matrices of the origin-destination pairs, we consider a toy example of three sites $$(s_1, s_2, s_3)$$ that are located on a line. In this example the neighborhood matrix W of the sites is defined by contiguity.

By convention the diagonal elements of W are zero, implying that no site is its own neighbor. We also assume that W is row-stochastic, which means that its rows sum to one. This row-normalization yields some computational advantages and allows to interpret the spatial lag of a variable as a local weighted average, where the weights correspond to the strength of the neighborhood relation. Based on these assumptions, we obtain the site neighborhood matrix shown in (2). The first and last sites ($$s_1$$ and $$s_3$$) only have the middle site ($$s_2$$) as their neighbor, which in turn has two neighbors. From the site neighborhood matrix W we can then derive three OD pair neighborhood matrices

\begin{aligned} W_d = {\mathbf {I}}_{n} \otimes W \qquad \qquad W_o = W \otimes {\mathbf {I}}_{n} \qquad \qquad W_w = W \otimes W, \end{aligned}
(3)

where $$\otimes$$ denotes the Kronecker product. The destination neighborhood matrix $$W_d$$ allows to capture the mutual influence of flows that start at the same origin and end at a neighboring destinations. Likewise, the origin neighborhood matrix $$W_o$$ captures dependence between flows that have the same destination and start from neighboring origins. Finally, the origin-to-destination neighborhood matrix $$W_w$$ represents links between flows whose origins and destinations are neighbors. All three matrices $$W_d$$, $$W_o$$ and $$W_w$$ have the dimensions $$N \times N$$ and inherit the row-stochastic property of $$W \; (n \times n)$$. Below we illustrate how the $$W_d$$ matrix is constructed for the above example of $$n = 3$$ sites and $$N = n^2 = 9$$ origin-destination pairs. The first row of $$W_d$$, for example, reveals that the only destination neighbor of the first pair ($$s_1 \rightarrow s_1$$) is the second pair ($$s_1 \rightarrow s_2$$).

### The explanatory variables

If we want to exploit the efficiency gains of the matrix formulation, we have to distinguish the attributes of the OD pairs from those associated with the origins and the destinations. The most usual OD pair attribute is g the vector of geographic distances, which we derive from the distance matrix $$G \; (n \times n)$$, based on the same ordering as for the flows in (1). Extending the model to account for multiple pair attributes is not mathematically involving, but leads to more complicated moment expressions. For this reason, the article works with a single distance vector, while the accompanying R package allows multiple OD pair attributes.

The attributes of the spatial sites reflect the origin’s capacity to emit and the destination’s capacity to attract a flow. Because these two effects might be driven by different characteristics of the sites, we use the matrix $$OX \; (n \times k_o)$$ to describe the origin’s emission and the matrix $$DX \; (n \times k_d)$$ to describe the destinations attraction capacity.

Another frequent problem during the estimation of spatial interaction models is related to flows that start and end at the same geographical site. Such intra-regional flows are often orders of magnitudes larger than inter-regional ones and may dominate the parameter estimation (see for example Tiefelsdorf 2003). Setting the diagonal elements of the flow matrix to zero or removing them from the data appears as an easy solution, but this also impacts all inter-regional flows that are neighbors of intra-regional ones. This spurious side effect occurs as soon as the model includes spatial lags of the flow vector and leads to distorted parameter estimates. LeSage and Pace (2009) present a more elegant solution that isolates the dominating data points by adding an additional set of parameters to intra-regional flows. The variables used for the intra-regional model are represented in a third matrix, denoted by $$IX \; (n \times k_I)$$. Following a suggestion of LeSage and Pace (2008), this article uses the notation

\begin{aligned} \tilde{OX} = {\left\{ \begin{array}{ll} OX &{} \text {if \; spatial structure = LAG}\\ (OX \; WOX) &{} \text {if \; spatial structure = SDM} \end{array}\right. } \end{aligned}
(4)

to treat the SDM and the LAG specification in the same framework. The same notation is subsequently applied to all three matrices of site attributes (DX, OX and IX).

The matrix $$Z = \begin{pmatrix} \iota _N&\iota _I&\tilde{X_d}&\tilde{X_o}&\tilde{X_I}&g \end{pmatrix}$$ gathers all the explanatory variables in their vectorized formulation: The first two columns of $$Z \; (N \times K)$$ represent the global constant $$\iota _N$$ and an intra-regional constant $$\iota _I$$, which is equal to one if the origin and destination coincide and zero otherwise. The next three sets of variables represent the attributes of the destinations ($$\tilde{X_d}$$), the origins ($$\tilde{X_o}$$) and the intra-regional observations ($$\tilde{X_I}$$). To construct these matrices we have to duplicate the information in $$\tilde{DX}, \tilde{OX}$$ and $$\tilde{IX}$$. The last set is the distance vector. Based on the ordering introduced in (1), we can derive the six components of the Z matrix as

$$\begin{array}{lll} \iota _N= \iota _n \otimes \iota _n &{\tilde{X}}_d= \iota _n \otimes \tilde{DX} &{\tilde{X}}_I= {{\,\mathrm{Diag}\,}}(\iota _I) (\tilde{IX} \otimes \iota _n) \\ \iota _I= {{\,\mathrm{VEC}\,}}(I_n) &{\tilde{X}}_o= \tilde{OX} \otimes \iota _n &g= {{\,\mathrm{VEC}\,}}(G), \end{array}$$
(5)

where $$I_n$$ is the $$(n \times n)$$ identity matrix, $$\iota _n$$ is a $$(n \times 1)$$ vector of ones, and the $${{\,\mathrm{Diag}\,}}(x)$$ operation places the elements of the vector x on the diagonal of a matrix with otherwise zero entries.

Definition (5) of the explanatory variables implies that the parameters of the intra-regional model only capture the difference to the corresponding inter-regional ones. In other words, the direct effect of a site attribute on a flow whose origin and destination coincide is based on the sum three coefficients if the attribute is present in all three matrices $$\tilde{DX}, \tilde{OX}$$ and $$\tilde{IX}$$. An alternative parametrization requires to set the rows of $$\iota _N, {\tilde{X}}_d$$ and $${\tilde{X}}_o$$ for intra-regional observations to zero, leading to intra-regional parameters that capture the full effect of the intra-regional attributes. However, the parametrization implied by equation (5) simplifies the calculation and reporting of significance tests. Using (5) we can, for example, test the necessity to include an intra-regional effect based on a null-hypothesis of the form $$H_0: \alpha _I = 0$$ (where $$\alpha _I$$ could be the coefficient of the intra-regional constant). The corresponding t-test is easy to compute and can be reported for all parameters in tabular form. With the alternative parametrization we would have to use the hypothesis of equality between the intra- and inter-regional parameters (for example $$H_0: \alpha _I = \alpha$$) to make the same test, which is not as easy to calculate and to report.

### The spatial econometric interaction model

We consider the following spatial econometric interaction model

\begin{aligned} y = \rho _d W_dy + \rho _o W_o y + \rho _w W_w y + Z\delta + \varepsilon , \end{aligned}
(6)

where the parameters $$\delta$$ capture the influence of the explanatory variables, $$\varepsilon$$ represents a vector of random errors, and $$\rho = \begin{pmatrix}\rho _d&\rho _o&\rho _w \end{pmatrix}'$$ is the vector of autoregression parameters associated to the three possible forms of spatial dependence. This definition of model (6) is, up the elements of the Z matrix, identical to the one of LeSage and Pace (2008), who derive a family of nine sub-models from this general specification.

### Matrix formulation and moment expressions

Model (6) explains the flow vector y $$(N \times 1)$$ with some explanatory variables and three autoregressive components. From a theoretical point of view, it is no problem to treat this model as a usual spatial econometric model with multiple weight matrices. However, this vectorized formulation is computationally inefficient because it ignores redundancies that arise from the fact that every site is at the same time origin and destination of many flows. For this reason, the spatial neighborhood matrices and the explanatory variables contain mainly duplicated information. The matrix formulation of LeSage and Pace (2008) avoids these inefficiencies by operating on the flows in their matrix representation and by deriving the elements of moments expressions such as $$Z'Z$$ and $$Z'y$$ directly from the data related to the sites. Hence, when using the matrix formulation, we do not need the large matrix of covariates Z $$(N \times K)$$, nor the even larger neighborhood matrices of the OD pairs $$W_r$$ $$(N \times N)$$ (for $$r = d, o, w$$). The moments presented in the following have the same structure as those of (LeSage and Pace 2009, ][page 224) but do not impose that $$\tilde{OX} = \tilde{DX} = \tilde{IX} = {\tilde{X}}$$.

The $$Z'Z$$ moment is derived from the exogenous covariates defined in (5). Since it is a symmetric matrix, we use the $$\bullet$$ notation to indicate elements that are already given on the upper triangle. The following definition of $$Z'Z$$ also uses the trace operator ($${{\,\mathrm{tr}\,}}$$), the elementwise Hadamard product $$\odot$$ and the $${{\,\mathrm{diag}\,}}$$-operator which extracts the main diagonal of a square matrix.

\begin{aligned} \begin{gathered} Z'Z=\left( \begin{array}{rrrrrc} {N} &{} {n} &{} n\cdot \iota _{{n}}'\tilde{DX} &{} n\cdot \iota _{{n}}'\tilde{OX} &{} {\iota _{{n}}'\tilde{IX}} &{} {\iota _{n}^{\prime } G \iota _{n}} \\ {\bullet } &{} {n} &{} {\iota _{{n}}'\tilde{DX}} &{} {\iota _{{n}}'\tilde{OX}} &{} {\iota _{{n}}'\tilde{IX}} &{} {{{\,\mathrm{tr}\,}}(G)} \\ \bullet &{} {\bullet } &{} {n\cdot \tilde{DX}^{\prime } \tilde{DX}} &{} ({\iota _{{n}}\tilde{DX})'\iota _{{n}}\tilde{OX}} &{} {\tilde{DX}^{\prime } \tilde{IX}} &{} {\tilde{DX}^{\prime } G \iota _n} \\ \bullet &{} {\bullet } &{} \bullet &{} {n\cdot \tilde{OX}^{\prime } \tilde{OX}} &{} {\tilde{OX}^{\prime } \tilde{IX}} &{} {\tilde{OX}^{\prime } G^{\prime } \iota _n} \\ {\bullet } &{} {\bullet } &{} {\bullet } &{} {\bullet } &{} {\tilde{IX}^{\prime } \tilde{IX}} &{} {\tilde{IX}^{\prime } {{\,\mathrm{diag}\,}}(G)} \\ \bullet &{} {\bullet } &{} \bullet &{} \bullet &{} \bullet &{} \iota _{{n}}'(G \odot G)\iota _{{n}} \end{array}\right) \end{gathered} \end{aligned}
(7)

The $$Z'y$$ moment is related to the empirical covariance between the flows and the exogenous variables. Such a covariance is also required for the three spatial lags of the flow vector. LeSage and Pace (2008) note the similarity in these four covariance terms and derive them efficiently according to the equations in (8) below, where $$y^{(t)}$$ ($$t = 1,2,3,4$$) represents the original flow vector as well as the lagged versions ($$y^{(1)} = y, \quad y^{(2)} = W_dy, \quad y^{(3)} = W_oy, \quad y^{(4)} = W_wy$$).

\begin{aligned} y^{(t)} = {\text {VEC}}(Y^{(t)}) \qquad Y^{(t)} = {\left\{ \begin{array}{ll} Y &{} \text {if \; t = 1}\\ WY &{} \text {if \; t = 2}\\ YW' &{} \text {if \; t = 3}\\ WYW' &{} \text {if \; t = 4} \end{array}\right. } \qquad Z'y^{(t)} = \left( \begin{array}{c} {\iota _{n}^{\prime } Y^{(t)} \iota _{n}} \\ {{{\,\mathrm{tr}\,}}(Y^{(t)})} \\ {\tilde{DX}^{\prime } Y^{(t)} \iota _{n}} \\ {\tilde{OX}^{\prime } Y^{(t)\prime } \iota _{n}} \\ {\tilde{IX}^{\prime } {{\,\mathrm{diag}\,}}(Y^{(t)})} \\ {\iota _{{n}}'(G \odot Y^{(t)})\iota _{{n}}} \end{array}\right) \end{aligned}
(8)

LeSage and Pace (2008) use the moment expressions in (7) and (8) to improve the performance of the MLE and also apply them to a Bayesian MCMC estimator (LeSage and Pace 2009). In Sect. 4, we adjust the estimation procedures for the MLE, the Bayesian MCMC, and the S2SLS to derive the parameter estimates and their standard errors exclusively from such moments.

## The feasible parameter space

Unfortunately, the spatial interaction model in (6) cannot be consistently defined for all values of the autoregressive parameters, and we have to consider a feasible parameter space that ensures the coherence of the model. The following paragraphs give an overview about the problems we encounter when defining this feasible parameter space and discuss some of the available solutions. We treat this issue first from the general perspective of a higher order spatial lag or Durbin model. Afterwards, we develop a new solution to this problem for the special case of the spatial interaction model in (6).

In general, a spatial model of order l is characterized by the fact that it uses l distinct neighborhood matrices associated to l autoregressive parameters. We can define this model as

\begin{aligned} y = \rho _1 W_1y + ... + \rho _l W_l y + Z\delta + \varepsilon \iff Ay = Z\delta + \varepsilon , \end{aligned}
(9)

where $$A = I - \rho _1 W_1 - ... - \rho _l W_l$$ denotes the spatial filter matrix and $$\varepsilon$$ is an independently distributed error. To clarify the problem inherent to this family of models, let us first recapitulate an argument of Kelejian and Robinson (1995)Footnote 2. In model (9), the variance-covariance of the dependent variable and the error term are related by the equation $$A{{\,\mathrm{Var}\,}}(y)A' = {{\,\mathrm{Var}\,}}(\varepsilon )$$. Thus, if A is singular, $${{\,\mathrm{Var}\,}}(\varepsilon )$$ must also be singular for any given Var(y), contradicting the assumption of independent errors used to define the model in the first place.

The above discussion shows that higher order SDM and LAG models can be defined coherently only if A is non-singular, which is equivalent to all its eigenvalues being different from zero. If we denote $$W_F = W_1 + ... + \rho _l W_l$$, the same condition requires all eigenvalues $$\lambda (W_F)$$ of $$W_F$$ to be different from one. In practice, this condition is rarely used because computing the eigenvalues $$W_F$$ is very costly and because the associated parameter space is not compact (Hepple 1995). However, we can derive more restrictive alternatives with more convenient properties. For the order one model these restrictions have simple analytical solutions, but their extension to the higher order model is not obvious.

Table 1 illustrates how the most commonly applied constraints on the parameter space for the order one model have been extended to higher order spatial models. The presented formulas and the discussion in the following paragraph are valid under the condition that the neighborhood matrices have positive, real entries. Stricter assumptions such as symmetry are not necessary if we interpret the absolute value of a complex number as the modulus. The first column of Table 1 defines five constraints on the feasible space for the autoregressive parameters $$\rho = (\rho _1, ..., \rho _l)$$ in an SDM or LAG model of order l. Columns two and three of Table 1 indicate whether a given constraint is necessary or sufficient to ensure the coherence of the model, and the last column shows an expression of the feasible parameter space for $$l = 1$$. The first four constraints are ordered in decreasing restrictiveness, while the fifth one cannot be compared to the others because it is rather an ad hoc rule than a mathematically justified restriction.

An advantage of the order one model is that we can separate the autocorrelation parameters from the eigenvalue decomposition of the $$W_F$$ matrix: $$\lambda (W_F) = \rho _1 \lambda (W_1)$$. In contrast, for higher order models the eigenvalue decomposition of $$W_F$$ must be recomputed for every change in the autoregressive parameters, which makes it very costly to define the parameter spaces implied by the first three constraints. This difficulty explains why many empirical studies resort to the naive extensions described by constraint (IV) and (V).

Constraint (I) is a necessary and sufficient condition for the coherence of the model, because it only excludes the parameter values for which A is singular. This condition was already mentioned by Martin (1992), but due to the complexity of the associated parameter space and the high computational cost, most spatial econometric models employ one of the more restrictive alternatives given by the later four constraints.

The parameter space associated to constraint (II) corresponds to the largest compact set of feasible parameter values that admits zero autocorrelation and is therefore intuitively appealing. It requires that all real eigenvalues of $$W_F$$ are smaller than one, which is equivalent to $$\lambda _{max}(W_F) < 1$$. This restriction also implies that the spatial filter matrix is positive definite if no complex eigenvalue of $$W_F$$ exceeds 1 in magnitude. In the order one model, the largest real eigenvalue $$\lambda _{max}(W_F)$$ is given by $$\lambda _{max}(W_1) \cdot \rho _1$$ if $$\rho _1 > 0$$ and $$\lambda _{min}(W_1) \cdot \rho _1$$ if $$\rho _1 \le 0$$. From this definition of $$\lambda _{max}(W_F)$$ we can easily derive the simple interval in column four and row two of Table 1. Using constraint (II) in a higher order model ($$l \ge 2$$) is much more complicated, since we can no longer separate the autocorrelation parameters from the eigenvalue decomposition of $$W_F$$. Consequently, to identify the space of feasible autoregression parameters (for which $$\lambda _{max}(W_F) < 1$$), we have to resort to a numerical search over a grid of possible values for $$(\rho _1, ..., \rho _l)$$. Elhorst et al. (2012) develop a method that allows to identify this space based on a grid with dimension $$l-1$$ instead of l, but computing $$\lambda _{max}(W_F)$$ for all points in this grid is still impractical for large data sets.

Constraint (III) suffers from the same computational problems as (II). It requires all eigenvalues of $$W_F$$ to be less than one in absolute value, and is often used to deal with the log-determinant term that appears in the likelihood function of most spatial models. When the constraint (III) is satisfied, this log-determinant can be approximated based on a truncated Taylor series (see for example Martin 1992; Barry and Pace 1999; Pace and LeSage 2004; Smirnov and Anselin 2009; Bivand et al. 2013). Most spatial models estimated by MLE or Bayesian methodology make use of such an approximation and thereby restrict (sometimes implicitly) the feasible parameter space according to constraint (III).

Constraints four and five can be considered as naive higher order extensions of the previous two constraints. Their main advantage is that they do not require complex computations. For row stochastic neighborhood matrices, constraints (III), (IV), and (V) are equivalent when the order of the model is $$l = 1$$ or when all autoregression parameters are positive. However, the naive constraints lead to problems when we allow negative autocorrelation in a higher order spatial model. Using constraint (IV) restricts the space of feasible solutions much more than necessary, and constraint (V), although it works well in most applications, does not permit to draw any conclusions about the coherence of the model.

The spatial econometric interaction model in (6) contains three neighborhood matrices, which is usually a blocking point for defining the feasible parameter space according to the first three constraints. However, thanks to the special neighborhood structure in this model, we can verify constraints (II) or (III) using $${\mathcal {O}}(n)$$ methods. In model (6) we have $$W_F = \rho _d ({\mathbf {I}}_{n} \otimes W) + \rho _o (W \otimes {\mathbf {I}}_{n}) + \rho _w (W \otimes W)$$ and we can compute the eigenvalues vector $$\lambda (W_F) (N \times 1)$$ directly from the eigenvalues vector $$\lambda (W) (n \times 1)$$ of the site neighborhood matrix W.

\begin{aligned} \lambda (W_F) = \rho _d (\iota _{{n}} \otimes \lambda (W)) + \rho _o (\lambda (W) \otimes \iota _{{n}}) + \rho _w (\lambda (W) \otimes \lambda (W)) \end{aligned}
(10)

We could use (10) to verify constraint (I), but this would require up to $${\mathcal {O}}(n^3)$$ computations to obtain $$\lambda (W)$$, and another $${\mathcal {O}}(n^2)$$ calculations for every set of values of the autoregressive parameters that we want to verify. Thus, the computational effort makes constraint (I) unusable for large sample applications.

However, if we restrict our attention to the case where W is similar to a symmetric matrix, we can verify constraints (II) or (III) without having to consider full eigenvalues vector (10) and by focusing on the relevant elements we can reduce the computational effort drastically. Let us first note that all of the N equations in (10) have the coordinate expression $$\rho _d \lambda _j(W) + \rho _o \lambda _i(W) + \rho _w \lambda _j(W) \lambda _i(W)$$, for $$i,j = 1,...,n$$. This coordinate form helps to see that the two extreme eigenvalues of $$W_F$$ are always contained in the same four elements of the vector $$\lambda (W_F)$$: The relevant entries correspond to the case when $$\lambda _j(W)$$ and $$\lambda _i(W)$$ are both chosen from the set $$\{\lambda _{max}(W), \lambda _{min}(W)\}$$. This argument also applies to a general W, if none of its complex eigenvalues exceed $$\min (|\lambda _{max}(W)| , |\lambda _{min}(W)|)$$ in magnitude. In the following we suppose that this condition is metFootnote 3 and denote the set of relevant eigenvalues by $$\Lambda (\rho )$$. The equation (11) below defines the four elements in this set using a small abuse of notation.

\begin{aligned} \Lambda (\rho ) = \{\rho _d \lambda _j(W) + \rho _o \lambda _i(W) + \rho _w \lambda _j(W) \lambda _i(W), \quad \text {for} \quad i,j = min,max\} \end{aligned}
(11)

Verifying constraints (II) and (III) with this reduced set of relevant eigenvalues has the double advantage that we only need to compute the largest and the smallest real eigenvalueFootnote 4 of W and that we only have to find the minimum and the maximum among four instead of N elements.

\begin{aligned} \begin{aligned}&\text {Constraint (II) } \iff \max (\Lambda (\rho ))< 1 \\&\text {Constraint (III) }\iff -1< \min (\Lambda (\rho )) \quad \& \quad \max (\Lambda (\rho )) < 1 \end{aligned} \end{aligned}
(12)

The vector of relevant eigenvalues $$\Lambda (\rho )$$ allows to verify the two constraints in (12) almost instantaneously. For the Bayesian MCMC estimator it is not difficult to enforce such a restriction during the estimation, but for the MLE and S2SLS this requires constrained optimization methods, which go beyond the scope if the present article. A simpler strategy is to use an unconstrained estimator and to check the feasibility of the parameters a posteriori. To quantify the gains from choosing the spaces associated to (II) and (III) as opposed to the classical approach, we can compare their volume with the one implied by constraint (IV), requiring $$|\rho _d| + |\rho _o| + |\rho _w| < 1$$. For the example presented in Sect. 6, this comparison reveals that we increase the feasible parameter space by a factor 2.4 when choosing (III) and by a factor 7.25 when choosing (II).

## Improved estimation methods

Maximum likelihood estimation is the traditional estimation method for spatial econometric models and dates back to Ord (1975). Anselin (1988) provides an in-depth analysis of the MLE estimator for a general spatial model that may include dependence in the error term, the response variable and the explanatory variables. Much later, Lee (2004) proves asymptotic normality and consistency of the MLE and quasi-MLE and shows that the estimator attains $$\sqrt{N}$$ convergence when the neighborhood matrix is sufficiently sparse. Although Anselin (1988) already discusses alternatives to the MLE, it took another decade until Bayesian estimation (LeSage 1997) and S2SLS estimation (Kelejian and Prucha 1998) received broader attention.

All three estimators were initially proposed for a standard spatial model, but they have been adapted to the case of OD flows. LeSage and Pace (2008) develop a MLE procedure for the spatial interaction model given in (6) and LeSage and Pace (2009) describe an MCMC estimator of the same model. Badinger and Egger (2011) extend the S2SLS estimator of Kelejian and Prucha (1998) to models with multiple weight matrices and Tamesue and Tsutsumi (2016) use this extension for a model of migration flows in Japan. In the following subsections, we derive the three estimators exclusively from moment matrices with low dimensions. First, we discuss a reformulation of the likelihood function that yields computational benefits for the MLE and the Bayesian MCMC estimator. We then derive the improved estimation procedures, starting with the two likelihood-based ones, and finally present the S2SLS estimator.

### Refining the likelihood expression

We treat the reformulation of the likelihood function in a separate section, because computational techniques that speed up its evaluation benefit the MLE as well as Bayesian estimators. The likelihood function $$f_y(y)$$ of model (6) is derived from the density $$f_{\varepsilon }(\varepsilon )$$ of the errors using a transformation of variables ( Shao 2003, page 23), which exploits the link $$y = A^{-1}Z\delta + A^{-1}\varepsilon$$ between the errors and the flows. The Jacobian term of this transformation turns out to be |A| the determinant of the spatial filter matrix.

\begin{aligned} f_y(y) = f_{\varepsilon }(\varepsilon ) |A| = (\frac{1}{2\pi \sigma ^2})^{(N/2)} \exp \left\{ - \frac{1}{2\sigma ^2} (Ay - Z\delta )'(Ay - Z\delta ) \right\} |A| \end{aligned}
(13)

Expression (13) shows that only the Jacobian term |A| and the residual sum of squares (RSS) term $$RSS(\rho ,\delta ) = (Ay-Z\delta )'(Ay-Z\delta )$$ lead to computational challenges during the evaluation of the likelihood. The RSS term can be expanded to depend only on the parameters and three low-dimensional moment matrices

\begin{aligned} RSS(\rho ,\delta ) = \tau (\rho )'TSS_{\bullet }\tau (\rho ) - 2\delta 'Z'y_{\bullet }\tau (\rho ) + \delta 'Z'Z\delta , \end{aligned}
(14)

where $$\tau (\rho )' = \begin{pmatrix}1&-\rho _d&-\rho _o&-\rho _w \end{pmatrix}$$, $$y_{\bullet } = \begin{pmatrix} y&W_dy&W_oy&W_wy \end{pmatrix}$$ and $$TSS_{\bullet } = y_{\bullet }'y_{\bullet }$$. The three moments in (14) can be derived from a matrix formulation of the model using the expressions (7), (8) and $$TSS_{\bullet }$$ = $$\iota _n'(Y^{(t)} \odot Y^{(s)})\iota _n$$, for $$t,s = 1,2,3,4$$.

To handle the Jacobian term, we use the techniques outlined by LeSage and Pace (2008) and LeSage (2020), who adapt the log-determinant approximation of Martin (1992) to model (6). We only sketch the intuition of this approximation here, while more detailed explanations can be found in the previously mentioned articles. If the autoregression parameters satisfy constraint (III) in Table 1, we can express the log-determinant as a Taylor series

\begin{aligned} \ln |A| = -\sum _{t=1}^{\infty } \frac{{\text {tr}}\left( (\rho _dW_d + \rho _oW_o + \rho _wW_w)^{t}\right) }{t} \end{aligned}
(15)

and the approximation corresponds to the first m terms of this series. The linearity of the trace operator then allows to separate the computation of the traces from the autoregression parameters for any power $$t = 1,...,m$$. If we further exploit the sparsity and Kronecker product structure of $$W_d$$, $$W_o$$, and $$W_w$$, we can derive the required traces using $${\mathcal {O}}(n)$$ methods.

### The MLE

The main computational challenge of the MLE is that the autoregression parameters do not have an analytical solution, and to find the estimators we are forced to use numerical optimization techniques. In the following subsections, we recapitulate how to use the concentrated log-likelihood function to speed up the parameter estimation and develop a generalized version of the mixed numerical analytical Hessian approximation of LeSage and Pace (2009) that allows to compute the standard errors of the estimators with almost no additional effort.

#### Parameter estimation

Applying the logarithm to the density of the dependent variable in (13) leads to the log-likelihood function expressed in terms of the model parameters.

\begin{aligned} {\mathcal {L}}(\rho ,\delta ,\sigma ^2) = \ln (f_y) = -\frac{N}{2}\ln (2\pi ) -\frac{N}{2}\ln (\sigma ^2) -\frac{1}{2\sigma ^2} (Ay - Z\delta )'(Ay - Z\delta ) + \ln |A| \end{aligned}
(16)

By differentiating the log-likelihood in (16) with respect to the parameters $$\rho$$, $$\delta$$ and $$\sigma ^2$$, we obtain the following three sets of first order conditions.

\begin{aligned} \begin{aligned} \frac{\partial {\mathcal {L}}}{\partial \rho } =&-tr(A^{-1}\frac{\partial A}{\partial \rho }) - \frac{(Ay - Z\delta )}{\sigma ^2}\frac{\partial A}{\partial \rho }&= 0 \implies&\rho \text { has no analytical expression } \\ \frac{\partial {\mathcal {L}}}{\partial \delta } =&-\frac{1}{\sigma ^2}(Z'Ay-Z'Z\delta )&= 0 \implies&{\hat{\delta }}(\rho ) = (Z'Z)^{-1}Z'Ay \\ \frac{\partial {\mathcal {L}}}{\partial \sigma ^2}=&-\frac{N}{2\sigma ^2}+\frac{(Ay - Z\delta )'(Ay - Z\delta )}{2\sigma ^4}&= 0 \implies&{\hat{\sigma }}^2(\rho ) = \frac{(Ay - Z\hat{\delta } (\rho ))'(Ay - Z\hat{\delta } (\rho ))}{N} \end{aligned} \end{aligned}
(17)

The first set of equations in (17) reveals that the optimization problem does not yield an analytical expression for the autoregressive parameters. However, the solutions for the parameters $$\delta$$ and $$\sigma ^2$$ are found explicitly as a function of $$\rho$$. We can use the notation $$y_{\bullet }$$ in (14) to simplify the solutions to the first order conditions in (17) and the expression of the RSS term in (14).

\begin{aligned} {\hat{\delta }}(\rho )= (Z'Z)^{-1}(Z'y_{\bullet })\tau (\rho ) &\qquad RSS(\rho ,{\hat{\delta }}(\rho ))= \tau (\rho )'RSS_{\bullet }\tau (\rho ) \\ {\hat{\sigma }}^2(\rho )= \frac{RSS(\rho ,{\hat{\delta }}(\rho ))}{N} \qquad& RSS_{\bullet }= TSS_{\bullet } - (Z'y_{\bullet })'(Z'Z)^{-1}(Z'y_{\bullet }) \end{aligned}
(18)

The factorization $$\tau (\rho )'RSS_{\bullet }\tau (\rho )$$ of the RSS term is not new to the literature, but is usually represented in terms of the residuals of an orthogonal projection, which corresponds to $$RSS_{\bullet } = \varepsilon _{\bullet }'\varepsilon _{\bullet }$$, where $$\varepsilon _{\bullet } = y_{\bullet } - Z(Z'Z)^{-1}Z'y_{\bullet }$$. For model (6), this operation requires to compute at least four additional vectors of residuals and fitted values with N rows, which is much less efficient then deriving $$RSS_{\bullet }$$ directly from an inner product as in (18). Substituting the simplified solutions in (18) into (16) leads to the concentrated log-likelihood $${\mathcal {L}}^c(\rho ) \propto -\frac{N}{2} \ln (\tau (\rho )'RSS_{\bullet }\tau (\rho )) +\ln |A|$$, which we can quickly optimize using the log-determinant approximation portrayed in Sect. 4.1.

#### Hessian estimation

The Hessian matrix is required to conduct inference on the parameter estimates, which is an important part of any estimation procedure. Although most spatial econometric models have an analytical solution for the Hessian matrix (see for example Anselin 1988), it is usually more practical to use a numerical approximation instead. In the following, we will extend the mixed numerical analytical Hessian estimator of LeSage and Pace (2009), to spatial models with multiple weight matrices. Furthermore, we restructure the expression of this Hessian estimator to depend only on quantities that are already available from the parameter estimation step, allowing to derive the variances of the parameters without increasing the computational effort.

The Hessian matrix corresponds to the second order derivatives of the log-likelihood function $${\mathcal {L}}(\rho ,\delta ,\sigma ^2)$$, as in (13), with respect to the model parameters. If we define $$\theta = \begin{pmatrix}\delta '&\sigma ^2 \end{pmatrix}'$$, we obtain a Hessian matrix H that is composed of four blocks.

\begin{aligned} H= \left[ \begin{array}{cc} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho \partial \rho '}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho \partial \theta ^{\prime }}} \\ {\frac{\partial ^{2} {\mathcal {L}}}{\partial \theta \partial \rho '}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \theta \partial \theta ^{\prime }}} \end{array}\right] = \left[ \begin{array}{cc} H_{11}&{} H_{12} \\ H_{21}&{} H_{22} \end{array}\right] \end{aligned}
(19)

For a spatial model of order one LeSage and Pace (2009) already demonstrate that three of the four blocks of the Hessian matrix have very simple analytical solutions. The developments in Appendix 1 show that their argument carries over to the case of a model with multiple weight matrices. The closed form solution of these blocks is given by

\begin{aligned} H^{\prime }_{12}=H_{21}= \begin{bmatrix} -\frac{Z'L}{\sigma ^2} \\ \frac{\delta '(Z'L) - \tau (\rho )'(y_{\bullet }'L) }{\sigma ^4}\end{bmatrix} \qquad H_{22} =\left[ \begin{array}{ccc} -{\frac{Z'Z}{\sigma ^2}} &{} 0 \\ 0 &{} -{\frac{N}{2\sigma ^4}} \end{array}\right] , \end{aligned}
(20)

where the matrix $$L = \begin{pmatrix}W_dy&W_oy&W_wy \end{pmatrix}'$$ groups the three spatial lags of the flow vector. An analytical estimate of $$H_{22}$$ can be derived from the $$Z'Z$$ moment and the MLE of $$\sigma ^2$$. To estimate $$H_{12}$$, we only need the estimated parameters and the matrices $$Z'L$$ and $$y_{\bullet }'L$$ corresponding to the last three columns of $$Z'y_{\bullet }$$ and $$TSS_{\bullet }$$ that have already been used for the reformulated RSS term in (14). Computing the exact solution of $$H_{11}$$ is infeasible in large sample applications because it depends on $$A^{-1}$$ the inverse of the spatial filter matrix. For this reason, LeSage and Pace (2009) propose to use an approximation of $$H_{11}$$, which exploits a link between $$H_{{\mathcal {L}}^{c}}$$ the Hessian of the concentrated likelihood and the blocks of the full Hessian in (19) (for details see Davidson and MacKinnon 1993, page 269).

\begin{aligned} {{H}}_{11} = {H}_{{\mathcal {L}}^{c}} + {H}_{21}' ({H}_{22})^{-1} {H}_{21} \end{aligned}
(21)

If we insert the analytical solutions of $${H}_{21}$$ and $${H}_{22}$$ given in (20) and a numerical estimate of $$H_{{\mathcal {L}}^{c}}$$ into the right-hand side of equation (21), we obtain the mixed numerical analytical estimator of $${{H}}_{11}$$. LeSage and Pace (2009) already point out that Newtonian methods for optimization are based numerical approximations of $$H_{{\mathcal {L}}^{c}}$$. Hence, if we choose such an optimization method all quantities required to compute the mixed numerical analytical Hessian approximation are available as a by-product of the parameter estimation step.

### The Bayesian MCMC estimator

A Bayesian estimator of spatial econometric models is already mentioned by Anselin (1988), but due to its prohibitively difficult estimation, it took further progress in computer technology and theory until it became a practical alternative to the MLE. LeSage (1997) was among the first to use a variant of the Metropolis-Hastings (M-H) algorithm to estimate a Bayesian spatial econometric model. LeSage and Pace (2009) further develop this idea and estimate a spatial econometric interaction model using Bayesian MCMC methodology. In a more recent article, Fischer and LeSage (2020) develop an extension of the spatial econometric interaction model to panel data and estimate it by Bayesian MCMC. Similar to Debarsy and LeSage (2018), they use block sampling in conjunction with an expression of the marginal likelihood to generate draws for the autoregression parameters. This article follows more closely the methodology of LeSage and Pace (2009) because we want to show that is possible to exploit the efficiency gains of the restructured RSS term also when using the full conditional likelihood instead of the marginal likelihood, and because we prefer to use sequential rather than block sampling.

Block sampling is often advocated on the grounds that it reduces the serial correlation between draws of the parameters. It also decreases the computational cost of each iteration of the sampler as it uses a single M-H step instead of one for each estimated parameter. However, block sampling suffers from low acceptance rates and requires more iterations to produce the same number of effective draws. Although the hybrid approach of Debarsy and LeSage (2018) reduces this problem, it increases the complexity of the sampling procedure (for implementation details see also Debarsy and LeSage 2020). In any case, we can express all the steps needed for both the sequential and block sampling approaches in terms of low-dimensional matrices, which means that the choice for either of the two is relatively unimportant for the efficiency of the estimator when used for large data sets. The concern that high serial correlation might affect the convergence of the estimator also appears to be unfounded, as shown by the results in Sect. 6 and the trace plots in Appendix 1.

#### The Bayesian model

Estimating a Bayesian model corresponds to finding a posterior distribution of the parameters that is informed by the data $${\mathcal {D}} = (y , Z)$$. Up to a proportionality constant, we derive this posterior from the conditional distribution of the data and the unconditional distribution of the parameters.

\begin{aligned} p(\rho , \delta , \sigma ^2 | {\mathcal {D}}) \propto p({\mathcal {D}}|\rho , \delta , \sigma ^2) \cdot p(\rho , \delta , \sigma ^2) \end{aligned}
(22)

The conditional distribution of the data coincides with the likelihood function used for the MLE, because it is derived from the same distributional assumption on the error term $$\varepsilon \sim N (0, \sigma ^2 I_N )$$.

\begin{aligned} p({\mathcal {D}}| \rho , \delta , \sigma ^2) = (\frac{1}{2\pi \sigma ^2})^{(N/2)} \exp \{- \frac{1}{2\sigma ^2} (Ay - Z\delta )'(Ay - Z\delta ) \}|A| \end{aligned}
(23)

The second density on the right hand side of (22) is defined by the prior assumptions on the model parameters. For these priors we follow LeSage and Pace (2009), who use a uniform distribution for $$\rho$$ and a normal-inverse gamma prior for $$\delta$$ and $$\sigma ^2$$. This prior is conjugate with respect to the Gaussian distribution of the errors and implies that the inverse of $$\sigma ^2$$ follows a gamma distribution and that $$\delta$$ follows a normal distribution if $$\sigma ^2$$ is held constant. We define the uniform prior for $$\rho$$ on the parameter space $$\Theta \subset {\mathbb {R}}^3$$ whose bounds are chosen to ensure that constraint (III) in (12) is satisfied. The resulting density is expressed using the indicator function $$\mathbbm {1}_{\{x\}}$$, which is one if the event x is true and zero otherwise. LeSage and Pace (2009) note that these assumptions imply prior independence between $$\rho$$ and the other parameters, which we should consider as an hypothesis that the estimation results will confirm or reject.

\begin{aligned} \delta | \sigma ^{2} \sim {\mathcal {N}}(c, T \sigma ^{2}) \implies&\pi \left( \delta | \sigma ^{2}\right) =(\frac{1}{2\pi \sigma ^2})^{(K/2)} |T|^{(-1/2)} \exp \{-\frac{1}{2\sigma ^2}(\delta -c)'T^{-1}(\delta - c)\} \\ \sigma ^{2} \sim \ I G(a, b) \implies&\pi (\sigma ^{2}) =\frac{b^{a}}{\Gamma (a)}(\frac{1}{\sigma ^2})^{(a+1)}\exp \{-b / \sigma ^{2}\} \\ \rho \sim U(\Theta ) \implies&\pi (\rho ) \propto\mathbbm {1}_{\{ \rho \in \Theta \}} \end{aligned}
(24)

The parameter values are set to zero for ab and c, and $$T^{-1} \approx {\mathbf {0}}$$ is approximated by a zero matrix. This parametrization ensures that all prior distributions are uninformative and allows to derive the following simplified expression of the posterior density.

\begin{aligned} p(\delta , \sigma ^2, \rho | {\mathcal {D}}) \propto \sigma ^{(-N-1)} \exp \{-\frac{1}{2 \sigma ^{2}}(A y-Z \delta )^{\prime }(A y-Z \delta )\} |A| \mathbbm {1}_{\{ \rho \in \Theta \}} \end{aligned}
(25)

The main problem of the classical Bayesian estimator for the spatial econometric interaction model is that the posterior in (25) is analytically not tractable. Therefore, in order to find to find the posterior expectation or variance of the parameters, we would have to use numerical integration techniques which have prohibitive computational cost.

#### The Bayesian MCMC estimator

To avoid the computational bottleneck of the classical Bayesian estimator LeSage and Pace (2009) estimate the model defined by equations (22), (23) and (24) using MCMC methodology. Their procedure splits the joint posterior in (25) into three conditional distributions and combines Gibbs-sampling for the parameters $$\delta$$ and $$\sigma ^2$$ with Metropolis-Hastings sampling for the autoregression parameters. In the following, we show that each step of this sampling procedure can be expressed in terms of low-dimensional moments, that are calculated only once to the initiate the sampler. To distinguish these moments from quantities that are updated throughout the sampling procedure we make extensive use of the indexation (i) for MCMC iterations.

#### Gibbs-sampling for the parameters $$\delta$$

LeSage and Pace (2009) show that the conditional distribution $$p(\delta |\rho ,\sigma ^2,{\mathcal {D}})$$ of $$\delta$$ simplifies to a normal distribution $$\delta \sim {\mathcal {N}}(c^{*},T^{*}\sigma ^2)$$, where $$T^{*} = (Z'Z+T^{-1})^{-1}$$ and $$c^{*} = (Z'Z+T^{-1})^{-1}(Z'A y+T^{-1} c)$$. If we recall the prior parameter values $$c = 0$$ and $$T^{-1} \approx {\mathbf {0}}$$ and define $$\tau (\rho _{(i-1)}) = ( 1 \; -\rho _{d(i-1)} \; -\rho _{o(i-1)} \; -\rho _{w(i-1)})$$, we can easily draw the updated value of $$\delta$$.

\begin{aligned} \delta _{(i)} = (Z'Z)^{-1}(Z'y_{\bullet }) \tau (\rho _{(i-1)}) + \epsilon _{(i)} \qquad \text { with } \qquad \epsilon _{(i)} \sim {\mathcal {N}}(0,(Z'Z)^{-1}\sigma ^2_{(i-1)}) \end{aligned}
(26)

#### Gibbs-sampling for the variance parameter $$\sigma ^2$$

The conditional distribution $$p(\sigma ^2|\rho ,\delta ,{\mathcal {D}})$$ of $$\sigma ^2$$ simplifies to an inverse gamma $$\sigma ^2 \sim IG(a^{*},b^{*})$$, where $$a^{*} = a + \frac{N}{2}$$ and $$b^{*} = b+ \frac{1}{2} (Ay- Z\delta )'(Ay- Z\delta )$$. To draw the updated parameter value $$\sigma ^2_{(i)}$$, we require $$b^{*}_{(i)}$$, which we derive by inserting $$\delta _{(i)}$$ into the RSS decomposition (14).

\begin{aligned} \sigma ^2_{(i)} \sim IG(\frac{N}{2},b^{*}_{(i)}) \qquad \text { with } \qquad b^{*}_{(i)} = \frac{1}{2} \tau (\rho _{(i-1)})' RSS_\bullet \tau (\rho _{(i-1)}) + \frac{1}{2}\epsilon _{(i)}' (Z'Z) \epsilon _{(i)} \end{aligned}
(27)

#### Metropolis-Hastings-sampling for the autoregressive parameters $$\rho$$

Sampling the autoregressive parameters is more complicated than the previous two steps because the conditional posterior distribution of $$\rho$$ is unknown. LeSage and Pace (2009) use an M-H sampler that incorporates the tuned random walk procedure of Holloway et al. (2002) to draw candidate values for the autoregression parameters from a Gaussian proposal distribution. The variance of this proposal distribution is monitored and adjusted to ensure that the candidates are accepted with a probability between $$40\%$$ and $$60\%$$, making the estimator flexible enough to deal with varying sample sizes. At this stage, it is easy to enforce that the autoregression parameter lie in the feasible space, as we can simply discard and re-sample candidates that do not fulfill Constraint (III) in Table 1.

Given a valid candidate, we calculate the probability of accepting it as an actual draw from the conditional posterior distribution. This calculation is presented below only for the draws of $$\rho _d$$, since the steps to update $$\rho _o$$ and $$\rho _w$$ are nearly identical. While the sampling order is trivial, it is important to consider the updated values for all parameters when updating the next one. Below, we define the probability to accept the candidate $$\rho ^c_{d(i)}$$ as the updated parameter value.

\begin{aligned} \begin{aligned} P(\rho _{d(i)}&= \rho ^c_{d(i)}) = \min \left\{ 1,LR(\rho ^c_{d(i)}) \right\} \\~\\ LR(\rho ^c_{d(i)})&= \frac{p(\rho ^c_{d(i)}| \delta _{(i)}, \sigma _{(i)}^2,\rho _{o(i-1)},\rho _{w(i-1)},{\mathcal {D}})}{p(\rho _{d(i-1)}| \delta _{(i)}, \sigma _{(i)}^2,\rho _{o(i-1)},\rho _{w(i-1)},{\mathcal {D}})} \end{aligned} \end{aligned}
(28)

Most of the computational cost of the sampling procedure is caused by the likelihood ratio in (28), and it is more convenient to evaluate its logarithm. This requires to compute the difference between two RSS terms, denoted $$\Delta RSS(\rho ^c_{d(i)})$$, and two log-determinant terms

\begin{aligned} \begin{aligned} \ln LR(\rho ^c_{d(i)})&= \ln |A(\rho ^c_{d(i)},\rho _{o(i-1)},\rho _{w(i-1)})| - \ln |A(\rho _{(i-1)})| - \frac{\Delta RSS(\rho ^c_{d(i)})}{2 \sigma ^{2}_{(i)}}, \end{aligned} \end{aligned}
(29)

where $$A(\rho ) = A(\rho _d, \rho _o, \rho _w) = I_N - \rho _dW_d - \rho _o W_o-\rho _w W_w$$. We can compute the difference in the RSS terms almost instantaneously using

\begin{aligned} \Delta RSS(\rho ^c_{d(i)}) = \tau ^{c\prime }_{d(i)} RSS_{\bullet } \tau ^c_{d(i)} - \tau (\rho _{(i-1)})' RSS_{\bullet } \tau (\rho _{(i-1)}), \end{aligned}
(30)

where $$\tau ^c_{d(i)} = ( 1 \; -\rho ^c_{d(i)} \; -\rho _{o(i-1)} \; -\rho _{w(i-1)})$$. Expression (30) is derived from the reformulation of the RSS term, as already shown for the computation of $$b^{*}_{(i)}$$ in (27). We should further note that one of the log-determinants in (29) is always available from the previous parameter update, which means that the cost of updating one parameter is not much higher than the cost of a single log-determinant approximation.

### The S2SLS estimator

The spatial two-stage least-squares estimator is essentially an instrumental variable estimation that treats the spatial lags of the dependent variables as endogenous regressors. This framing of the estimation problem is already developed by Anselin (1988), but the S2SLS became operational only after Kelejian and Prucha (1998) demonstrate that suitable instrumental variables can be derived from higher order spatial lags of the exogenous covariates. The authors prove consistency and asymptotic normality of the estimator and subsequently extend their methodology to more complex models (see for example Kelejian and Prucha 2004, 2010). Badinger and Egger (2011) extend the S2SLS estimator to spatial models with multiple weight matrices and Tamesue and Tsutsumi (2016) use this extension of the S2SLS to estimate a model of migration flows in Japan. As the authors focus on the issue of missing values for intra-regional migration, they do not discuss potential multicollinearity problems of the instrumental variables. Such problems arise systematically when the S2SLS estimator is used for a spatial interaction models, and the following paragraphs show how they can be avoided. After defining the spatial instruments, we adapt the two stages of the estimation procedure to exploit the efficiency gains provided by the matrix formulation.

#### Instrumental variables for interaction models

When estimating a spatial interaction model by S2SLS, we consider the spatial lags of the dependent variable $$L = (W_dy \; W_oy \; W_wy)$$ as endogenous regressors in an ordinary linear model, leading to the following reformulation of model (6).

\begin{aligned} y = L\rho + Z\delta + \varepsilon \end{aligned}
(31)

We can solve the endogenity issue by using instrumental variables for the endogenous regressors. Kelejian and Prucha (1998) show that suitable instruments for this type of model can be easily found among higher order spatial lags of the covariates in the matrix Z. While the notion of higher order spatial lags is straightforward for models with a single weight matrix, it becomes somewhat more complicated in higher order models. Badinger and Egger (2011) explain how to generate such lags for models with an arbitrary number of spatial weight matrices. For the model in (31) we derive the required lag-operators from interactions of the terms

\begin{aligned} {\mathbf {I}}_{N}^{\omega _1} \times W_d^{\omega _2} \times W_o^{\omega _3} \times W_w^{\omega _4}, \end{aligned}
(32)

where the powers associated with the weight matrices, as well as their sum, should not exceed the maximal order of lags m; $$\omega _t = 0,1, ...,m$$ (for $$t = 1,2,3,4$$) and $$\sum _{t = 1}^4 \omega _t \le m$$. For their model of Japanese migration flows Tamesue and Tsutsumi (2016) set $$m = 2$$, which is the common choice for the lags-order used to generate instruments for the S2SLS estimator. They rightfully note that many of the lag operators generated by the interactions in (32) are redundant and reduce them to the set of unique lag operators. For $$m = 2$$, we denote this set $$W_U^{2}$$.

\begin{aligned} W_U^{2} = \begin{Bmatrix} {\mathbf {I}}_{{N}}&{W_d}&{W_o}&{W_d}^2&{W_o}^2&{W_w}&{W_w}^2&{W_d}{W_w}&{W_o}{W_w}\ \end{Bmatrix} \end{aligned}
(33)

However, even if we only use this reduced set of lag operators, many of the instruments that we derive from Z will be duplicated, leading to collinearity problems during the estimation. For this reason, we associate specific lag operators in $$W_U^{2}$$ to each of the six sets of variables contained in the matrix Z (see equations (5)). The resulting instrumental variables are grouped into four sets $$U = \begin{pmatrix}U_{\alpha }&U_{\alpha _I}&U_{\beta }&U_{\gamma } \end{pmatrix}$$, which are described in the sequel.

The first set of instruments $$U_{\alpha }$$ corresponds to the global constant. Since constants are invariant to multiplication by row stochastic matrices, we derive no instruments from this variable.

\begin{aligned} U_{\alpha } = \iota _{{n}} \otimes \iota _{{n}} = \iota _{{N}} \end{aligned}
(34)

The instruments in $$U_{\alpha _I}$$ are derived from the intra-regional constant $$\iota _{{I}} = {{\,\mathrm{VEC}\,}}({\mathbf {I}}_{{n}})$$ which is multiplied by all the lag operators in $$W_U^2$$.

\begin{aligned} U_{\alpha _I} = \begin{Bmatrix} {\mathbf {I}}_{{N}}&{W_d}&{W_o}&{W_d}^2&{W_o}^2&{W_w}&{W_w}^2&{W_d}{W_w}&{W_o}{W_w}\ \end{Bmatrix} \times \iota _{{I}} \end{aligned}
(35)

The resulting variables can be derived from the site neighborhood matrix $$W (n \times n)$$ and do not require to actually construct any of the $$N \times N$$ lag operators contained in $$W_U^2$$.

\begin{aligned} \begin{aligned} {\mathbf {I}}_{{N}}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({\mathbf {I}}_{n}) \\ {W_d}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W}) \\ {W_o}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W}') \end{aligned} \qquad \qquad \begin{aligned} {W_d}^2\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W^2}) \\ {W_o}^2 \iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W^{2\prime }}) \\ {W_w}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W}{W}') \end{aligned} \qquad \qquad \begin{aligned} {W_w}^2\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W^2}{W^{2\prime }}) \\ {W_d}{W_w}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W^2}{W}') \\ {W_o}{W_w}\iota _{{I}}&= {{\,\mathrm{VEC}\,}}({W}{W^{2\prime }}) \end{aligned} \end{aligned}

The matrix $$U_{\beta }$$ contains all instruments derived from the origin and destination attributes. When constructing these instruments, we have to pay attention to two mechanisms which, if we ignore them, create duplicated columns in the instrument matrix. One of these mechanisms results from the row-stochastic property of the W matrix, a property that makes it unnecessary to apply the origin lag operator $$W_o$$ to the destination attributes $$\tilde{X_d}$$ (as $$W_o^t\tilde{X_d} =\tilde{X_d}$$ for $$t = 0,1,2,...$$). The same argument holds when applying powers of the destination lag operator $$W_d$$ to the origin attributes $$\tilde{X_o}$$. For the same reason, it is also superfluous to apply the origin-to-destination lag operator $$W_w$$ to the site attributes (as $$W_w^t\tilde{X_d} =W_d^t\tilde{X_d}$$ and $$W_w^t\tilde{X_o} =W_o^t\tilde{X_o}$$ for $$t = 0,1,2,...$$). The second mechanism causes duplication in the SDM specification, where we use a spatial lag of the covariates as additional explanatory variables. In this case, instruments that are generated as $$W_d^t\tilde{X_d}$$ and $$W_o^t\tilde{X_o}$$, for $$t = 1,2,...,m$$, will duplicate the first $$m-1$$ lags.

We can avoid duplication of instrumental variables altogether if we define them according to the equations in (36) below. These instruments are derived from lags of the site attributes (DXOX and IX), where the lag order is incremented by one $$\tilde{m} = m + 1$$ if the variables are used according to the SDM specification. Given the lagged site attributes we use Kronecker products to derive the instruments $$U_{\beta } = (\tilde{\tilde{X_d}} \tilde{\tilde{X_o}} \tilde{\tilde{X_I}})$$.

\begin{aligned} \tilde{\tilde{DX}}&= ({W}^0 DX\quad{{W}^1 DX}\quad{...}\quad{{W}^{\tilde{m}} DX)} \quad \tilde{\tilde{X_d}}= \iota _{{n}} \otimes \tilde{\tilde{DX}} \\ \tilde{\tilde{OX}}&= ({W}^0 OX\quad{{W}^1 OX}\quad{...}\quad{{W}^{\tilde{m}} OX)}\quad \tilde{\tilde{X_o}}= \tilde{\tilde{OX}} \otimes \iota _{{n}} \\ \tilde{\tilde{IX}}&= ({W}^0 IX\quad{{W}^1 IX}\quad{...}\quad{{W}^{\tilde{m}} IX)} \quad \tilde{\tilde{X_I}}= Diag(\iota _{{I}})\iota _{{n}} \otimes \tilde{\tilde{IX}} \end{aligned}
(36)

The fourth set of instruments $$U_{\gamma }$$ is derived from the distance vector g. Applying all the lag operators in (33) to this variable would not lead to duplicated instruments. However, using all these lags is not necessary and increases the computational burden of the whole estimation procedure significantly. For this reason, we will only use powers of the origin-to-destination lag operator $$W_w$$ to derive instruments from g. The choice of $$W_w$$ over $$W_o$$ and $$W_d$$ is intuitively more appealing because attributes such as the distance are only defined for OD pairs, but not for the origins and the destinations. Additionally, if we use $$W_w$$ to derive higher order lags from attributes that have a symmetric matrix form, we preserve this symmetry, which yields computational advantages. We can compute the lags of the distance matrix G as $${\check{G}}= WGW'$$ and $$\check{{\check{G}}}= W{\check{G}}W'$$, and we use these to define the instruments in $$U_{\gamma }$$.

\begin{aligned} U_{\gamma } = \begin{Bmatrix} {\mathbf {I}}_{{N}}&{W_w}&{W_w}^2 \end{Bmatrix} \times g = \begin{pmatrix} {{\,\mathrm{VEC}\,}}(G)&{{\,\mathrm{VEC}\,}}({\check{G}})&{{\,\mathrm{VEC}\,}}(\check{{\check{G}}}) \end{pmatrix} \end{aligned}
(37)

#### Moment based S2SLS estimation

Given the instrument matrix U, we can estimate the parameters of model (31) by S2SLS. The first step of the estimation procedure corresponds to a projection of the endogenous regressors onto the space spanned by the columns of the instrument matrix.

\begin{aligned} {\hat{L}} = U(U'U)^{-1}U'L \end{aligned}
(38)

During the second stage, we use $${\hat{L}}$$ the fitted values of the first stage as additional covariates and estimate the model parameters $$\mu ' = \begin{pmatrix} \rho '&\delta ' \end{pmatrix}$$ by an OLS regression of y on $${\hat{Z}} = \begin{pmatrix} {\hat{L}}&Z \end{pmatrix}$$.

\begin{aligned} {\hat{\mu }} = ({\hat{Z}}'{\hat{Z}})^{-1} {\hat{Z}}'y \end{aligned}
(39)

Although the above two stages are simple to understand and implement, they involve computations with the matrices U, Z and $${\hat{L}}$$, that are large for applications with many OD pairs. In the following, we adjust the estimation procedure to avoid the construction of these matrices.

The second stage equation (39) shows that we can estimate the model based on the two moments $${\hat{Z}}'{\hat{Z}}$$ and $${\hat{Z}}'y$$, which are composed of the five blocks shown below.

\begin{aligned} {\hat{Z}}'{\hat{Z}} = \begin{pmatrix} {\hat{L}}'{\hat{L}} &{} {\hat{L}}'Z \\ \bullet &{} Z'Z \end{pmatrix} \qquad \qquad {\hat{Z}}'y = \begin{pmatrix} {\hat{L}}'y \\ Z'y \end{pmatrix} \end{aligned}
(40)

Two of the five blocks in (40) coincide with the moment matrices $$Z'Z$$ and $$Z'y$$, defined by (7) and (8). The first block of the $${\hat{Z}}'{\hat{Z}}$$ moment corresponds to the inner product of the fitted values of the first stage in (38). We can derive this inner product directly from the two moments $$U'U$$ and $$U'L$$ without having to compute the fitted values $${\hat{L}}$$ $$(N \times 3)$$ explicitly.

\begin{aligned} {\hat{L}}'{\hat{L}} = (U'L)'(U'U)^{-1}(U'L) \end{aligned}
(41)

To compute the $${\hat{L}}'Z$$ block in the $${\hat{Z}}'{\hat{Z}}$$ moment we also do not require the fitted values of the first stage. In fact, we can omit the projection $$P_U = U(U'U)^{-1}U'$$ entirely because the matrix Z represents a subset of the columns of the matrix U.

\begin{aligned} {\hat{L}}'Z = L'P_UZ = L'Z \end{aligned}
(42)

The last block required to define the second stage moments in (40) is $${\hat{L}}'y$$, which we can also formulate in terms of moment matrices.

\begin{aligned} {\hat{L}}'y = (U'L)'(U'U)^{-1}(U'y) \end{aligned}
(43)

Equations (43), (42), and (41) show that it is possible to derive the matrices required for the second stage of the estimation procedure directly from the three moments $$U'U, U'L$$ and $$U'y$$. Since $$Z'Z$$ and $$Z'y$$ are subsets of the columns and rows of these three moments, they do not add to the computational burden of the estimation procedure. Therefore, to make the S2SLS estimation as efficient as possible, we need to derive the three moments $$U'U$$, $$U'L$$, and $$U'y$$ from the matrix formulation of the model. The columns of $$U'L$$ and $$U'y$$ have identical structures, and we can use definition (8) of $$y^{(t)}$$ $$(t = 1,2,3,4)$$ to compute them efficiently. The block structure of the required moments is derived from the four instrument groups $$U_{\alpha }$$, $$U_{\alpha _I}$$, $$U_\beta$$ and $$U_\gamma$$.

\begin{aligned} U'U = \begin{pmatrix} U_{\alpha \alpha } &{} U_{\alpha \alpha _I} &{} U_{\alpha \beta } &{} U_{\alpha \gamma } \\ \bullet &{} U_{\alpha _I\alpha _I} &{} U_{\alpha _I\beta } &{} U_{\alpha _I\gamma } \\ \bullet &{} \bullet &{} U_{\beta \beta } &{} U_{\beta \gamma } \\ \bullet &{} \bullet &{} \bullet &{} U_{\gamma \gamma } \end{pmatrix} \qquad \qquad U'y^{(t)} = \begin{pmatrix} U_{\alpha }'y^{(t)} \\ U_{\alpha _I}'y^{(t)} \\ U_\beta 'y^{(t)} \\ U_\gamma 'y^{(t)} \end{pmatrix} \end{aligned}
(44)

Detailed calculations for all 14 blocks of the above moments in can be found in the Appendix 1. Given the moments in (44) we can easily define the matrices required for the second stage in (39) and estimate the parameters.

The variance-covariance matrix of the parameters can be computed as $$\widehat{Var({\hat{\mu }})} = {\hat{\sigma }}^2({\hat{Z}}'{\hat{Z}})^{-1}$$, which requires $${\hat{\sigma }}^2$$ a consistent estimator of the variance of the errors. To estimate this variance we use the inner product of the residuals $${\hat{\varepsilon }} = y - {\tilde{Z}} {\hat{\mu }}$$, where $${\tilde{Z}} = (L \; Z)$$ contains the original lags of the flows and not their fitted values.

\begin{aligned} \begin{aligned} {\hat{\sigma }}^2&= \frac{1}{N}(y'y - 2({\tilde{Z}}'y)'{\hat{\mu }} + {\hat{\mu }}'({\tilde{Z}}'{\tilde{Z}}){\hat{\mu }} \end{aligned} \end{aligned}
(45)

We obtain the moments $${\tilde{Z}}'y$$ and $${\tilde{Z}}'{\tilde{Z}}$$ in the above expressions by replacing $${\hat{L}}$$ with L in (40), which only changes two of the five existing blocks. The new elements $$L'L$$ and $$L'y$$, as well as $$y'y$$, correspond to subsets of the rows and columns of the $$TSS_\bullet$$ matrix that is defined in (14).

## A simulation study of computational efficiency

We evaluate the performance of the estimators presented in the previous section based on simulated flow data between the IRIS regions of France. These regions are created by the French Institutes of Statistics and Economic Studies (INSEE) and Geographic and Forest Information (IGN) and divide metropolitan France into 48590 territorial units. They are often used in practice because they are more homogeneous in terms of population than other administrative areas. For each IRIS we create two region attributes and for each OD pair a distance vector and a flow vector. Since we are only interested in the run time, all the data is generated randomly. We vary the sample size by including a growing number of IRIS around a center point, going up to a maximum of 10000 regions and 100 Million OD pairs. Figure 1 illustrates this procedure.

In addition to the MLE, Bayesian MCMC and S2SLS estimators of Sect. 4, we use two implementations of the OLS estimator. One of them is based on a matrix formulation and the other one on a vectorized formulation of the model. For each of the five estimators, we estimate three different models that include an increasing number of variables. The most comprehensive model corresponds to an SDM that uses the additional parameters for intra-regional observations, as well as spatial lags of all the site attributes

\begin{aligned} \begin{aligned} y_{SDM}&= \rho _d W_dy + \rho _o W_dy + \rho _w W_wy \\&\quad + \alpha \iota _N + \alpha _I \iota _I + X_d\beta _d + X_d^L\beta _d^L + X_o\beta _o + X_o^L\beta _o^L + X_I\beta _I + X_I^L\beta _I^L +\gamma g, \end{aligned} \end{aligned}
(46)

where the superscript L indicates parameters and variables linked to the spatial lags of a variable. In total, model (46), includes 18 parameters (as all $$\beta$$-parameters have two entries). When model (46) is estimated by OLS, we omit the three autoregression parameters, resulting in an SLX (spatial lag of X) model. The second model is also an SDM, but it does not use the additional intra-regional parameters. Removing all spatial lags of the origin and destination attributes leads to the smallest model that matches a LAG specification. For the OLS estimators, the smallest model is actually an ordinary linear model (OLM).

Figure 2 shows the execution timeFootnote 5 of the five estimators under the different model specifications and increasing sample sizes. To make the differences in smaller samples more visible, the time axis is square-root transformed. Hence, the linear slopes indicate that time grows proportionally to $$n^2$$ the number of OD pairs. Not surprisingly, OLS estimation in matrix form is the fastest method for all model specifications. Comparing MLE and Bayesian MCMC estimation, we see that they scale almost equally well to large samples, and the maximum estimation time of these two methods is between two and three minutes for all model specifications. However, for smaller samples, Bayesian MCMC appears to be the slower method. This behavior is explained by the 5500 iterations of the MCMC sampling procedure, that is relatively heavy compared to the three-dimensional optimization used for the MLE. The S2SLS performs almost as well as the MLE and Bayesian MCMC for the two smaller models, but for the most extensive one it scales significantly worse to large samples. This is a striking contrast to the benchmark results of Arbia et al. (2019), who compared estimators of standard spatial models and found that the S2SLS is the best performing method. In our case, the computational cost of S2SLS increases sharply when we include the intra-regional model. In particular, the intra-regional constant requires many additional instruments that do not have a Kronecker product form and therefore do not benefit as much from the matrix formulation.

## Applied example of commuting flows

We illustrate the three estimation methods and the different parameter space constraints based on an example of home-to-work commuting flows. Our example includes 5041 OD pairs derived from the 71 municipalities that are located closest the center of Paris. The data used for this example is available in the spflow package, and combines various datasets originally disseminated by the INSEE and the IGN.

Figure 3 illustrates the spatial distribution of the population, the number of companies and the median income in these municipalities. The graphic clearly shows that all three variables are spatially correlated, suggesting that the independence assumption of the traditional gravity model is inadequate for our data.

We estimate three different models that explain the commuting flows between the municipalities using the pairwise distances and also include a subset of the variables shown in Fig. 3 as origin, destination, and intra-regional characteristics. All variables are log-transformed, and to avoid logarithms of zero we add 1 to the flows and the distances before the transformation. Furthermore, since zero is not a realistic value for the median income, we center it, which gives more meaning to the constant terms. For spatial models, we suppose that the neighborhood structure is defined by the contiguity of the municipalities. The most exhaustive model corresponds to an SDM that includes the three spatial autoregressive components shown in (6) and some spatial lags of the exogenous variables. This SDM is estimated by the three estimation methods of Sect. 4. Additionally, we estimate an SLX model that uses the same covariates but does not account for spatial autocorrelation in the flows. The third model corresponds to the traditional gravity model without spatially lagged variables and will be referred to as an OLM.

The resulting parameter estimates and their standard errors are presented in Table 2. The first three rows show the results for the autoregressive parameters of the SDM. The parameter $$\rho _d$$ of destination dependence is significant and positive for MLE and Bayesian MCMC, but insignificant for S2SLS estimation. All three estimators of the SDM model report a significant and positive origin dependence parameter $$\rho _o$$ and an insignificant origin-to-destination dependence parameter $$\rho _w$$. For the remaining parameters, the MLE and the Bayesian MCMC estimation of the spatial Durbin model lead to almost identical results. The S2SLS estimates of the same parameters have similar magnitudes and the same sign, but the associated standard errors tend to be higher. From the OLS estimation of the OLM and SLX models, we obtain parameters that have the same sign but quite different magnitudes, which is not surprising since the SDM captures much of the variation in the flows through spatial autoregression. To compare the goodness-of-fit of the five estimations, we use the $$R^2_{\text {corr}}$$ measure, that is computed as the squared correlation between the observed flows and their fitted values.Footnote 6For the two models that are estimated by OLS this measure coincides with the usual coefficient of determination ($$R^2$$) and reveals that the OLM explains 80.4% and the SLX 82.9% of the variation in the data. For all three estimations of the SDM the $$R^2_{\text {corr}}$$ is 91.9%. In terms of the computation time,Footnote 7 the OLS is the fastest methods, but we see that in small samples the performance MLE is comparable to the one of the OLS.

Figure 4 shows a matrix of bi-variate scatter plots of the residuals against their spatial lags. The rows correspond to the five estimations, while the columns represent the three neighborhood matrices that are used to compute the lags $$W_d\varepsilon$$, $$W_o\varepsilon$$ and $$W_w\varepsilon$$. The slopes of the regression lines in the first three rows of Fig. 4 indicate that the residuals of the SDM are not spatially correlated for all estimators presented in Sect. 4. However, for the OLM and SLX models there is noticeable positive spatial autocorrelation for all three definitions of the spatial neighborhood (see rows four and five of Fig. 4). The strongest spatial dependence is observed for the origin neighborhood displayed in the second column of Fig. 4. Destination and origin-to-destination dependence in the residuals of the SLX and the OLM are weaker magnitude but still clearly different from zero.

Turning to the question whether the estimated models are coherent, we may first note that the absolute values of the autoregression parameters sum to less than one for all estimators. Thus, constraint (IV) suffices to conclude that all estimations yield a coherent model. However, in another model or a different application, it may well be that the parameters are within the feasible parameter space but outside the bounds of constraint (IV).

Figure 5 illustrates the volume of the parameter spaces implied by the contiguity neighborhood matrix for the municipalities and the constraints (II) to (V) in Table 1. The graphic shows a projection of the three-dimensional spheres onto the plane created by $$\rho _d$$ and $$\rho _o$$, with the height in the third dimension ($$\rho _w$$) represented by darker colors. Normalizing the volumes of these spheres with respect to the smallest one shows that space of feasible solutions increases by a factor of 2.4 if we choose (III), and by a factor of 7.25 if we choose (II) instead of (IV). For constraint (V), we obtain a parameter space four times larger than for (IV), but we can also see that only a part of this space overlaps with the space (II). This is problematic because space (II) corresponds to the largest compact space of feasible parameters around the origin, and parameter values at the edge of this space lead to an incoherent model (Fig. 6).

## Conclusion

This article presents improved calculation techniques for three estimators of spatial econometric interaction models and develops an efficient method to test whether the estimated parameters belong to the feasible parameter space. We show that the MLE, Bayesian MCMC, and S2SLS estimators can be derived exclusively from moment matrices with small dimensions, which makes the estimation time, once the moments are calculated, independent of the sample size. To compute these moment matrices, we exploit the efficiency gains of the matrix formulation, which reduces the number of operations from $${\mathcal {O}}(N)$$ to $${\mathcal {O}}(\sqrt{N})$$ for most of their elements. Our benchmarks confirm that the spatial econometric interaction model scales very well to large samples, requiring only couple of minutes to estimate models that include 100 million flows.

The results of our empirical application suggest that an SDM specification of the spatial interaction model is superior to both the traditional gravity model and the SLX model in the context of commuting flows. Regardless of whether the SDM is estimated by MLE, Bayesian MCMC, or S2SLS, we obtain significant autoregression parameters and a better fit to the data than with the non-spatial or the SLX model. Moreover, the SDM is the only model that results in spatially uncorrelated regression residuals, demonstrating that the other specifications are inconsistent with the independent error assumption used to define the models in the first place.

Much of the methodology developed in this article is not specific to spatial econometric interaction models. In fact, the moment based formulation of the estimators can be used for any higher-order SDM with Gaussian errors, and extensions to models with spatially correlated errors seem possible too. Also, the method to avoid duplicated instrumental variables for the S2SLS estimator of SDM is not specific to interaction models. The efficiency gains linked to the matrix formulation are not so easy to generalize because they arise from the OD structure of the data. However, some other problems, such as space-time panel models, have a similar structure. Hence, there is reason to be optimistic that many computation techniques presented in this article can also be fruitful to reduce the estimation costs of spatial econometric models in much broader contexts than OD flows.

## Data Availability

The data is available in the R package spflow that is published on CRAN (https://cran.r-project.org/package=spflow).

## Code Availability

Most of the software used for model estimation is contained in the spflow package. Additional code linked to the example application is available on request.

1. 1.

2. 2.

In fact, Kelejian and Robinson (1995) discuss a model with a single weight matrix and spatial autocorrelation in the error term. However, most of their considerations are also relevant to higher order SDM and LAG models, since they have a very similar expression of the variance-covariance matrix of the dependent variable.

3. 3.

It is probably possible to generalize the developments related to the definition of the feasible parameter space of model (6) to the case where W has large complex eigenvalues, but this enterprise is not the purpose of the present article. In fact, many neighborhood matrices considered in practice fulfill the requirement that none of its complex eigenvalues exceed $$\min (|\lambda _{max}(W)| , |\lambda _{min}(W)|)$$ in magnitude. For example, when the neighborhood matrix is derived from contiguity or from distances it is similar to a symmetric matrix which implies that all its eigenvalues are real. Furthermore, when the neighborhood matrix W is row normalized its largest eigenvalue is real and equal to one, in which case we only have to verify that none of its complex eigenvalues exceeds $$\lambda _{min}(W)$$ in magnitude.

4. 4.

Efficient algorithms that allow to extract the smallest or largest eigenvalues of large matrices are available in many software packages. Some examples are the https://www.mathworks.com/help/matlab/ref/eigs.html#description or the https://spectralib.org/ C++ library Spectra which offers an interface to the R software.

5. 5.

All benchmarks were run on a 2.20 GHz Intel Xeon Gold 5120 CPU without parallel execution.

6. 6.

The fitted values are calculated according to the in-sample trend signal (TS) predictor, whose statistical properties are described by Goulard et al. (2017).

7. 7.

The models were estimated on a laptop with 2.50 GHz Intel Core i5-7200U CPU.

## References

1. Anderson J (2011) The gravity model. Annual Rev Econ 1:133–160

2. Anselin L (1988) Spatial econometrics: methods and models. Springer, Netherlands

3. Arbia G, Ghiringhelli C, Mira A (2019) Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be? In: Regional Science and Urban Economics, pp. 67–73

4. Badinger H, Egger P (2011) Estimation of higher-order spatial autoregressive cross-section models with heteroscedastic disturbances. Papers Reg Sci 1:213–235

5. Barry RP, Pace RK (1999) Monte Carlo estimates of the log determinant of large sparse matrices. Linear Alg Appl 1:41–54

6. Bivand R, Hauke J, Kossowski T (2013) Computing the jacobian in gaussian spatial autoregressive models: an illustrated comparison of available methods. Geograp Anal 2:150–179

7. Chun Y, Kim H, Kim C (2012) Modeling interregional commodity flows with incorporating network autocorrelation in spatial interaction models: an application of the US interstate commodity flows. Comput Environ Urb Syst 6:583–591

8. Curry L (1972) A spatial analysis of gravity flows. Reg Stud 2:131–147

9. Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press, UK

10. Debarsy N, LeSage JP (2018) “Flexible dependence modeling using convex combinations of different types of connectivity structures”. In: Regional Science and Urban Economics, pp. 48–68

11. Debarsy N, LeSage JP (2020) Bayesian model averaging for spatial autoregressive models based on convex combinations of different types of connectivity matrices. J Bus Econ Stat 1–12

12. Elhorst JP (2010) Applied Spatial econometrics: raising the bar. Spat Econ Anal 1:9–28

13. Elhorst JP, Lacombe DJ, Piras G (2012) On model specification and parameter space definitions in higher order spatial econometric models. Reg Sci Urb Econ 1:211–220

14. Fischer MM, LeSage JP (2020) Network dependence in multi-indexed data on international trade flows. J Spat Econ 1:4

15. Goulard M, Laurent T, Thomas-Agnan C (2017) About predictions in spatial autoregressive models: optimal and almost optimal strategies. Spat Econ Anal 2–3:304.325

16. Hepple LW (1995) “Bayesian Techniques in Spatial and Network Econometrics: 2. Computational Methods and Algorithms”. In: Environment and Planning A: Economy and Space 4, pp. 615.644

17. Holloway G, Shankar B, Rahmanb S (2002) Bayesian spatial probit estimation: a primer and an application to HYV rice adoption. Agri Econ 3:383.402

18. Kelejian HH, Prucha IR (1998) “A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances”. In: The Journal of Real Estate Finance and Economics 1, pp. 99.121

19. Kelejian HH, Prucha IR (2004) Estimation of simultaneous systems of spatially interrelated cross sectional equations. J Econ 1:27.50

20. Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econ 1:53.67

21. Kelejian HH, Robinson DP (1995) Spatial Correlation: A Suggested Alternative to the Autoregressive Model. In: ed. by L. Anselin and R. J. G. M. Florax. Springer, pp. 75.95

22. Kerkman K, Martens K, Meurs H (2017) A multilevel spatial interaction model of transit flows incorporating spatial and network autocorrelation. J Transp Geogr 155.166

23. Lee L-F (2002) Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models. Econ Theory 2:252.277

24. Lee L-F (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 6:1899.1925

25. Lee M-L, Pace RK (2005) “Spatial Distribution of Retail Sales”. In: The Journal of Real Estate Finance and Economics 1, pp. 53.69

26. LeSage JP (1997) Bayesian estimation of spatial autoregressive models. Int Reg Sci Rev 1–2:113.129

27. LeSage JP (2020) Fast MCMC estimation of multiple W-matrix spatial regression models and Metropolis. Hastings Monte Carlo log-marginal likelihoods. J Geog Syst 1:47.75

28. LeSage JP, Pace RK (2008) Spatial econometric modeling of origin-destination flows. J Reg Sci 5:941.967

29. LeSage JP, Pace RK (2009) Introduction to spatial econometrics. CRC Press, Florida

30. Margaretic P, Thomas-Agnan C, Doucet R (2017) Spatial dependence in (origin-destination) air passenger flows. Papers in Regional Science 2:357.380

31. Martin RJ (1992) Approximations to the determinant term in gaussian maximum likelihood estimation of some spatial models. Communications in statistics—theory and methods 1:189.205

32. Ord K (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 349:120.126

33. Oshan TM (2020) “The spatial structure debate in spatial interaction modeling: 50 years on”. In: Progress in Human Geography, p. 0309132520968134

34. Pace RK, LeSage JP (2004) Chebyshev approximation of log-determinants of spatial weight matrices. Comput Stat Data Anal 2:179.196

35. Porojan A (2001) Trade flows and spatial effects: the gravity model revisited. Open Econ Rev 3:265.280

36. Shao J (2003) Mathematical statistics. Springer, New York

37. Smirnov O, Anselin L (2009) An O(N) parallel method of computing the Log-Jacobian of the variable transformation for models with spatial interaction on a lattice. Comput Stat Data Anal 8:2980.2988

38. Tamesue K, Tsutsumi M (2016) “Dealing with Intraregional Flows in Spatial Econometric Gravity Models”. In: Spatial Econometric Interaction Modelling. Ed. by R. Patuelli and G. Arbia. Springer International Publishing. Chap. 6, pp. 105.119

39. Tiefelsdorf M (2003) Misspecifications in interaction model distance decay relations: a spatial structure effect. J Geograp Syst 1:25–50

40. Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 3:253–269

41. Young EC (1924) The movement of farm population. Cornell University Agricultural Experiment Station, Ithaca

## Acknowledgements

I acknowledge funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010, from the French National Association of Research and Technology (ANRT) and from the market research agency BVA Group.

## Author information

Authors

### Corresponding author

Correspondence to Lukas Dargel.

## Ethics declarations

### Conflict of interest

The authors declared that there is no conflict of interest.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### Appendix A Hessian matrix

This appendix develops the simplified analytical expression of the elements of the Hessian matrix that are presented in Sect. 4.2.2. These expressions are derived from the second order derivatives of the likelihood function given in (16). LeSage and Pace (2009) derive similar expressions for a spatial model of order one. Another treatment of the Hessian matrix for the spatial econometric models can be found in (Anselin 1988, ][see pages 74ff.), who provides many computation steps linked to derivatives of the spatial filter matrix that are similar to the ones shown below.

\begin{aligned} \begin{aligned} \frac{\partial A}{\partial \rho _d}&= -W_d\\ \frac{\partial A'A}{\partial \rho _d}&= -2A'W_d \end{aligned} \qquad \begin{aligned} \frac{\partial ln|A|}{\partial \rho _d} = {{\,\mathrm{tr}\,}}(A^{-1}\frac{\partial A}{\partial \rho _d}) = -{{\,\mathrm{tr}\,}}(A^{-1}W_d)\\ \frac{\partial A^{-1}}{\partial \rho _d} = -A^{-1}\frac{\partial A}{\partial \rho _d}A^{-1}=A^{-1}W_dA^{-1} \end{aligned} \end{aligned}

The following paragraphs show that the Hessian matrix of higher order spatial models, in particular the one of model (6), also admits very simple analytical solutions to most of its elements. Since the expression of the likelihood function is invariant by permutation of the terms $$\rho _oW_o$$, $$\rho _dW_d$$, $$\rho _wW_w$$ many elements of the Hessian matrix are up to the indexes (odw) identical. This property allows shorten the following argumentation, since we only have to develop the expressions related to $$\rho _d$$, while the expressions related to $$\rho _d$$ and $$\rho _w$$ are inferred by exchanging the indexes.

### Block $$H_{11}$$

The Hessian block $$H_{11}$$ given is the second order derivative of the likelihood function with respect to $$\rho$$

\begin{aligned} H_{11} = \frac{\partial ^{2} {\mathcal {L}}}{\partial \rho \partial \rho '}= \left[ \begin{array}{ccc} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _d^{2}}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _d \partial \rho _o}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _d \partial \rho _w}} \\ \bullet &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _o^2}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _o \partial \rho _w}} \\ \bullet &{} \bullet &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \rho _w^2}} \end{array}\right] = \left[ \begin{array}{ccc} V_{dd} &{} V_{do} &{} V_{dw} \\ \bullet &{} V_{oo} &{} V_{ow} \\ \bullet &{} \bullet &{} V_{ww} \end{array}\right] , \end{aligned}

where for $$j,k \in (d,o,w)$$, $$V_{jk} = -{{\,\mathrm{tr}\,}}(W_jA^{-1}W_kA^{-1}) -\frac{1}{\sigma ^2}y'W_j'W_ky$$. The steps below derive the expression of $$V_{dd}$$, while all other elements can be found using the same arguments and only require to exchange the indexes.

\begin{aligned} \begin{aligned} \frac{\partial ^2 {\mathcal {L}}}{\partial \rho _d^2}&= \frac{\partial ^2}{\partial \rho _d^2}(\ln |A|) + \frac{\partial ^2}{\partial \rho _d^2}(-\frac{1}{2\sigma ^2}(Ay-Z\delta )'(Ay-Z\delta ))\\&= \frac{\partial }{\partial \rho _d}({{\,\mathrm{tr}\,}}(A^{-1}\frac{\partial }{\partial \rho _d}A)) + \frac{\partial ^2}{\partial \rho _d^2}(-\frac{1}{2\sigma ^2}y'A'Ay) \\&= \frac{\partial }{\partial \rho _d}(-{{\,\mathrm{tr}\,}}(A^{-1}W_d)) + \frac{\partial }{\partial \rho _d}(-\frac{1}{2\sigma ^2}2y'A'(-W_d)y) \\&= -{{\,\mathrm{tr}\,}}(-A^{-1}(-W_d)A^{-1}W_d)) -\frac{y'W_d'W_dy}{\sigma ^2} \\&= -{{\,\mathrm{tr}\,}}(W_dA^{-1}W_dA^{-1}) -\frac{y'W_d'W_dy}{\sigma ^2} := V_{dd} \end{aligned} \end{aligned}

### Block $$H_{21}$$

The Hessian block $$H_{21}$$ given in (20) corresponds to the cross derivative of the likelihood function which we differentiate first with respect to $$\theta = (\delta ' \sigma ^2)'$$ and then with respect to $$\rho$$

\begin{aligned} H_{21}= \frac{\partial ^{2} {\mathcal {L}}}{\partial \theta \partial \rho '}= \begin{bmatrix} \frac{\partial ^{2} {\mathcal {L}}}{\partial \delta \partial \rho _d } &{} \frac{\partial ^{2} {\mathcal {L}}}{\partial \delta \partial \rho _o } &{} \frac{\partial ^{2} {\mathcal {L}}}{\partial \delta \partial \rho _w } \\ \frac{\partial ^{2} {\mathcal {L}}}{\partial \sigma ^2 \partial \rho _d } &{} \frac{\partial ^{2} {\mathcal {L}}}{\partial \sigma ^2 \partial \rho _o } &{} \frac{\partial ^{2} {\mathcal {L}}}{\partial \sigma ^2 \partial \rho _w } &{} \end{bmatrix} = \begin{bmatrix} -\frac{Z'L}{\sigma ^2} \\ \frac{\delta '(Z'L) - \tau (\rho )'(y_{\bullet }'L) }{\sigma ^4} \end{bmatrix}, \end{aligned}

where $$L = (W_dy \; W_oy \; W_wy)$$. The equations below derive analytical expressions for the elements in the first column of above matrix.

\begin{aligned} \begin{aligned} \frac{\partial ^{2}{\mathcal {L}}}{\partial \delta \partial \rho _d }&= \frac{\partial ^{2}}{\partial \delta \partial \rho _d }(-\frac{1}{2\sigma ^2}(Ay-Z\delta )'(Ay-Z\delta )) \\&= \frac{\partial ^{2}}{\partial \delta \partial \rho _d }(-\frac{1}{2\sigma ^2}(-2(Z\delta )'Ay) \\&= \frac{\partial }{\partial \delta }(\frac{(Z\delta )'(-W_d)y}{\sigma ^2}) \\&= \frac{-Z'W_dy}{\sigma ^2} \end{aligned} \quad \begin{aligned} \frac{\partial ^{2}{\mathcal {L}}}{\partial \sigma ^2 \partial \rho _d }&= \frac{\partial ^{2}}{\partial \sigma ^2 \partial \rho _d }(-\frac{1}{2\sigma ^2}(Ay-Z\delta )'(Ay-Z\delta )) \\&= \frac{\partial ^{2}}{\partial \sigma ^2 \partial \rho _d }(-\frac{1}{2\sigma ^2}(y'A'Ay-2(Z\delta )'Ay)) \\&= \frac{\partial }{\partial \rho _d}(\frac{y'A'Ay-2(Z\delta )'Ay}{2\sigma ^4}) \\&= \frac{-2(Ay)'W_dy+2\delta 'Z'W_dy}{2\sigma ^4} \end{aligned} \end{aligned}

The previously introduced permutation-invariance of the likelihood function then allows to apply the same steps to derive the other two columns. These expressions are simplified further using the multivariate formulation L of the spatial lags of the flow vector.

\begin{aligned} H_{21}&= \begin{bmatrix} -\frac{Z'W_dy}{\sigma ^2} &{} -\frac{Z'W_oy}{\sigma ^2} &{} -\frac{Z'W_wy}{\sigma ^2} &{} \\ \frac{\delta 'Z'W_dy-(Ay)'W_dy}{\sigma ^4} &{} \frac{\delta 'Z'W_oy-(Ay)'W_oy}{\sigma ^4} &{} \frac{\delta 'Z'W_wy-(Ay)'W_wy}{\sigma ^4} \end{bmatrix} \\&= \begin{bmatrix} -\frac{Z'L}{\sigma ^2} \\ \frac{\delta '(Z'L) - \tau (\rho )'(y_{\bullet }'L) }{\sigma ^4} \end{bmatrix} \end{aligned}

### Block $$H_{22}$$

The Hessian block $$H_{22}$$ given in (20) contains the second order derivatives of the likelihood function with respect to $$\theta$$

\begin{aligned} H_{22} = {\frac{\partial ^{2} {\mathcal {L}}}{\partial \theta \partial \theta ^{\prime }}} =\left[ \begin{array}{ccc} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \delta \partial \delta '}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial \delta \partial \sigma ^2}} \\ {\frac{\partial ^{2} {\mathcal {L}}}{\partial \sigma ^2 \partial \delta '}} &{} {\frac{\partial ^{2} {\mathcal {L}}}{\partial (\sigma ^2)^2}} \end{array}\right] =\left[ \begin{array}{ccc} -{\frac{Z'Z}{\sigma ^2}} &{} 0 \\ 0 &{} -{\frac{N}{2\sigma ^4}} \end{array}\right] \end{aligned}

The above block has the same expression as the Hessian matrix of a non-spatial linear model, when we evaluate at the maximum likelihood estimators in (17), in particular $${\hat{\sigma }}^2(\rho ) = (Ay - Z\delta (\rho ))'(Ay - Z\delta (\rho ))N^{-1}$$ and $${\hat{\delta }}(\rho ) = (Z'Z)^{-1}Z'Ay$$.

\begin{aligned} \begin{aligned} \frac{\partial ^{2}{\mathcal {L}} }{\partial \delta \partial \delta '}&= \frac{\partial ^{2}}{\partial \delta \partial \delta '} (-\frac{1}{2\sigma ^2}(Ay-Z\delta )'(Ay-Z\delta )) \\&= \frac{\partial ^{2}}{\partial \delta \partial \delta '} (\frac{-\delta 'Z'Z\delta }{2\sigma ^2}) \\&= \frac{-Z'Z}{\sigma ^2} \\~\\ \frac{\partial ^{2}{\mathcal {L}} }{\partial \delta \partial \sigma ^2}&= \frac{\partial ^{2}}{\partial \delta \partial \sigma ^2} (-\frac{1}{2\sigma ^2} (Ay - Z\delta )'(Ay - Z\delta )) \\&= \frac{\partial }{\partial \delta } (\frac{1}{2\sigma ^4} (Ay - Z\delta )'(Ay - Z\delta )) \\&= \frac{-Z'(Ay - Z\delta )}{\sigma ^4} \\&= \frac{-Z'Ay - Z'Z(Z'Z)^{-1}Z'Ay}{\sigma ^4} \\&= 0 \\~\\ \frac{\partial ^{2} {\mathcal {L}}}{\partial (\sigma ^2)^2}&= \frac{\partial ^{2}}{\partial (\sigma ^2)^2} (-\frac{N}{2}\ln (\sigma ^2)-\frac{1}{2\sigma ^2} (Ay - Z\delta )'(Ay - Z\delta )) \\&= \frac{\partial }{\partial \sigma ^2} (-\frac{N}{2\sigma ^2}+\frac{(Ay - Z\delta )'(Ay - Z\delta )}{2\sigma ^4})\\&= \frac{N}{2\sigma ^4} - \frac{(Ay - Z\delta )'(Ay - Z\delta )}{\sigma ^6} \\&= \frac{N}{2\sigma ^4} - \frac{N\sigma ^2}{\sigma ^6} \\&= -\frac{N}{2\sigma ^4} \end{aligned} \end{aligned}

### Appendix B A matrix formulation of spatial instruments

This appendix shows how to derive the moments $$U'y$$, $$U'L$$ and $$U'U$$ of Sect. 4.4.2 that are required to perform S2SLS estimation based on the matrix formulation of the spatial interaction model. The first subsection presents the ten blocks of the moment $$U'U$$ that is linked to the empirical variance of the instrumental variables. The second subsection derives the moments $$U'y$$ and $$U'L$$, which are proportional to the empirical covariances of the instrumental variables with the original and lagged versions of the flow vector.

### The variance moment $$U'U$$

The previously introduced groups of instrumental variables $$U = (U_{\alpha } U_{\alpha _I} U_\beta U_\gamma )$$ allow to express the moment matrix $$U'U$$ in terms of 16 blocks.

\begin{aligned} U'U = \begin{pmatrix} U_{\alpha \alpha } &{} U_{\alpha \alpha _I} &{} U_{\alpha \beta } &{} U_{\alpha \gamma } \\ \bullet &{} U_{\alpha _I\alpha _I} &{} U_{\alpha _I\beta } &{} U_{\alpha _I\gamma } \\ \bullet &{} \bullet &{} U_{\beta \beta } &{} U_{\beta \gamma } \\ \bullet &{} \bullet &{} \bullet &{} U_{\gamma \gamma } \end{pmatrix} \end{aligned}

The following four paragraphs derive the diagonal blocks of above matrix. For the six off-diagonal blocks we will start with those in the first row and move from top to bottom and left to right. Since $$U'U$$ is a symmetric matrix, we can focus on the upper triangle of the above matrix and infer all blocks on the lower triangle by symmetry.

#### Block 1 ($$\alpha ^2$$)

The first block of the $$U'U$$ moment contains only the inner product of intercept term $$U_{\alpha } = \iota _{{N}}$$.

\begin{aligned} U_{\alpha }'U_{\alpha } := U_{\alpha \alpha } = (\iota _{{n}}' \otimes \iota _{{n}}') (\iota _{{n}} \otimes \iota _{{n}}) = (\iota _{{n}}'\iota _{{n}}) (\iota _{{n}}'\iota _{{n}}) = n \times n = N \end{aligned}

#### Block 2 ($$\alpha _I^2$$)

The second diagonal block of the covariance moment $$U_{\alpha _I}'U_{\alpha _I} := U_{\alpha _I\alpha _I}$$ contains the inner product of the instruments derived from intra-regional constant. All columns in $$U_{\alpha _I}$$ are derived by applying $${{\,\mathrm{VEC}\,}}$$-operator to the matrices $$\mathrm {A}_i^H$$, for $$i = 1, ..., 9$$, which are defined below.

\begin{aligned} U_{\alpha _I} = \begin{pmatrix} VEC(\mathrm {A}_1^U)&...&VEC(\mathrm {A}_9^U) \end{pmatrix} \end{aligned}
\begin{aligned} \begin{aligned} \mathrm {A}_1^U&= {\mathbf {I}}_{n}\\ \mathrm {A}_2^U&= {W}\\ \mathrm {A}_3^U&= {W}' \end{aligned} \qquad \qquad \begin{aligned} \mathrm {A}_4^U&= {W^2}\\ \mathrm {A}_5^U&= {W^{2\prime }}\\ \mathrm {A}_6^U&= {W}{W}' \end{aligned} \qquad \qquad \begin{aligned} \mathrm {A}_7^U&= {W^2}{W^{2\prime }}\\ \mathrm {A}_8^U&= {W^2}{W}' \\ \mathrm {A}_9^U&= {W}{W^{2\prime }}\end{aligned} \end{aligned}
(47)

This structure allows to compute each element of $$U_{\alpha _I\alpha _I}$$ as $$U_{\alpha _I\alpha _I,ij} = \iota _n'(\mathrm {A}_i^H \odot \mathrm {A}_i^U)\iota _n$$, for $$i,j = 1,...,9$$. Since the block is symmetric we only need to compute the elements on the upper triangle. We can further reduce the number of elements to compute if we exploit the following properties: For two square matrices A and B with compatible dimension, we have $$\iota _{{n}}'(A \odot B)\iota _{{n}}$$ = $$\iota _{{n}}'(A' \odot B')\iota _{{n}} = \iota _{{n}}'(B \odot A)\iota _{{n}}$$. In addition, if either A or B is symmetric we have $$\iota _{{n}}'(A \odot B)\iota _{{n}} = \iota _{{n}}'(A' \odot B)\iota _{{n}}$$. Exploiting all symmetries of the problem we only need to compute 21 of the 81 elements that constitute the matrix $$U_{\alpha _I\alpha _I}$$ for the case when W itself is not symmetric. When W is symmetric we have to define $$U_\alpha$$ in terms of the five instruments $${{\,\mathrm{VEC}\,}}(W^m)$$ for $$m=0,1,2,3,4$$ since four of the nine vectors in $$\begin{pmatrix} VEC(\mathrm {A}_1^U)&...&VEC(\mathrm {A}_9^U) \end{pmatrix}$$ would be redundant.

#### Block 3 ($$\beta ^2$$)

The next diagonal block contains the inner product of the instruments derived from the site attributes $$U_{\beta } = ( \tilde{\tilde{X_d}} \; \tilde{\tilde{X_o}} \; \tilde{\tilde{X_I}} )$$. We can exploit the structure of the three matrices $$\tilde{\tilde{X_r}} (r = d,o,I)$$ to derive this moment block from matrix products of much the much smaller matrices $$\tilde{\tilde{DX}}, \tilde{\tilde{OX}}$$ and $$\tilde{\tilde{DX}}$$.

\begin{aligned} U_{\beta }'U_{\beta } := U_{\beta \beta } = \begin{pmatrix} n \tilde{\tilde{DX}}' \tilde{\tilde{DX}} &{}&{} (\iota _{{n}}'\tilde{\tilde{DX}})'(\iota _{{n}}'\tilde{\tilde{OX}}) &{}&{} \tilde{\tilde{DX}}'\tilde{\tilde{IX}} \\ \bullet &{}&{} n \tilde{\tilde{OX}}' \tilde{\tilde{OX}} &{}&{} \tilde{\tilde{OX}}'\tilde{\tilde{IX}} \\ \bullet &{}&{} \bullet &{}&{} \tilde{\tilde{IX}}'\tilde{\tilde{IX}} \end{pmatrix} \end{aligned}

#### Block 4 ($$\gamma ^2$$)

The last diagonal block contains inner product of the instruments derived from the pair attributes $$U_{\gamma } = ( {{\,\mathrm{VEC}\,}}(G) \; {{\,\mathrm{VEC}\,}}({\check{G}}) \; {{\,\mathrm{VEC}\,}}(\check{{\check{G}}}) )$$, where $${\check{G}}= WGW'$$ and $$\check{{\check{G}}}= W{\check{G}}W'$$. This structure allows to express the moment block in terms of the Hadamard product. When the matrix G represents the geographic distance, it is usually symmetric, which implies that $${\check{G}}$$ and $$\check{{\check{G}}}$$ are also symmetric. In this case, we can exploit that for two symmetric matrices A and B with compatible dimensions, we only require the upper diagonal of both to compute $$\iota _n'(A \odot B)\iota _n$$. This reduces the memory and computational requirements by about 50%, which may be worthwhile because this block is responsible for a major part of the computational burden of the S2SLS estimator.

\begin{aligned} U_{\gamma }'U_{\gamma } := U_{\gamma \gamma } = \begin{pmatrix} \iota _n'(G \odot G)\iota _n &{} \iota _n'(G \odot {\check{G}})\iota _n &{} \iota _n'(G \odot \check{{\check{G}}})\iota _n \\ \bullet &{} \iota _n'({\check{G}}\odot {\check{G}})\iota _n &{} \iota _n'({\check{G}}\odot \check{{\check{G}}})\iota _n \\ \bullet &{} \bullet &{} \iota _n'(\check{{\check{G}}}\odot \check{{\check{G}}})\iota _n \end{pmatrix} \end{aligned}

#### Block 5 ($$\alpha \alpha _I$$)

The fifth block of $$U'U$$ is obtained as the inner product of the global constant $$U_{\alpha }$$ and the instruments derived from the intra-regional constant $$U_{\alpha _I}$$. Since all elements of $$U_{\alpha }$$ are one, computing this inner product corresponds to summing the elements in each of the nine vectors in $$U_{\alpha _I}$$. As W is assumed to be row-stochastic, we can directly conclude that the first five entries of $$U_{\alpha \alpha _I}$$ are equal to n. The remaining four entries are computed from the vectors of column sums of $${W}$$ and $${W^2}$$.

\begin{aligned} U_{\alpha }'U_{\alpha _I} := U_{\alpha \alpha _I} = \begin{pmatrix}n&n&n&n&n&(\iota _{{n}}'W)(\iota _{{n}}'W)'&(\iota _{{n}}'{W^2})(\iota _{{n}}'{W^2})'&(\iota _{{n}}'W)(\iota _{{n}}'{W^2})'&(\iota _{{n}}'W)(\iota _{{n}}'{W^2})' \end{pmatrix} \end{aligned}

#### Block 6 ($$\alpha \beta$$)

Block six is computed from the constant $$U_{\alpha }$$ and the instruments $$U_{\beta }$$ that are derived from the site attributes. Computing the elements of this block only requires the scalar n and the column sums of the matrices $$\tilde{\tilde{DX}}, \tilde{\tilde{OX}}$$ and $$\tilde{\tilde{IX}}$$.

\begin{aligned} U_{\alpha }'U_{\beta } := U_{\alpha \beta } = \begin{pmatrix} n \cdot \iota _{{n}}'\tilde{\tilde{DX}}&\,&n \cdot \iota _{{n}}'\tilde{\tilde{OX}}&\,&\iota _{{n}}'\tilde{\tilde{IX}} \end{pmatrix} \end{aligned}

#### Block 7 ($$\alpha \gamma$$)

This block contains the inner product of the constant $$U_{\alpha }$$ with the instruments derived from the exogenous attributes of the origin-destination pairs $$U_{\gamma }$$.

\begin{aligned} U_{\alpha }' U_{\gamma } := U_{\alpha \gamma } = \begin{pmatrix} \iota _n' G \iota _n &{}&{} \iota _n' {\check{G}}\iota _n &{}&{} \iota _n' \check{{\check{G}}}\iota _n \\ \end{pmatrix} \end{aligned}

#### Block 8 ($$\alpha _I \beta$$)

The eighth block of $$U'U$$ relates to the empirical covariance between $$U_{\alpha _I}$$ the instruments derived from the intra-regional constant and $$U_{\beta }$$ the instruments derived from the site attributes. The structure of $$U_{\alpha _I}$$ allows to express each of the nine rows of $$U_{\alpha _I}'U_{\beta } := U_{\alpha _I\beta }$$ using $$\mathrm {A}_i^U$$, for $$i = 1, ... , 9$$, as defined in (47).

\begin{aligned} U_{\alpha _I\beta ,i} = \begin{pmatrix} (\iota _n'\mathrm {A}_i^{H\prime }) \tilde{\tilde{DX}}&\,&(\iota _n'\mathrm {A}_i^H) \tilde{\tilde{OX}}&\,&{{\,\mathrm{diag}\,}}(\mathrm {A}_i^H) \tilde{\tilde{IX}} \end{pmatrix} \end{aligned}

#### Block 9 ($$\alpha _I \gamma$$)

This block is calculated as the inner product of the instruments derived from the intra-regional $$U_{\alpha _I}$$ constant and those derived from the exogenous origin-destination pair attributes $$U_{\gamma }$$. It is again possible to derive $$U_{\alpha _I}'U_{\gamma } := U_{\alpha _I\gamma }$$ for all rows ($$i = 1,...,9$$) using the notations in (47).

\begin{aligned} U_{\alpha _I\gamma ,i} = \begin{pmatrix} \iota _n'(\mathrm {A}_i^H \odot G)\iota _n &{}&{} \iota _n'(\mathrm {A}_i^H \odot {\check{G}})\iota _n &{}&{} \iota _n'(\mathrm {A}_i^H \odot \check{{\check{G}}})\iota _n \\ \end{pmatrix} \end{aligned}

We can reduce the computational burden of this block if we pay attention to symmetries that were already mentioned in the sections on the diagonal blocks $$U_{\alpha _I\alpha _I}$$ and $$U_{\gamma \gamma }$$.

#### Block 10 ($$\beta \gamma$$)

The last block of the $$U'U$$ moment matrix is computed as the inner product of $$U_{\beta }$$ the instruments derived from the site attributes with $$U_{\gamma }$$ the instruments derived from the pair attributes.

\begin{aligned} U_{\beta \gamma }'= \begin{pmatrix} (\iota _{{n}}'G') \tilde{\tilde{DX}} &{}&{} (\iota _{{n}}'G) \tilde{\tilde{OX}} &{}&{} {{\,\mathrm{diag}\,}}(G) \tilde{\tilde{IX}} \\ (\iota _{{n}}'{\check{G}}') \tilde{\tilde{DX}} &{}&{} (\iota _{{n}}'{\check{G}}) \tilde{\tilde{OX}} &{}&{} {{\,\mathrm{diag}\,}}({\check{G}}) \tilde{\tilde{IX}} \\ (\iota _{{n}}'\check{{\check{G}}}') \tilde{\tilde{DX}} &{}&{} (\iota _{{n}}'\check{{\check{G}}}) \tilde{\tilde{OX}} &{}&{} {{\,\mathrm{diag}\,}}(\check{{\check{G}}}) \tilde{\tilde{IX}} \end{pmatrix} \end{aligned}

### The covariance moments $$U'y_{\bullet }$$

To construct the moments $$U'y$$ and $$U'L$$ we use the notations $$Y^{(t)}$$ and $$y^{(t)}$$ that are defined in (8). The elements of the four moments $$U'y^{(t)}$$, for $$t = 1,2,3,4$$, represent the columns of $$U'y = U'y^{(1)}$$ and $$U'L = (U'y^{(2)}U'y^{(3)}U'y^{(4)})$$.

\begin{aligned} U'y^{(t)} = \begin{pmatrix} (U_{\alpha }'y^{(t)})'&(U_{\alpha _I}'y^{(t)})'&(U_\beta 'y^{(t)})'&(U_\gamma 'y^{(t)})'\end{pmatrix}' \end{aligned}

The elements of the above moments are derived below, where the elementwise notation for the nine entries of $$U_{\alpha _I}'y^{(t)}$$ uses definition (47) of $$\mathrm {A}_i^U$$, for $$i = 1, ... , 9$$.

\begin{aligned} \begin{aligned} U_{\alpha }'y^{(t)}&= \begin{pmatrix} \iota _{{n}}'Y^{(t)}\iota _{{n}} \end{pmatrix} \\~\\ (U_{\alpha _I}'y^{(t)})_i&= \begin{pmatrix} \iota _{{n}}'(\mathrm {A}_i^H \odot Y^{(t)}) \iota _{{n}} \end{pmatrix}, \text { for } i = 1,...,9 \\~\\ U_{\beta }'y^{(t)}&= \begin{pmatrix} \tilde{\tilde{DX'}}(Y^{(t)} \iota _{{n}}) \\ \tilde{\tilde{OX'}}(Y^{(t)\prime } \iota _{{n}}) \\ \tilde{\tilde{IX'}} {{\,\mathrm{diag}\,}}(Y^{(t)}) \end{pmatrix} \\~\\ U_{\gamma }'y^{(t)}&= \begin{pmatrix} \iota _{{n}}'(G \odot Y^{(t)}) \iota _{{n}} \\ \iota _{{n}}'({\check{G}}\odot Y^{(t)}) \iota _{{n}} \\ \iota _{{n}}'(\check{{\check{G}}}\odot Y^{(t)}) \iota _{{n}} \end{pmatrix} \end{aligned} \end{aligned}