## Abstract

The focus is on cross-sectional dependence in panel trade flow models. We propose alternative specifications for modeling time-invariant factors such as sociocultural indicator variables, e.g., common language and currency. These are typically treated as a source of heterogeneity that is eliminated using fixed effects transformations, but we find evidence of cross-sectional dependence after eliminating country-specific and time-specific effects. These findings suggest use of alternative simultaneous dependence model specifications that accommodate cross-sectional dependence, which we set forth along with Bayesian estimation methods. Ignoring cross-sectional dependence implies biased estimates from panel trade flow models that rely on fixed effects.

## Introduction

The empirical trade literature has largely ignored the issue of cross-sectional dependence between countries in econometric estimation of empirical trade flow models (Baltagi et al. 2014). In a cross-sectional setting, trade costs are incorporated using geographical distance between origin and destination dyads involved in trade flows, as well as sociocultural factors. These might include: common language and currency, historical colonial relationships, common borders, trade agreements, etc. The latter are perceived as representing a generalization of distance that also influences trade costs. For example, common language and common currency should reduce trade costs.

In a panel data model setting, distance as well as sociocultural factors (which we label generalized distance variables) are generally time invariant, so they are modeled using fixed effects. In a conventional panel setting, the impact of time-invariant variables reflects a source of heterogeneity, and introduction of appropriate fixed effects transformations is used to control for differences in the level of flows attributable to these country-specific time-invariant factors.

This paper argues that generalized distance variables can be viewed as transmission channels and modeled as a source of cross-sectional dependence, frequently observed in trade flows (see Porojan 2001). The objective is to introduce alternative simultaneous dependence specifications for modeling time-invariant factors such as generalized distance variables. These model specifications accommodate cross-sectional dependence, which we set forth along with Bayesian estimation methods. Ignoring cross-sectional dependence implies biased estimates from panel trade flow models that rely on fixed effects.

The idea of our modeling approach becomes most clear for the case of a dummy variable reflecting common borders that is often introduced as a generalized distance variable that impacts trade costs. When introduced as an indicator variable, the implication is that higher levels of flows exist between countries with common borders, a heterogeneity effect. As an alternative treatment, common borders could be introduced as a first-order contiguity spatial weight matrix. A first-order contiguity spatial weight matrix, say \(W_b\), for exports from a sample of *N* countries would be of dimension \(N \times N\) with nonzero elements in the \((i,j)\hbox {th}\) position if countries *i* and *j* share a common border, and zeros on the main diagonal. Multiplying the \(N \times N\) spatial weight matrix with an \(N \times 1\) vector of export/import flows *f*, or vector of income *X* produces a linear combination of *neighboring country* export/import flows \(W_b f\), or income \(W_b X\). Of course, we can take the same approach to forming an \(N \times N\) matrix (say) \(W_c\) having nonzero elements in the \((i,j)\hbox {th}\) position if countries *i* and *j* share a common currency or language, or exhibiting colonial ties, etc. We will have more to say about this later, but we note that a vector \(W_c X\) in this context represents a linear combination of income from countries showing a sociocultural similarity measured in terms of common currency, language, colonial ties, and so on.

These vectors can be used to specify a model of cross-sectional dependence that reflects: (1) cross-sectional dependence reflecting interaction between neighboring countries, neighbors to the neighboring countries, etc., which result in global spillover impacts, and (2) contextual effects arising from neighboring countries, which result in local spillover impacts. This type of model has been labeled a spatial Durbin model (SDM) specification in the spatial econometrics literature.

Of course, it is possible that trade flows reflect both a heterogeneity impact from time-invariant fixed effects and impacts of the type set forth in (1) and (2) above. We can test our alternative cross-sectional dependence specification for consistency with sample data on trade flows by eliminating fixed effects (through a transformation) and testing the transformed model for: cross-sectional dependence, contextual effects or a combination of these. It is worth noting that our SDM specification allows for the presence/absence of cross-sectional dependence, and/or contextual effects as well as a combination of these. Using data transformed to eliminate time-invariant fixed effects, we estimate a Bayesian panel SDM model to determine whether cross-sectional dependence, contextual effects, or a specification with both of these is most consistent with a panel of imports and exports from a sample of 74 countries over the 38-year period from 1963 to 2000. Specifically, we consider estimates from a single panel data model specification applied to 74 sets of import flows and 74 sets of export flows. One set of estimates reflect the panel data relationship between imports to a single country from all other 73 countries over the 38-year period in our sample. This model relationship is estimated for all 74 countries in our sample. Another set of estimates reflect the panel data relationship between exports from a single country to all other 73 countries over the 38-year period in our sample. This model relationship is also estimated for all 74 countries in our sample.

Another methodological innovation is use of convex combinations of cross-sectional dependence weight matrix structures (see Pace and LeSage 2002; Hazir et al. 2018; Debarsy and LeSage 2018). The weight matrix structures are constructed to reflect: spatial proximity between countries, as well as numerous types of sociocultural proximity such as common currency, language, and colonial ties. A convex combination of these multiple weight matrices (with associated parameters) is used to form a single weight matrix, where the parameters assign relative importance to each type of cross-sectional dependence. This approach allows us to treat sociocultural factors (for example, common currency, common language, historical colonial relationships, trade agreements, and so on) that have been traditionally modeled as time-invariant fixed effects as sources of cross-sectional dependence.

Constructing weight matrices from indicator variables reflecting sociocultural factors allows our SDM specification to model time-invariant factors as network links between countries that impact trade costs, rather than simply a source of heterogeneity. We set forth Bayesian MCMC estimation methods for our model specification that allows for cross-sectional dependence reflecting interaction and global spillover impacts as well as contextual effects arising from neighboring countries. Throughout the paper, we use the label “spatial” when referring to the SDM model specification, but the reader should note that a more appropriate term would be cross-sectional dependence, since connectivity between countries consists of both pure spatial distance and sociocultural proximity. Of course, we draw on the methodology and terminology set forth in the spatial econometrics literature.

Our Bayesian estimation approach allows for estimation and posterior inference on a vector of parameters that determine the relative importance of each type of cross-sectional dependence. Estimates are based on data transformed using an approach from Lee and Yu (2010) that eliminates both time-specific and country-specific fixed effects using an orthogonality transformation. If the generalized distance variables reflect only time-invariant fixed effects, our model estimates should indicate no cross-sectional dependence or contextual effects. If this is not the case, we have evidenced that these generalized distance variables have a greater impact on trade flows than the conventional heterogeneity view suggests.

Section 2 introduces conventional cross-sectional gravity models as used in the empirical trade literature, along with the notion of cross-sectional dependence. Section 3 discusses the formation of convex combinations of spatial and a host of sociocultural proximity structures, and these are discussed in the context of the panel cross-sectional dependence specifications.

Section 4 outlines computationally efficient expressions for the static panel variant of the spatial Durbin model that we wish to estimate. Bayesian MCMC estimation and inference for the model specifications are discussed in Sect. 5. Focus is on inference regarding the scalar parameters that determine the relative influence of five types of proximity that we consider (spatial, common language, common currency, trade agreements, and colonial ties) in our cross-sectional dependence specification. LeSage (2019) points to three computational challenges that arise for this type of model where the weight matrix \(W_c\) is a function of estimated parameters \(\gamma _{\ell }\)\(({\ell }=1,\ldots ,L=5)\) indicating the relative importance assigned to each type of connectivity structure. Each of these is discussed in Sect. 6 along with approaches set forth in LeSage (2019) for overcoming these challenges. Given the mixture of multiple proximity channels of transmission, interpretation of the estimates from our specification differs from that in conventional spatial models. Section 6 discusses interpretation of estimates from the cross-sectional model specification.

Section 7 applies the approach to panel data on trade flows covering the 38 years from 1963 to 2000. We provide empirical estimates for the scalar parameters reflecting the mixture of spatial and sociocultural measures of proximity and test our cross-sectional model specification for consistency with the sample data. The magnitude of bias arising from cross-sectional dependence is assessed by examining estimates of local and global spillover effects, since these are restricted to zero in conventional panel trade models.

Section 8 provides conclusions. Appendix presents information on data used as well as sources.

## Empirical cross-sectional trade models

Most trade models specify aggregate bilateral demand equations of consumers in countries \(j=1,\ldots ,N\) from producers in countries \(i=1,\ldots ,N\) in the general form:^{Footnote 1}

where \(f_{ijt}\) are bilateral exports of country *i* to country *j* at time *t*, \(l_{it}\) are exporter time-specific factors, \(m_{jt}\) are importer time-specific factors, while \(c_{ijt}\) is a measure of all bilateral trade costs from *i* to *j* at time *t*, with \(\tau\) reflecting the partial elasticity of trade flows with respect to trade costs (see Baltagi et al. 2014).

Specifics regarding what \(l_{it}\) and \(m_{jt}\) represent depend on the particular trade model. For example, using the model from Anderson and van Wincoop (2003) with a single sector, \(\tau\) reflects a measure of the elasticity of substitution between products from different countries, and \(l_{it}, m_{jt}\) correspond to size measures such as gross domestic product (which we denote using *X*). This occurs since aggregated trade flows \(F_{it} = \sum _{j=1}^N f_{ijt}\) represent total sales in country *i* at time period *t* which corresponds to gross domestic product. Finally, trade flows \(f_{ijt}\) are assumed to be inversely related to the bilateral trade costs \(c_{ijt}\).

For the cross-sectional case where we have a single year, the model in Eq. (1) is double-indexed, resulting in a balanced panel in our case where the number of importing and exporting countries is equal. Applying a log-transformation to the deterministic part \(h_{ij} = ln(l_i, m_j, c_{ij}^\tau )\) of the model in Eq. (1) and also to the trade flows \(\tilde{f}_{ij} = ln(f_{ij})\) results in:

with \(u_i, v_j\) reflecting exporter- and importer-specific effects when the data are organized first by exporting and then by importing countries. In matrix/vector notation, we can write:

where \(y=vec(\tilde{F})\) is an \(N^2 \times 1\) vector of the trade flow matrix logged and the matrices \(\Delta _u, \Delta _v\) are \(N^2 \times N\), while the vectors *u*, *v* are \(N \times 1\). The matrices \(\Delta _u, \Delta _v\) map elements from the \(N \times 1\) vectors of country-specific exporter and importer effects in *u*, *v* to the appropriate origin–destination combination of countries reflected in the \((i,j)\hbox {th}\) flow dyads in \(vec(\tilde{F})\). The matrix \(H = \left( \begin{array}{ccc} X_d&X_o&vec(C) \end{array} \right)\), where LeSage and Pace (2008) show that \(X_d = \iota _N \otimes X,\, X_o = X \otimes \iota _N\), with *X* being an \(N \times 1\) vector of gross domestic product (gdp) for the countries, and \(\iota _N\) an \(N \times 1\) vector of ones. The Kronecker product (\(\otimes\)) applied to the country-level (gdp) vector strategically arranges country-level incomes to match the export–import dyads of the dependent variable vector that arises from vectorizing the flow matrix. The term *vec*(*C*) is often simply a pairwise distance matrix vectorized as a proxy for trade costs between origin and destination dyads. A conformable vector \(\delta\) contains parameters \(\beta _d, \beta _o\) and *c* associated with the variable vectors \(X_d, X_o\) and *vec*(*C*).

As noted in the introduction, we can generalize proxies for trade costs to include not only distance (*vec*(*C*)), but also, for example, common borders and language. These binary indicator variables can be represented using \(N \times N\) matrices \(W_b\) and \(W_l\). The matrix *H* can be extended to include these indicator variables: \(H=\left( \begin{array}{ccccc} X_d&X_o&vec(C)&vec(W_b)&vec(W_l) \end{array} \right)\), along with the extended vector \(\delta\).

A cross-sectional dependence specification that has been labeled the spatial Durbin model (SDM) is shown in Eq. (4), where we redefine \(H = (X_d, X_o)\), \(\beta =(\beta _d, \beta _o)'\) and \(\theta =(\theta _d, \theta _o)'\):

Here \(W_{c}\) is an \(N \times N\) matrix reflecting a convex combination of the two weight matrices \(W_{b}\) and \(W_{l}\).

The SDM specification allows for contextual effects as well as global spillovers from changes in country-level incomes reflected by elements contained in vectors \(X_d, X_o\) in the matrix *H*. This can be seen by noting that a change in income of country *i*, \(X_i\), will have a partial derivative impact that involves the matrix inverse: \((I_{N^2} - \rho W_{c})^{-1} = I_{N^2} + \rho W_{c} + \rho ^2 W_{c}^2 + \ldots\) as shown in Eq. (5):

LeSage and Thomas-Agnan (2015), and LeSage and Fischer (2016) provide specifics regarding the nature of these partial derivatives, but for our purposes we simply note that changes taking place in one country will have global spillover impacts on trade flows in neighboring countries, neighbors to the neighbors, and so on. There will also be feedback effects arising from matrices such as \(W_{c}^2\), since the diagonal elements of this matrix contain nonzero elements. These reflect the fact that country *i* is a neighbor to its neighboring country *j*, or a second-order neighbor to itself.

## Panel data models

In a panel setting, explanatory variables from the matrix *H* in the cross-sectional model in Eq. (4) that do not vary over time between countries must be eliminated. Variables such as distance *vec*(*C*), and indicator variables for common borders, language, currency and other sociocultural measures of similarity (\(vec(W_b)\) and \(vec(W_l))\) do not vary over time. Transformations such as the *within transformation* or that suggested by Lee and Yu (2010) can be used to eliminate country-specific and time-specific effects. Given the motivation for cross-sectional dependence set forth above, a question arises whether time-invariant factors reflect heterogeneity that is eliminated by fixed effects transformations.

We consider panel model specifications that use the \(i\hbox {th}\) column of the flow matrix \(\tilde{F}\) representing exports from country *i* to all other countries as the dependent variable vector *y* over the \(T=38\) years from 1963 to 2000. Specifically, we consider estimates from a single panel data model specification applied to 74 sets of import flows and 74 sets of export flows. One set of estimates reflect the panel data relationship between imports to a single country from all other 73 countries over the 38-year period in our sample. This model relationship is estimated for all 74 countries in our sample. Another set of estimates reflect the panel data relationship between exports from a single country to all other 73 countries over the 38-year period in our sample. This model relationship is also estimated for all 74 countries in our sample. We also consider two panel data model specifications, the spatial autoregressive (SAR) and spatial Durbin model (SDM). Given our sample of \(N=74\) countries, this results in 74 different sets of panel data model estimates based on import and export data sets having dimension \((N-1) \times T\).^{Footnote 2}

An advantage of this approach is that we allow for different coefficient estimates for the parameters of each panel model relationship, for each of the *N* origin (exporting) countries, and for a set of time-invariant fixed effects for each destination (importing) country with respect to each origin country. This set of heterogeneous coefficients contrasts with typical empirical trade panel data models that impose a restriction that coefficients on all explanatory variables are the same for all countries and time periods, with heterogeneity accounted for by the fixed effects parameters. A conventional empirical trade panel model would stack the \(N^2 \times T\) flow matrices as noted in the previous section and rely on a matrix *H* containing destination and origin incomes in the \(N^2 \times 1\) vectors \(X_d, X_o\). In our set of \((N-1) \times T\) panel models, we used the within transformation to eliminate country-specific effects. In addition, we transformed the data to deviation from the cross-sectional means at each time period to eliminate time-specific fixed effects.

**Convex combinations of proximity structures**

We focus on convex combinations of weight matrices that result in a single weight matrix reflecting multiple types of connectivity, where coefficients from the convex combination can be used for inference regarding the relative importance of each type of connectivity. For example, in our case of \(L=5\) weight matrices, \(W_{\ell }, \ell =1,\ldots ,L\) reflecting *L* different types of dependence between our cross section of countries:

The matrix \(W_c\) reflects a convex combination of the *L* weight matrices, with the scalar parameters \(\gamma _{\ell }\) indicating the relative importance assigned to each type of dependence. We wish to consider both conventional spatial dependence, which represents one type of cross-sectional dependence, and multiple types of sociocultural dependence (specifically, common currency, common language, trade agreements and colonial ties).

The spatial weight matrix \(W_{\ell =1}\) reflects spatial proximity of countries (specifically some number of nearest neighbors). We rely on six nearest neighbors to form \(W_{\ell =1}\). The other matrices \(W_{\ell =2,\ldots ,5}\), are constructed to reflect sociocultural proximity based on: common currency \(W_{\ell =2}\), common language \(W_{\ell =3}\), membership in a trade agreement (excluding WTO membership) \(W_{\ell =4}\), and direct historical colonial ties \(W_{\ell =5}\).

There are some points to note regarding this approach. First, the matrices \(W_{\ell }\) must be distinct, but can be highly correlated. If, for example, \(W_{\ell =1} = W_{\ell =2}\), the parameters \(\gamma _1\) and \(\gamma _2\) will not be properly identified. Second, the matrices \(W_{\ell }\) are row-normalized to have row-sums of unity and zero diagonal elements. Zero diagonal elements exclude a country *i* from being a neighbor to itself. Row normalization ensures that the scalar cross-sectional dependence parameter \(\rho\) must be less than one, a conditional required for convergence of the infinite series expansion: \((I_N - \rho W_c)^{-1} = I_N + \rho W_c + \rho ^2 W_c^2 + \ldots\).

Another point is that no individual row in the matrix \(W_{c}\) can contain only zeros. In the case of a spatial weight matrix \(W_{\ell =1}\) based on some number (say) *s* nearest neighboring countries, all rows will by definition consist of nonzero elements. However, this results in nonzero rows in the matrix \(W_c\) only if \(\gamma _{\ell =1}\) is nonzero. We allow zero values for the parameters \(\gamma _{\ell }\). To prevent zero rows in the matrix \(W_c\), we restricted our sample of countries to those for which all \(L=5\) matrices had rows with nonzero elements. This resulted in elimination of countries such as South Korea and Japan that do not have a common language, common currency, direct colonial ties, etc. with any other country in the sample.

## Computationally efficient expressions for the model

We extend the approach taken by LeSage (2019) that deals with cross-sectional specifications to the case of a static panel data setting. The static panel variant of the spatial Durbin model (SDM) that we wish to estimate is shown in Eq. (7), where each \(W_{\ell }\) represents an \((N-1) \times (N-1)\) weight matrix whose main diagonal contains zero elements and row-sums of the off-diagonal elements equal to one, with *N* denoting the number of countries. Note also that we have eliminated country-specific effects by applying the demeaning transformation to the vector *y* and matrix *X*, but for notational simplicity we use *y*, *X*. In addition, we eliminate time-specific effects by demeaning as well. Nonzero (off-diagonal) weight matrix elements (*i*, *j*) of each \(W_{\ell }\) reflect that observation *j* exhibits interaction with observation *i*, with different weight matrices describing different possible types of interaction (e.g., spatial, and different types of sociocultural.

with \(\rho\) denoting the scalar dependence parameter. The \((N-1) T \times 1\) vector *y* contains observations on exports (imports) from (to) country *i* to (from) all \((N-1)\) other countries for all time periods. These are organized with those for all (other) countries for the first time period, then all countries for the second time period, and so on. The \((N-1)T \times (N-1)T\) matrix \((I_T \otimes W_c)\) uses the Kronecker product to replicate the weight matrix for each time period. The \((N-1) T \times K\) matrix *X* in Eq. (7) contains the explanatory variables arranged in the same fashion as the dependent variable vector *y*, with \(\beta\) being the associated \(K \times 1\) vector of parameters. In our case, the explanatory variable vector *X* reflects gdp pc (gross domestic product per capita) of the destination or origin countries in the case of exports or imports, respectively, with \(\theta\) being the associated \(K \times 1\) vector of parameters. The \((N-1) T \times K\) matrices \((I_T \otimes W_1) X, (I_T \otimes W_2) X, \ldots , (I_T \otimes W_L) X\) reflect (logged) gdp pc in spatial neighbors to the origin/destination in the case of \(W_1\), and countries with common currency, common language, trade agreements and direct colonial ties in the cases of \(W_{2}\) to \(W_5\). In the social networking literature, these variable vectors are referred to as *contextual effects*, representing characteristics of peer groups defined by the matrix products \(W_{\ell } X\)\((\ell =1,\ldots ,L)\) which create averages of peers’ characteristics that might influence the outcomes vector *y*. In our model, these variables allow for (average) income in spatial and sociocultural neighbors produced by the matrix-vector products \(W_{\ell } X\) to influence trade flows. Finally, the \((N-1) T \times 1\) vector \(\varepsilon\) represents a constant variance normally distributed disturbance term (\(\varepsilon \sim {\mathcal {N}}(0, \sigma ^2 I_{(N-1)T})\).

The model in Eq. (7) can be expressed as shown in Eq. (8), that is computationally convenient because it isolates the parameters \(\rho , \gamma _{\ell }, \ell = 1,\ldots ,L\) in the \((L+1) \times 1\) vector \(\omega\). We use: \(\tilde{W}_{\ell } = (I_T \otimes W_{\ell })\) in Eq. (8) to simplify notation.

A related model labeled the spatial autoregressive (SAR) model can be constructed by redefining the matrix \(Z = X\). This type of model excludes *contextual effects* embedded in the various types of neighboring countries income represented by the variable vectors \(W_{\ell } X\).

The value of isolating the parameter vector \(\omega\) is that this allows us to pre-calculate the \((N-1)T \times L\) matrix *My* prior to beginning the Markov chain Monte Carlo (MCMC) sampling loop. It also leads to quadratic form expressions for crucial terms that arise during MCMC sampling from the sequence of conditional distributions for the parameters. Quadratic forms produce computationally fast and efficient calculations.

## The Markov chain Monte Carlo estimation scheme

Here again, we extend the approach taken by LeSage (2019) for estimation in a cross-sectional setting to the case of the static panel data specification. Prior distributions along with conditional posterior distributions for the model parameters required to implement MCMC estimation of the SDM panel data specification in Eq. (7) are set forth here.

We rely on a normal prior for the parameters \(\delta = \left( \begin{array}{cccc} \beta&\theta _1&\ldots&\theta _L \end{array} \right) '\):

where \(\bar{\delta }\) is a \((K + L)\times 1\) vector of prior means and \(\bar{\Sigma }_{\delta }\) is a \((K + L) \times (K + L)\) prior variance–covariance matrix.^{Footnote 3}

We employ a uniform prior for \(\rho\) since this scalar dependence parameter is constrained to lie in the open interval: \((-1,1)\).^{Footnote 4} The constraint \((-1< \rho < 1)\) is imposed during MCMC estimation using rejection sampling.

Since the parameters \(\gamma _{\ell }, \ell = 1,\ldots ,L\) are a focus of inference, we do not impose a prior distribution on these parameters, but impose the closed interval [0, 1] for \(\gamma _{\ell }\), \(\ell =1,\ldots , L\) during MCMC estimation, and also impose \(\sum _{\ell =1}^{L} \gamma _{\ell } =1\), by setting \(\gamma _L = (1 - \sum _{\ell =1}^{L-1}\gamma _{\ell })\). We discuss how proposal values for the vector of parameters \(\Gamma\) are generated later.

For the parameter \(\sigma ^2\), we use an inverse gamma(\(\bar{a},\bar{b}\)) distribution shown in Eq. (12). We note that as values of \(\bar{a}, \bar{b} \rightarrow 0\), this prior distribution becomes uninformative, which might be important in applied practice since there would be little basis for assigning prior values for the parameter \(\sigma ^2\).

As is traditional in the literature, we assume that priors for the parameters \(\delta , \rho , \Gamma , \sigma ^2\) are independent. Given these priors, we require the conditional distributions for the parameters \(\delta , \sigma ^2, \rho , \Gamma\) from which we sample to implement MCMC estimation. The conditional distribution for the parameters \(\delta\) is multivariate normal with mean and variance–covariance shown in Eq. (13):

The conditional posterior for \(\sigma ^2\) (given \(\delta , \rho , \Gamma\)) takes an inverse gamma (IG) form in Eq. (14), when we set the prior parameters \(\bar{a} = \bar{b} = 0\):

The (log) conditional posterior for \(\rho\) (given \(\delta , \Gamma , \sigma ^2\)) has the form in Eq. (15), where we use \(T\text{ ln }|I_{N-1} - \rho W_c(\Gamma )|\) to show that the log-determinant term in this model depends on the vector \(\Gamma\). For example, considering a convex combination of three matrices, we need to calculate: \(T \text{ ln }|I_{N} - \rho W_c(\gamma )| = T \text{ ln }|I_{N} - \rho (\gamma _1 W_1 + \gamma _2 W_2 + \gamma _3 W_3)|\) with \(\gamma _3 = 1-\gamma _1 - \gamma _2\). (We provide details regarding a computationally efficient approach to calculating the log-determinant term in the next section.)

where we use the expression \(\omega (\rho )\) to indicate that only the parameter \(\rho\) in the vector \(\omega\) varies, with the parameter vector \(\Gamma\) fixed.

This distribution does not reflect a known form as in the case of the conditional distributions for \(\delta\) and \(\sigma ^2\). We sample the parameter \(\rho\) from this conditional distribution using a Metropolis–Hastings sampling approach. Details are described in the next section where we outline our approach to avoid repeated calculation of the log-determinant term in this conditional distribution.

The (log) conditional posterior for \(\Gamma\) (given \(\delta , \rho , \sigma ^2\)) takes the form in Eq. (16), where we also have a log-determinant that depends on values taken by the vector \(\Gamma\). We use the expressions \(\omega (\Gamma )\) to indicate that these parameter vectors depend on \(\Gamma\) with the parameter vectors \(\rho\) fixed.

As in the case of the conditional distribution for \(\rho\), this distribution does not reflect a known form. We sample the parameter vector \(\Gamma\) as a block from this conditional distribution using a reversible jump procedure to produce proposal values for the vector \(\Gamma\) in conjunction with Metropolis–Hastings sampling. Details are described in the next section.

## A computationally efficient approach based on trace approximations

Debarsy and LeSage (2018) point to three computational challenges arising for this type of model where the weight matrix \(W_c\) is a function of estimated parameters \(\gamma _{\ell }\). One is that the log-determinant term in the conditional distributions for \(\rho\) and \(\Gamma\) in Eqs. (15) and (16) cannot be pre-calculated over a range of values for the dependence parameter \(\rho\) as is conventionally done in single weight matrix spatial regression models. A second issue relates to dealing with the restriction imposed on the parameters \(\sum _{\ell =1}^L \gamma _{\ell } = 1\). The third challenge arises when calculating measures of dispersion for the partial derivatives \(\partial y / \partial X\) that LeSage and Pace (2009) label *effects estimates*. An empirical measure of dispersion for the effects is typically constructed by evaluating the partial derivatives using a large number (say 1,000) MCMC draws for the parameters.^{Footnote 5} The expressions for the partial derivatives involve the inverse of an \((N-1) \times (N-1)\) matrix. For the case of a single weight matrix, LeSage and Pace (2009) show how to use a trace approximation to avoid calculating the matrix inverse thousands of times, but this approach does not apply to the model developed here.

In Sect. 6.1, a Taylor series approximation for the log-determinant term is set forth. The log-determinant term arises in the conditional distributions [see Eqs. (15) and (16)] required to sample the dependence parameter \(\rho\) and the parameters \(\gamma _{\ell },\, \ell = 1,\,\ldots ,\,L\) that serve as weights in the convex combination. Section 6.2 describes a reversible jump approach to block sampling the parameters \(\gamma _{\ell },\, \ell = 1,\,\ldots ,\,L\). Calculation of the effects estimates which represent partial derivatives of the dependent variable with respect to changes in the explanatory variables is the subject of Sect. 6.3.

### A Taylor series approximation for the log-determinant

Pace and LeSage (2002) set forth a Taylor series approximation for the log-determinant of a matrix like our expression: \(\text{ ln }|I_{N-1} - \rho \tilde{W}_c|\). They show that for a *symmetric* nonnegative weight matrix \(\tilde{W}_c\) with eigenvalues \(\lambda _{\min } \ge -1, \lambda _{\max } \le 1\), and \(1/\lambda _{\min }< \rho < 1\), and \(\textit{tr}(\tilde{W}_c) = 0\), where \(\textit{tr}\) represents the trace:

Golub and van Loan (1996, p. 566) provide the expression in Eq. (17), while Pace and LeSage (2002) note that due to the linearity of the trace operator we have expression (18). We note that the first-order trace involves \(\textit{tr} (W_c)\) which is zero for any convex combination of weight matrices that have zero diagonal elements. For symmetric matrices \(W_{\ell }\), we can express the second-order trace as a quadratic form in Eq. (19) involving the vector of parameters \(\Gamma\) and all pairwise multiplications of the individual matrices \(W_{\ell }\) as shown in Eq. (20):

This formulation separates the parameters in the vector \(\Gamma\) from the matrix of traces, which allows pre-calculation of the matrix of traces for a given set of weight matrices \(W_{\ell }\) prior to MCMC sampling. For the case of asymmetric matrices, we use matrix products \(\sum _i^L \sum _j^L W_i \odot W_j'\). We note that row-normalized weight matrices would be an example of asymmetric matrices. Our sociocultural weight matrices are by definition symmetric, because countries *i* and *j* with common language, common currency, and so on, would result in countries *j* and *i* having common language, common currency, and so on.

LeSage (2019) emphasizes that a more efficient computational expression is \((\Gamma \otimes \Gamma ) vec (Q^2)\), where \(\otimes\) is the Kronecker product and *vec* the operator that stacks the columns of the matrix \(Q^2\). Using this approach leads to a similar expression for the third-order trace, which in the case of \(L = 2\) takes the form involving \(L^3\) matrix products:

where again, we can use sums of matrix products to produce the \(L^3\) matrix products required:

We rely on a fourth-order Taylor series approximation, since LeSage et al. (2018) provide results from a Monte Carlo experiment, showing that this produces the desired accuracy in a cross-sectional model setting.

A fourth-order Taylor series approximation to the log-determinant \(T\, \text{ ln }|I_{N-1} - \rho W_c|\) takes the form in Eq. (24).

The conditional distribution for the parameter \(\rho\) consists of the log-determinant term as well as a term involving the sum-of-squared errors: \((1/2 \sigma ^2) e'e\), where \(e = \left( My \, \omega (\rho ) - Z \delta \right)\), and a third term: \(((N-1)T/2)\, ln \, \sigma ^2\). We use \(\omega (\rho )\) to indicate that only the scalar parameter \(\rho\) in the vector \(\omega\) is varied with the vector \(\Gamma\) fixed when evaluating the conditional distribution for \(\rho\). We note that \(e'e = \left( My \, \omega (\rho ) - Z \delta \right) ' \left( My \, \omega (\rho ) - Z \delta \right)\) results in quadratic forms with the parameters as outer vectors: \(e'e = \omega (\rho )' y' M' M y \, \omega (\rho ) - \omega (\rho )' y' M' Z \delta - \delta ' Z' My \, \omega (\rho ) + \delta ' Z'Z \delta\). Since the conditional distribution is evaluated twice when carrying out the Metropolis–Hastings step for sampling the parameter \(\rho\), once at the current value of \(\rho\) (which we label \(\rho ^c)\) and a second time at the proposed value (which we label \(\rho ^p\)), the quadratic forms plus the Taylor series trace approximation to the log-determinant allow for rapid calculations.^{Footnote 6}

The (log) conditional distribution for \(\rho\) is shown in Eq. (25), where the expression \(e' e(\rho )\) indicates that only the parameter \(\rho\) in the vector \(\omega\) varies, with elements in the vector \(\Gamma\) fixed:

The current value of \(\rho ^c\) is evaluated in Eq. (25) as well as a proposal value \(\rho ^p\). The proposal value is generated using a tuned random-walk procedure: \(\rho ^p = \rho ^c + \kappa \, {\mathcal {N}}(0,1)\), where \(\kappa\) is a tuning parameter and \({\mathcal {N}}(0,1)\) denotes a standard normal distribution. The tuning parameter is adjusted based on monitoring the acceptance rates with \(\kappa\) adjusted downward using \(\kappa ' = \kappa /1.1\) if the acceptance rate falls below 40%, and adjusted upward using \(\kappa ' = (1.1) \kappa\) when the acceptance rate rises about 60% (see LeSage and Pace 2009, p. 137). The (non-logged) conditional distributions are then used in expression (26) to calculate a Metropolis–Hastings acceptance probability \(\psi _{MH}\), where we use \((\cdot )\) to denote the conditioning parameters \((\delta , \Gamma , \sigma ^2)\):

If \((p(\rho ^p | \cdot ) - p(\rho ^c | \cdot )) > \text{ exp }(1))\), the Metropolis–Hastings probability \((MH_{p})\) is set to one, otherwise, \(MH_{p}\) is calculated using: \(\psi _{MH}(\rho ^c,\rho ^p)\). This probability \((MH_{p})\) is compared to a uniform(0, 1) random draw to make the accept/reject decision based on (uniform\((0,1) < MH_{p}) \rightarrow\) accept), otherwise reject.

### A reversible jump approach to block sampling \(\Gamma\)

A second computational challenge for MCMC estimation of the model is sampling parameters in the vector \(\Gamma\), which must sum to one. We rely on a block-sampling approach set forth in LeSage (2019). This involves a proposal vector of candidate values for \(\gamma _{\ell }, \ell = 1,2,\ldots ,L-1\), with \(\gamma _{L} = 1 - \sum _{\ell =1}^{L-1}\). Since a vector of proposal values are produced, it is easy to impose the restriction that \(\sum _{\ell } \gamma _{\ell } = 1\). The conditional distributions for the current and proposed vectors that we label \(\Gamma ^c, \Gamma ^p\) are evaluated with a Metropolis–Hastings step used to either accept or reject the newly proposed vector \(\Gamma ^p\). Block sampling the parameter vector \(\Gamma\) has the virtue that accepted vectors will obey the summing up restriction and reduce autocorrelation in the MCMC draws for these parameters. However, block sampling is known to produce lower acceptance rates which may require more MCMC draws in order to collect a sufficiently large sample of draws for posterior inference regarding \(\Gamma\).

LeSage (2019) uses a reversible jump procedure to produce the proposal values for the vector \(\Gamma\). This involves (for each \(\gamma _{\ell }, \ell = 1,\ldots ,L-1\)) a three-headed coin flip. By this, we mean a uniform random number on the open interval \(\textit{coin}\ \textit{flip} = U(0,1)\), with head #1 equal to a value \(\le 1/3\), head #2 a value \(> 1/3\) and \(\le 2/3\), and head #3 equal to a value \(> 2/3\) and smaller than one. Given a head #1 result, we set a proposal for \(\gamma ^p_{\ell }\) using a uniform random draw on the open interval \((0 < \gamma ^c_{\ell })\), the current value. A head #2 results in setting the proposal value equal to the current value \((\gamma ^p_{\ell } = \gamma ^c_{\ell })\), while a head #3 selects a proposal value based on a uniform random draw on the open interval \((\gamma ^c_{\ell } < 1)\).^{Footnote 7}

The (non-logged) conditional distributions in expression (27) are used to calculate a Metropolis–Hastings acceptance probability, where we use \((\cdot )\) to denote the conditioning parameters \((\delta , \rho , \sigma ^2)\):

The (log) conditional posterior for the (say the proposal) vector \(\Gamma ^p\) (given \(\delta , \rho , \sigma ^2\)) in Eq. (28) can be rapidly evaluated using the log-determinant approximation and the quadratic forms representation of the sum-of-squared errors. We use \(\omega (\Gamma ^p)\) in Eq. (29) to indicate that only the vector \(\Gamma\) changes in the vector \(\omega\), with the value of \(\rho\) fixed.

There are further computational gains from calculating some matrices prior to MCMC sampling.^{Footnote 8}

### Calculating effects estimates

The third computational challenge tackled by LeSage (2019) relates to constructing an empirical posterior distribution for the effects estimates representing the model partial derivatives. LeSage and Pace (2009) point out that for the case of (our) SDM model, partial derivatives take the form in Eq. (30) for the single explanatory variable vector *X* (logged gdp pc). They propose scalar summary measures of the own- and cross-partial derivatives that they label *direct* and *indirect* effects, shown in Eqs. (31) and (33), where *tr* represents the trace operator and \(\iota _{N-1}\) is an \((N-1) \times 1\) vector of ones.^{Footnote 9}

While the expressions in Eqs. (31), (32) and (33) produce point estimates for the scalar summary measures of effects (own- and cross-partial derivatives) used to interpret the impact of changes in the SDM model explanatory variables on dependent variable outcomes, we also require measures of dispersion for the purpose of statistical tests regarding the significance of these effects. Use of an empirical distribution constructed by simulating the nonlinear expressions in Eq. (30) using (say 1,000) draws from the posterior distribution of the underlying parameters \(\rho , \beta _r, \theta _r, \gamma _{\ell },\ell = 1,\ldots ,L\) is suggested by LeSage and Pace (2009). Note that a naive approach to such a simulation-based empirical distribution would require calculation of the \((N-1) \times (N-1)\) matrix inverse \((I_{N-1} - \rho W_c)^{-1}\) a large number of times, for varying values of the parameters \(\rho , \gamma _{\ell }, \ell = 1,\ldots ,L\), which would be computationally intensive.

The required quantity for constructing the empirical distribution of the effects is \(tr(S(W_c))\), which can be estimated without a great deal of computational effort (see LeSage and Pace 2009 for details). In the case described in LeSage and Pace (2009), the SDM model relies on a single weight matrix *W*, allowing use of estimated traces \(tr(W^2)/(N-1), tr(W^3)/(N-1), \ldots , tr(W^q)/(N-1)\) calculated once prior to simulation of the effects estimates. This allows simulation of the empirical distribution for the effects estimates using only vector products involving draws of the parameters \(\rho , \delta\) taken from their posterior distributions.

Our situation differs because the matrix \(W_c\) depends on estimated parameters \(\gamma _{\ell }, \ell = 1,\ldots ,L\) ruling out use of estimated traces calculated prior to the simulation. We could rely on posterior means for \(\gamma _{\ell }\), i.e., \(\bar{\gamma }_{\ell },\) to create a single matrix \(\hat{W}_c (\bar{\gamma }_{\ell })\), for which estimated traces could be calculated prior to simulation. However, this would ignore stochastic variation in the effects estimates that arise from the fact that there is uncertainty regarding the parameters \(\gamma _{\ell }\). Ideally, we would like to use draws for the \(\gamma _{\ell }\) parameters from their posterior distributions when simulating the empirical distribution of effects estimates.

LeSage (2019) points out that since we have already calculated trace expressions for \(j=2, 3, 4\) in Eq. (18) to produce the Taylor series approximation to the log-determinant term based on the quadratic forms in Eq. (34), these can be used to replace low-order traces estimated based on posterior means \((\bar{\gamma }_{\ell })\) used to construct a single matrix \(W_c\). Higher-order traces decline in magnitude, so low-order traces are most important for accurate estimates of the effects.

Specifically, their approach estimates \(q=100\) traces using the approach of LeSage and Pace (2009), based on a single weight matrix \(\hat{W}_c = \sum _{\ell =1}^L \bar{\gamma }_{\ell } W_{\ell }\), constructed using posterior means for \(\gamma _{\ell }\), then replace the estimated first-order trace with zero (a known value), and the second- through fourth-order traces with terms shown in Eq. (34). The MCMC-sampled parameters \(\Gamma\) are used in the expressions (34) during the simulation that produces the empirical distribution of effects estimates. LeSage (2019) notes that this incorporates uncertainty regarding the parameters \(\gamma _{\ell }\) for low-order traces since they are using MCMC draws for these parameters. They argue that since higher-order terms involve increasingly smaller magnitudes of the parameters \(\rho\) and \(\Gamma\), low-order traces are most important for accurate estimates of the effects.

Of course, this is a computational compromise between calculating an empirical distribution for the effects estimates based on the exact formula which would require thousands of evaluation of the \((N-1) \times (N-1)\) matrix inverse. A series of Monte Carlo experiments reported by LeSage et al. (2018) show that this approach produces effects estimates with very little bias except in cases where the level of spatial dependence is very high (e.g., values of \(\rho \le -0.9\) or \(\rho \ge 0.9\)).

## Application of the cross-sectional dependence panel models

We consider panel model specifications that use the \(i\hbox {th}\) column (\(i\hbox {th}\) row) of the flow matrix representing exports (imports) from (to) country *i* to (from) all other \((N-1)\) countries *j* as the dependent variable vector *y* over the 38 years from 1963 to 2000. The explanatory variable is (logged) gross domestic product per capita lagged one year to cover the period from 1962 to 1999. The trade flows are from Feenstra et al. (2005), while the gdp data at market prices (current US$) and population data come from World Bank’s (2002) World Development Indicators. A usable sample of 74 countries (see Table 9 in Appendix) was constructed for which gdp, population and trade flows were available over the 38 years.^{Footnote 10}

Given our sample of 74 countries over the 38-year period from 1963 to 2000, this results in 74 different sets of estimates for the import panel data model relationship and another 74 sets of estimates for the export panel data model relationship. Specifically, we consider estimates from a single panel data model specification applied to 74 sets of import flows and 74 sets of export flows over the 38-year time period. One set of estimates reflect the panel data relationship between imports to a single country from all other 73 countries over the 38-year period in our sample. This single model relationship is estimated for all 74 countries in our sample, resulting in 74 different sets of estimates for the import-model parameters. Another set of estimates reflect the panel data relationship between exports from a single country to all other 73 countries over the 38-year period in our sample. Again, this model relationship is estimated for all 74 countries in our sample, resulting in 74 different sets of estimates for the export-model parameters.

This approach allows panel estimation based on the *T* time periods for each country’s exports/imports relationship such that we have heterogeneous coefficients across countries. Specifically, different (country-specific) dependence parameters \(\rho _i\) reflect different levels of dependence, different responses \(\delta _i\) to own- and neighboring countries income, and parameters \(\gamma _{\ell }, \ell = 1,\ldots ,L\) as well as different noise variance estimates \(\sigma _{\varepsilon ,i}^2\). The specification also implicitly allows for fixed effects between each dyad of countries, since there will be a set of \(N-1\) fixed effects for each country *i*’s exports to (imports from) all other countries *j*. As already noted, we eliminated time-specific effects by demeaning the vector of *N* trade flows for each time period to eliminate these effects prior to estimation.

### Evidence of cross-sectional dependence

The first question we examine is whether trade flows exhibit cross-sectional dependence, which is a different phenomenon than heterogeneity modeled by the fixed effects transformations. In the presence of cross-sectional dependence, estimates from conventional models that ignore cross-sectional dependence can be shown to be biased and inconsistent.

The presence of cross-sectional dependence also implies spillover impacts arising from changes in neighboring countries \(j \ne i\) income on country *i*’s trade flows. In our model, neighbors are defined broadly to include both spatial neighbors and sociocultural neighbors. Specifically, changes in income of countries *j* that have spatial, common language, currency, trade agreements, or colonial ties with country *i* will impact export or import flows in the SAR model, provided that the scalar dependence parameter \(\rho\) is different from zero and the parameter \(\beta\) is nonzero. In the case of the SDM model, the scalar dependence parameter \(\rho\) could be zero but there will still be spillover impacts if the parameters \(\theta _{\ell }, \ell = 1,\ldots ,L\) are nonzero.

Figure 1 shows a histogram of the distribution of estimates for the 74 different scalar dependence parameters \(\rho\) from the SAR model estimates, and Figure 2 shows that for the SDM model estimates of \(\rho\). Recall, we have estimates from 74 export and 74 import data sets, and the figures show histograms for both import and export estimates. From the figures, it should be clear that all 148 sets of SAR and 148 sets of SDM estimates are positive. They were also all statistically different from zero based on lower 0.05 and upper 0.95 credible intervals constructed from the (empirical) posterior distribution based on MCMC draws.

Table 1 shows the mean value for \(\rho\) over all 74 countries along with standard deviations of the distribution across countries and a \(t-\)statistic constructed using the mean divided by the standard deviation. These results are consistent with the notion that we have a distribution of cross-sectional dependence estimates for our sample of 74 countries that is different from zero.

We note that estimates for the parameters \(\gamma _{\ell }\) that are discussed in the next section are not well-identified for values of \(\rho\) near zero. Intuitively, in the face of no cross-sectional dependence estimates of the relative importance/weights assigned to different types of cross-sectional connectivity structures are meaningless. Since estimates of the cross-sectional dependence parameters \(\rho\) were positive and different from zero for all countries, we can appropriately turn attention to the estimates for \(\gamma _{\ell }\) that provide an indication of the relative importance of each of the five types of dependence.

### Relative importance of spatial and sociocultural connections

As motivated, the relative sizes of the parameter estimates for \(\gamma _{\ell }\) allow us to draw conclusions about what types of connectivity are important. Figure 3 shows a histogram of these five sets of parameter estimates for the 74 countries determined using the import flows data and SDM relationship. Figure 4 shows these estimates for the export flows data and SDM relationship.

For the set of estimates based on the import data relationships we see a relatively large number of countries (48) where estimates for \(\gamma\) associated with common currency take on small values less than 0.1, and the same is true for common language where we see 30 countries in this range of small values. In the case of estimates from the export data relationships shown in Figure 4, we also see evidence that \(\gamma\) estimates associated with common language and currency weight matrices take on small values less than 0.1 for a large number (over 50) of the 74 countries.

A more formal approach to examining this issue involves counting countries where lower 0.05 bounds of the (truncated) distribution of MCMC draws for the parameters \(\gamma\) are greater than zero.^{Footnote 11} Table 2 shows these counts of countries for both the SAR and SDM estimates based on import and export flow relationships. From the table, we see that for the case of the SDM relationship, spatial dependence and colonial ties were among the most important types of dependence in estimates from both import and export relationships. Of the 74 countries the \(\gamma\) parameters on *W*space were nonzero in 58 and 59 countries for import and export estimates, respectively. In the case of colonial ties, there were 52 countries with nonzero weight placed on this type of dependence for the set of import estimates and 61 countries for the export estimates. This suggests that colonial ties are slightly more important for explaining variation in export flows than import flows.

For the SDM models, common currency was the least important type of dependence for the import relationship, since only 18 countries had nonzero \(\gamma\) estimates, and common language for export estimates with 14 nonzero countries. Common currency was next least important for export relationships, while import models treated common language more importantly with 38 nonzero countries. Of course, imported consumer goods may require common language marketing labels and instruction manuals, partially explaining this type of result. The existence of trade agreements between countries seems to be important for both imports and exports in slightly more than half of the 74 countries examined (nonzero estimates for 37 and 41 countries, respectively). A similar pattern arose for the counts arising from the SAR relationship as discussed for the SDM relationship.

Table 3 shows the means and standard deviations \(\sigma _{\gamma }\) for the 74 countries posterior estimates of \(\gamma _{\ell }\), for both the SAR and SDM estimates based on import and export data. We note that since the posterior means across \(\gamma _{\ell }, \ell =1,\ldots ,5\) sum to unity for each country, the means across our sample of 74 countries reported in the table also sum to one.

The magnitudes reported reflect the patterns of counts from Table 3, with average \(\gamma\) values for the spatial weights being the largest (around 0.33), for SAR and SDM estimates based on both import and export data. In the case of import data, the second most important type of connectivity between countries was the existence of trade agreements (except the SDM export model), with an average value around 0.23 for both SAR and SDM relationships based on both import and export data. The SAR import estimates give roughly equal weight of 0.13 to the remaining three types of connectivity structures (common currency, language and colonial ties) with SDM import estimates also roughly equal with slightly less weight given to common currency. We also see agreement between the SAR and SDM estimates with regard to the importance of the remaining three types of connectivity (common currency, language and colonial ties) for exports. Trade agreements and colonial ties were most important (around 0.23) and common language least important (around 0.07).

A more complete picture of the \(\gamma _{\ell }\) weights assigned to the various types of dependence is provided in Tables 4, 5, 6, 7, 8 in the Appendix. Country-level estimates for each of the five \(\gamma _{\ell }\) parameters are sorted from low to high. It is important to note when considering the magnitudes of these estimates that simultaneous cross-sectional dependence implies that changes in income in country *i* will impact neighboring countries (first-order nodes in the connectivity structure/network) as well as higher-order neighboring nodes. That is, neighbors to the neighboring countries, neighbors to the neighbors of the neighbors, and so on, with the magnitude of impact declining for higher-order neighboring relations.

An implication of this is that (for example) colonial ties could reflect an important connectivity structure for countries like Sweden or Finland who do not have immediate (first-order) colonial ties. Nonetheless, cross-sectional dependence suggests that if colonial ties are important for major trading partners of Sweden or Finland, then this type of connectivity structure would also be important (receive a large \(\gamma\) estimate) for Sweden or Finland. Similar statements could be made about other types of connectivity structures; important higher-order links/nodes in the network of trading partners can mean that these connectivity structures represent an important source of cross-sectional dependence.

An unfortunate aspect of models such as that set forth here that rely on multiple types of connectivity (simultaneous dependence weight matrices) is that we cannot separate out the spillover/network impacts arising from each type of connection. This can be seen by considering the matrix inverse: \((I_N - \rho W_c)^{-1} = I_N + \rho W_c + \rho ^2 W_c^2 + \ldots\) which will contain numerous cross-products involving the different matrices \(W_{\ell }, \ell = 1,\ldots ,L\). Higher-order powers will in general involve increasing larger matrix cross-products. The spirit of the model specification is that (say) spatial proximity to countries whose trade patterns rely heavily on (say) colonial ties might lead to multiple transmission channels that ultimately impact the observed patterns of trade flows.

### Empirical estimates of bias from ignoring cross-sectional dependence

Ignoring spatial and sociocultural dependence when estimating empirical trade flow models will lead to bias in estimates of the impact arising from income on trade flows. The magnitude of the bias can be quantified by examining the size and significance of the indirect effects estimates from the SAR and SDM specifications. The size of the indirect effects depends on the magnitude of the dependence parameter \(\rho\) as well as the coefficient on income \(\beta\) in the case of the SAR specification. Intuitively, in cases where there is an absence of cross-sectional dependence \((\rho = 0)\) we will not see a large amount of bias.

For the SDM specification, the size of indirect effects is determined by the dependence parameter \(\rho\), the coefficient on income \(\beta\) as well as coefficients \(\theta _{\ell }, \ell = 1,\ldots ,L\). Here even in the absence of cross-sectional dependence, nonzero values for the parameters \(\theta _{\ell }\) would indicate omitted variable bias arising from contextual effects ignored by traditional models that do not include explanatory variables measuring these influences. Cross-sectional dependence reflects the fact that trade takes place in the context of a worldwide network of flows.

Since conventional trade models ignore cross-sectional dependence of the type captured by the SAR specification by assuming that \(\rho = 0\), this implies an assumption of no spillovers (indirect effects of zero). If the SAR specification is consistent with the data, omitted variables bias will arise, and estimates of the coefficients representing the impact of country-level income on trade flows will likely overstate this impact by inappropriately attributing variation in trade flows to own-country income. In cases where the SDM specification is most consistent with the data, conventional models ignore the influence of neighboring countries income, where neighboring countries are broadly defined to include spatial as well as sociocultural neighbors. In cases where the SDM specification is the data generating process, bias in conventional models can be attributed to ignoring both interaction between countries (assuming \(\rho = 0\)) and contextual effects (assuming \(\theta _{\ell } = 0\)).

Figure 5 shows a frequency distribution of the posterior mean indirect effects estimates from the SAR relationship involving both import and export data across the 74 countries, and Figure 6 displays these effects for the SDM relationship. For the SAR import estimates, we see 17 countries where indirect effects are near zero and in the case of the export estimates 14 countries with near-zero indirect effects. Remaining countries exhibit positive spillovers reflecting the magnitude of bias that would arise from ignoring cross-sectional dependence. In the case of the SDM specification, there are 17 of the 74 countries with (near) zero spillovers in the case of both the import and export estimates, with mostly positive spillovers.

## Closing remarks

We raise questions about the role played by time-invariant country-specific factors in explaining variation in trade flows. These are typically viewed and modeled using fixed effects or transformations to capture the heterogeneity impact of these in a panel data model of trade flows.

Our findings indicate that conventional approaches to eliminating fixed effects associated with time-invariant factors leave a great deal of variation in trade flows unexplained. This unexplained variation takes the form of: (1) cross-sectional dependence of trade flows on neighboring country flows and/or (2) contextual effects from neighboring country income levels. Using data transformed to eliminate time-invariant fixed effects, we use a panel data extension of the Bayesian SAR and SDM models set forth in LeSage (2019) to examine the question of cross-sectional dependence and contextual effects using a panel of imports and exports from a sample of 74 countries over the 38-year period from 1963 to 2000. Specifically, we consider 148 different sets of estimates for the panel data model, 74 sets of estimates for a panel data import relationship between each country and all other 73 countries over the 38-year period in our sample. Another set of 74 estimates from the panel data model relating exports from each country to all other 73 countries, covering the 38-year time period is also considered.

The SAR and SDM panel data model utilizes a convex combination of different types of connectivity between countries. We consider: spatial proximity, common currency and language connections, trade agreements and colonial ties. The model produces estimated weights for each of the five types of connectivity that sum to unity, allowing a posterior inference regarding the relative importance of the various types of connectivity. Our findings indicate that the most important type of connectivity is spatial proximity to neighboring countries, with the next most important types of connectivity being trade agreements and colonial ties. Common currency and language represent the least important connections between countries.

The spatial autoregressive (SAR) and spatial Durbin model (SDM) panel data specifications capture simultaneous cross-sectional dependence between trade flows, with significant cross-sectional dependence pointing to biased and inconsistent estimates for model specifications that ignore the presence of this type of dependence. Simultaneous cross-sectional dependence implies spillovers from changes in one country’s income to other countries, with the pattern of impacts falling on *neighboring countries*. In our model, that utilizes a convex combination of different types of connectivity, neighboring countries are broadly defined to include countries: (1) located nearby in space, having (2) common currency, (3) common language, (4) trade agreements, or (5) colonial ties. The spillovers can impact immediately neighboring countries, neighbors to the neighboring countries, neighbors to the neighbors of the neighbors, and so on, with impact declining for higher-order neighboring relations.

The implications of our findings are twofold. One is that conventional treatment of generalized distance factors such as common language, free trade and stronger forms of agreements, common currency, and so on, as time-invariant sources of heterogeneity in empirical panel trade model specifications ignore potential cross-sectional dependence and/or contextual effects (characteristics of neighboring countries). We explored the magnitude of bias that arises from this problem. A second implication is that from a theoretical perspective sociocultural proximity of countries seems as important as pure geographical proximity. Our estimates point to spatial proximity receiving around 1/3 weight and sociocultural proximity around 2/3 weight.

The results presented here suggest more attention be given to panel model specifications that allow for cross-sectional dependence in trade flows, as well as models that incorporate neighboring country characteristics. This suggests more emphasis on theoretical and empirical models of the type introduced by Lebreton and Roi (2011), Koch and LeSage (2015) for bilateral trade flows, LeSage and Pace (2008), Baltagi et al. (2007, 2008) for bilateral migration, and Behrens et al. (2012) for foreign direct investment.

## Notes

- 1.
We deal only with the case where the number of importing and exporting countries is the same.

- 2.
We exclude exports from country

*i*to itself, which would be on the main diagonal of the trade flow matrix, since we have no information on intra-country flows, resulting in \(N-1\). - 3.
We do not introduce an intercept vector and associated parameter since use of the within transformation to eliminate fixed effects precludes an intercept.

- 4.
A value of −1 is often used in practice as this ensures that the matrix inverse \((I_{(N-1)T} - \rho (I_T \otimes W_c))^{-1}\) exists. This has the advantage that we do not have to calculate the minimum eigenvalue of \(W_c\) which changes as a function of the values taken by \(\gamma\).

- 5.
In the case of maximum likelihood estimation, parameters (say 1000) are drawn from a normal distribution using the mean estimates and estimated covariance matrix based on a numerical or analytical Hessian.

- 6.
Note also that we pre-compute

*My*prior to MCMC sampling. - 7.
See Debarsy and LeSage (2018) for a discussion of the reversible jump nature of this procedure.

- 8.
Specifically, \(T_1 = y' M' My\), \(T_2 = y' M' Z\), \(T_3 = Z' M y\), \(T_4 =Z' Z\) can be calculated since they consist of known quantities (sample data), so the quadratic forms are: \(e'e = \omega (\Gamma )' T_1 \omega (\Gamma ) - \omega (\Gamma )' T_2 \delta - \delta ' T_3 \omega (\Gamma ) + \delta ' T_4 \delta\). As noted, \(\omega (\Gamma )\) indicates that \(\omega (\Gamma )' = \left( \begin{array}{ccccc} 1&-\rho \gamma _1&-\rho \gamma _2&\cdots&-\rho \gamma _L \end{array} \right)\), where the parameter \(\rho\) is conditioned on (fixed).

- 9.
- 10.
In addition, we eliminated countries from our sample that had one or more zero rows in any of the five weight matrices. As noted earlier, this is necessary to ensure that the matrix \(W_c\) does not contain zero rows, when we allow individual \(\gamma _{\ell }, \ell = 1,\ldots ,L\) parameters to take values of zero. This resulted in a few countries such as South Korea and Japan for which data were available to be excluded from our sample.

- 11.
Technically, although we allow for the open interval \((0< \gamma < 1)\), we consider a lower 0.05 value above 0.01 for the MCMC draws to be nonzero.

## References

Anderson JE, van Wincoop E (2003) Gravity with gravitas: a solution to the border puzzle. Am Econ Rev 93(1):170–192

Baltagi BH, Egger P, Pfaffermayr M (2007) Estimating models of complex FDI: are there third-country effects? J Econom 140(1):260–281

Baltagi BH, Egger P, Pfaffermayr M (2008) Estimating regional trade agreement effects of FDI in an interdependent world. J Econom 145(1–2):194–208

Baltagi BH, Egger P, Pfaffermayr M (2014) Panel data gravity models of international trade, 31 Jan 2014. CESifo Working Paper Series No. 4616. SSRN: http://ssrn.com/abstract=2398292

Behrens K, Ertur C, Koch W (2012) Dual gravity: using spatial econometrics to control for multilateral resistance. J Appl Econom 27(5):773–794

Debarsy N, LeSage JP (2018) Flexible dependence modeling using convex combinations of different types of connectivity structures. Reg Sci Urban Econ 69(2):46–68

Elhorst P (2013) Spatial econometrics: from cross-sectional data to spatial panels. Springer, Berlin Heidelberg

Feenstra RC, Lipsey RE, Deng H, Ma AC, Mo H (2005) World trade flows: 1962–2000. NBER Working Paper Series 11040. http://www.nber.org/papers/w11040

Golub GH, van Loan CF (1996) Matrix computations. John Hopkins University Press, Baltimore

Hazir CS, LeSage JP, Autant-Bernard C (2018) The role of R&D collaboration networks on regional innovation performance. Pap Reg Sci 97(3):549–567

Koch W, LeSage JP (2015) Latent multilateral trade resistance indices: theory and evidence. Scott J Polit Econ 62(3):264–290

Krisztin T, Fischer MM (2015) The gravity model for international trade: specification and estimation issues. Spat Econ Anal 10(4):451–470

Lebreton M, Roi L (2011) A spatial interaction model with spatial dependence for trade flows in Oceania: a preliminary analysis. Unpublished manuscript, Université Montesquieu Bordeaxu IV

Lee L-F, Yu J (2010) Estimation of spatial autoregressive panel data models with fixed effects. J Econom 154(2):165–185

LeSage JP (2019) Fast MCMC estimation of multiple W-matrix spatial regression models and Metropolis-Hastings Monte Carlo log.marginal likelihoods. J Geogr Syst. https://doi.org/10.1007/s10109-019-00294-2

LeSage JP, Fischer MM (2016) Spatial regression-based model specifications for exogenous and endogenous spatial interaction. In: Patuelli R, Arbia G (eds) Spatial econometric interaction modelling. Springer, Berlin, pp 37–68

LeSage JP, Pace RK (2008) Spatial econometric modeling of origin-destination flows. J Reg Sci 48(5):941–967

LeSage JP, Pace RK (2009) Introduction to spatial econometrics. Taylor Francis/CRC Press, Boca Raton

LeSage JP, Thomas-Agnan C (2015) Interpreting spatial econometric origin-destination flow models. J Reg Sci 55(2):188–208

LeSage JP, Chih Y-Y, Vance C (2018) Spatial dynamic panel models for large samples. Paper presented at the North American Meetings of the Regional Science Association International, San Antonio, TX, November 2018

Pace RK, LeSage JP (2002) Semiparametric maximum likelihood estimates of spatial dependence. Geogr Anal 34(1):75–90

Porojan A (2001) Trade flows, spatial effects: the gravity model revisited. Open Econ Rev 12(3):265–280

World Bank (2002) World development indicators. WTO

WTO (2014) WTO Regional trade agreements database. WTO, Geneva

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

## About this article

### Cite this article

LeSage, J.P., Fischer, M.M. Cross-sectional dependence model specifications in a static trade panel data setting.
*J Geogr Syst* **22, **5–46 (2020). https://doi.org/10.1007/s10109-019-00298-y

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Bayesian
- MCMC estimation
- Sociocultural distance
- Origin–destination flows
- Treatment of time-invariant variables
- Panel models

### JEL Classification

- C18
- C51
- R11