Network dependence in multi-indexed data on international trade flows

Faced with the problem that conventional multidimensional fixed effects models only focus on unobserved heterogeneity, but ignore any potential cross-sectional dependence due to network interactions, we introduce a model of trade flows between countries over time that allows for network dependence in flows, based on sociocultural connectivity structures. We show that conventional multidimensional fixed effects model specifications exhibit cross-sectional dependence between countries that should be modeled to avoid simultaneity bias. Given that the source of network interaction is unknown, we propose a panel gravity model that examines multiple network interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo estimation methods that produce a Monte Carlo integration estimate of the log-marginal likelihood that can be used for model comparison. Application of the model to a panel of trade flows points to network spillover effects, suggesting the presence of network dependence and biased estimates from conventional trade flow specifications. The most important sources of network dependence were found to be membership in trade organizations, historical colonial ties, common currency, and spatial proximity of countries.


Introduction
In recent years fixed effects model specifications have enjoyed great popularity in the panel gravity literature as a way to take into account the specific threedimensional nature of international trade flow data sets. 1 These models, differing in the specification of the fixed effects, focus on unobserved heterogeneity, but ignore any potential cross-sectional dependence due to network interactions. Faced with this problem, we propose a panel gravity model that examines multiple network interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo (MCMC) estimation methods that produce a Monte Carlo integration estimate of the log-marginal likelihood that can be used for model comparison.
Empirical international trade modeling has focused on multilateral resistance terms based on van Wincoop (2003, 2004), which Feenstra (2002Feenstra ( , 2004 argued could be proxied by inclusion of origin-and destinationspecific fixed effects parameters in the model specification. Ranjan and Tobias (2007) model the effects parameters as a (semi-parametric) function of an exogenous variable measuring bilateral trade resistance between origin-destination dyads. Koch and LeSage (2015) make a convincing argument that all of these models relax multilateral aspects of the dependence structure in favor of bilateral dependence, or a specification based on heterogeneity rather than simultaneous/multilateral dependence. They argue this is important given Behrens et al. (2012) demonstration that model specifications that control for regional heterogeneity do not necessarily capture the full structure of interdependence between trade flows implied by theory. Behrens et al. (2012) derive a dual version of the Anderson and van Wincoop (2003) model using a model based on quantity competition that also implies a simultaneous structure of interdependence between observed trade flows. Koch and LeSage (2015) show how linearizing the system of nonlinear multilateral resistance relationships around a free trade world leads to a structured random effects interdependence structure that can be used as a Bayesian prior to produce statistical estimates of the inward and outward multilateral resistance indices. They argue their statistical approach has advantages over the non-stochastic numerical approach used by Anderson and van Wincoop (2003) to solve for the multilateral resistance indices.
In contrast to the empirical trade literature, the regional science literature regarding modeling of flows between regions has placed more emphasis on use of spatial autoregressive models that include spatial lags of flows from regions neighboring the origin and destination, an idea promoted by LeSage and Pace (2008), who devote an entire chapter to these models in their book (LeSage and Pace 2009, Chapter 8). This paper extends this type of spatial autoregressive models to a panel data model setting. The literature discussed above did not deal with panel data models where we have a series of flow matrices covering numerous time periods, but rather more conventional gravity models consisting of a single flow matrix.
In a panel data model setting, distance as well as sociocultural factors (which we can view as generalized distance variables) are generally time invariant, so they are treated as fixed effects. Balazsi et al. (2018) explore econometric implications of using a host of alternative multidimensional fixed effects in panel gravity models, but assume trade flows to be independent. We apply two of the more widely used multidimensional fixed effects transformations from the empirical trade literature to a panel of N ¼ 70 countries trade flows covering the T ¼ 38 years from 1963 to 2000, in a model specification that allows for the presence of simultaneous crosssectional dependence.
One approach uses N origin and N destination country fixed effects plus T timespecific effects proposed by Mátyás (1997) as an extension of the conventional fixed effects panel data model (e.g., Baltagi 2005) to the multidimensional situation that arises in the case of gravity models. These models take an N 2 T Â 1 vector of dependent variables reflecting the matrix of trade flows between the N countries (assuming flows between all countries) at each time period, resulting in a dummy matrix of fixed effects with column rank of 2N þ T À 2. The second approach makes use of fixed effects proposed by Cheng and Wall (2005) that introduce fixed effects for origin-destination dyads as well as time periods, resulting in a dummy matrix with column rank of N 2 þ T À 1, frequently adopted in the empirical trade literature.
Apart from accommodating heterogeneity in flows using fixed effects, the dependent variable vector of N 2 Â 1 trade flows for each time period are assumed to be independent, so flows between countries that have a common currency, language, membership in trade organizations, border or colonial ties are no more likely than flows between countries having nothing in common. Cross-sectional dependence in flows suggests that flows between countries with sociocultural similarity are likely to exhibit dependence as opposed to independence. We set forth a model specification that allows for this type of dependence in flows across the N 2 T country-time dyads, along with computationally efficient MCMC estimation methods. 2 The nature of dependence that we model would be better labeled network dependence rather than spatial or cross-sectional dependence, because we introduce dependence between network nodes involving origin-and destination-dyads as well as covariance across these. We use the terms network and cross-sectional dependence interchangeably, but note that the network dependence specification introduced here reflects a special case of cross-sectional dependence that can arise 2 Sarafidis and Wansbeek (2012) provide an overview of econometric specifications for dealing with cross-sectional dependence, consisting of two main approaches, spatial econometric and common factor models. In this paper we take the spatial econometric approach, but note that a common factor specification could also be employed to address the issue we raise. Common factor cross-sectional dependence specifications would need to be extended to address the type of dependence that we consider here. See also Baltagi and Maasoumi (2013), who provide an introductory discussion for a series of articles in a special issue devoted to dependence in cross-section, time series and panel data models.
in the case of origin-destination flows that has not received a great deal of attention in the literature.
An innovative aspect of our MCMC estimation approach is use of Metropolis-Hastings guided samples from the joint posterior distribution of the dependence parameters to construct a Monte Carlo integration estimate of the log-marginal likelihood useful for model comparison. The MCMC estimation approach allows for estimation and posterior inference on a vector of dependence parameters that determines the relative importance of network dependence. In our case, we rely on Markov Chain Monte Carlo sampling to estimate the model parameters with the dependence parameters sampled using a Metropolis-Hastings procedure. Since this approach produces draws of the dependence parameters that are steered by Metropolis-Hastings accept/reject decisions to areas of high density of the joint posterior, we can produce an efficient Monte Carlo integration of the log-marginal likelihood. 3 Another methodological innovation is use of multiple network interaction structures constructed to reflect spatial proximity between countries, as well as numerous types of sociocultural proximity such as common currency, language, colonial ties, and so on. An equally weighted combination of these multiple weight matrices is used to form a single weight matrix. This approach allows us to treat sociocultural factors as sources of network dependence in the panel gravity model.
The remainder of this paper is organized as follows. Section 2 outlines conventional panel gravity models as used in the empirical trade literature, along with a discussion of the two multidimensional fixed effects specifications that we explore. 4 Section 3 introduces network dependence in a conventional multidimensional fixed effects model specification, and describes the prior setup of the MCMC estimation approach. Section 4 sets forth computational challenges to estimation along with MCMC procedures to overcome these challenges. Section 5 applies our model to panel data on trade flows between 70 countries covering the 38 years from 1963 to 2000. In the application of the model we consider combinations of multiple sociocultural connectivity structures, that can be used in conjunction with logmarginal likelihood estimates to determine the relative importance of each type of connectivity. We find strong evidence of network dependence in trade flows pointing to network spillover effects, and suggesting that ignoring the presence of this type of cross-sectional dependence will result in biased estimates from conventional trade flow model specifications. Section 6 concludes the article and a technical appendix provides further details on the MCMC estimation approach.
In multidimensional panel data sets, the dependent variable of a panel gravity model is observed along three indices, such as y ijt ; i ¼ 1; . . .; N i ; j ¼ 1; . . .; N j ; t ¼ 1; . . .; T. (1997) made an early attempt to introduce multidimensional fixed effects for (log-linear) gravity model specifications such as that in (1). 5 The dependent variable y ijt in (1) reflects an N 2 T Â 1 vector of (logged) trade flows between N countries i and j at time t, so i ¼ 1; . . .; N; j ¼ 1; . . .; N, and t ¼ 1; . . .; T.

Mátyás
where the a i , c j parameters are the origin and destination country-specific fixed effects, and k t are time-specific fixed effects. b is a 2K Â 1 vector of parameters on the N 2 T Â 2K (logged) covariates X ijt ¼ ðX 1 it ; X 1 jt ; . . .; X K it ; X K jt Þ with X k it (X k jt ) denoting the kth measure for the economic size of the origin country i (destination country j) in the country dyads (i, j) at time t. We note that distance between the countries is time-invariant and not in the set of covariates. It is assumed that e ijt are normal i.i.d. idiosyncratic disturbances with zero mean and scalar (r 2 e ) variance. 6

The Cheng and Wall model
There are of course other specifications for the fixed effects. For example, Egger and Pfaffermayr (2003) propose bilateral specific fixed effects c ij . A variant of this, proposed by Cheng and Wall (2005), popular in the empirical trade literature is shown in (2), or in matrix form y ¼ Xb þ D c c þ D k k þ e with corresponding dummy design matrices D c ¼ ð I N I N i T Þ and D k ¼ ð i N i N I T Þ. Balazsi et al. (2018) point out that the model in (1) represents a special case of that in (2), and there is an analogy of this 3-dimensional (3D) situation in (2) to 2-dimensional (2D) panel data models, where individuals in the 2D situation are treated as (i, j) pairs in the 3D setting. In other words, individual effects are now assigned to (i, j) dyads.
We note that the model specifications in (1) and (2) assume that the dependent variable vector of N 2 Â 1 trade flows for each time period are statistically independent, so flows between countries that have a common currency, language, membership in trade organizations, border or colonial ties are no more likely than flows between countries having nothing in common. Network dependence in flows suggests that flows between countries with sociocultural similarity are likely to exhibit dependence as opposed to independence. In the next section, we set forth a model specification that allows for this type of dependence in flows across the N 2 T country-time dyads.

The model
We set forth an extension of the conventional panel gravity model that allows for origin-, destination-and origin-destination-based network dependence. The matrix expressions in (3) represent the network dependence panel gravity model for origindestination flows introduced in matrix form.
where y is the N 2 T Â 1 dependent variable vector of origin-destination flows for each time period, organized with t being the slow index for elements y ijt in the vector y. The N 2 T Â K matrix X ot contains origin-specific covariates with the associated K Â 1 parameter vector b o , where X ot ¼ X t i N , and similarly for X dt ¼ i N X t with X t being an N Â K matrix of covariates measuring the (economic) size of each destination country at time t, with associated b d . The N 2 T Â 1 vector e represents the normally distributed i.i.d. scalar variance disturbances. The matrices of fixed effects D c c þ D k k are as defined after (2), or we could use fixed effects from Mátyás (1997) shown in (1) as well. The model in (3) indicates that flows at each time period t ¼ 1; . . .; T exhibit dependence on flows of countries that are sociocultural neighbors to the origin country captured by the N 2 T Â 1 vector I T ðW I N Þy with the associated scalar parameter q o measuring the strength of that dependence. The matrix W is an N Â N weight matrix that defines neighboring countries based on sociocultural connectivity structures. A neighboring country is indicated by a non-zero (i, j) element in the N Â N matrix W, which has zeros on the main diagonal. The matrix W is normalized to have row-sums of unity, resulting in the N 2 Â 1 vector ðW I N Þy t reflecting a linear combination of trade flows at time t from countries that are sociocultural neighbors to the origin country.
The model also allows for dependence of flows in each time period from countries that are sociocultural neighbors to the destination country, captured by the vector I T ðI N WÞy, with associated scalar parameter q d , and we note that this vector relies on the same matrix W used to define (sociocultural) neighbors. The matrix ðI N W) defines neighbors to the destination, the matrix ðW I N Þ identifies those to the origin, when the vector of flows for time t arises from a conventional N Â N origin-destination flow matrix, organized with dyads (i, j) representing flows from origin j to destination i.
Another type of dependence is also included in the model, reflected by the N 2 T Â 1 vector I T ðW WÞy and associated scalar parameter q w , which captures dependence of flows from countries that are sociocultural neighbors to both the origin and destination countries. Following ideas expressed in LeSage and Pace (2008) we motivate this type of dependence using the (cross-sectional) specification in (4), where they argue that the matrix A can be viewed as a filter. For notational convenience in (4) (as in the sequel), we have defined the matrix Z to include X ot ; X dt and the fixed effects, with the parameters b adjusted appropriately.
The argument is that the existence of origin-and destination-based dependence between trade flows (W o y; W d y), logically implies a covariance between these two types of dependence which is reflected in W w y. LeSage and Pace (2008) note that this filter implies a restriction that q w ¼ Àq o q d , but argue this restriction needs not be imposed during estimation, so we address the more general case here and allow for an unrestricted parameter q w . 7

MCMC estimation
The model can be written in a computationally efficient form as shown in (5).
A key feature ofỹ is that this expression separates dependence parameters to be estimated from sample data describing the simultaneous dependence, with the scalar dependence parameters in the vector x. We assume normally distributed, zero mean, constant variance (r 2 ) disturbances.

Likelihood and priors
The likelihood is shown in (6), where jRðxÞj is the determinant that depends on the dependence parameters in x, as does the expression e 0 e. f ðy; x; r 2 ; bÞ ¼ jRðxÞj ð2pr 2 Þ ÀN 2 T=2 exp À e ¼ỹ x À Zb; RðxÞ ¼ I N 2 T À q o I T ðW I N Þ À q d I T ðI N WÞ À q w I T ðW WÞ: To ensure that RðxÞ is non-singular, restrictions need to be placed on the dependence parameters x to ensure that RðxÞ À1 exhibits an underlying stationary process. Specifically, q o þ q d þ q w \1. The parameter space N for the set of parameters ðx; r 2 ; bÞ is: Since our focus is on large samples involving N 2 T observations, we rely on uninformative priors for the parameters b and r 2 , as these would not likely impact posterior estimates. By uninformative prior we mean a ''flat'' prior that is within the range of valid values for these parameters, so all intervals are of equal length, indicating all values are equally likely. 9 Since the dependence parameters in x are a focus of inference, uniform priors for these dependence parameters are used, which must obey a stability constraint.
Mathematically, the flat or uniform prior for x; b can be represented as pðxÞ / 1; pðbÞ / 1. The noise variance r 2 , is restricted to positive values, with a flat prior assigned to the log-transformed value which is denoted pðr 2 Þ / 1=r 2 . Given this prior information, and prior independence, one can write: pðxÞ Â pðr 2 Þ Â pðbÞ / 1=r 2 . While this flat prior is improper since the integral over the parameter space N is not finite, the joint posterior distribution for the dependence parameters x is proper under relatively unrestrictive assumptions. This joint posterior is derived by analytically integrating out the parameters b; r 2 , with details regarding this integration found in Hepple (1995a, b).
To derive the joint posterior for the dependence parameters x, begin with the full joint posterior pðx; b; r 2 Þ and analytically integrate out b; r 2 . This relies on standard techniques from the Bayesian regression literature (Zellner 1971). Combining the likelihood function in (7) with the flat priors (and ignoring the constant 2p N 2 T=2 ) leads to the joint posterior in (8), from which r can be integrated out, leading to (9), where Cð:Þ denotes the Gamma function.
The number of fixed effects M in the matrix Z differs for the Mátyás (1997)  pðx; bjyÞ / jRðxÞj To integrate out the 2K þ M different b parameters, properties of the multivariate tÀdistribution in conjunction with 'completing the square' are used (see Zellner 1971). This leads to a joint distribution for the dependence parameters x shown in (10), with the term jZ 0 Zj À1=2 and the exponent ÀðN 2 T À 2K À MÞ=2 arising from this integration (see Hepple 1995a, b). This expression must be numerically integrated to arrive at the log-marginal likelihood for these models.
Dittrich et al. (2017) show there is no problem regarding posterior propriety for the cross-sectional model case and this result carries over to our model in a relatively straightforward and trivial way. The conditional distributions for the model parameters b; r 2 ; x needed to implement MCMC estimation are described in ''Appendix A''. A motivation for having analytically integrated out the parameters b; r 2 is that further integration of the joint conditional posterior over the set of dependence parameters in x, yields the log-marginal likelihood for these models. We can use Monte Carlo integration to accomplish this task. Monte Carlo integration evaluates the expression to be integrated using random draws of the parameter values. A drawback to this approach is inefficiency because many of the random draws for the parameters are not in areas of high density of the function being integrated. In our case, the Metropolis-Hastings sampling procedure used to produce draws of the dependence parameters steers these parameter values to areas of high density of the joint posterior (described later in Sect. 4.1). This allows us to produce an efficient Monte Carlo integration of the log-marginal likelihood.
Given an estimate of the log-marginal likelihood for a model m i (log m i ), we can calculate: Of course, there is a great deal of interest in comparing alternative models, for example, models based on different weight matrices, or different fixed effects specifications.
4 Computational challenges to estimating the model One issue that arises when considering estimation of the model in (3) is that multiple dependence parameters q o ; q d ; q w would require use of a multivariate optimization routine to produce estimates based on maximum likelihood. It is also the case that the dependence parameters are (well) defined over the ðÀ1; 1Þ interval, meaning that constrained optimization would be required to ensure values À1\q o þ q d þ q w \1. 10 Another challenge to maximum likelihood estimation is the log-determinant term that arises in the (log) likelihood function, specifically (log): jI N 2 T À q o I T ðW I N Þ À q d I T ðI N WÞ À q w I T ðW WÞj. In the case of conventional spatial regression models involving a single weight matrix, there is a great deal of literature on approaches to efficiently calculating or approximating the log-determinant term that appears in the (log) likelihood jI N À qWj, (see LeSage and Pace 2008, Chapter 4). These approaches are not directly applicable to the model specifications considered here, complicating maximum likelihood estimation, since the log-determinant expression needs to be evaluated for multiple dependence parameter values during optimization. In the case of Markov Chain Monte Carlo estimation, the log-determinant term appears in the conditional distribution for the dependence parameters requiring multiple evaluations during sampling and could be demanding and quickly hit a computational bottleneck. ''Appendix B'' shows how to side step this computational bottleneck.
Because of the issues outlined above, we set forth estimation based on Markov Chain Monte Carlo, with no prior distributions assigned to the parameters b; r 2 . Parameter restrictions are imposed on the dependence parameters during MCMC sampling using methods described in Sect. 4.1. Since emphasis is on modeling situations involving large samples of observations, prior information would not play a role in determining posterior estimates of the parameters, so MCMC is used as a computational device to produce estimates that should be identical to those from maximum likelihood estimation. MCMC estimation involves sequentially sampling each parameter (or set of parameters) from their conditional distributions (or joint conditional distribution in the case of a set of parameters). Expressions for the conditional distributions are frequently easier to calculate than those required to evaluate the (log) likelihood, which is true for the model specifications considered here.
Another aspect of our network dependence model relates to proper interpretation of the partial derivative impacts on the dependent variable vector arising from changes in the explanatory variables, e.g., oEðyÞ=oX r for the rth explanatory variable. We take this issue up in Sect. 4.2. MCMC estimation proceeds by sampling sequentially from the conditional distributions of each parameter (or set of parameters).

Block sampling the dependence parameters
One motivation for working with the joint conditional posterior distribution for the dependence parameters is the need to impose stability restrictions on these parameters. Specifically, À1\q o þ q d þ q w \1. Working with the joint conditional posterior distribution for these parameters allows us to adopt a block sampling Metropolis-Hastings scheme for the dependence parameters. Block sampling means that a vector of dependence parameters in x is proposed and compared to the current vector of dependence parameters. The proposed vector is either accepted or rejected. This allows proposals of dependence parameters that obey the stability restriction, so any vectors that are accepted by the Metropolis-Hastings procedure will always obey the needed restrictions.
Debarsy and LeSage (2018) set forth a block-sampling approach that proposes a vector of candidate values for a similar set of dependence parameters in the context of a model involving a convex combination of weight matrices. Dependence parameters that do not meet the stability restriction can be rejected, so any values accepted are consistent with stability.
The conditional distributions for the current and proposed dependence vectors that we can label x c ; x p are evaluated with a Metropolis-Hastings step used to either accept or reject the newly proposed vector x p . Block sampling the dependence parameter vector x has the virtue that accepted vectors will obey any restrictions and reduce autocorrelation in the MCMC draws for these parameters. However, block sampling is known to produce lower acceptance rates which may require more MCMC draws in order to collect a sufficiently large sample of draws for posterior inference regarding x. To address this issue, Debarsy and LeSage (2018) propose a hybrid approach that begins with a reversible jump sampling procedure and switches to a tuned random-walk proposal procedure for proposing vectors x after some initial number of start-up samples are drawn (see also LeSage et al. 2019).
We rely here on a reversible jump procedure to produce proposal values for the vector of parameters q o ; q d ; q w . In particular, we rely on a three-headed coin flip for each scalar parameter. By this we mean a uniform random number on the open interval coin flip = U(0, 1), with head #1 equal to a value smaller or equal to 1/3, head #2 a value larger than 1/3, but smaller or equal to 2/3 and head #3 a value larger than 2/3 and smaller than one. Given a head #1 result, we set a proposal q Of course, a similar approach is used to produce proposals for the parameters q d ; q w . Proposed vectors of these parameters inconsistent with the stability restrictions are eliminated via rejection sampling.
The reversal jump approach to proposing the block of dependence parameters has the virtue that accepted vectors will obey the stability restriction and will also reduce autocorrelation in the MCMC draws for these parameters. However, proposals from the reversible jump procedure based on the large intervals between ðÀ1\q c o Þ and ðq c o \1Þ will not produce candidates likely to be accepted when these parameters are estimated with a great deal of precision, as would be the case for problems involving large N 2 T. This can result in a failure to move the chain adequately over the parameter space. To address this issue, standard deviations, r q o ; r q d ; r q w for each parameter are calculated based on the first 1000 draws (and updated thereafter using an interval of m ¼ 1000 draws). These are used in a tuned random-walk procedure to produce candidate/proposed values. Specifically, we use a tuning scalar c for each parameter that is adjusted based on acceptance rates for each parameter. This is used in conjunction with the standard deviations to produce proposals: q p o ¼ q c o þ c N ð0; 1Þ r q o , with the same approach used for q d ; q w . The proposed estimation method relies on a great many approximations, raising the issue of whether resulting estimates have desirable properties such as small bias and mean-squared error as well as good coverage. By coverage we mean that the (say) 2.5% and 97.5% intervals from the empirical distributions of the effects estimates on which practitioners base conclusions regarding statistical significance of the effects estimates cover the true values 95% of the time. Debarsy and LeSage (2020) provide Monte Carlo evidence on this type of procedure.

Interpreting the model
The partial derivatives used to interpret how changes in (say the rth) explanatory variable of the model impacts changes in the dependent variable vector are nonlinear matrix expressions. The sequence of partial derivatives for this model are shown in (11), where we record the N Â N matrices of changes in (logged) flows arising from changing the rth variable in each country i X r i using Y i ; i ¼ 1; . . .; N, to denote the N Â N flow matrices associated with changing the rth variable in each country i. We note that because the matrix W does not change over time in our static panel data model, we have a set of N 2 Â N matrices describing the partial derivative impacts.
In (11), Jd i (i ¼ 1; . . .; N) is an N Â N matrix of zeros with the ith row equal to i 0 N b d , and Jo i is an N Â N matrix of zeros with the ith column equal to i N b o , where b o and b d denote parameters associated with origin and destination size measures. We have N sets of N Â N outcomes, (one for each change in X r i ; i ¼ 1; . . .; N) resulting in an N 2 Â N matrix of partial derivatives reflecting the total effect on flows from changing the rth characteristic of all N regions, which in the gravity modeling literature is labelled the total effect (LeSage and Thomas-Agnan 2015, and LeSage and Fischer 2016).
LeSage and Thomas-Agnan (2015) provide a motivation for the expression in (11), noting that changes in the (size) characteristics of a single country i will (potentially) produce impacts on all elements of the N Â N flow matrix. Intuitively, a change in (say) income of a single country can impact trade flows involving immediate trading partners, as well as, trade flows involving partners to the trading partners, partners to the partners of the trading partners, and so on, potentially impacting the entire N Â N flow matrix.
Since regression models typically consider changes in characteristics (say income) of all i ¼ 1; . . .; N observations/countries, this produces a set of N different N Â N matrices of partial derivatives associated with changes in each explanatory variable in the model. While LeSage and Thomas-Agnan (2015) propose scalar summary measures for the various types of effects that average over certain dimensions of the sequence of N different N Â N matrices, we adopt a simpler strategy here for producing scalar summary measures of the partial derivative impacts. We take an average of the diagonal elements of the N different N Â N matrices in (11) as a measure of own-partial derivative impacts reflecting owncountry changes in flows arising from changes in (say) the typical country's income. And use an average of the cumulative off-diagonal elements from each row of the N different N Â N matrices in (11) to summarize network effects arising from changes in (say) income in a typical country. Network effects are measured by a scalar summary measure of the spillover impacts on other countries associated with changes in an explanatory variable in the model (say income). The scalar summary averages over all countries, and since the model is a static panel data model, over all time periods as well. We can delineate between origin-and destination-specific effects using the expressions involving b o ; b d , which allows us to determine the relative importance of changes in (say) income at origin versus destination countries on trade flows.
In addition to point estimates of the partial derivative impacts, there might also be a need to calculate empirical measures of dispersion for the effects that could be used for inference. An empirical distribution of the scalar own-and cross-partial derivatives (labeled direct and network effects here) can be constructed using MCMC draws for the parameters q o ; q d ; q w ; b o ; b d in expression (11). However, this would require inversion of the N 2 Â N 2 matrix ðI N 2 À q o W o À q d W d À q w W w Þ thousands of times for each set of draws for q o ; q d ; q w , making this computationally intensive. 11 A compromise approach would be to use posterior means of the estimated parameters q o ; q d ; q w to calculate a single matrix inverse: ðI N 2 À q o W o À q d W d À q w W w Þ À1 in conjunction with the MCMC draws for the parameters b o ; b d . However, this would ignore stochastic variation in the effects estimates that arise from the fact that there is uncertainty regarding the parameters q o ; q d ; q w . Ideally, we would like to use draws for these dependence parameters from their posterior distributions when simulating the empirical distribution of effects estimates.

Context and different interaction structures
We consider panel model specifications that use a panel of trade flows as the dependent variable vector y over the 38 years from 1963 to 2000. The (single) explanatory variable is (logged) gross domestic product per capita (GDP) lagged one year to cover the period from 1962 to 1999. The trade flows are from Feenstra et al. (2005), while the GDP data at market prices (current US$) and population data come from World Bank's (2002) World Development Indicators. A usable sample of 70 countries (see Table 6 in ''Appendix C'') was constructed for which GDP, population and trade flows were available over the 38 years. 12 Given our sample of 70 countries and 38 years, this results in N 2 T ¼ 186; 200; with 2N þ T À 2 ¼ 176 fixed effects parameters for the case of the Mátyás (1997) model in (1), and N 2 þ T À 1 ¼ 4; 937 fixed effects parameters in the Cheng and Wall (2005) approach set forth in the model from (2).
We used five different definitions for the matrix W describing alternative structures of network dependence, specifically, W space based on the three nearest spatial neighboring countries, W language based on countries sharing a common language, W currency based on common currency, W colony based on countries with direct historical colonial ties, and W trade based on membership in the same trade union (excluding the WTO). Details regarding countries with common borders, language, currency, colonial ties and trade union membership can be found in Tables 7, 8, 9 and 10 in ''Appendix C''. 13

Model comparison of alternative structures of network dependence
Estimates from the model in (12) where the parameters q o ; q d ; q w are (significantly) different from zero point to the existence of cross-sectional dependence.
In the presence of dependence, estimates from conventional models that ignore cross-sectional/network dependence can be shown to be biased and inconsistent (see LeSage and Fischer 2020). The presence of network dependence also implies spillover impacts arising from changes in neighboring countries j 6 ¼ i income on country i's trade flows. In our model, neighbors are defined to include spatial neighbors in the case where W space is used when estimating the model. More broadly, sociocultural neighbors arise when the matrix W used is based on common language, currency, trade union membership or direct colonial ties. Specifically, changes in income of countries j that have spatial, common language, currency, trade agreements, or colonial ties with country i will impact flows in the spatial 12 We eliminated countries from our sample that had one or more zero rows in any of the five weight matrices. This resulted in a few countries such as South Korea, Japan and India for which data was available to be excluded from our sample. 13 A reviewer pointed out that matrices such as W currency ; W trade might be viewed as variable over time. Introducing weight matrices that vary over time greatly complicates estimation as well as interpretation as shown in Lee and Yu (2012). Consider that in this case, the partial derivative effect of a change in X ot on the outcome variable vector y would require that we take into account any (non-linear) impact arising from change in the connectivity structure over time.
autoregressive model, provided that the scalar dependence parameters q o ; q d ; q w are different from zero and the parameters b o ; b d are non-zero. Table 1 shows log-marginal likelihood function values for models based on the alternative definitions of the weight matrix as well as the two alternative approaches to including fixed effects. The sum of posterior means for q o þ q d þ q w are also reported, since non-zero values of these parameters point to significant network dependence. From the table, we see that models using the Cheng and Wall (2005) fixed effects have higher log-marginal likelihoods than the corresponding Mátyás (1997) model based on the same weight matrix, indicating these models are more consistent with our sample data. A second finding indicated by the estimated logmarginal likelihoods in the table is that the rank-ordering of preferred models for the various types of weight matrices is very similar for both types of fixed effects. Specifically, the model specifications based on W trade have the highest log-marginal likelihood, those based on W space are next highest, followed by W language . Turning to estimates for the dependence parameters, we see that the sums of these are substantially positive, pointing to the presence of network dependence. Table 2 presents estimates for the best models based on W trade using both the Mátyás (1997) fixed effects and those of Cheng and Wall (2005). The table presents the mode of the parameter estimates evaluated using the joint posterior distribution as well as the mean and median based on 5000 retained MCMC draws (with an initial 5000 excluded for burn-in of the sampler). Monte Carlo (MC) error estimates are reported along with Geweke's diagnostic that compares draws from the first ten percent of the MCMC sampling (after burn-in) and the last 50% of the draws. The test is whether the batched means are equal, which indicates convergence.

Parameter estimates and partial derivative impacts for the best model
From the estimates we see that the dependence parameters q o ; q d ; q w are different from zero based on the credible intervals calculated from the MCMC draws. As noted in the discussion of model interpretation in Sect. 4.2, the parameters b o ; b d do not represent partial derivative impacts of the elasticity response of trade flows to changes in origin and destination country GDP. These need to be calculated using the non-linear matrix expressions for the own-and cross-partial derivatives. The  Table 3, where we see substantial network effects. The network effects reflect cumulated off-diagonal elements of the matrix of partial derivatives (cross-partial derivatives) averaged over all countries as described in Sect. 4.2. These estimates show larger direct and network impacts arising from changes in destination country than origin country income on trade flows in the case of both types of fixed effects. The Cheng and Wall (2005) fixed effects lead to larger direct and network destination effects than those from the Mátyás (1997) fixed effects specification, but smaller origin-specific direct and network effects than those from the Mátyás (1997) Cheng and Wall (2005) specification. The total effects estimates from the cross-sectional dependence models would be comparable to the least-squares estimates, and we see that ignoring network effects that arise from cross-sectional dependence lead to a substantial downward bias in the least-squares estimates.

Model comparison of extended versions of the model
We produced estimates for models based on averages of all 26 possible combinations of two or more weight matrices. For example, we define the combined weight matrix: where W c is row-normalized to have row-sums of unity. Note that we (implicitly) assign equal weight to all matrices in each model, so a model containing three matrices (e.g., W space ; W trade ; W language ) assigns a weight of 1/3 to each of these. 14 Log-marginal likelihoods are presented for these models in Table 4 for the specification based on Mátyás (1997) fixed effects, and in Table 5 for the Cheng and Wall (2005) fixed effects specification.
In the Table 4 results, model #19 dominates all others leading to a posterior model probability of one assigned to this specification, based on W currency þ W colony þ W trade . We also note that a comparison of the log-marginal likelihood for the best single weight matrix model from Table 1 shows that combinations of weight matrices produce a specification more consistent with our sample data. That is, the log-marginal likelihood for the model based on W trade alone was À5:1401e þ 05, compared to that for model #19 based on three weight matrices of À5:0210e þ 05. The next best model was model #10 based on W colony þ W trade and the 3rd best model was model #23 based on W space þ W currency þ W colony þ W trade . The Table 5 results are based on the extended set of fixed effects from Cheng and Wall (2005) where we see that the best model (#23) is one based on W space þ W currency þ W colony þ W trade , and the next best model (#26) included all five weight matrices, with the third-best model (#16) including W space þ W colony þ W trade . What seems clear from the results in Tables 4 and 5 is that membership in trade unions and historical colonial ties are an important source of interaction between countries' trade flows. The results from Table 5 place emphasis on W space not found for the model based on simpler fixed effects. Recall that the model based on Cheng and Wall (2005) fixed effects whose results are presented in Table 5 represents the preferred model as it has higher log-marginal likelihood values. A computationally efficient approach to MCMC estimation of a multi-indexed network dependence panel gravity model was set forth and used to examine the presence of a specific type of network dependence in trade flows. The network dependence model specifications are based on a combination of multiple weight matrices reflecting different sources of network dependence such as common currency, language, colonial ties, membership in trade unions or spatial proximity of countries. In cross-sectional gravity models these are typically treated as generalized distance variables, with the interpretation being that they reflect heterogeneity impacting the intercept term. In a panel data specification, these types of We show that after including commonly used fixed effects of the type suggested by Mátyás (1997) or Cheng and Wall (2005), there is evidence that significant network dependence in trade flows remains. Conventional gravity models assume the variable vector of N 2 Â 1 trade flows for each time period are independent, so trade flows between countries that have a common currency, language, border, colonial ties or are members of a trade union are no more likely than flows between countries having nothing in common.
Our panel gravity model allows these sociocultural factors to represent a basis for trade interaction between countries, with more similar flows between countries that share common borders, currency, language etc. Application of the model to a panel of trade flows covering 38 years and 70 countries provides evidence that this is the case. Network dependence produces simultaneous dependence, which means that flows from country dyad (i, j) depend on flows from other country dyads (say (k, l) with k 6 ¼ i, l 6 ¼ j), where the dependence structure is based on sociocultural factors. The most important sources of cross-sectional dependence were found to be trade organizations, historical colonial ties, common currency and spatial proximity of countries.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.
Funding Open access funding provided by Vienna University of Economics and Business (WU). pðbjr 2 ; xÞ ¼ N ðb;R b Þ; ðA:1Þ b ¼ ðZ 0 ZÞ À1 ðZ 0ỹ xÞ; We note that ðZ 0 ZÞ À1 Z 0ỹ consists of only sample data information, so this expression can be calculated once prior to MCMC sampling, and this is true of ðZ 0 ZÞ À1 as well. This means that sampling new values of the parameters b (given values for the parameters r 2 ; x) can take place in a rapid, computationally efficient way.
The conditional posterior for r 2 (given b; x) takes the form in (A.2), with the uninformative prior.
pðr 2 jb; xÞ / ðr 2 Þ Àð N 2 T 2 Þ exp À 1 2r 2 ðe 0 eÞ ; ðA:2Þ The joint conditional distribution for the dependence parameters in x can be obtained by analytically integrating out b; r 2 leading to a (log kernel) expression for the joint posterior of the dependence parameters in x. log pðxjy; Z; WÞ / log½DðxÞ À ðN 2 T=2Þ logðx 0 FxÞ; ðA:3Þ where log½DðxÞ is a Taylor series approximation to the log-determinants in the model, described in detail in ''Appendix B''. We note here that this log-determinant term depends on the dependence parameters in the vector x, indicated by DðxÞ. F consists of only sample data, so this expression can be calculated prior to MCMC sampling, leading to a computationally efficient expression reflecting a quadratic form: logðx 0 FxÞ, that can be easily evaluated for any vector of dependence parameters x.

Appendix B: A Taylor's series approximation to the log-determinant term
In ''Appendix A'', we have motivated that (A.3) represents a computationally efficient expression for the joint posterior, but this involves the log-determinant term log½DðxÞ in (B.1), where: W 1 ¼ I T ðW I N Þ; W 2 ¼ I T ðI N WÞ; W 3 ¼ I T ðW WÞ, which could be inherently demanding and quickly hit a computational bottleneck.
logjI N 2 T À q o W 1 À q d W 2 À q w W 3 j: ðB:1Þ One way to side step this computational bottleneck is by approximation to the logdeterminant term. Pace and LeSage (2002) set forth a Taylor series approximation for the log-determinant of a matrix like our expression: logjI NT ÀWj, wherẽ W ¼ ð q o W 1 þ q d W 2 þ q w W 3 Þ. They show that for a symmetric nonnegative weight matrixW with eigenvalues k min ! À 1; k max 1, and trðWÞ ¼ 0 where tr represents the trace: logjI N 2 T À q o W 1 À q d W 2 À q w W 3 j ¼ logjI N 2 T ÀWj; ' À X q j¼1 trðW j Þ=j; trðWÞ ¼ trðq o W 1 þ q d W 2 þ q w W 3 Þ ¼ q o trðW 1 Þ þ q d trðW 2 Þ þ q w trðW 3 Þ: ðB:3Þ Golub and van Loan (1996, p. 566) provide the expression in (B.2), while (B.3) arises from linearity of the trace operator. Note that the first-order trðWÞ is zero, given the definitions of W 1 ; W 2 ; W 3 . Let g ¼ ð q o q d q w Þ 0 , and first consider the case of symmetric matrices W 1 ; W 2 ; W 3 , which allows the second-order trace to be expressed as a quadratic form in (B.5) involving the vector of parameters g and all pairwise multiplications of the individual matrices inW as shown in (B.4). i P 3 j W i W j ; i ¼ 1; 2; 3; j ¼ 1; 2; 3. For the case of asymmetric matrices, matrix products P 3 i P 3 j W i W 0 j can be used, and the weight matrices in the multiindexed panel gravity data model would be an example of asymmetric matrices. Note that this formulation separates the parameters in the vector g from the matrix of traces, which allows pre-calculation of the matrix of traces prior to MCMC sampling. A more efficient computational expression is ðg gÞvecðQ 2 Þ, where is the Kronecker product and vec the operator that stacks the columns of the matrix Q 2 .

ðB:6Þ
A key aspect of these calculations is that traces of products of the weight matrices can be pre-calculated prior to MCMC sampling. This means that updating the logdeterminant expression for any set of dependence parameters (q o ; q d ; q w Þ involves simple multiplications, where the dependence parameters in g can be separated from these matrix products.