Genetic distance, cultural differences, and the formation of regional trade agreements


Genetic distance between countries’ populations has been shown to proxy cross-country differences in cultures and preferences. In an unbalanced panel of 133 countries from 1970 to 2012, the study finds that higher genetic distance between two countries decreases their probability of having a trade agreement, even when controlling for geographic distance and other controls. The impact of cultural differences proxied by genetic distance is persistent over time and economically significant: While increasing the geographic distance between two countries by 1% decreases the probability of a regional trade agreement by 0.11% points, increasing their genetic distance by 1% decreases the probability by 0.06% points.

Fig. 1
Fig. 2


  1. 1.

    Knack and Keefer (1997) find that countries which are ethnically more homogeneous have higher levels of trust.

  2. 2.

    Zou et al. (2009) show that individuals’ behavior depends on what they perceive to be the consensus or “common sense” view within their culture; for similar arguments see also Roth et al. (1991). Henrich (2000) and Henrich et al. (2001) show that behavior in the ultimatum game depends on the culture of the experiment subjects.

  3. 3.

    Buchan et al. (2002) find that Japanese experiment subjects have a lower level of trust than their American counterparts. Gächter et al. (2010) and Herrmann et al. (2008) find significant differences in the willingness to punish non-cooperative players in experiments in different cultural backgrounds. These are not isolated findings: Cross-cultural differences in behavior in trust games are corroborated in a meta-analysis by Johnson and Mislin (2011).

  4. 4.

    Brander (1986) is probably the first one to characterize trade negotiations as an attempt to escape the prisoner’s dilemma of unilateral strategic trade policy.

  5. 5.

    Roth et al. (1991) find that while subjects in different countries exhibit similar behavior in experimental markets, individual bargaining behavior varies considerably across countries. Gelfand et al. (2015) find that strategies which lead to successful negotiations in the United States are detrimental in Egypt. For a literature survey on cultural differences and negotiations, see Gelfand et al. (2012).

  6. 6.

    For example, cross-cultural differences such as norms around kinship correlate with human genetic diversity, see Jones (2003). For a general introduction to the relationship between human genetic and cultural diversity, see Stone and Lurquin (2007).

  7. 7.

    Ahlerup and Olsson (2012) provide an overview of this literature; see also Ashraf and Galor (2013).

  8. 8.

    FDI data are often missing for many country pairs, restricting Leblang’s (2010) analysis to 28 FDI-receiving countries. Our sample comprises more countries and over 40 years.

  9. 9.

    All the cited papers use probit models in their analysis. Besides probit models, a plethora of methods have been used to analyze the determinants of RTAs: Egger and Larch (2008) use spatial econometric probit models and Márquez-Ramos et al. (2011) use ordered probit models to explain the drivers of different levels of trade integration between countries. Kohl and Brouwer (2014) use a clustering algorithm to identify “natural” trade integration blocs and estimate the impact of determinants of these blocs using a probit model.

  10. 10.

    We use Mario Larch’s Regional Trade Agreements Database from Egger and Larch (2008) in its updated version rta_20170310.dta which can be accessed at

  11. 11.

    The data are available at Spolaore and Wacziarg (2009) use the original genetic distance data from Cavalli-Sforza et al. (1994) which covers only 42 populations.

  12. 12.

    For details on the calculation of these measures, see Spolaore and Wacziarg (2016a). They also show that genetic distance is correlated with a cultural difference measure based on question-specific distances from the World Valued Survey (WVS) for 98 questions. Contrary to genetic distance which is available for 180 countries, this measure is only available for 74 countries. To maintain a large sample, we do not include it in our regressions.

  13. 13.

    During the stalled negotiations for a potential trade agreement between the European Union and the United States, a commonly repeated argument was that differences in legal philosophies in consumer protection law (precautionary principle in the EU versus risk assessment and cost–benefit principles in the US) made an agreement difficult to reach, see Bergkamp and Kogan (2013).

  14. 14.

    Data are available at Legal systems are categorized as either civil law, common law, Muslim law, customary, or a mixture of these categories. We treat mixed legal systems as a separate category.

  15. 15.

    Data are from Kreutz (2010) and contain information about armed conflicts between 1946 and 2005. \((War Duration)_{ij}=(War End Date)_{ij}-(War StartDate)_{ij}\) is the number of days of war between country i and j after 1945. We focus on wars after World War II as it marks the beginning of the current international order and because we focus on RTA formation between 1970 and 2012.

  16. 16.

    Note that country-year fixed effects automatically control for year fixed effects, i.e., across-the-board differences in RTA formation across years which affect all countries in a similar way.

  17. 17.

    Baier et al. (2014) approximate these multilateral resistance terms by GDP-weighted averages of bilateral distances with trade partners. These terms also control for a country’s remoteness, i.e., for its average trade costs across all its trade partners, similar to the approximation proposed by Baier and Bergstrand (2009) in a trade gravity context. Our fixed effects control for these terms, circumventing the need to construct proxy indices.

  18. 18.

    For earlier years, our regressors and country-year fixed effects perfectly separate the dependent variable, so maximum likelihood estimates of logit or probit models do not exist and using a linear probability model does not make sense. For a discussion of perfect separation, see, e.g., Mansournia et al. (2018).

  19. 19.

    This is well-known in the gravity literature, see, e.g., Head and Mayer (2014), p. 140: In a bilateral gravity equation of symmetric bilateral trade flows regressed on symmetric trade cost measures, estimated importer and exporter dummies are identical. This also applies in our setting. Including origin and destination-specific dummies or country-specific dummies delivers numerically identical coefficients.

  20. 20.

    Baier and Bergstrand (2004) discuss correlation of errors across countries within an RTA (e.g., across EU member countries) but do not consider the more general case of correlation of a given country’s trade policy across all its potential partner countries we consider. The correlation within an RTA of Baier and Bergstrand (2004) is modelled on the value of the dependent variable, introducing endogeneity bias in the calculation of the standard errors. Our approach avoids this.

  21. 21.

    The variance–covariance estimator by Cameron et al. (2011) assumes \(E\left({\varepsilon }_{ijgh}{\varepsilon }_{lm{g}^{^{\prime}}{h}^{^{\prime}}}|{x}_{ijgh},{x}_{lm{g}^{^{\prime}}{h}^{^{\prime}}}\right)=0\mathrm{ unless }g={g}^{^{\prime}}\mathrm{or }h={h}^{^{\prime}}\) where ij and lm refer to two country pairs (i.e., observations in the data) where we now indicate explicitly the two groups (i.e., clusters), in our application the first and the second country in a country pair, by g and h. If \(g={g}^{^{\prime}}\) or \(h={h}^{^{\prime}}\), i.e., within an origin or destination country, \({\varepsilon }_{ij}={\varepsilon }_{ji} \forall i, j,\) then \(E\left({\varepsilon }_{ijgh}{\varepsilon }_{lm{g}^{^{\prime}}{h}^{^{\prime}}}|{x}_{ijgh},{x}_{ji{g}^{^{\prime}}{h}^{^{\prime}}}\right)=E\left({\varepsilon }_{ijgh}{\varepsilon }_{ijgh}|{x}_{ijgh},{x}_{ijgh}\right)\), and hence the estimator allows for arbitrary correlation between \({\varepsilon }_{ijgh}\) and \({\varepsilon }_{ji{g}^{^{\prime}}{h}^{^{\prime}}}\), including perfect correlation.

  22. 22.

    The dependent variable is in levels and the regressor is in logarithms, i.e., if genetic distance increases by 1%, the probability for an RTA increases by \({\beta }_{1}/100\) units, i.e., \(\frac{{\beta }_{1}}{100}\times 100={\beta }_{1}=-0.090\)% points, see Wooldridge (2002), p. 656.

  23. 23.

    In unreported regressions, we estimated the columns of Table 1 on the larger samples which are possible when not including all regressors. The effect of genetic distance remains very similar.

  24. 24.

    The persistent negative effect of distance on bilateral trade flows has been referred to as the distance puzzle. It has spurred a large literature which tries to explain this fact, e.g., Lin and Sim (2012), Yotov (2012), and Larch et al. (2016). None of these papers investigates the impact of genetic distance over time on bilateral trade flows.

  25. 25.

    For an overview of EU decision making and its history concerning trade policy issues, see chapter 12 in Baldwin and Wyplosz (2015).

  26. 26.

    In our sample, the correlation between genetic distance and geographic distance in levels across all years is 0.410, and 0.514 in logarithms.

  27. 27.

    Using a different specification, Melitz and Toubal (2019) do find that genetic distance matters even for trade flows between European countries.

  28. 28.

    The countries included are Austria, Belgium, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Ireland, Italy, the Netherlands, Norway, Poland, Portugal, Russia, Spain, Sweden, Switzerland, and the United Kingdom.


Heid gratefully acknowledges financial support from the Australian Research Council (DP190103524) and Lu from the National Natural Science Foundation of China (91846301). We thank Ralph-Christopher Bayer and Laura Márquez-Ramos for useful comments. All remaining errors are our own.



See Tables 4 and 5.

Table 4 List of countries
Table 5 Descriptive statistics of the different samples

