Skip to main content
Log in

Which Immigrants Promote Trade with Third Party Countries? On the Role of Geographic and Linguistic Proximity

  • Original Article
  • Published:
Eastern Economic Journal Aims and scope Submit manuscript

Abstract

This study investigates how geographic and linguistic characteristics of immigrants’ origin countries affect the spillover effect on trade with third party countries. The results provide clear evidence of an inter-ethnic spillover effect and, within it, separately identify the role of spoken language, official language, and geographic proximity of the origin country to the trading partner. We also distinguish the ethnic spillover effect from the inter-ethnic spillover effect by focusing on the role of native tongue. Lastly, we document a trade diversion effect by immigrants who are neither geographically nor linguistically close to the trading partner country.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. United Nations Conference on Trade and Development Trade Analysis Information System (TRAINS) via World Bank World Integrated Trade Solutions (WITS).

  2. Being from a bordering country often means a higher probability of speaking the same language, with a correlation coefficient between the two measures being 0.3 (Table 2).

  3. \(I^{\mathrm{bord,excl}}_{\mathrm{scti}}\) means proximate by the geographic but not the linguistic measure and \(I^{\mathrm{lang,excl}}_{\mathrm{scti}}\) and subsequent linguistic measures mean proximate by the language measure but not the geographic one.

  4. As is standard in the literature, we add 1 to all the observations before taking logs to account for zeros. This produces little distortion, as almost all the nonzero values of all the variables are large enough that adding 1 has a negligible effect.

  5. Immigration variables are not instrumented for here, since PPML with instrumental variables and fixed effects suffers from the incidental parameter problem and also fails to reach convergence. Relatedly, we do not use Tobit here since not only does a large number of fixed effects hamper computation, but the zeros here are likely “real” zeros, rather than a result of censoring, since negative trade volume is not possible.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Firsin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Free Trade Agreement Countries and Start Years

Australia (2005), Bahrain (2006), Costa Rica (2009), Dominican Republic (2007), El Salvador, Guatemala, Honduras, Nicaragua–all 2006, Chile (2004), Colombia (2012), Israel (1985), Jordan (2001), Morocco (2006), Mexico and Canada (1994), Oman (2009), Panama (2011), Peru (2009), Singapore (2004), South Korea (2012).

Tariff Calculation

Import tariffs facing US exporters by trading partner country and year at the HS6 product level are taken from the World Integrated Trade Solutions (World Bank 2020). We assign HS6-level products to three-digit NAICS industries and calculate the export-volume-weighted average rates. Import tariffs by the US, produced by the US Census Bureau, are available at the HS10 level via Peter Schott’s data website (Schott 2020). HS10-level products are matched to three-digit NAICS industries. We calculate the import-volume-weighted average rates analogously to the tariffs facing US exporters.

Calculation of Immigrant Proximity Measures

Geography

Our chief goal is to explore how immigrants from third party countries affect trade and the corresponding channels. Our strategy is to calculate the number of immigrants proximate to the residents of the trading partner country based on geography and language. The geographic proximity measure of choice should reflect the likelihood of having business networks and knowledge of country characteristics helpful for trade facilitation. For the geographic proximity measure, we use a simple rule of assigning a value of 1 to a pair of countries that share a border and 0 to those that do not, and denote the value as B. Let us define country c as the export destination country and j and h as some other two countries. There could be a country h that does not border country c, but is by some measure closer to its economic center than country j that does border c, especially in the case of large countries; most of the time, however, a common border is a good measure of proximity to economically important areas, and is more straightforward than arbitrary distance zone measures. Furthermore, international trade literature suggests that trade is greater among bordering countries even controlling for distance.

The estimated number of immigrants in state s, at time t, in industry i, who are from countries bordering country c, is calculated as follows:

$$\begin{aligned} \begin{aligned} I^{\mathrm{bord}}_{\mathrm{scti}}=\sum _{j\in C; j\ne c} I_{\mathrm{sjti}}*B_{\mathrm{cj}}, \end{aligned} \end{aligned}$$
(8)

where \(B_{\mathrm{cj}}=1\) if c and j share a common border and 0 otherwise, \(I_{\mathrm{sjti}}\) is the number of immigrants from country j, and the summation is over all the countries other than c.

Language

The language proximity measure is more nuanced. It is an intuitive argument that common spoken language is vital for communication associated with international trade transactions, be it marketing, logistics or more informal communication. Yet language usually enters the estimated gravity equation as a dummy for common official language between the trading countries and interacted with the immigrant stock variable. One limitation of this strategy is that it is not accurate to assume that people from countries with the same official language will be able to communicate or that people from countries with different official languages will not be able to communicate. Realizing this, Melitz and Toubal (2014) further the network specificity literature by showing that not only common official language, but also all common native and spoken languages, when spoken by a substantial portion of the population in each country, matter for trade. Their results indicate that all the relevant languages together have double the effect of just the official language and that native language is especially important, since in addition to basic ability to communicate it allows more nuanced communication and potentially ensures more trust. What they used to characterize countries we adapt to characterize immigrants.

To obtain the most inclusive measure of linguistic proximity, we create a measure that captures both the common spoken language probability (which is greater or equal to native) and common official language. We also provide separate analysis for each measure to better identify the underlying mechanisms through which linguistic proximity operates. The combined measure calculates the number of immigrants in state s, at time t, in industry i, who are linguistically proximate to residents of country c. This is done by first calculating the number of linguistically proximate immigrants for each country j and then summing over all the countries in the following manner:

$$\begin{aligned} \begin{aligned} I^{\mathrm{lang}}_{\mathrm{scti}}=\sum _{j\in C; j\ne c} I_{\mathrm{sjti}}\max ({\text{COL}}_{\mathrm{cj}},{\text{CSL}}_{\mathrm{cj}}), \end{aligned} \end{aligned}$$
(9)

where COL takes the value of 1 if j and c share the same official language and CSL is the estimated probability that a randomly chosen person from country j speaks the same language as a randomly chosen person from country c. In constructing CSL we use the methodology and data of the Melitz and Toubal (2014) study but adapt it to the immigrant population. For a large number of countries, they compiled data on the official language, the share of people who report each language as native (with a 4\(\%\) threshold) and the same measure for spoken language. To create a value for linguistic proximity, we follow their methodology to create a value for common spoken language (CSL) for each country pair, a measure that should be highly correlated with the true (unknown) probability that any two randomly chosen people from the two countries would share the same spoken language. The common spoken language score for a pair of countries is calculated as

$$\begin{aligned} \begin{aligned} {\text{CSL}}_{\mathrm{cj}}=\mathop {{\max }}\limits _{\mathrm{k}}(L_{\mathrm{kc}}L_{\mathrm{kj}})+(\alpha -\mathop {{\max }}\limits _{\mathrm{k}}(L_{kc}L_{\mathrm{kj}}))(1-\mathop {-{\max }}\limits _{k}(L_{\mathrm{kc}}L_{\mathrm{kj}})), \end{aligned} \end{aligned}$$
(10)

where \(L_{kc}\) is the share of people in country c who speak language k, \(\alpha =\sum ^n_{k=1}L_{\mathrm{kc}}L_{\mathrm{kj}}\) is the sumproduct of the shares of people who report speaking each language, and \(\max (L_{\mathrm{kc}}L_{\mathrm{kj}})\) is the maximum product of the two shares. For example, if in country 1, 90% of people report speaking French, 50 percent report German, and 0% report Spanish, while in country 2, 80% report French, 90% report German, and 10 percent report Spanish, then \(\alpha =0.72+0.45+0=1.17\) and \({\text{CSL}}_{\mathrm{cj}}=0.72+(1.17-0.72)(1-0.72)=0.09+0.05*0.91=0.846\); in practice, this ensures CSL is always between 0 and 1. The \({\text{CNL}}_{\mathrm{cj}}\) calculation is analogous.

Geography Versus Language

Since our geographic and linguistic proximity measures overlap, they do not allow identifying each channel separately. Hence, we construct “exclusive” measures (as opposed to what we can call “inclusive” above) that only capture people who meet one criteria but not the other. To estimate the requisite number of linguistically proximate immigrants, we sum over only the non-bordering countries:

$$\begin{aligned} \begin{aligned} I^{\mathrm{lang,excl}}_{\mathrm{scti}}=\sum _{j\in C_{\mathrm{NB}}; j\ne c; } I_{\mathrm{sjti}}*\max ({\text{COL}}_{\mathrm{cj}},{\text{CSL}}_{\mathrm{cj}}). \end{aligned} \end{aligned}$$
(11)

For the exclusive border-based measure, we first exclude countries that share the same official language; then, we subtract the estimated number of those able to speak the same language as residents of country c and sum over all countries j, so that

$$\begin{aligned} \begin{aligned} I^{\mathrm{bord,excl}}_{\mathrm{scti}}=\sum _{j\in C_{-o}; j\ne c } I_{\mathrm{sjti}}*\max (B_{\mathrm{cj}}-{\text{CSL}}_{\mathrm{cj}},0), \end{aligned} \end{aligned}$$
(12)

where \(C_{\mathrm{-o}}\) refers to countries that do not have the same official language as c. Additionally, we construct a measure of distant immigrants, that is those neither linguistically nor geographically proximate. It is calculated as follows:

$$\begin{aligned} \begin{aligned} I^{\mathrm{NB, NL}}_{\mathrm{scti}}=\sum _{j\in C ; j\ne c; } I_{\mathrm{sjti}}*(1-(\max (B_{\mathrm{cj}},{\text{COL}}_{\mathrm{cj}},{\text{CSL}}_{\mathrm{cj}})) ), \end{aligned} \end{aligned}$$
(13)

where NB and NL refer to neither sharing a common border nor being linguistically proximate.

Linguistic Channels

Because the different measures of linguistic proximity are correlated, as shown in Table 2, we need to take steps to tease out the separate effects of each component of linguistic proximity. We focus on the exclusive measures to isolate the geographic proximity effect. Because the share of people speaking a given language is generally greater than or equal to the share speaking it as a native language, we can calculate the (expected) number of people who speak the same non-native language in countries with a different official language as

$$\begin{aligned} \begin{aligned} I^{\mathrm{csl,-on,excl}}_{\mathrm{scti}}=\sum _{j\in C_{\mathrm{NB}}\cap C_{-o}; j\ne c; } I_{\mathrm{sjti}}*({\text{CSNNL}}_{\mathrm{cj}}), \end{aligned} \end{aligned}$$
(14)

where \({\text{CSNNL}}_{\mathrm{cj}}\) refers to the common spoken non-native language probability between c and j. The \({\text{CSNNL}}_{\mathrm{cj}}\) value is based on the shares of people speaking each relevant language as non-native, with each such share calculated as the difference between the share speaking the language and the share reporting it as a native language. Additionally, because some of the people in countries with the same official language do not speak the same language, we can try to get at the role of the same official language as separate from spoken using the measure below:

$$\begin{aligned} \begin{aligned} I^{\mathrm{col,-s,excl}}_{\mathrm{scti}}=\sum _{j\in C_{\mathrm{NB}}\cap C_{o}; j\ne c; } I_{\mathrm{sjti}}*(1-{\text{CSL}}_{\mathrm{cj}}). \end{aligned} \end{aligned}$$
(15)

Lastly, since we cannot identify people who share the same native but not spoken language (naturally), we instead control for the number of CSL speakers in the same regression equation as CNL speakers to estimate the additional effect of the latter.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Firsin, O. Which Immigrants Promote Trade with Third Party Countries? On the Role of Geographic and Linguistic Proximity. Eastern Econ J 48, 1–44 (2022). https://doi.org/10.1057/s41302-021-00208-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41302-021-00208-5

Keywords

JEL Classification

Navigation