1 Introduction

A growing number of studies have analysed the importance of controlling cross section dependence (CSD, hereafter) in gravity model of migration flows. In particular, following the seminal study by Anderson and van Wincoop (2003), Bertoli and Fernández-Huertas Moraga (2013) defined multilateral resistance to migration (MRM, hereafter) as the influence that alternative destinations exert on the bilateral migration flows and developed a theoretical framework for the gravity equation through a random utility maximisation (RUM) model and showed that MRM can be controlled by a multifactor error structure. They applied the common correlated effects (CCE) methodology advanced by Pesaran (2006) to analysing international migration flows from 61 origin countries to Spain. The importance of properly taking into account MRM has been emphasised in a recent survey by Beine et al. (2016) such that the regression residuals should be cross-sectionally independent for the estimation to be consistent with the RUM model.

In the literature on gravity models, a number of studies have also highlighted the importance of explicitly developing the multi-dimensional models, e.g. Mitze (2016) and Mátyás (2017). Notice that the existing studies by Bertoli and Fernández-Huertas Moraga (2013) and Piras (2017) have attempted to control MRM through error components including bilateral pair fixed effects and unobserved (global) factors, but their approach is equivalent to the 2-dimensional multifactor model, ignoring the 3-dimentional characteristic of the data. Indeed, the main drawback of this approach lies in the fact that it does not account for the hierarchical factor structure in 3D panels. In this paper, we follow recent advancements in the three-dimensional (3D) panel data models with a hierarchical multifactor structure and relax restrictions on how MRM may affect country/province-pair-specific migration flows. Recently, Kapetanios et al. (2017, (2020) have extended the CCE methodology, mainly developed in the 2D panel data by Pesaran (2006), into the 3D panels. In particular, Kapetanios et al. (2020) propose a 3DCCE estimator, which explicitly accommodates cross section dependence (CSD), by employing the local and global cross section averages of dependent and independent variables as proxies for unobserved global and local factors, respectively. Kapetanios et al. (2020) show that the CCE estimator proposed by Pesaran (2006) is not effective in removing correlations between local factors and regressors within 3D panels; under these circumstances, it leads to biased estimates. In a 3D framework, in addition to including unobserved global factors with pair-specific coefficients, we can also control for origin (destination)-specific factors that have heterogeneous effects on destinations (origins), which can lead to the more flexible specification of MRM than the models employed in the existing empirical literature on migration flows. We apply these 3D estimation methodologies to the interprovincial migration flows in Canada from 1976 to 2014.

In Canada, there were slightly more than 300,000 interprovincial migrants in 2014, representing 0.85% of the Canadian population. Interprovincial migration flows can provide significant economic benefits by reallocating labour from the low-productivity regions with high unemployment to the high-productivity regions with low unemployment. Shannon (2015) highlights that the lack of immigration in some regions in Canada may raise concerns about the economic future of these regions, because little immigration implies declining population inflows as well as an inability to use immigration to deal with regional skill shortages. Indeed, this is an important policy issue, intensely debated in other developed countries due to the worsening of the economic gap among regions within the same country (e.g. the urban–rural or the North–South divide in Italy, see Piras 2017). To deal with this issue, the Canadian government introduced the Provincial Nominees Program, which aims to create a balanced regional distribution of immigrants, see (Pandey and Townsend 2011) and (Baglay 2012).

The structural factors—such as regional differences in earnings, employment prospects and labour productivity—have been highlighted as the main drivers behind interprovincial migration in Canada, e.g. Coulombe (2006). Day and Winer (2006) apply a provincial fixed effect model for net migration flows (across age groups) among ten Canadian provinces and document evidence that interprovincial migration flows are driven mainly by the long-run regional differentials in unemployment rates and labour productivity, as well as the rural/urban differential structure of the provinces. Focusing on the migration flows from 9 provinces to Ontario, Basher and Fachin (2008) address the issue of CSD in examining the long-run determinants of interprovincial migrations over the period 1971–2004. They find a very strong link between migration flows and unemployment differentials and income in the sending province, in line with the earlier studies, e.g. Coulombe (2006) and Day and Winer (2006). The evidence on income effect is mixed, with the level of home income more important than relative income differentials.

In general, there is a strand of literature that examines regional income disparity across Canadian provinces. Convergence in income is likely to weaken the economic motivation for migration while a lack of convergence is likely to trigger migration from poorer to richer provinces. Though a complete convergence for all provinces may be impossible to achieve, there is a general consensus on the existence of a convergence process. The increased efforts by the federal government to establish interprovincial redistribution programmes (i.e. equalisation payments) may have facilitated such convergence process. Brown and Macdonald (2015) show that the episodes of divergence coincide with the largest economic shocks in the twentieth century. In the time span we consider, they explain the first episode of divergence coinciding with the first oil shock from 1973 to the mid-1980s while the second occurring at the beginning in 1996 and continuing through the resource boom of the 2000s.

We follow the pioneering work by Harris and Todaro (1970), who propose to explain the migration flows by the difference in expected income. In this regard, we examine the effects of the conventional determinants such as income and unemployment rate differentials. We also aim at investigating the role of networks (i.e., a cluster of migrants of the same origin residing in the destination) and relative inequality, defined as the Gini coefficient in the origin divided by the Gini coefficient in the destination. Indeed, expat migrant networks play an important role on the migration decisions of potential migrants. Through the informational and financial support provided by the network, newcomers can lower their migration and assimilation costs, see Beine and Parsons (2015) and Beine et al. (2019). Further, as argued by Stark (1984), relative deprivation together with absolute income is an important factor to explain migration decisions. The relative deprivation hypothesis states that an improvement in an agent’s relative income improves her welfare. Unlike most existing studies, which may impose inconvenient restrictions on the type of unobserved heterogeneity that may lead to omitted variable bias, we analyse the determinants of interprovincial migration flows in Canada, using a 3D panel data model with the hierarchical multifactor error structure, that accounts for CSD through unobserved global (country) and local (origin and destination provincial) factors, respectively.

Our main empirical findings are summarised as follows: an increase in income at the destination (at the origin) significantly increases (reduces) migration outflows whereas an increase in the unemployment rate at the destination (at the origin) significantly decreases (increases) migration flows. Further, we find evidence of a nonlinear effect of the relative inequality, consistent with the predictions of Borjas’ (1987) selection model, and a significant effect of networks which relates to the positive effect of diasporas analysed by Beine et al. (2011). Our evidence is broadly consistent with previous studies on interprovincial Canadian migration, see Coulombe (2006) and Day and Winer (2006) for the positive effects of increasing employment opportunities and income differential. Furthermore, by conducting the rolling window-based time-varying estimation over the last 10 years of the sample, 2005–2014, we find that the effect of the unemployment differentials had been quite small but relatively stable, and the effect of the income differentials, after registering a peak in 2007, had steadily declined. By contrast the impacts of both relative inequality and network had displayed an increasing trend. These time varying patterns suggest that the recent rise in the internal migration flows, registered in Canada from 2009 onwards, is more likely to be associated with the relative income inequality and network presence rather than the conventional long-run determinants such as income and unemployment differentials.

Our empirical findings might provide a support for developing an appropriate Government policy such as the Canadian Provincial Nominees Program that aims at reallocating the labour supply across regions or provinces. Indeed, like many other developed countries, Canada faces declining population growths in certain regions. In an attempt to stem this phenomenon, fiscal and income redistribution policies should be designed to influence migration location choices on the basis of significant push and pull determinants.

The paper proceeds as follows: Sect. 2 overviews the literature on the gravity model of migration flows. Section 3 describes the 3D panel data modelling with hierarchical multifactor structure. Section 4 presents main empirical findings for Canadian interprovincial migration flows. Section 5 offers concluding remarks.

2 Overview on the gravity models of migration flows

The dependence across cross section units have been modelled explicitly in the 2D panels by two main approaches: the factor-based approach (e.g. Pesaran 2006; Bai, 2009) and the spatial modelling techniques (e.g. Behrens et al. 2012; Mastromarco et al. 2016). Chudik et al. (2011) show that the factor-based models exhibit the strong CSD, while the spatial-based models can deal with weak CSD. In the existing empirical literature, these approaches have been separately implemented to estimate gravity models of migration flows.

Bertoli and Fernández-Huertas Moraga (2013) follow the seminal study by Anderson and van Wincoop (2003) and control for the multilateral resistance to migration (MRM) through time-varying unobserved common factors. They apply the CCE approach to a gravity model of international migration flows from 61 origin countries to Spain. Migrants are faced with a discrete choice problem when selecting alternative destinations, as domiciles are imperfect substitutes and migration involves frictions. The importance of properly taking into account the MRM has been emphasised in a recent survey by Beine et al. (2016), where they highlight that the regression residuals should be cross-sectionally independent for the estimation to be consistent with the random utility maximisation (RUM) model. Crucially, Bertoli and Fernández-Huertas Moraga (2013) state that MRM introduces omitted variable bias in estimating the impacts of the determinants of bilateral migration flows. In this regard, it is widely acknowledged that the inclusion of destination-time dummies or origin-time dummies as proxies for MRM in receiving and sending countries, as proposed by Ortega and Peri (2013) and Beine and Parsons (2015), is likely to produce biased estimation results in the presence of strong CSD, see also Kapetanios et al. (2017).

Piras (2017) considers a triple-index panel gravity model of internal migration flows with the human capital across Italian regions during 1970–2005. Finding evidence of strong CSD by applying the cross section dependence (CD) test by Pesaran (2015), he applies the CCE estimator together with the mean group estimator by Pesaran and Smith (1995) and the augmented mean group estimator by Eberhardt and Teal (2010). The main results confirm that the macroeconomic variables (per capita GDP and unemployment) are the leading drivers behind migration flows across Italian regions. Furthermore, the human capital has no role at the destination while it works as a restraining factor at the origin. Piras (2017) justifies this finding with the severe territorial differences, which characterise the North-South divide in Italy. However, to control for strong CSD, he imposes that the error components contain the bilateral pair fixed effects and unobserved heterogeneous factors [see Eq. 9], which are essentially the same as the 2D specification.

Beenstock et al. (2015) highlight the importance of modelling the relationship between origin and destination country as well as the relationship between origin/destination and a third country, which might be connected to another destination. Since the migration flows are essentially multilateral, the standard approach using a bilateral specification, that ignores the multilateral or third-country effects, would lead to biased estimation and misleading inference. They suggest to control third country effects by applying the spatial econometric techniques, where the spatial effects are allowed to be different at the origin and at the destination. They apply the double-spatial lag ML estimator developed by Elhorst et al. (2012) to an analysis of the determinants of immigration from European neighbourhoods to the EU over the period 2000–2010. They find that unemployment rates, social spending per head and inequality affect migration. In the trade flows, Behrens et al. (2012) adapt the approach by LeSage and Pace (2008), and derive the spatial connectivity matrices, interpreted as a multilateral linkage. LeSage and Llano (2016) extend the (unilateral) spatial dependence into a multilateral case where the (latent) multilateral spatial effects are estimated for both origin and destination. In particular, they propose to combine the Bayesian hierarchical approach and the Spatial autoregressive (SAR) model to estimate the regional effects. Mitze (2016) aims to introduce the ‘competing destinations’ effect by applying a spatial Durbin model to interregional migration flows in Germany. The basic intuition is to model the human behaviour as a spatial choice process such that an actual migration choice is made through the hierarchical information processing as migrants are supposed to evaluate a limited number of alternatives. Prospective migrants attempt to simplify these alternatives by categorising them into clusters, where the probability of one destination in a certain cluster being selected is related to the other regions in that cluster. This clustering process requires that the spatial proximity of destinations has an influence on the destination choice by migrants from one particular origin.

In the case of Canadian interprovincial migration flows, Day and Winer (2006) apply a triple index model to investigate the impacts of public policy on interprovincial migration in Canada from 1974 to 1996. They take into account provincial fixed effects and analyse the effect of regional variation in federal unemployment insurance, provincial social assistance, federal and provincial personal income taxes and public spending of different types on interprovincial migration. They find that the prime determinants of interprovincial migration are differentials in earnings, employment prospects and moving costs, whereas the impacts of public policies are relatively small. Basher and Fachin (2008) apply the bootstrap-based panel cointegrating gravity model, proposed by Fachin (2007) that are robust to the presence of CSD. They employ a 2D gravity model to analyse migration flows to Ontario from other 9 provinces from 1971 to 2004, with the main aim to analyse the long-term decline of internal migration in Canada. They propose a bootstrap panel cointegration analysis for a model where migration flows depend on income, unemployment and Federal transfers and find that the main determinants of migration are unemployment differentials and income in the sending province, whereas income and federal transfer differentials appear to play only a minor role. They also show that interprovincial migration flows have been significantly reduced by shrinking differentials in the labour market and income growth in the sending provinces.

In sum, we may conclude that income and unemployment are key drivers of the inter-regional migration process. Further, given the multidimensional structure of interprovincial migration data, recent empirical literature has also underlined the importance of properly modelling the general specification of MRM in order to avoid omitted variable bias.

Given the growing availability of dataset which contain information on the multiple dimensions, the recent literature on panel data has focused on extending the two-way error components specification to the multidimensional setting. The triple-index specification has been popularised by Mátyás (2017), where time, origin and destination fixed effects are specified, respectively, as unobservables. Baltagi et al. (2003) propose an extended specification with fixed origin-time, destination-time and country-pair effects. Balazsi et al. (2017b) generalise the 3D within estimator while Balazsi et al. (2017c) consider the random effects approach and propose a sequence of GLS estimators. This multi-dimensional approach is an essential tool for the analysis of complex interconnectedness of the data, which can be applied to the number of bilateral flows between countries or regions such as trade, FDI, capital or migration flows.

In this paper, we focus on controlling strong CSD by following recent developments on the multidimensional panel data models. We extend the microfounded specifications proposed by Bertoli and Fernández-Huertas Moraga (2013) and Piras (2017) to a 3D framework with a hierarchical multifactor structure, through which we can deal with two sources of cross section correlations, one driven by global factors and the other driven by local factors. Allowing for such hierarchical multifactor structure would be extremely relevant in the context of the provincial data where cross section entities within a single country are deeply integrated and interconnected such that interprovincial migration flows might be affected by both global (country) and local (origin and destination provincial) effects.

3 The 3D panel data estimator with hierarchical multifactor structure

Migration choices are often explained by using a random utility maximisation (RUM) model. The utility that individual q, who was located in country i at time \(t-1,\) derives from opting for country j belonging to the choice set D at time t is:

$$\begin{aligned} U_{qijt}=w_{ijt}-c_{ijt}+\epsilon _{qijt} \end{aligned}$$
(1)

where \(w_{ijt}\) is a deterministic component of utility, \(c_{ijt}\) is the cost of moving from i to j, and \(\epsilon _{qijt}\) is an individual-specific stochastic term. Assuming that \(\epsilon _{qijt}\) follows an independent and identically distributed extreme value type 1 distribution (McFadden, 1974), Beine et al. (2016) express the expected gross migration flow from country i to j as

$$\begin{aligned} E(M_{ijt})=\frac{e^{w_{ijt}-c_{ijt}}}{\sum _{l\in D}e^{w_{ilt}-c_{ilt}}} s_{it} \end{aligned}$$
(2)

where \(s_{it}\) is the ability of the origin i to send out migrants. Assuming that the deterministic component of utility does not vary with i, they write (2) as a gravity equation, such that

$$\begin{aligned} E(M_{ijt})=\phi _{ijt}\frac{y_{jt}}{\Omega _{it}}s_{it}. \end{aligned}$$

where \(\Omega _{it}=\sum _{l\in D}e^{w_{ilt}-c_{ilt}}\) represents the exponentiated value of the expected utility of prospective migrants, \( y_{jt}=e^{w_{jt}}\) proxies for the attractiveness of destination j and \( \phi _{ijt}=e^{-c_{ijt}}\) is the accessibility of destination j for potential migrants from i. Taking the ratio between \(E(M_{ijt})\) and \( E(M_{iit})\) and normalising \(\phi _{iit}\) to 1, we obtain:

$$\begin{aligned} \frac{E(M_{ijt})}{E(M_{iit})}=\phi _{ijt}\frac{y_{jt}}{y_{it}}. \end{aligned}$$
(3)

Bertoli and Fernández-Huertas Moraga (2013) stress the importance of allowing correlations in the stochastic component of utility across different alternatives in the migrant choice set. Using a more general distributional assumption on the stochastic component in (1), which does not rely on the independence from irrelevant alternatives property and allows for a correlation in the stochastic component of utility across different alternatives in the choice set, they show that (3) is generalised to

$$\begin{aligned} \frac{E(M_{ijt})}{E(M_{iit})}=\phi _{ijt}^{1/\tau }\frac{y_{jt}^{1/\tau }}{ y_{it}}\frac{\Omega _{iit}}{\Omega _{ijt}} \end{aligned}$$
(4)

where \(\tau \) is a dissimilarity parameter and \(\Omega _{ijt}\) is a resistance term which varies across destinations, see equation (9) in Beine et al. (2016). Specifically, an increase in the attractiveness of an alternative destination that is perceived as a close substitute to destination j, will reduce \(E(M_{ijt})\) more than \(E(M_{iit})\), thus inducing a decline in the ratio in (4).Footnote 1

Taking the logarithm of the odds of migrating to country j over staying in country i,  (3) can be expressed as a linear function of the differential in the deterministic component of utility associated with the two countries

$$\begin{aligned} \ln \left( \frac{M_{ijt}}{M_{iit}}\right) =\frac{\ln y_{jt}}{\tau }-\ln y_{it}+\frac{\ln \phi _{ijt}}{\tau }+\ln \left( \frac{\Omega _{iit}}{\Omega _{ijt}}\right) . \end{aligned}$$
(5)

In the empirical macro analysis of migration flows, the first three terms in (5), which represent attractiveness of destination and/or origin and the accessibility of destination in terms of the time-specific cost of moving, are generally proxied by push and pull variables which might vary over the two indexes, (it) and/or (jt), and over the three indexes, (ijt).Footnote 2 On the contrary, the importance of taking into account of the resistance term \(\Omega _{ijt}\) has been mostly neglected in the empirical literature, with a few exceptions such as Bertoli and Fernández-Huertas Moraga (2013), Ortega and Peri (2013) and Piras (2017), who embody the resistance term in a compound unobserved error term, see the error components specifications (8) and (9) below.

In this paper, we propose to model the migration flows by means of a more general 3D error components specification, which can properly address the important issue of MRM. Consider the 3D heterogeneous panel data model:

$$\begin{aligned}&y_{ijt}=\varvec{\beta }_{1,ij}^{\prime }{\varvec{x}}_{ijt}+\varvec{ \beta }_{2,ij}^{\prime }{\varvec{x}}_{it}+\varvec{\beta } _{3,ij}^{\prime }{\varvec{x}}_{jt}+\varvec{\delta }_{ij}^{\prime } {\varvec{d}}_{t}+e_{ijt}, \ i=1,\ldots ,N_{i},\nonumber \\&\quad \quad \ j=1,\ldots ,N_{j}, \ t=1,\ldots ,T, \end{aligned}$$
(6)

where \(y_{ijt}\) is the dependent variable observed across three indices, the origin i, the destination j at period t, \({\varvec{x}}_{ijt}\) is the \(m_{1}\times 1\) vector of covariates observed across three indices, \( {\varvec{x}}_{it}\) \(\left( {\varvec{x}}_{jt}\right) \) is the \( m_{2}\times 1\) \(\left( m_{3}\times 1\right) \) vector of push (pull) covariates observed across the origin i \(\left( j\right) \) at period t, and \({\varvec{d}}_{t}\) is the \(m_{d}\times 1\) vector of observed common effects including deterministic components such as constant and trend. \( \varvec{\beta }_{1,ij}\), \(\varvec{\beta }_{2,ij}\), \(\varvec{ \beta }_{3,ij}\) and \(\varvec{\delta }_{ij}\) are the \(m_{1}\times 1\), \( m_{2}\times 1\), \(m_{3}\times 1\), and \(m_{d}\times 1\) vectors of corresponding parameters.

Beine and Parsons (2015) attempt to model MRM in receiving countries by imposing the following simple error components specification:

$$\begin{aligned} e_{ijt}=\alpha _{i}+\theta _{jt}+\varepsilon _{ijt} \end{aligned}$$
(7)

where \(\alpha _{i}\) is the origin individual effect and \(\theta _{jt}\) is the destination-time dummy. Alternatively, Ortega and Peri (2013) suggest to capture MRM by including the origin-time dummies \(\theta _{it}\):

$$\begin{aligned} e_{ijt}=\alpha _{i}+\theta _{it}+\varepsilon _{ijt}\text {.} \end{aligned}$$
(8)

Notice that the destination/origin-time dummies have been widely used in the literature as proxies for unmeasured origin-specific push factors and destination-specific pull factors of bilateral migration flows.

To explicitly address the important issue of modelling CSD in migration flows, Bertoli and Fernández-Huertas Moraga (2013) and Piras (2017) propose to model MRM through the following multifactor structure:

$$\begin{aligned} e_{ijt}=\alpha _{ij}+\varvec{\gamma }_{ij}^{\prime }{\varvec{f}} _{t}+\varepsilon _{ijt} \end{aligned}$$
(9)

where \({\varvec{f}}_{t}\) is an \(m_{f}\times 1\) vector of unobserved factors and \(\varvec{\gamma }_{ij}\) is an \(m_{f}\times 1\) vector of heterogeneous loadings. Bertoli and Fernández-Huertas Moraga (2013) define MRM as the confounding influence that the attractiveness of alternative destinations exerts on the determinants of bilateral migration rate and show that the heterogeneous unobserved factors are expected to capture MRM. Notice, however, that the standard CCE estimator developed in 2D panels by Pesaran (2006) can be applied to the model (6) with the error components in (9).

In the context of the 3D models, Baltagi et al. (2003) propose the following triple error components specification:

$$\begin{aligned} e_{ijt}=\alpha _{ij}+\theta _{it}+\theta _{jt}^{*}+\varepsilon _{ijt}, \end{aligned}$$
(10)

where \(\alpha _{ij}\) is the bilateral pair-fixed effects, and \(\theta _{it}\) \(\left( \theta _{jt}^{*}\right) \) are the origin and destination country-time fixed effects (CTFE), respectively. Notice that (10) encompasses (7) and (8) as a special case.Footnote 3 Balazsi et al. (2017b) develop the 3D within estimator and analyse their behaviour in the case of no self-flow data, unbalanced data, and dynamic autoregressive models. The CTFE specification has been popularly applied to control for MRM of exporters and importers in the structural gravity model of trade flows, e.g. Baltagi (2015). However, the main drawback of the CTFE approach lies in the underlying assumption that bilateral flows are independent of what happens to the rest of the world.Footnote 4 Existing studies have neglected the important issue of properly controlling CSD in 3D models. In this regard, Kapetanios et al. (Kapetanios et al. 2017, KMSS) propose to combine unobserved heterogeneous global factors, \(\theta _{t}\) with the CTFE specification by

$$\begin{aligned} e_{ijt}=\alpha _{ij}+\theta _{it}+\theta _{jt}^{*}+\pi _{ij}\theta _{t}+\varepsilon _{ijt} \end{aligned}$$
(11)

KMSS develop the two-step consistent estimation procedure by approximating global factors with the double cross section averages of the dependent variable and regressors and then applying the 3D-within transformation.

The above discussions have at least two implications: (i) it is important to develop more proper specifications for accommodating CSD within multi-dimensional dataset and (ii) such specifications should involve several layers of components to account for effects across different dimensions. Following this research trend, Kapetanios et al. (2020, KSS) have developed the hierarchical multi-factor error components specification:

$$\begin{aligned} e_{ijt}=\varvec{\gamma }_{ij}^{\prime }{\varvec{f}}_{t}+\varvec{ \gamma }_{\circ j}^{\prime }{\varvec{f}}_{i\circ t}+\varvec{\gamma } _{i\circ }^{\prime }{\varvec{f}}_{\circ jt}+\varepsilon _{ijt}, \end{aligned}$$
(12)

where \({\varvec{f}}_{t}\) is an \(m_{f}\times 1\) vectors of global factors, and \({\varvec{f}}_{i\circ t}\) and \({\varvec{f}}_{\circ jt}\) are the \( m_{\bullet \circ }\times 1\) and \(m_{\circ \bullet }\times 1\) vectors of local factors with \(\varvec{\gamma }_{ij}\), \(\varvec{\gamma }_{\circ j}\) and \(\varvec{\gamma }_{i\circ }\) being the \(m_{f}\times 1,\) \( m_{\bullet \circ }\times 1\) and \(m_{\circ \bullet }\times 1\) vectors of corresponding heterogeneous loadings. The specification in (12) is more parsimonious than (10) and (11), and it is more structural by allowing the presence of global and local factors, explicitly. Furthermore, we can also estimate the parameters of push and pull regressors, say \({\varvec{x}}_{it}\) and \({\varvec{x}}_{jt}\), that vary over \(\left( i,t\right) \) and over \(\left( j,t\right) \) in the model, (6), unlike the CTFE and KMSS approaches, which remove \({\varvec{x}} _{it}\) and \({\varvec{x}}_{jt}\) together with \(\alpha _{ij}\), \(\theta _{it}\) and \(\theta _{jt}^{*}\) by applying the 3D within transformation.Footnote 5

\({\varvec{f}}_{t}\) are the global factors such as common demographic shift that affects every migration flows between source i and destination j. For example, if the Canadian population increases substantially, then migration flows are likely to increase between any pair of provinces. The local factors, \(f_{i\circ t}\) and \(f_{\circ jt},\) represent the origin and destination provincial factors, respectively. \(f_{i\circ t}\) are the factors of the ith sending province (say, Newfoundland) that affect the migration flows from Newfoundland to all destination provinces while \( f_{\circ jt}\) are the factors of the jth receiving province (say, British Columbia) that affect all migration flows to British Columbia from all other provinces. If unemployment insurance increases in Newfoundland, migration outflows from Newfoundland to all other countries may decrease, whereas if unemployment insurance in British Columbia decreases, inflows from all other countries may decrease.Footnote 6 This factor structure allows to model the interconnection among cross section units; if Québec and Newfoundland are connected by business cycle synchronisation as they are both specialised in the fishery industry located in the Atlantic coast, a positive shock in Qu ébec will also lead to a positive cycle in Newfoundland.

KSS allow unobserved factors to be correlated with the explanatory variables, \({\varvec{x}}_{ijt}\), and consider the following data generating process for \({\varvec{x}}_{ijt}\):

$$\begin{aligned} {\varvec{x}}_{ijt}={\mathcal {D}}_{ij}{\varvec{d}}_{t}+\varvec{\Gamma } _{ij}{\varvec{f}}_{t}+\varvec{\Gamma }_{\circ j}{\varvec{f}}_{i\circ t}+\varvec{\Gamma }_{i\circ }{\varvec{f}}_{\circ jt}+{\varvec{v}} _{ijt}, \end{aligned}$$
(13)

where \({\mathcal {D}}_{ij}\) is the \(\left( m_{x}\times m_{d}\right) \) parameter matrix on observed common effects, \(\varvec{\Gamma }_{ij}\), \( \varvec{\Gamma }_{\circ j}\) and \(\varvec{\Gamma }_{i\circ }\) are \( \left( m_{x}\times m_{f}\right) \), \(\left( m_{x}\times m_{\bullet \circ }\right) \), \(\left( m_{x}\times m_{\circ \bullet }\right) \) factor loading matrices, and \({\varvec{v}}_{ijt}\) are the idiosyncratic errors. Let \( {\varvec{z}}_{ijt}=\left( y_{ijt},{\varvec{x}}_{ijt}^{\prime }\right) ^{\prime }\) and \({\varvec{f}}_{ijt}=\left( {\varvec{f}}_{t}^{\prime }, {\varvec{f}}_{i\circ t}^{\prime },{\varvec{f}}_{\circ jt}^{\prime }\right) ^{\prime }\), and construct the global and local cross section averages of \({\varvec{z}}_{ijt}\), respectively, by

$$\begin{aligned} \varvec{{\bar{z}}}_{t}=\frac{1}{N^{2}}\sum _{i=1}^{N_{i}}\sum _{j=1}^{N_{j}} {\varvec{z}}_{ijt},\text { }\varvec{{\bar{z}}}_{i\circ t}=\frac{1}{N} \sum _{j=1}^{N_{j}}{\varvec{z}}_{ijt},\text { }\varvec{{\bar{z}}}_{\circ jt}=\frac{1}{N}\sum _{i=1}^{N_{i}}{\varvec{z}}_{ijt}\text {,} \end{aligned}$$
(14)

KSS propose to use \(\left( {\varvec{d}}_{t}^{\prime },\varvec{{\bar{z}}} _{ijt}^{\prime }\right) ^{\prime }\) with \(\varvec{{\bar{z}}}_{ijt}=\left( \varvec{{\bar{z}}}_{t}^{\prime },\varvec{{\bar{z}}}_{i\circ t}^{\prime }, \varvec{{\bar{z}}}_{\circ jt}^{\prime }\right) ^{\prime }\) as observable proxies for \({\varvec{f}}_{ijt}\). Then, we can estimate the individual slope coefficients, \(\varvec{\beta }_{ij}\) and their means \(\varvec{ \beta }\) consistently by augmenting the regression, (6) with \( \left( {\varvec{d}}_{t}^{\prime },\varvec{{\bar{z}}}_{ijt}^{\prime }\right) ^{\prime }\). This estimator is the 3D common correlated effect (3DCCE) estimator, which will be applied in the main empirical applications. This hierarchical multi-dimensional approach is expected to become an essential tool for the analysis of complex interconnectedness of the bilateral (origin-destination) flows such as trade, FDI, capital or migration flows.

We develop a diagnostic tool for detecting the presence of CSD as well as evaluating the goodness of the estimation strategy by jointly applying the modified cross section dependence (CD) test and estimating the CSD exponent. We first develop the CD test, which is a modified counterpart of an existing CD test advanced by Pesaran (2015). For convenience, we represent the vector of residuals \({\hat{\mathbf {e}}}_{ij}=\left( {\hat{e}}_{ij1},..,{\hat{e}} _{ijT}\right) ^{\prime }\) as the \(\left( ij\right) \) pair using the single index \(n=1,\ldots ,N_{1}N_{2}\) - where \(N_{1}\) and \(N_{2}\) are the number of origin and destination countries, respectively—and compute the pair-wise residual correlations between n and \(n^{\prime }\) cross section units by

$$\begin{aligned} {\hat{\rho }}_{nn^{\prime }}\left( ={\hat{\rho }}_{n^{\prime }n}\right) =\frac{ {\hat{\mathbf {e}}}_{n}^{\prime }{\hat{\mathbf {e}}}_{n^{\prime }}}{\sqrt{\left( {\hat{\mathbf {e}}}_{n}^{\prime }{\hat{\mathbf {e}}}_{n}\right) \left( {\hat{\mathbf {e}}}_{n^{\prime }}^{\prime }{\hat{\mathbf {e}}}_{n^{\prime }}\right) }},\ n,n^{\prime }=1,\ldots ,N_{1}N_{2}\text { and }n\not =n^{\prime }. \end{aligned}$$

Then, we construct the CD statistic by

$$\begin{aligned} CD=\sqrt{\frac{2}{N_{1}N_{2}\left( N_{1}N_{2}-1\right) }} \sum _{n=1}^{N_{1}N_{2}-1}\sum _{n^{\prime }=n+1}^{N_{1}N_{2}}\sqrt{T}{\hat{\rho }}_{nn^{\prime }} \end{aligned}$$
(15)

which has the limiting N(0, 1) distribution under the null hypothesis of (pairwise) residuals weak cross section dependence.

Next, we follow Bailey et al. (2016, BKP) and derive the exponent of CSD based on \(\overline{{\hat{e}}}_{..t}=\left( N_{1}N_{2}\right) ^{-1}\sum _{i=1}^{N_{1}}\sum _{j=1}^{N_{2}}{\hat{e}}_{ijt}\). If \({\hat{e}}_{ijt}\) ’s are cross sectionally correlated across \(\left( ij\right) \) pairs, \( Var\left( \overline{{\hat{e}}}_{..t}\right) \) declines at a rate that is a function of \(\alpha \), where \(\alpha \) is defined as

$$\begin{aligned} \underset{N_{1}N_{2}\rightarrow \infty }{\lim }\left( N_{1}N_{2}\right) ^{-\alpha }\lambda _{\max }\left( \varvec{\Sigma }_{{\hat{e}}}\right) \end{aligned}$$

and \(\varvec{\Sigma }_{{\hat{e}}}\) is the \(N_{1}N_{2}\times N_{1}N_{2}\) covariance matrix of \({\hat{\mathbf {e}}}_{t}\) \(=\left( {\hat{e}}_{11t},\ldots ,{\hat{e}}_{N_{1}N_{2}t}\right) ^{\prime }\) with \(\lambda _{\max }\left( \varvec{\Sigma }_{{\hat{e}}}\right) \) denoting the largest eigenvalue. \(Var\left( \overline{{\hat{e}}}_{..t}\right) \) cannot decline at rate faster than \(\left( N_{1}N_{2}\right) ^{-1}\) as well as it cannot decline at rate slower than \( \left( N_{1}N_{2}\right) ^{\alpha -1}\) with \(0\le \alpha \le 1\). Furthermore, for \(1/2<\alpha \le 1\), BKP propose the bias-adjusted estimator:

$$\begin{aligned} \mathring{\alpha }=1+\frac{1}{2}\frac{\ln \left( {\hat{\sigma }}_{\overline{{\hat{e}}}_{..t}}^{2}\right) }{\ln (N_{1}N_{2})}-\frac{\ln \left( {\hat{\mu }}_{\pi }^{2}\right) }{2\ln (N_{1}N_{2})}-\frac{{\hat{c}}_{N_{1}N_{2}}}{2\left[ N_{1}N_{2}\ln \left( N_{1}N_{2}\right) {\hat{\sigma }}_{\overline{{\hat{e}}} .._{t}}^{2}\right] } \end{aligned}$$
(16)

where \({\hat{\sigma }}_{\overline{{\hat{e}}}_{..t}}^{2}=T^{-1}\Sigma _{t=1}^{T} \overline{{\hat{e}}}_{..t}^{2}\), \({\hat{c}}_{N_{1}N_{2}}=\widehat{{\bar{\sigma }} _{N_{1}N_{2}}^{2}}=\left( N_{1}N_{2}\right) ^{-1}\sum _{i=1}^{N_{1}}\sum _{j=1}^{N_{2}}{\hat{\sigma }}_{ij}^{2}\) and \({\hat{\sigma }}_{ij}^{2}=T^{-1}\Sigma _{t=1}^{T}{\hat{\xi }}_{ijt}^{2}\) is the ijth diagonal element of the estimated covariance matrix, \({\hat{\Sigma }}_{{\hat{e}}}\) with \({\hat{\xi }}_{ijt}=\) \({\hat{e}}_{ijt}-{\hat{\delta }}_{ij}\overline{{\hat{e}}} _{..t}\) and \({\hat{\delta }}_{ij}\) is the OLS estimator from the regression of \( {\hat{e}}_{ijt}\) on \(\overline{{\hat{e}}}_{..t}\). We evaluate the confidence band for \(\mathring{\alpha }\) by employing the test statistic in (B47) in Supplementary Appendix VI in BKP.

The CD test statistic is increasingly applied to the residuals of regression models as an ex-post diagnostic tool. However, the CD test fails to reject the null hypothesis of weak error CSD when the factor loadings have zero means, implying that the CD test will display very poor power when it is applied to cross-sectionally demeaned data. Furthermore, the residual-based CD test has been shown to often reject the null hypothesis of no remaining CSD in the case of the CCE estimator (e.g. Mastromarco et al. 2016). Juodis and Reese (2018) prove that the application of the CD test to residuals obtained from interactive effects models introduces a bias term of order \(\sqrt{T}\), rendering an erroneous rejection of the null. Nonetheless, we suggest to use the CD test in conjunction with the estimate of \(\alpha \) as a model-selection tool, where a reduction in the absolute value of the CD test statistic and a moderate range of the estimate of \(\alpha ,\)Footnote 7 would be interpreted as an indication of an improved model specification.

4 The empirical application

4.1 The data

We collect the data over the period 1976–2014 from Statistics Canada, which records annual migration streams by province of origin and destination. We consider interprovincial migration flows for 10 provinces: Prince Edward Island, Nova Scotia, New Brunswick, Québec, Ontario, Manitoba, Saskatchewan, Alberta, British Columbia and Newfoundland and Labrador. Then, we obtain the data for 90 province-pairs (\(N=90\), \(T=38\)). The panel is balanced and interprovincial migration flows are always positive. Table 1 shows descriptive statistics. The richest provinces are Alberta, British Columbia and Ontario in terms of disposable income. They mostly attract migrants from other provinces and register relatively lower unemployment rates and higher Gini coefficients. On the other hand, Prince Edward Island, New Brunswick and Newfoundland and Labrador show the lowest income level and highest unemployment rate with higher migration outflows.

Table 1 Descriptive statistics

Figure 1 displays the time-varying aggregate inflows across provinces and provide an insight of how the pattern of interprovincial migration, as an aggregate, evolves over time. The interprovincial migrants flows have been on a declining trend till the mid 1990s, after which they became rather stable. Such overall declining patterns broadly correspond to the stylised facts observed by Shannon (2015) in Canada and by Molloy et al. (2011) in the US. Interprovincial migration flows in Canada were mostly flat since 2000, though it stepped up slightly in 2009 and remained substantially below the US interstate migration flows. Shannon (2015) explains that the decline in the interprovincial migrant share may be driven by an environment change (i.e. factors creating poorer conditions at destination), which renders the migration less attractive. Molloy et al. 2011) point out that the job opportunities have become more similar across the US regions, implying that fewer potential moves would generate gains large enough to offset the costs of moving, resulting in a decline in internal migration. If this concept were applied to the Canadian case, the decline in interprovincial migration would imply that immigrants are now more evenly spread across less unequal regions than in the past.

Fig. 1
figure 1

Aggregate interprovincial migration flows in Canada

Figure 2 plots net interprovincial migration flows, defined as the sum of the net migrants (inflow minus outflow) for the 10 provinces. Notice that positive net flows have been concentrated in a small number of provinces. For example, in 2014, Alberta accounted for 76% of net positive migration, British Columbia for 23% and Saskatchewan for 1%. Over a longer time horizon (1987–2014), only two provinces had positive net migration; Alberta with 433,851 (57% of total net positive migration) and British Columbia with 331,083 (43%). Net migration flows have varied considerably over time, in terms of the number of net positive migrants and the destination provinces. These variations are closely related to provincial economic performance, but they are also caused by a couple of the major political events. In the early 1990s, most of the net internal migration was recorded in British Columbia. Since 1996, Alberta has been the major destination of interprovincial migrants. From 1997 to 2002, Ontario had the second highest net positive migration while British Columbia experienced net outflows. Since 2003, British Columbia has become a major destination province again. Not surprisingly, net migration flows to Alberta appear to be closely related to the oil industry boom. When the oil price collapsed in 2009, there was a sharp decline in net migration flows to Alberta, while when oil prices soared 2 years later, they increased sharply. The subsequent oil price drop and economic challenge resulted in substantial increase in outflows from Alberta in 2015. Further, as noticed by Day and Winer (2006), extraordinary policy changes could influence interprovincial migration flows. The election of the separatist Parti Qùebecois government in Québec in the second half of the 1970s, and the closing of the cod fishery in Atlantic Canada in 1992 appeared to have substantially altered interprovincial migration in Québec during 1977–1980 and in Newfoundland during 1993–1996, respectively.

Fig. 2
figure 2

Interprovincial migration net flows (inflows–outflows) in Canada

In sum, we observe that interprovincial migration flows have been on a declining trend on average (Fig. 1), but we still find that a few provinces have experienced significant migration activities (Fig. 2). In what follows we aim at identifying the main determinants of these time-varying patterns.

4.2 Empirical results with regressors expressed as differentials

Following the income maximisation approach by Basher and Fachin (2008), Beine and Parsons (2015) and Beine et al. (2019), we first consider the following gravity regression model of interprovincial migration flows:

$$\begin{aligned} m_{ijt}=\beta _{1}y_{ijt}^{d}+\beta _{2}u_{ijt}^{d}+\beta _{3}g_{ijt}+\beta _{4}g_{ijt}^{2}+\beta _{5}n_{ijt}+e_{ijt}, \end{aligned}$$
(17)

with origin \(i=1,\ldots ,N,\) destination \(j=1,\ldots ,N\) and time \(t=1,\ldots ,T\), where \(m_{ijt}\) is bilateral migration flow expressed as \(m_{ijt}=\ln \) \( \frac{M_{ijt}}{pop_{it}}\), \(M_{ijt}\) is the migration flow from i to j at time t, \(pop_{it}\) is the population in the origin i, which proxies for the number of stayers, \(M_{iit}\) in (5) as in most empirical studies. Here we construct the main regressors as differentials: \( y_{ijt}^{d}=y_{it}-y_{jt}\) is the income differential and \( u_{ijt}^{d}=u_{it}-u_{jt}\) is the unemployment rate differential, where y is the log of provincial disposal income per capita (in 2014 constant dollars) and u is the annual provincial unemployment rate (in percentage).Footnote 8 We also investigate the impact of the second moment of the income distribution in origin and destination by introducing a measure of relative income inequality. Following Mayda (2010), we construct the origin province’s relative inequality as a measure of the inequality in the origin relative to the destination (i.e. the Gini coefficient in the origin divided by the Gini coefficient in the destination, \(g_{ijt}=\frac{g_{it}}{g_{jt}}\)) and include both linear and quadratic terms; the total impact of relative inequality is measured at equal inequality degrees for origin and destination. This choice is motivated by Borjas’ (1987) selection model, from which we infer that an increase in the origin’s relative inequality has a nonlinear effect on the emigration rate: given low values of the origin country’s relative inequality, as \(g_{ijt}\) rises, the emigration rate will increase. On the other hand, given high values of \(g_{ijt}\), the emigration rate will decrease.Footnote 9 Further, Beine and Parsons (2015) and Beine et al. (2019) stress the importance of taking into account of networks, i.e. the presence at destination of immigrants from the same origin. We measure networks, \( n_{ijt} \) by the accumulated (over time) net migration flow. The network presence lowers the migration costs of the next groups and the process continues as long as benefits exceed costs of migration, see Beine et al. (2011).Footnote 10

In what follows we apply four estimators to (17); namely the two-way fixed effects estimator (FE) with \(e_{ijt}=\alpha _{ij}+\theta _{t}+\varepsilon _{ijt}\), the 2D-PCCE estimator (\(PCCE_{G}\)) with \( u_{ijt}=\alpha _{ij}+\varvec{\gamma }_{ij}^{\prime }{\varvec{f}} _{t}+\varepsilon _{ijt}\), the triple-way fixed effects estimator (CTFE) with \(e_{ijt}=\alpha _{ij}+\theta _{it}+\theta _{jt}^{*}+\varepsilon _{ijt}\), and the 3D-PCCE estimator (\(PCCE_{GL})\) with \(u_{ijt}=\alpha _{ij}+ \varvec{\gamma }_{\circ j}^{\prime }{\varvec{f}}_{i\circ t}+ \varvec{\gamma }_{i\circ }^{\prime }{\varvec{f}}_{\circ jt}+ \varvec{\gamma }_{ij}^{\prime }{\varvec{f}}_{t}+\varepsilon _{ijt}\).Footnote 11 As an ex-post diagnostic tool, we report the CD test results applied to the residuals and estimates of the CSD exponent (denoted \(\alpha \) ).

Table 2 reports the panel gravity estimation results for the 90 province-pairs among the 10 Canadian provinces over the period 1976-2014 (39 years). As a benchmark, we first describe the results for FE. All the coefficients are statistically significant with expected signs. The impact of the income differentials (\(y_{ijt}^{d}\)) between origin and destination province is significant and negative at − 0.40, implying that the increase in \(y_{ijt}^{d}\) reduces the interprovincial migration flows. On the contrary, the impact of the unemployment differentials (\(u_{ijt}^{d}\)) is significant and slightly positive at 0.02, implying that the increase in \(u_{ijt}^{d}\) tends to boost the interprovincial migration flows. Further, the coefficient on the linear inequality (\(g_{ijt}\)) is positive at 4.65 while the coefficient on the quadratic inequality (\(g_{ijt}^{2}\)) is negative at -2.01. This is consistent with the selection model by Borjas (1987), suggesting that the effect of inequality depends on the size of emigration rates. In particular, an increase in the origin’s relative inequality has a non-monotonic effect: the impact is positive (negative) if there is a positive (negative) selection. The impact of network (\(n_{ijt}\)) is significant and positive at 1.12, showing that the presence of migrants of the same origin tends to encourage new inflows. Overall, these results are qualitatively consistent with a priori expectations, and with the previous studies, e.g. Coulombe (2006), Day and Winer (2006), Basher and Fachin (2008) and Beine and Parsons (2015). However, the CD test being applied to the residuals convincingly rejects the null of weak CSD, while \( \alpha \) is estimated at 1.003, implying the strong CSD. In this regard, we may conclude that the FE estimation results are likely to be less reliable.

Next, we find that the estimation results for \(PCCE_{G}\) are similar to those of FE. All the coefficients display the expected signs and are significant with smaller standard errors. Still, the \(PCCE_{G}\) results suffer from strong CSD in the residuals, though the estimate of \(\alpha \) is 0.832, substantially lower than the FE counterpart. A notable difference is that the impact of the network is now substantially reduced to 0.045 from 1.12 by FE.

Turning to the CTFE estimation results, we find that the coefficients are mostly significant and with expected signs. Surprisingly, the impact of \( g_{ijt}^{2}\) is slightly positive but insignificant. The impact of network is more positive at 1.21. Though the CD test still rejects the null of weak CSD in the residuals, \(\alpha \) is estimated at 0.782 with the confidence band, (0.713,0.852), suggesting the moderate level of CSD.

Table 2 Alternative estimation results for the gravity regression in (17)

Finally, the \(PCCE_{GL}\) estimation results demonstrate that the coefficients are all significant with expected signs. We still reject the null of weak residual CSD, but \(\alpha \) is estimated at 0.823 with the confidence band, (0.762,0.884), suggesting the moderate level of CSD. As described in Sect. 3, the CD test has been used as an ex-post model-selection tool, with a reduction in the value of the CD test statistic typically interpreted as an indication of an improved model specification. On the basis of these results, we select the \(PCCE_{GL}\) as the preferred estimator. There are a few differences between FE and \(PCCE_{GL}\) results. First, the impact of the unemployment differentials is now more than 3 times larger while the impact of relative inequalities are substantially smaller. Next, we notice that the \(PCCE_{GL}\) estimators, that explicitly deal with CSD through unobserved heterogenous factors, tend to produce a much smaller network effect than those in FE and CTFE, that ignore the presence of unobserved factors. This evidence may suggest that the network variable is endogenous due to unobserved MRM. As emphasised by Beine and Parsons (2015) and Özen at al. (2011), endogeneity might arise from the omission of unobserved factors that would be correlated with the network variable.Footnote 12

Fig. 3
figure 3

Time varying coefficients of rolling window-based estimations of model (17) over 2005–2014

Finally, we investigate the time varying patterns of the main coefficients by employing a rolling window-based estimation over the last ten years of the sample, 2005–2014, which are plotted in Fig. 3.Footnote 13 The effect of the unemployment differentials has been relatively stable and quite small over the whole period. The effect of the income differentials shows a peak at 2007 and then steadily declining until 2014. On the contrary, the total impact of \(g_{ijt}\)—computed as \(\beta _{3}+2\beta _{4}\)—has initially declined but started to increase from 2009 onwards, showing evidence in favour of a positive selection in more recent period. Further, the impact of network, albeit small, has shown a slightly increasing trend in the last few years. These time-varying patterns suggest that the rise in the internal migration flows, registered in Canada from 2009 onwards (see Fig. 1), is more likely to be associated with the relative income inequality and network presence rather than the conventional long-run determinants such as income and unemployment differentials.Footnote 14

Evidence on positive selection, in the spirit of Borjas’ (1987) selection model, is broadly in line with the interprovincial skill redistribution hypothesis in Coulombe (2006) and Coulombe and Tremblay (2009), according to which (i) the typical interprovincial migrants in Canada has higher skills than the non-migrants and (ii) the most mobile component of the population is the educated young people who move from the relatively poor and rural provinces to the relatively rich and urban provinces. Also, given that the net interprovincial migration flows are typically oriented towards Ontario, Alberta, and British Columbia, the increasing importance of the network effect shows that migrants are attracted by the big cities where they can benefit from the various externalities of their diaspora (see Beine et al. 2011 and Beine and Coulombe 2018). Hence, migration location choice is no longer shown to be only guided by the highest probability of employment. It is often argued that immigrants tend to cluster together because the presence of established network facilitates assimilation of new arrivals, both in the labour market and in the social environment of the host. Analysing a survey on recent international immigrants to Canada, Goel and Lang (2019) find that relatives and friends, already living in the host country, help recent immigrants to find their first jobs. A similar mechanism might work for interprovincial migration and this issue deserves further investigations.

4.3 Empirical results with asymmetrical push and pull factors

Mayda (2010) and Piras (2017) emphasise that it is rather arbitrary to treat push and pull factors symmetrically, which does not properly disentangle the different structural impacts that the same variables could have as push or pull factor. In this regard, we consider the alternative gravity regression of the migration flows:

$$\begin{aligned} m_{ijt}=\beta _{1}y_{it}+\beta _{2}y_{jt}+\beta _{3}u_{it}+\beta _{4}u_{jt}+\beta _{5}g_{ijt}+\beta _{6}g_{ijt}^{2}+\beta _{7}n_{ijt}+e_{ijt}, \end{aligned}$$
(18)

for \(i,j=1,\ldots ,N\) and \(t=1,\ldots ,T\). In particular, we define the income and unemployment variables separately for origin and destination such that \( y_{it}\) and \(y_{jt}\) are the logged per capita disposable income of the origin and destination while \(u_{it}\) and \(u_{jt}\) are the unemployment rates of the origin and destination.

Table 3 Alternative estimation results for the gravity regression in (18)

Table 3 reports the panel gravity estimation and diagnostic results for the gravity regression in (18) for the 90 province-pairs among the 10 Canadian provinces over the period 1976-2014. The FE estimator suffers from strong CSD while \(\alpha \) is estimated at 1.003. The CD test still rejects the null of weak CSD for both \(PCCE_{G}\) and \(PCCE_{GL}\), but the \(PCCE_{GL}\) estimators display much lower CSD, as supported by the smaller CSD exponent estimate, \({\hat{\alpha }}\) at 0.904 for \(PCCE_{G}\) and at 0.862 for \(PCCE_{GL}\). On the basis of these results, we focus on the \(PCCE_{GL}\) with the lowest CSD.Footnote 15

All the coefficients are statistically significant with expected signs. We find that an income increase in the origin, \(y_{it}\), will reduce the migration flows from i to j while an income increase in the destination, \(y_{jt}\), will raise the migration flows. Next, the higher unemployment in the origin, \(u_{it}\) will increase the migration flows from i to j, whereas the higher unemployment in the destination, \(u_{jt}\) will reduce migration flows. In both cases, the pull impacts are slightly stronger than the push impacts,Footnote 16 in line with the previous studies. Furthermore, as expected, the impact of the linear inequality (\( g_{ijt}\)) is positive, but the impact of the quadratic inequality (\( g_{ijt}^{2}\)) is negative, which is again consistent with the selection hypothesis by Borjas (1987).

The \(PCCE_{GL}\) estimation results are somewhat different from the FE and \( PCCE_{G}\) estimation results. First, the impacts of the income and unemployment at the origin are now larger in magnitude while those impacts at the destination are substantially smaller, such that the differences between them are much narrower. Importantly, we notice that the income coefficients by \(PCCE_{GL}\) suggest that the pull effects no longer dominate the push effects, and they are close to be rather symmetric, in line with the recent view in favour of a convergence process. This evidence might suggest that global factors only do not capture the local MRM, leading to misleading results. Second, the impacts of the inequality measures are larger than the FE estimates, but close to those by \(PCCE_{G}\). Third, both \(PCCE_{G}\) and \(PCCE_{GL}\) produce much smaller network impacts than FE. Overall, these estimation results are consistent with a priori expectations and with the previous evidence on the Canadian case (e.g. Coulombe 2006; Day and Winer 2006) and on the migration flows in general (e.g. Mayda 2010; Beine and Parsons 2015; Piras 2017).

Furthermore, we examine the time-varying slope coefficients through a rolling window-based \(PCCE_{GL}\) estimation over 2005–2014 and plot them in Fig. 4. The time-varying effects of unemployment rate have been rather stable and small throughout the period with the coefficients on \(u_{jt}\) being generally larger than those on \(u_{it}\). The impacts of both income at destination and income at origin have been much larger than those on unemployment. The time-varying effects of income at destination and income at the origin display an opposite sign, and the distance between them (in absolute terms) decreases from 2007 onwards. At the beginning of the period observed, \(y_{jt}\) has a larger impact on migration decision while in more recent years, \(y_{it}\) and \(y_{jt}\) have similar effects, implying that push and pull effects of per capita income rather approach symmetry. Similar to Fig. 3, the total impact of \(g_{ijt}\) initially declined but increased from 2007 onwards. The impact of network, albeit small, shows a slightly increasing pattern in the last few years. This confirms that the location choice of interprovincial migration seems to be no longer only guided by the conventional long-run determinants such as income and unemployment differentials.

Fig. 4
figure 4

Time-varying coefficients of rolling window-based estimations of model (18) over 2005–2014

5 Conclusions

In this paper, we have applied recent 3D panel data techniques to an analysis of the interprovincial migration flows in Canada from 1976 to 2014. We have extended the microfounded specifications proposed by Bertoli and Fernández-Huertas Moraga (2013) and Piras (2017) to a 3D framework with a hierarchical multifactor structure. The specification proposed allows to deal with two sources of cross section correlations, one driven by global factors and the other driven by local (origin/destination) factors. This structure is shown to be particularly relevant in the context of provincial (or regional) data where cross sectional entities within a single country are deeply integrated and interconnected.

Our empirical findings suggest that the interprovincial migration flows in Canada have been mostly affected by average disposal income and relative income inequality, in line with Borjas’ (1987) selection model. We also find that a smaller (albeit significant) effect is attributed to unemployment rate and network.

Further, the time-varying patterns of the slope coefficients derived through a rolling window-based estimation over the last ten years of the sample, 2005–2014, show that the rise in the internal migration flows, registered in Canada from 2009 onwards, is more likely to be associated with positive selection—in light of Borjas’ (1987) selection model—and the existence of networks of migrants of the same origin rather than with the conventional long-run determinants such as income differentials. Evidence on positive selection is broadly in line with the interprovincial skill redistribution described in Coulombe (2006) and Coulombe and Tremblay (2009). Data on migrants’ skills would be useful to further investigate the important nexus between self-selection of migrants and diaspora as discussed in Beine et al. (2011).

We conclude by noting an immediate avenue for future research. First, we might concern the application of the 3D hierarchical multifactor structure to international migration flows. Notably, migration patterns within country and across countries have different characteristics because domestic migration flows are generally less regulated by policy selection criteria or labour mobility. Next, if the data on transit migration were readily available as analysed in Artuc and Ozden (2018), then it is quite desirable to extend the 3D method to the 4D datasets of migration flows though the development of the 4D models with the hierarchical multifactor structure would be more challenging due to appropriately modelling several layers of components to account for effects across the higher dimensions.