There is a longstanding history of the study of human mobility patterns, with reliance on census data, which spans the works of Ravestein who focused on migration within the UK [1], Zipf’s intercity \(P_{1}P_{2}/D\) movement [2], the intervening opportunities model [3, 4], urban travel demands and regional modeling [5, 6], all of which contributed to the understanding of economic processes [7, 8], urban planning, traffic engineering [9, 10], and the spreading of infectious diseases [1114]. At present, the ubiquity of mobile phone usage data and credit card transactions made human mobility more amenable to mathematical analysis, and therefore, lead to the discovery of underlying patterns of motion described as random walks and particularly Lévy flights [1517]. It also revealed universal behavior, which was explained by the gravity and the radiation models [1824], which also describes trade flow [7, 25] among other types of traffic. The first links the flux between origin and destination cities to powers of their initial populations with an inverse dependence on their pairwise distance given by:

$$ T_{ij} \propto\frac{p_{i}^{\alpha} p_{j}^{\beta}}{d_{ij}^{\gamma}}, $$
(1)

where \(T_{ij}\) is the influx between the source city i and the destination city j, \(p_{i}\) and \(p_{j}\) are their respective initial populations, \(d_{ij}\) is their pairwise distance, α, β, and γ are the model parameters. On the other hand the radiation model is a parameter-free model given by:

$$ T_{ij} = T_{i} \frac{p_{i} p_{j}}{(p_{i} + s_{ij}) (p_{i}+p_{j} + s_{ij})}, $$
(2)

where \(T_{i} = \sum_{i \neq j} T_{ij}\) the total number of commuters at location i, and \(s_{ij}\) is the population density in a circle centered at i.

These two competing models, with variants [26], have been shown to properly predict human mobility and their relative likelihood has been explored [23, 27]. However, their validity to model mobility under stressful conditions beyond the daily commute or travel has not been examined. Moreover, there is a clear evidence of the breakdown of models based on stationary events in explaining behavioral changes in human activity under severe conditions [2834].

In addition, the understanding of human flows under external perturbations and abrupt changes such as panic, stampedes and motion beyond the stationarity of daily activity, is pivotal to emergency response and crisis management and thus the need for models which capture these dynamics is unequivocal. Of particular interest is the refugee migration, which is a form of human mobility under life threatening circumstances. Unlike the long timescale decision underlying the job selection process according to which the origin and destination are determined, refugees’ migration is a pressing decision made on a very short timescale to flee a endangering environment. In this paper we are specifically concerned with the displaced Syrian population to neighboring countries and particularly to Lebanon, a highly vulnerable country to the crisis spillover because of the large refugees’ influx straining the infrastructure and further destabilizing Lebanon’s socioeconomic conditions [3537]. The unavailability of the full migration data out of Syria restricted our data exploration and model fitting to a modification of the gravity model and made it impossible to rule out the explanatory potential of the parameter-free radiation model. We thus present an adjusted gravity model incorporating the peculiarities of refugees’ mobility using the United Nations Human Rights Council (UNHCR) data made available to our lab through an official data sharing request documenting their influx up till 2014 when the Lebanese–Syrian borders were closed [36]. The data chronicles the cumulative refugees’ count until May 2014 detailing of their origins and destinations.

1 Theory

In econometrics, correction had been introduced to the gravity model as it failed to take into consideration the multilateral resistance terms in trade. The equation was either augmented with an exponential term accounting for the importer exporter difference, known as the Anderson–van Wincoop Gravity equation or with a stochastic effect [7]. Physically this is equivalent to shifting from an isolated cities paradigm to a description that takes into account the cities interactions. Analogously, Coulomb’s law, a special case of Eq. (1) for \(\alpha, \beta= 1\) and \(\gamma= 2\), which describes the interaction between two isolated particles with charges \(p_{i}\) and \(p_{j}\) is invalidated when the particles are in an electrolyte or a plasma [38]. More precisely, the surrounding particles screen the potential and subsequently their effect is equivalent to the renormalization of the bare charge from \(p_{i} \rightarrow p_{i} e^{-r/l}\), where l is the Debye Hückel length.

Based on the above, two basic contentions underly our paper: when medium to long range mobility between non isolated cities is studied, the interaction between the source cities cannot be neglected and this is the reason why we suspect that the gravity model fails to explain the migration data. Thus, we argue that the latter is the ideal limit behavior and, therefore, accounting for the presence of multiple nearby source cities and their effect in shielding or accentuating the flux, allows to properly predict the migration as is the case of an electron in a plasma where the generated shielding depends on the configuration of the surrounding charged particle [39]. Our second argument is based on the symmetry breaking nature of the refugees’ migration. The flux becomes almost unidirectional as people from the host cities are unlikely to leave to unsafe destinations, that is: \(T_{ij} \gg T_{ji}\) even when their populations \(p_{i}\) and \(p_{j}\) are comparable, where i and j are Syrian and Lebanese cities respectively. For this we rewrite the gravity model with the renormalized effect:

$$ T_{ij} = A \frac{p_{i}^{\alpha} p_{j}^{\beta}}{d_{ij}^{\gamma}}e^{\sum _{k \in\mathcal{O}} d_{ik}/\delta_{k}}, $$
(3)

where \(d_{ik}\) is the distance between the origin city i and the rest of the origins k and \(\delta_{k}\) is a model parameter. Each origin city is tagged with a corresponding \(\delta_{k}\), which shields the flux when it is negative and accentuates it for positive values of the parameter. Therefore, \(|\delta_{k}|\) can be thought of as a characteristic length marking the influence of the origin city k on the migration from i, while the sign of \(\delta_{k}\) determines its effect on the increase/decrease of the migration out of i. More precisely, when \(|d_{ik}/\delta_{k} | \ll1\), the exponential term barely changes the effective population of i. On the contrary, when \(|d_{ik}/\delta_{k} | \gg1\) the migration out of i is highly affected by city k either positively or negatively depending on the sign of \(\delta_{k}\).

Moreover, Eq. (3) incorporates the highly non-symmetrical nature of the flux. Explicitly, when i denotes a Lebanese city and j is a Syrian destination the exponential term introduces the possibility of obstructing the flow in this direction for negative values of \(\delta_{k}\).

Finally, it should be noted than unlike the Debye–Hückel model where a single length-scale characterizes the system’s particles, in our model each city is characterized by its own \(\delta_{k}\) and consequently the analogy with the DH model is incomplete.

2 Methods and results

Figure 1 shows the chord diagram depicting the flow from Syria to Lebanon with the width of the chords characterizing the flux intensity. The diagram reveals preferential migration from certain origin cities to specific destinations, which is explained by our model as we will show in what follows. The diagram was produced using the Migration data.xlsx.

Figure 1
figure 1

The chord diagram depicts the flow between Syrian districts and Lebanese governorates

In order to test our model and compare it with both the radiation and the gravity models we used Google API to calculate the cities’ pairwise distance, which is the shortest path over the road network; the Distance.csv file includes the Lebanese to Syrian cities’ distances while Syria-Syria-Distance.csv is the inter Syrian cities distance matrix. Additionally, in the face of both Lebanon and Syria’s lack of census data we retrieved their estimates from the City population website [40]. Moreover, we gathered the population density from the Gridded Population of the World, Version 4 (GPWv4) as a raster file with 1 km resolution [41]. The latter was then intersected with Lebanon, Syria, Jordan, Turkey and Iraq’s shapefiles to produce the population density raster for this region of interest. Subsequently, \(s_{ij}\) is assigned the mean value of the population density inside the circle centered at Syrian origin city i with radius \(d_{ij}\) where the latter is the distance from i to a Lebanese city j; the results are tabulated in DistanceMatrixSij.csv. We then performed a regression on the data of \(T_{ij}\) predicted by the radiation model against the real refugee fluxes. The model’s Multiple R-squared is 0.0152 with an Adjusted R-squared of 0.01235. However, we still cannot not rule out the radiation model since \(T_{i}\), which is the total number of refugees leaving location i, could not be calculated using this UNHCR data, which only provides the total number of refugees fleeing to Lebanon and excludes the ones fleeing to other neighboring countries. Nevertheless, we suspect that the asymmetry in the flux can no longer be explained by the radiation model since its derivation relied upon integrating over a single benefits distribution function \(p(z)\), which should be modified to incorporate the effect of war. Similarly we performed a regression against the data for the gravity model whose Multiple R-squared is given by 0.4533 with an the Adjusted R-squared: 0.4486.

Finally, our model’s regression results are given by a Multiple R-squared of 0.8082 and an Adjusted R-squared of 0.8, which clearly outperforms both models. This was achieved through 10,000 successive sampling in each iteration of which the model is trained over 80% of the data and tested over the rest and subsequently the one with minimal test error was chosen. Further, we performed an exhaustive test for model selection using both backward and forward stepwise regressions and subsequently our variables selection was made according to the Akaike Information Criterion (AIC); its minimum is given by 32.098, which corresponds to including all the variables in Eq. (3) and thus the removal of variables from our model does not improve the predictions.

Moreover, it should be noted that the variables \(d_{ij}\) are inherently interdependent since they mark inter-cities distances; that is \(d_{ik}\) is related to a combination of \(d_{ij}\) and \(d_{kj}\). In the case of predictors’ multicollinearity their coefficient’s variances can be highly sensitive to changes in the model as is the case when more migration data at a finer spatial resolution is added, which increases the number of origin cities k. The stability in the coefficients’ variances can be guaranteed when their high Variance Inflation Factor (VIF), which is signature of the multicollinearity, is countered by high regressor’s variability. More precisely, the efficiency of extrapolating the fitted model to newly acquired data is not affected by the presence of multicollinearity as long as the new data and the one on which the regression model is built share the same multicollinearity pattern [42], which is the case here since the data of the potential new cities include equally multi-collinear distances variables. Equivalently, the scope of the model is restricted to the range of predictors that exhibit the same multicollinearity pattern.

Figure 2 shows the UNHCR data together with the red curve fitting the logarithm of Eq. (3) given by \(\log{T_{ij}}\) as a function of the \(\log{A} + \alpha\log{p_{i}} + \beta \log{p_{j}} - \gamma\log{d_{ij}} + \sum_{k \in\mathcal{O}} d_{ik}/ \delta_{k}\). Table 1 summarizes the values of \(\delta_{k}\) while Fig. 3 shows the map of Syria with circles centered at the source cities’ with radii proportional to \(\delta_{k}\) and color coded by the sign of their corresponding \(\delta_{k}\).

Figure 2
figure 2

\(\log{A} = 61 \pm11\), \(\alpha= 1.76 \pm0.25\), \(\beta= 1.13 \pm0.06\), \(\gamma= 2.21 \pm0.20\) and \(\delta_{k}\) are given in Table 1 and the Multiple R-squared: 0.8082

Figure 3
figure 3

The map shows the districts’ center with a circumscribing circle whose radius is proportional to \(1/\delta_{k}\) and its color to the sign of the latter

Table 1 The table shows the values of \(\delta_{k}\)

3 Discussion

The highly populated provinces in Syria, Aleppo, Homs, Idleb, Rural Damascus, and Hama contributed the most to the influx to Lebanon as shown in Fig. 1. Those also witnessed the major combats. In order to understand these patterns some socioeconomic background will be drawn [43, 44]. Before the war, Syrians sought job opportunities in Lebanon, and particularly the rural population of Idleb, which then hosted relatives and contacts fleeing the war. Consequently, this explains the scatter of the Idleb refugees across the Lebanese territories. This migration is mainly linked to a network of workers who chose their destinations based on the market’s demand, and subsequently after the war their families followed them to seek refuge.

Refugees from Homs are concentrated in the nearby cities of Akkar, Tripoli, and Koura all of which are in the neighboring Northern Lebanon and share a common prevalent sunni denomination, while those migrating from Aleppo are settled in Zahleh and in Beirut mainly. Here a distinction should be made between the rural and the urban populations of the province. The Syrian–Armenian refugee community coming from Aleppo settle in Burj Hammoud in Beirut, which is dominantly Lebanese-Armenian, and is drawn to the area through family ties and shared craftsmentship while city dwellers and businessmen head to downtown Beirut. Conversely, those coming from the rural areas in Aleppo settle in the Bekaa. Additionally, the influx from Soueida to Chouf reveals the family ties connecting the Druze community across borders [45].

Despite these peculiarities and the different socioeconomic factors driving the migration, the results of our model revealed the existence of two types of origin cities depending on the sign of their corresponding \(\delta_{k}\). We suspect that this is the result of the interplay between areas controlled by the central government and areas controlled by its opponents, and is thus equivalent to an internally drawn border constraining migration and defining its routes. To check this hypothesis we compared our findings with reports delineating areas of control and show that our findings are strikingly in agreement with these maps [45]. In the latter, cities under central government control are shown in light purple and these correspond to the cities with positive \(\delta_{k}\) in our model, which we denote by \(\mathcal{R}\), while the others held by all the regime’s opponents correspond to cities with negative \(\delta_{k}\) denoted by \(\mathcal{RO}\). Particularly, when people are far from a region \(\mathcal{R}\) under government control they are more likely to migrate while the opposite effect occurs when they are far from \(\mathcal{RO}\). More precisely, when \(d_{ik}/{\delta_{k}} \gg0\), that is when the origin city i is very far from k, where the latter is under government control the migration out of i intensifies. Conversely, if a city i is distant from k where the latter is under \(\mathcal{RO}\) the likelihood of migration decreases. Thus, the cities with positive \(\delta_{k}\) increase the migration out of the source cities, while those with negative \(\delta_{k}\) have the contrary effect. Therefore, our parameter \(\delta _{k}\) is a proxy to understanding how the local population perceives the safety and security of their origin cities in relation to their remoteness from \(\mathcal{R}\) and \(\mathcal{RO}\).

Conclusively, the Debye–Hückel theory proved to have correspondences with the sociology of human migration under duress. In analogy with other models borrowed from physics, the“microscopic details” of the system under scrutiny turned out of be irrelevant; that is of little effect on the macroscopic process [46, 47]. In the case of migration this entails that the individual level variability of the refugees, their socio-encomonic backgrounds and the geopolitics of war seem to be averaged out and aggregated into an exponential term. The model suggests this is achieved through the refugee’s scanning of space and moving towards the perceived safe destination.

4 Conclusion

In this paper, we have presented a model for refugees’ migration, which is based on the idea of interaction between source cities. Effectively, this resulted in a renormalization in the source cities’ population as our model suggested. Consequently, cities were classified according to the sign of their corresponding \(\delta_{k}\). This sign difference was linked to the interplay between different areas of control leading to a space dependent migration undergoing varying degrees of friction. Our model thus represents an attempt to predicting human mobility in relation to space and its fragmentation between the fighting parties. The analysis of these patterns should also complemented with migration data to other neighboring countries, which we did not have access to.