Skip to main content
Log in

Impact of Weather Factors on Migration Intention Using Machine Learning Algorithms

  • Original Research
  • Published:
Operations Research Forum Aims and scope Submit manuscript

Abstract

A growing attention in the empirical literature has been paid on the incidence of climate shocks and change on migration decisions. Previous literature leads to different results and uses a multitude of traditional empirical approaches. This paper proposes a tree-based Machine Learning (ML) approach to analyze the role of the weather shocks toward an individual’s intention to migrate in the six agriculture-dependent-economy countries such as Burkina Faso, Ivory Coast, Mali, Mauritania, Niger, and Senegal. We performed several tree-based algorithms (e.g., XGB, Random Forest) using the train-validation-test workflow to build robust and noise-resistant approaches. Then we determine the important features showing in which direction they influence the migration intention. This ML-based estimation accounts for features such as weather shocks captured by the Standardized Precipitation-Evapotranspiration Index (SPEI) for different timescales and various socioeconomic features/covariates. We find that (i) the weather features improve the prediction performance, although socioeconomic characteristics have more influence on migration intentions, (ii) a country-specific model is necessary, and (iii) the international move is influenced more by the longer timescales of SPEIs while general move (which includes internal move) by that of shorter timescales.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

Gallup dataset is a paid dataset that we cannot make available publicly based on its copyrights.

Notes

  1. Black et al. [5] distinguishes migration, displacement, and immobility. Beine and Jeusette [1] refers to the “trapped population”.

  2. The major difference between the logistic regression from the ML approach and the regressions used in [6] is that the logistic regression from the ML approach runs a single regression including all features or covariates, while in [6], there are multiple runs of regressions (i.e., a run for each feature).

  3. R-squared can be computed using the McFadden’s R\(^{2}\) formula [12]. Bertoli et al. [6] use R-squared measure implemented in STATA [13]: \(1 - L_M/L_0,\) where \(L_M\) is the log-likelihood of the model and \(L_0\) is the log-likelihood of a null-model. A null-model is a model where we learn only from the target attribute with no predictor.

  4. For more detailed information, see Appendix 1 and [15].

  5. A dummy variable that represents categorical data.

  6. GADM: the Database of Global Administrative Areas.

  7. There are several ways to configure the year variable: (i) use the integer value for each year, (ii) subtract each year by the minimum year to have relatively smaller numbers starting with 0, and (iii) treat integer as a categorical variable and perform one-hot encoding. Here, we use the last approach.

  8. We used the R package correlationfunnel which is fast and offers visualizations to facilitate this work.

  9. The interviews are conducted in different months for different countries and the month of interview may be different for each year (Fig. 3).

  10. SPEI at 3 months timescale for May 2015 is a function of the sum of the climatic water balance of March, April, and May 2015.

  11. By construction, SPEI has a zero mean and a standard deviation of unity.

  12. To get a SPEI at 12 months timescale with lag 6 for an individual interviewed in May 2015, the SPEI value is the SPEI12 value 6 months ago in November 2014.

  13. We find that the results are similar with permutation feature importance. Refer to Fig. 19 in the Appendix.

  14. Longer timescales (\(\ge \)18 months) referred to https://climatedataguide.ucar.edu/climate-data/standardized-precipitation-evapotranspiration-index-spei.

  15. The international move’s question (Q2) actually asks people if they want to move permanently to another country.

  16. The economic activity in the countries we consider in this article highly depends on the agricultural sector. Knowing that the irrigation infrastructure is lacking and that these sectors are mainly rainfed, the weather conditions contribute greatly to agricultural production and income generation.

References

  1. Beine M, Jeusette L (2018) A meta-analysis of the literature on climate change and migration. J Demogr Econ 1–52. https://doi.org/10.1017/dem.2019.22

  2. Berlemann M, Steinhardt MF (2017) Climate change, natural disasters, and migration-a survey of the empirical evidence. CESifo Econ Stud 63(4):353–385. https://doi.org/10.1093/cesifo/ifx019

    Article  Google Scholar 

  3. Cattaneo C, Beine M, Fröhlich CJ et al (2019) Human migration in the era of climate change. Review of Environmental Economics and Policy 13(2):189–206. https://doi.org/10.1093/reep/rez008

    Article  Google Scholar 

  4. Millock K (2015) Migration and environment. Annu Rev Resour Econ 7(1):35–60. https://doi.org/10.1146/annurev-resource-100814-125031

    Article  Google Scholar 

  5. Black R, Arnell NW, Adger WN et al (2013) Migration, immobility and displacement outcomes following extreme events. Environ Sci Pol 27:32–43. https://doi.org/10.1016/j.envsci.2012.09.001

    Article  Google Scholar 

  6. Bertoli S, Docquier F, Rapoport H et al (2021) Weather shocks and migration intentions in Western Africa: insights from a multilevel analysis. J Econ Geogr https://academic.oup.com/joeg/advance-article-pdf/doi/10.1093/jeg/lbab043/41299221/lbab043.pdf

  7. Gallup (2015) Worldwide research methodology and codebook

  8. Harris I, Osborn TJ, Jones P et al (2020) Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci Data 7(1):1–18. https://doi.org/10.1038/s41597-020-0453-3

    Article  Google Scholar 

  9. Dell M, Jones BF, Olken BA (2014) What do we learn from the weather? The new climate-economy literature. J Econ Lit 52(3):740–98. https://doi.org/10.1257/jel.52.3.740

    Article  Google Scholar 

  10. Tjaden J, Auer D, Laczko F (2019) Linking migration intentions with flows: evidence and potential use. Int Migr 57(1):36–57. https://doi.org/10.1111/imig.12502

    Article  Google Scholar 

  11. Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive to global warming: the standardized precipitation evapotranspiration index. J Clim 23(7):1696–1718. https://doi.org/10.1175/2009JCLI2909.1

    Article  Google Scholar 

  12. McFadden D et al (1973) Conditional logit analysis of qualitative choice behavior. University of California, Institute of Urban and Regional Development

    Google Scholar 

  13. StataCorp L et al (2007) Stata data analysis and statistical software. Special Edition Release 10:733

    Google Scholar 

  14. Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49(2). https://doi.org/10.1145/2907070

  15. Provost F, Fawcett T (2013) Data science for business: what you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc

  16. Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econom 11(1):685–725. https://doi.org/10.1146/annurev-economics-080217-053433

    Article  Google Scholar 

  17. Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106. https://doi.org/10.1257/jep.31.2.87

    Article  Google Scholar 

  18. Athey S (2018) The impact of machine learning on economics. University of Chicago Press, pp 507–547. https://doi.org/10.7208/chicago/9780226613475.001.0001

  19. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd Edn. Springer Series in Statistics, Springer. https://doi.org/10.1007/978-0-387-84858-7

  20. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  21. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Krishnapuram B, Shah M, Smola AJ et al (eds) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. ACM, pp 785–794. https://doi.org/10.1145/2939672.2939785

  22. Snoek J, Rippel O, Swersky K et al (2015) Scalable Bayesian optimization using deep neural networks. In: Bach FR, Blei DM (eds) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, JMLR Workshop and Conference Proceedings, vol 37. JMLR.org, pp 2171–2180. https://dl.acm.org/doi/10.5555/3045118.3045349

  23. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  Google Scholar 

  24. Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240(4857):1285–1293. https://doi.org/10.1126/science.3287615

    Article  Google Scholar 

  25. Breiman L, Friedman J, Stone C et al (1984) Classification and regression trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis. https://doi.org/10.1201/9781315139470

  26. Cattaneo C, Peri G (2016) The migration response to increasing temperatures. J Dev Econ 122:127–146. https://doi.org/10.1016/j.jdeveco.2016.05.004

  27. Duan L, Street WN, Liu Y et al (2014) Selecting the right correlation measure for binary data. ACM Trans Knowl Discov Data (TKDD) 9(2):1–28. https://doi.org/10.1145/2637484

    Article  Google Scholar 

  28. Vicente-Serrano SM, Beguería S, López-Moreno JI et al (2010) A new global 0.5 gridded dataset (1901–2006) of a multiscalar drought index: comparison with current drought index datasets based on the palmer drought severity index. J Hydrometeorol 11(4):1033–1043. https://doi.org/10.1175/2010JHM1224.1

  29. Eslamian S (2014) Handbook of engineering hydrology: environmental hydrology and water management. CRC Press. https://doi.org/10.1201/b16766

    Article  Google Scholar 

  30. Wilhite DA, Svoboda MD (2000) Drought early warning systems in the context of drought preparedness and mitigation. Early warning systems for drought preparedness and drought management. Geneva: World Meteorological Organization, pp 1–21

  31. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106. https://doi.org/10.1007/BF00116251

    Article  Google Scholar 

  32. Djamba YK (2003) Gender differences in motivations and intentions to move: Ethiopia and South Africa compared. Genus 59(2):93–111

    Google Scholar 

  33. Athey S (2015) Machine learning and causal inference for policy evaluation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 5–6. https://doi.org/10.1145/2783258.2785466

  34. Athey S, Tibshirani J, Wager S et al (2019) Generalized random forests. Ann Stat 47(2):1148–1178. https://doi.org/10.1214/18-AOS1709

    Article  Google Scholar 

  35. Mayda A (2010) International migration: a panel data analysis of the determinants of bilateral flows. J Popul Econ 23:1249–1274. https://doi.org/10.1007/s00148-009-0251-x

    Article  Google Scholar 

  36. Fisher A, Rudin C, Dominici F (2018) Model class reliance: variable importance measures for any machine learning model class, from the Rashomon perspective, vol 68. Preprint at http://arxiv.org/abs/1801.01489

Download references

Acknowledgements

We thank Professor Frédéric Docquier for the reviews, suggestions, and earlier discussions of the project. The authors thank Dr. Shari De Baets for her feedback and comments.

Funding

This research is supported by the ARC Convention on “New approaches to understanding and modeling global migration trends” (convention 18/23-091). This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 754412. Open access funding provided by the University of Skövde.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to John O. R. Aoga or Juhee Bae.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Machine Learning Approaches

1.1 1.1 Data Preprocessing

We use the sample dataset in Table 4 as an illustrative example. This dataset has four features: age, household size (“hhsize”), having human network abroad (“mabr”), and the intensity of the drought (“drought”); and one target attribute representing the migration intention (“move”).

Table 4 A sample dataset with individual characteristics, drought index, and migration intention

The first step is data preprocessing. It allows cleaning up the data by handling missing data and scale/type-related problems. A scale-related issue occurs when variables are displayed in different scales, for example, year (e.g., [2010, 2016]) and age (e.g., [0, 100]). This problem can cause bias on ML models’ output and implementation inefficiency.

There are two types of variables, numerical and categorical variables, that may need preprocessing. The categorical variables contain labels instead of numerical values. Many ML algorithms only support numerical variables, often for the sake of implementation efficiency. Hence, it is recommended to convert these variables into numerical variables using one-hot encoding.

Definition

(One-hot encoding) It consists of creating new binary variables for the unique labels in the categorical variable.

It is well known that it produces bias to the model output when using the numerical variable inputs with different scales. We overcome this problem by binarizing these numerical variables.

Definition

(Data binarization) It comprises transforming a numerical variable into several binary variables. The binarization workflow is in two steps: (i) split the numerical variable into intervals and create a categorical variable by labeling each range. Then, (ii) use the one-hot encoding method to create the binary variables.

Example

Figures 12 and 13 show examples of binarization and one-hot encoding for the age and SPEI12 variables.

Fig. 12
figure 12

Example of one-hot encoding and binarization of the variable age

Fig. 13
figure 13

Example of one hot encoding and binarization of the 12 months timescale of SPEI variable

Generalization is an essential concept in ML. It refers to the ability of a method to classify unknown examples to the model correctly. For this, the dataset is split for training and testing the model in the data preprocessing step.

Definition

(Training set and test set) The training set is a part of the dataset used to train the model and the test set is the hold-out part of the dataset to test the model. Typically, 60 to 90% of the database is assigned as a training set while the rest as a test set.

To have a noise-free and robust model that generalizes well, the training and test sets are extracted iteratively from the dataset. This resampling procedure is called the cross-validation process.

Definition

(Cross-Validation) The cross-validation process consists of randomly splitting the dataset into K fairly equal samples \(S_1, S_2, \cdots , S_K\). Based on these samples, K folds are created, each containing training and testing sets. At the ith fold, the samples \(S_1, S_2, \cdots , S_K\), excluding \(S_i\), are merged to a training set and sample \(S_i\) is used as a testing set.

Example

Figure 14 shows an example of the second fold.

Fig. 14
figure 14

10-folds cross validation

1.2 1.2 Tree-Based Approaches

Decision tree method approximates the learning function f using decision trees.

Definition

A decision tree represents a set of conditions that satisfies the classification of instances. Paths from the root to the leaf represent classification rules.

Example

Figure 15 is an example of a decision tree using the sample dataset.

Fig. 15
figure 15

Example of a decision tree trained with the sample dataset in Table 4. The numbers from 1 to 14 are instance numbers from Table 4. Capitalized Y and N represent the moving intention for each instance

Decision tree algorithms classify instances from the root to the leaves by providing a classification for each instance to the leaves. Each node represents a test on the features, and each branch corresponds to a potential value of a feature.

Example

In the tree in Fig. 15, age is the root node. This node has three branches (young, middle, and old) representing the age values. The first leaf on at the leftmost of the tree represents all instances where individuals are young, and the drought is harsh, where people have a moving intention (move).

A decision tree is built by selecting the variable at each node that gives the best data split. This split is based on the measure of the impurity rate (obtained by calculating, for example, the entropy or the Gini index) of each variable. The best variable is the one with the lowest impurity rate. Typically, this measure favors splits that allow having the dominant or (strongly) discriminative label over the target attribute.

It is possible to represent a decision tree as a linear function [17]. This is closer to the way that social scientists represent a model. To do so, we represent each leaf of the tree as a variable (feature) of the linear model. This variable is the product of decisions from the root to the leaf. This model thus contains as many variables as there are leaves in the tree. These variables show how decision trees take into account the nonlinearity of the problem automatically.

Example

Let \(L_1, L_2,\cdots ,L_5\) be the variables of the linear model. These variables represent the leaves of the tree in Fig. 15 (from left to right of the tree). The leftmost leaf variable \(L_1\) is equal to \(L_1 = 1_{\text {age = young}\wedge \text {drought = harsh}}\). The variables \(L_3\) and \(L_5\) are equal to \(L_3 = 1_{\text {age = old}}\), and \(L_5 = 1_{\text {age = middle} \wedge \text {mabr = no}}\). Accordingly, the outcome (y) follows:

$$\begin{aligned} y = f(L) = \beta _1L_1 + \beta _2L_2 + \beta _3L_3 + \beta _4L_4 + \beta _5L_5 + \epsilon \end{aligned}$$
(A1)

As in the example, building and using decision trees (DT) are straightforward and explainable. However, in practice, they might be inaccurate [19]. Thus, several other tree-based methods have been proposed. Random Forest (RF) [20] and eXtreme Gradient Boosting (XGB) [21] methods are well known and widely used.

Definition

(Random Forest) Random forest consists of several decision trees that operate together as an ensemble. This ensemble of trees is called a forest. Each tree classifies an instance in the forest, and the class label of this instance is decided by a majority vote. Each tree is built on a randomly selected (with replacement) sample of the dataset and a random number of features.

Example

With the DT example in Fig. 15, instance 1 from our sample example is classified as the class label number (i.e., the individual with instance number 1 has an intention to move). With RF that contains five trees, we classify this instance with each tree and take the majority-class label. Assuming that we have these predictions {Yes, Yes, Yes, No, No}, RF classifies this instance as Yes.

Random forest considers the predictions of each tree to have the same weight. By contrast, XGB does not make this assumption, thus, dynamically assigns a certain weight to each tree and instance. At each step of the forest construction, a new tree is added to address the errors made by the existing trees.

By constructing decision trees, one may wonder how deep it needs to go to achieve a better classifier. For a forest, how many trees does it need and how many features must be selected? Basically, in ML, these parameters are determined dynamically by trying several sets of parameters. This process is called parameter tuning. In this paper, we used Bayesian Hyperparameter Optimization (BHO) [22].

Definition

(Bayesian Hyperparameter Optimization) It consists of testing the models on several parameters and associating each set with a probability to obtain the best performance. A Bayesian model (i.e., probability model) is then used to select the most promising parameters.

1.3 1.3 Performance Evaluation

In supervised learning, models are evaluated by making one-on-one comparisons between the predicted outcome (\(\hat{y}\)) and the real outcome (y). This is a benefit of ML over parameter estimation, where the estimation is usually based on the assumptions made from the data-generating process to ensure consistency [17].

For comparison, in ML, we typically build a confusion matrix.

Definition

(confusion matrix) A confusion matrix is a matrix that compares the predicted values to the ground-truth. It contains four values, namely true positive (actual observation “Yes” and predicted “Yes”), false positive (actual observation “No” but predicted “Yes,” false alarm), true negative (actual observation “No” and predicted “No”), and false negative values (actual observation “Yes” but predicted “No”).

Example

Figure 16 shows the predicted move intention using the decision tree (DT) and the confusion matrix comparing these predictions to the observed (actual) move intention.

Fig. 16
figure 16

Model performance evaluation with Precision, Recall, Accuracy, and AUC based on the confusion matrix values

Based on the confusion matrix, various performance metrics can be computed. The common ones are accuracy, precision, and recall.

Definition

(Accuracy - Precision - Recall) The accuracy is a ratio of correctly predicted observations to the total number of observations. It is an intuitive measure, but only when false positive and false negatives are not too different. Instead, precision shows the ratio of correctly predicted positive observations to the total predicted positive observations, while recall is the ratio of correctly predicted positive observations to all accurate (or true) observations. The formulas are available in Fig. 16 with the confusion matrix. These measurements have values between 0 and 1 (the higher, the better performance).

Predicted class labels typically involve a user-defined threshold (e.g., 0.5). By convention, the probability lesser or equal to the threshold is considered as a “No” and otherwise a “Yes.” Differently defined threshold leads to different predictions. The Area under the ROC (Receiver operating Characteristics) curve (AUC) [23, 24], another model performance metric, is used to evaluate the performance regardless of any classification threshold.

Definition

(ROC and AUC) A ROC curve, a two-dimensional graph, is generated by plotting the false-positive fraction (x-axis) against the true-positive fraction (y-axis) of a model for each possible threshold value. The ROC curve shows how well a model classifies binary outcomes. The AUC (Area under the curve), as its name implies, is the area under the ROC curve. Typically, it is computed when a single value is needed to summarize a model’s performance to undertake comparisons. The AUC value is also between 0 and 1 (the higher, the better performance).

Example

Figure 16 illustrates the ROC curve and the AUC of a decision tree (DT). The AUC of this classifier is 0.89 (i.e. classifier performs well).

In this paper, we mainly use AUC and precision to determine which method to focus on.

1.4 1.4 Output Interpretation: Feature Importance and Partial Dependence Plots (PDP)

The features X used to estimate f in the equation \(f(X) = y\) are rarely equally relevant. Typically, only a small subset of the features is relevant. Hence, after training the model, the Relative Feature Importance (RFI) method is used to determine the relevant features. RFI was introduced by [25] for tree-based learning methods.

Definition

(RFI) RFI consists of, (i) for each internal node of a tree T, compute the contribution of each feature to the prediction, (ii) then sum its contributions for each feature, and (iii) arrange the features accordingly.

To calculate the importance \(I_j\) of the feature j (at node j) in a decision tree (A2), five elements are needed: the numbers of “Yes” (\(w_j^{Yes}\)) and “No” (\(w_j^{No}\)) instances, the total number of instances (\(w_j = w_j^{Yes} + w_j^{No}\)) at node j, the contribution of j (\(c_j =\! \sum _{i \in \{\text {Yes, No}\}} w_j^{i}\)), and the importance of node j (\(n_j = w_j c_j - \sum _{k \in \{\text {children of }j\}}w_k c_k \)).

$$\begin{aligned} I_j = \frac{n_j}{\sum _{i \in \{\text {all feature nodes}\}} n_i} \end{aligned}$$
(A2)

Example

Figure 17 shows how we compute the five elements needed to compute the importance of the feature age, which results in 0.977 using (A2):

$$\begin{aligned} I_{{age}} = \frac{n_{{age}}}{n_{{age}}+n_{{drought}}+n_{{mabr}}} = \frac{-1108}{-1108-18-8} = 0.977 \end{aligned}$$
(A3)

In a single decision tree, it is clear that the most important feature is the feature at the root node. In a forest, (A2) is generalized as follows:

$$\begin{aligned} RI_j = \frac{\sum _{t \in \{\text {forest}\}} n_j^t }{\sum _{t \in \{\text {forest}\}}\sum _{i \in \{\text {all feature nodes of } t\}} n_i^t} \end{aligned}$$
(A4)
Fig. 17
figure 17

The five elements needed to compute the feature importance in DT in Fig. 15

RFI has become widespread and is used for other ML methods. In order to understand how these important features influence the outcome y, one uses the Partial Dependency Plots [19], Chap. 14].

Definition

(Partial Dependence) Assume the features \(X = X_1, X_2,\cdots , X_p\), indexed by \(P = \{1, 2,\cdots ,p\}\). Let S and its complement R be subsets of P, i.e., \(S,R \subset P \wedge S\cup R = P \wedge S\cap R = \emptyset \). Assuming that \(f(X) = f(X_S, X_R)\), the partial dependence of f(X) on the features \(X_S\) is,

$$\begin{aligned} PD_S(X_S) = E_{X_R} f(X_S, X_R) \approx \frac{1}{N} \sum ^N_{i=1} f(X_S, x_{iR}) \end{aligned}$$
(A5)

This is a marginal average of f describing the effect of a chosen set of features S on f. It is approximated as the average over the N instances in the training set (X) of the prediction of each instance (\(x_{iR}\)) occurring in the complementary set \(X_R\).

The computation of (A5) requires a pass over the data for each set of joint values of \(X_S\). This can be computationally intensive, and therefore, the partial dependency is usually not calculated with more than three features. Fortunately, partial dependence with only one feature is often informative enough, and it simplifies the calculation with a discrete feature. In practice, for a discrete feature with two class labels “yes” and “no,” we only compute \(PD_{S}(X_S = yes)\) and \(PD_{S}(X_S = no)\).

Example

Figure 18 shows how we compute the partial dependence in DT (Fig. 15) on a feature “mabr” (human network abroad).

Fig. 18
figure 18

Partial Dependence computation for the “mabr” feature (connections abroad)

From the different values used to calculate the partial dependence, we can draw a chart with the tested values in x-axis against the partial dependence output in y-axis. The plot’s role is to show in which direction (towards label ’Yes’ or ’No’) each feature value drives the outcome y. The plot visualizes the effect of a feature related to the average effects of other features.

Appendix 2 Additional Figures

Figure 19 shows male and age as top influencing features according to the permutation feature importance measures, similar to the results from the Relative Feature Importance (RFI) method. We also observe international move is more affected by longer SPEIs (e.g., 18, 24) while general move is affected by shorter SPEIs (e.g., 2, 3, 12) which aligns with previous findings. Darker box plot shows the uncertainty from the permutations. Permutation feature importance measures the increase of a model’s prediction error after a certain feature’s value is permuted. The permutation breaks the relationship between the feature and the true outcome. A feature is considered “important” if the change of a feature value increases the model error since it means that the model relies on that feature for prediction. Fisher et al. [36] proposed “model reliance” measures and a model-agnostic permutation feature importance algorithm.

Fig. 19
figure 19

Male and age appear as top features based on the permutation feature importance

Fig. 20
figure 20

Feature importance (dot size) based on different SPEI timescales and lags with the distribution of those by each lag (top) and each SPEI (right)

Figure 20 shows the feature importance distributions of the six countries targeting international move over the seven SPEI timescales (i.e., 1, 2, 3, 6, 12, 18, 24) and 49 lags (i.e., 0–48).

Appendix 3 Terminology Comparison

Table 5 compares the common terminology used in social sciences and machine learning.

Table 5 The mapping of the terminology used in social science and machine learning

Appendix 4 GWP Questions

Table 6 describes the World Poll questions used to measure the opinions of the interviewees.

Table 6 GWP questions

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aoga, J.O.R., Bae, J., Veljanoska, S. et al. Impact of Weather Factors on Migration Intention Using Machine Learning Algorithms. Oper. Res. Forum 5, 8 (2024). https://doi.org/10.1007/s43069-023-00271-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43069-023-00271-y

Keywords

Navigation