Skip to main content

Advertisement

Log in

School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach

  • Research Article
  • Published:
Journal of Computational Social Science Aims and scope Submit manuscript

Abstract

Designing early warning systems through machine learning (ML) models to identify students at risk of dropout can improve targeting mechanisms and lead to efficient social policy interventions in education. School dropout is a culmination of various factors that drive children to leave school, and timely policy responses are most needed to address these underlying factors and improve school retention of children over time. However, applying ML approaches to school dropout prediction is an important challenge, especially in low-income countries, where data collection and management systems are relatively more prone to financial and technical constraints. For this reason, this study suggests using already collected household panel data to predict the probability of school dropout and explore feature importance for primary school children in Malawi through ML models. A rich set of variables is obtained in this study from the household data and used to build Random Forest (RF), least absolute shrinkage and selection operator (LASSO), Ridge and multilayer neural network (MNN) models. The study further explores how performance metrics differ when we embed the training samples' weights representing frequency in sampling design into the cost function of these ML models to discuss the implications of using household data in computational social science. LASSO and MNN models trained with sample weights become more prominent due to their higher recall rates of 80.6% and 78.8%. Compared to the baseline model trained with sample weights, the recall rate gained is roughly 56 percentage points using LASSO and 54 percentage points using MNN. Also, comparing LASSO and MNN trained with and without sample weights reveals that training models with sample weights increase the recall rate roughly by 11 percentage points for LASSO and 12 percentage points for MNN. Lastly, the paper provides a comprehensive and unified approach to better interpret the models using a game-theoretic approach – SHapley Additive exPlanations (SHAP) – to quantify feature importance. As a result, socio-economic characteristics of children, such as working in household farming and father's education level, are among the most important features contributing to the probability of school dropout in ML models. This study argues that the weighted sample structure of household data and its wide range of variables explored through the SHAP method for feature importance can enrich the literature and yield valuable results to harness data science for society.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Source: Authors' own visualisation based on the Integrated Household Panel Survey Rounds 2016 and 2019

Fig. 5

Source: Authors' own visualisation based on the Integrated Household Panel Survey Round 2016

Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability statement

The dataset analysed during the current study is not publicly available as it contains proprietary information that the authors acquired through a license. Nevertheless, the data can be requested from the World Bank Microdata Library Central Data Catalogue. Information on how to obtain it and reproduce the analysis is available from the corresponding author on request.

Notes

  1. In this adult equivalence scale, the first household member receives one, the number of each additional adult is multiplied by 0.7 and the number of each child is multiplied by 0.5. Please see [63] for further details.

  2. For further information about the method of construction for FCS and the standardised food group weights, please see the International Dietary Expansion Project [64].

  3. ‘Ganyu’ is used in Malawi to describe rural labour relationships that are mostly informal and short-term.

References

  1. UNESCO Institute for Statistics (2019). Out-of-school children, adolescents and youth: Global status and trends. Fact Sheet no. 56. UIS/2019/ED/FS/56. Retrieved September 7, 2022 from: http://uis.unesco.org/sites/default/files/documents/new-methodology-shows-258-million-children-adolescents-and-youth-are-out-school.pdf

  2. Huisman, J., & Smits, J. (2015). Keeping children in school: Effects of household and context characteristics on school dropout in 363 districts of 30 developing countries. SAGE Open. https://doi.org/10.1177/2158244015609666

    Article  Google Scholar 

  3. Breton, T. R. (2004). Can institutions or education explain world poverty? An augmented Solow model provides some insights. Journal of Socio-Economics, 33, 45–69. https://doi.org/10.1016/j.socec.2003.12.004

    Article  Google Scholar 

  4. World Bank. (2020). The human capital index 2020 update : Human capital in the time of COVID-19. Washington, DC.: World Bank Retrieved September 10, 2021 from https://openknowledge.worldbank.org/handle/10986/34432

  5. Backman, O. (2017). High school dropout, resource attainment, and criminal convictions. Journal of Research in Crime and Delinquency, 54(5), 715–749. https://doi.org/10.1177/0022427817697441

    Article  Google Scholar 

  6. Bjerk, D. (2011). Re-examining the impact of dropping out on criminal and labor outcomes in early adulthood. (No. 5995). Bonn: IZA – Institute of Labor Economics. Retrieved September 10, 2021 from: https://www.iza.org/en/publications/dp/5995/re-examining-the-impact-of-dropping-out-on-criminal-and-labor-outcomes-in-early-adulthood

  7. Dragone, D., Migali, G., & Zucchelli, E. (2021). High school dropout and the intergenerational transmission of crime. (No. 14129). Bonn: IZA Institute of Labour Economics. Retrieved September 10, 2021 from: https://docs.iza.org/dp14129.pdf

  8. Campolieti, M., Fang, T., & Gunderson, M. (2010). Labour market outcomes and skill acquisition of high-school dropouts. Journal of Labour Research, 31, 39–52. https://doi.org/10.1007/s12122-009-9074-5

    Article  Google Scholar 

  9. Catterall, J. S. (2011). The societal benefits and costs of school dropout recovery. Education Research International. https://doi.org/10.1155/2011/957303

    Article  Google Scholar 

  10. Mussida, C., Sciulli, D., & Signorelli, M. (2019). Secondary school dropout and work outcomes in ten developing countries. Journal of Policy Modeling, 41, 547–567. https://doi.org/10.1016/j.jpolmod.2018.06.005

    Article  Google Scholar 

  11. Kabeer, N., & Mahmud, S. (2009). Imagining the future: Children, education and intergenerational transmission of poverty in urban Bangladesh. IDS Bulletin, 40(1), 10–21. https://doi.org/10.1111/j.1759-5436.2009.00003.x

    Article  Google Scholar 

  12. Bird, K., Higgins, K., & McKay, A. (2010). Conflict, education and the intergenerational transmission of poverty in Northern Uganda. Journal of International Development, 22(8), 1183–1196. https://doi.org/10.1002/jid.1754

    Article  Google Scholar 

  13. Rose, P., & Dyer, C. (2008). Chronic poverty and education: a review of the literature. Chronic Poverty Research Centre Working Paper No. 131. https://doi.org/10.2139/ssrn.1537105

  14. Moses, E. (2011). Quality of education and the labour market: A conceptual and literature overview. Stellenbosch Economic Working Papers: 07/11. Matieland: South Africa.

  15. Boccanfuso, D., Larouche, A., & Trandafirc, M. (2015). Quality of higher education and the labor market in developing countries: Evidence from an education reform in Senegal. World Development, 74, 412–424. https://doi.org/10.1016/j.worlddev.2015.05.007

    Article  Google Scholar 

  16. Haimovich, F., Vazquez, E., & Adelman, M. (2021). Scalable early warning systems for school dropout prevention: Evidence from a 4.000-school randomized controlled trial, Documento de Trabajo, No. 285, Universidad Nacional de La Plata, Centro de Estudios Distributivos, Laborales y Sociales (CEDLAS), La Plata.

  17. Wilson, S. J., & Tanner-Smith, E. E. (2013). Dropout prevention and intervention Programs for improving school completion among school-aged children and youth: A systematic review. Journal of the Society for Social Work and Research, 4(4), 357–372. https://doi.org/10.5243/jsswr.2013.22

    Article  Google Scholar 

  18. Christenson, S. L., & Thurlow, M. L. (2004). School dropouts. Current Directions in Psychological Science, 13(1), 36–39. https://doi.org/10.1111/j.0963-7214.2004.01301010.x

    Article  Google Scholar 

  19. Chung, J. Y., & Lee, S. (2018). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030

    Article  Google Scholar 

  20. Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47. https://doi.org/10.1080/21568235.2020.1718520

    Article  Google Scholar 

  21. Huo, H., Cui, J., Hein, S., Padgett, Z., Ossolinski, M., Raim, R., & Zhang, J. (2020). Predicting dropout for nontraditional undergraduate students: A machine learning approach. Journal of College Student Retention: Research, Theory & Practice. https://doi.org/10.1177/1521025120963821

    Article  Google Scholar 

  22. World Health Organization (WHO). (2020). Population below international poverty line. Retrieved September 15, 2021 from: http://uis.unesco.org/sites/default/files/documents/new-methodology-shows-258-million-children-adolescents-and-youth-are-out-school.pdf

  23. Malawi National Statistical Office. (2020). The third integrated household panel survey 2019 report. Zomba, Malawi: Malawi National Statistical Office.

  24. Mduma, N., Kalegele, K., & Machuve, D. (2019). A survey of machine learning approaches and techniques for student dropout prediction. Data Science Journal, 18(14), 1–10. https://doi.org/10.5334/dsj-2019-014

    Article  Google Scholar 

  25. Lundberg, S., M., & Lee, S. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.

  26. Bollen, K. A., Biemer, P. P., Karr, A. F., Tueller, S., & Berzofsky, M. E. (2016). Are survey weights needed? A review of diagnostic tests in regression analysis. Annual Review of Statistics and Its Application, 3, 375–392. https://doi.org/10.1146/annurev-statistics-011516-012958

    Article  Google Scholar 

  27. Freeman, J., & Simonsen, B. (2015). Examining the impact of policy and practice interventions on high school dropout and school completion rates: A systematic review of the literature. Review of Educational Research, 85(2), 205–248. https://doi.org/10.3102/0034654314554431

    Article  Google Scholar 

  28. Khan, I. M., Ahmad, A. R., Jabeur, N., & Mahdi, M. N. (2021). A conceptual framework to aid attribute selection in machine learning student performance prediction models. International Journal of Interactive Mobile Technologies (iJIM), 15(15), 4–19. https://doi.org/10.3991/ijim.v15i15.20019

    Article  Google Scholar 

  29. Sekine, K., & Hodgkin, M. E. (2017). Effect of child marriage on girls’ school dropout in Nepal: Analysis of data from the Multiple Indicator Cluster Survey 2014. PLoS ONE. https://doi.org/10.1371/journal.pone.0180176.t002

    Article  Google Scholar 

  30. Zahra, F. (2020). High hopes, low dropout: gender differences in aspirations for education and marriage, and educational outcomes in rural Malawi. Comparative Education Review, 64(4), 670–702.

    Article  Google Scholar 

  31. Orooji, M., & Che, J. (2019). Predicting Louisiana public high school dropout through imbalanced learning techniques. arXiv:1910.13018 [cs.LG]. https://doi.org/10.48550/arXiv.1910.13018

  32. Sansone, D. (2019). Beyond early warning indicators: High school dropout and machine learning. Oxford Bulletin of Economics and Statistics. https://doi.org/10.1111/obes.12277

    Article  Google Scholar 

  33. DHS (2022). Measures DHS. Retrieved September 13, 2022 from:http://www.measuredhs.com/Measure [Accessed Date:

  34. World Bank. (20201). Integrated household panel survey 2010–2013–2016–2019 (Long-term panel,102 EAs). Retrieved June 12, 2022 from: https://microdata.worldbank.org/index.php/catalog/3819

  35. Yang, S., & Kim, J. K. (2020). Statistical data integration in survey sampling: A review. Japanese Journal of Statistics and Data Science, 3, 625–650. https://doi.org/10.1007/s42081-020-00093-w

    Article  Google Scholar 

  36. Pfeffermann, D. (1996). The use of sampling weights for survey data analysis. Statistical Methods in Medical Research, 5(3), 239–261. https://doi.org/10.1177/096228029600500303

    Article  Google Scholar 

  37. Smith, T. M. F. (1976). The foundations of survey sampling: A Review. Journal of the Royal Statistical Society, 139(2), 183–195. https://doi.org/10.2307/2345174

    Article  Google Scholar 

  38. Nguyen, N. D., & Murphy, P. (2015). To weight or not to weight? A statistical analysis of how weights affect the reliability of the quarterly national household survey for immigration research in Ireland. The Economic and Social Review, 46(4), 567–603.

    Google Scholar 

  39. Bertolet, M. (2008). To weight or not to weight? Incorporating sampling designs into model-based analyses. [PhD thesis, Carnegie Mellon University]. Arnegie Mellon University ProQuest Dissertations Publishing. Retrieved September 14, 2021 from: To weight or not to weight? Incorporating sampling designs into model-based analyses – ProQuest

  40. Gao, C., Fei, C., McCarl, B., & Leatham, D. (2020). Identifying vulnerable households using machine-learning. Sustainability. https://doi.org/10.3390/su12156002

    Article  Google Scholar 

  41. Walpole, M. (2003). Socio-economic status and college: How SES affects college experiences and outcomes. The Review of Higher Education, 27(1), 45–73. https://doi.org/10.1353/rhe.2003.0044

    Article  Google Scholar 

  42. Benner, A. D., Boyle, A. E., & Sadler, S. (2016). Parental involvement and adolescents’ educational success: The roles of prior achievement and socioeconomic status. Journal of Youth Adolescence, 45, 1053–1064. https://doi.org/10.1007/s10964-016-0431-4

    Article  Google Scholar 

  43. Molnar, C. (2022). Interpretable machine learning: A guide for making black box models explainable (2nd ed.). Retrieved June 18, 2022 from: https://christophm.github.io/interpretable-ml-book/

  44. Komatsu, M., Takada, C., Neshi, C., Unoki, T., & Shikida, M. (2020). Feature extraction with SHAP value analysis for student performance evaluation in remote collaboration. Conference Presentation at the 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). Bangkok, Thailand.

  45. Ramaswami, G., Susnjak, T., & Mathrani, A. (2022). On developing generic models for predicting student outcomes in educational data mining. Big Data and Cognitive Computing. https://doi.org/10.3390/bdcc6010006

    Article  Google Scholar 

  46. Sahlaoui, H., Alaoui, E. A. A., Nayyar, A., Agoujil, S., & Jaber, M. M. (2021). Predicting and interpreting student performance using ensemble models and Shapley Additive Explanations. IEEE Access, 9, 152688–152703.

    Article  Google Scholar 

  47. Aulck, L., Velagapudi, N., Blumenstock, J., & J. West, J. (2016). Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364. Retrieved June 18, 2022 from: https://arxiv.org/abs/1606.06364

  48. Solis, M., Moreira, T., Gonzalez, R., Fernandez, T., & Hernandez, M. (2018). Perspectives to predict dropout in university students with machine learning. IEEE International Work Conference on Bioinspired Intelligence (IWOBI), 2018, 1–6. https://doi.org/10.1109/IWOBI.2018.8464191

    Article  Google Scholar 

  49. Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalisation. Computers and Education: Artificial Intelligence. https://doi.org/10.1016/j.caeai.2022.100066

    Article  Google Scholar 

  50. Baranyi, M., Nagy, M., & Molontay, R. (2020) Interpretable deep learning for university dropout prediction. In Proceedings of the 21st Annual Conference on Information Technology Education. Odesa, Ukraine, 13–19.

  51. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning with applications in R (2nd (Edition). New York: Springer Nature.

    Book  Google Scholar 

  52. Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems. American Economic Review: Papers & Proceedings, 105(5), 491–495. https://doi.org/10.1257/aer.p20151023

    Article  Google Scholar 

  53. Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106. https://doi.org/10.1257/jep.31.2.87

    Article  Google Scholar 

  54. Crespo, R. C. (2019). Two become one: improving the targeting of conditional cash transfers with a predictive model of school dropout. London: London School of Economics and Political Science. Retrieved October 8, 2021 from: http://eprints.lse.ac.uk/101013/1/05_19_Cristian_Crespo.pdf

  55. Raschka, S., & Mirjalili, V. (2017). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow (2nd (Edition). Packt Publishing Ltd.

    Google Scholar 

  56. Hastie, T., Tibshirani, R., & Friedman, J. (2017). The elements of statistical learning: Data mining, inference, and prediction (2nd (Edition). Springer.

    Google Scholar 

  57. Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analysing and predicting students’ performance by means of machine learning: A review. Applied Sciences. https://doi.org/10.3390/app10031042

    Article  Google Scholar 

  58. Malawi National Statistical Office. (2010). Third integrated household survey (IHS3) 2010–2011 basic information document. Zomba, Malawi: Malawi National Statistical Office.

  59. Hashemi, M., & Karimi, H. A. (2018). Weighted machine learning. Statistics, Optimisation and Information Computing, 6, 497–525. https://doi.org/10.19139/soic.v6i4.479

    Article  Google Scholar 

  60. Malawi National Statistical Office. (2020). The third integrated household panel survey basic information document 2019. Zomba, Malawi: Malawi National Statistical Office.

  61. International Monetary Fund. (2021). IMF macroeconomic and financial data. Retrieved September 19, 2021 from https://data.imf.org/?sk=4FFB52B2-3653-409A-B471-D47B46D904B5&sId=1485878855236

  62. World Bank. (2021). Inflation, consumer prices (annual %) - Malawi. World Bank Open Data. Retrieved September 19, 2021 from https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG?locations=MW

  63. OECD. (2012). What are equivalence scales? OECD project on income distribution and poverty. Retrieved September 19, 2021 from https://www.oecd.org/economy/growth/OECD-Note-EquivalenceScales.pdf

  64. INDDEX Project. (2018). Data4Diets: Building blocks for diet-related food security analysis. Tufts University. Retrieved September 20, 2021 from https://inddex.nutrition.tufts.edu/data4diets

  65. Nyangasa, M. A., Buck, C., Kelm, S., Sheikh, M., & Hebestreit, A. (2014). Exploring food access and sociodemographic correlates of food consumption and food insecurity in Zanzibari households. International Journal of Environmental Research and Public Health, 16(9), 1029–1049. https://doi.org/10.3390/ijerph16091557

    Article  Google Scholar 

  66. Vyas, S., & Kumaranayake, L. (2006). Constructing socio-economic status indices: How to use principal components analysis. Health Policy and Planning, 21(6), 459–468. https://doi.org/10.1093/heapol/czl029

    Article  Google Scholar 

  67. Houweling, T. A. J., Kunst, A. E., & Mackenbach, J. P. (2003). Measuring health inequality among children in developing countries: does the choice of the indicator of economic status matter? International Journal for Equity in Health. https://doi.org/10.1186/1475-9276-2-8

    Article  Google Scholar 

  68. Naveed, T. A., Gordon, D., Ullah, S., & Zhang, M. (2021). The construction of an asset index at household level and measurement of economic disparities in Punjab (Pakistan) by using MICS-Micro Data. Social Indicators Research, 155, 73–95. https://doi.org/10.1007/s11205-020-02594-3

    Article  Google Scholar 

  69. Thompson, C. G., Kim, R. S., Aloe, A. M., & Becker, B. J. (2017). Extracting the variance inflation factor and other multicollinearity diagnostics from typical regression results. Basic and Applied Social Psychology. https://doi.org/10.1080/01973533.2016.1277529

    Article  Google Scholar 

  70. Franke G. R. (2010). “Multicollinearity,” in Wiley International Encyclopedia of Marketing. eds. Sheth J. N., Malhotra N. K. (New Jersey, USA: John Wiley & Sons Ltd.).

  71. Brownlee, J. (February 2021). A gentle introduction to threshold-moving for imbalanced classification. Machine Learning Mastery. Retrieved October 23, 2021 from https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/

  72. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.5555/1953048.2078195

    Article  Google Scholar 

  73. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958. https://doi.org/10.5555/2627435.2670313

    Article  Google Scholar 

  74. StataCorp,. (2015). Stata Statistical Software: Release 14. StataCorp LP.

    Google Scholar 

  75. Van Rossum, G., & Drake Jr, F. L. (1995). Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam.

  76. Chollet, F., & Others. (2015). Keras. Retrieved November 15, 2021 from https://github.com/fchollet/keras.

  77. Abadi, M., & Others. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Retrieved November 21, 2021 from https://www.tensorflow.org/

  78. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming with NumPy. Nature, 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2

    Article  Google Scholar 

  79. McKinney, W., & Others. (2010). Data structures for statistical computing in python. Paper presented at the he 9th Python in Science Conference.

  80. Waskom, M.L. (2021). Seaborn: statistical data visualisation. Journal of Open Source Software https://doi.org/10.21105/joss.03021

  81. Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

    Article  Google Scholar 

  82. Overseas Development Institute (2000). Ganyu labour in Malawi and its implications for livelihood security interventions - an analysis of recent literature and implications for poverty alleviation. Retrieved November 25, 2021 from: https://odi.org/en/publications/ganyu-labour-in-malawi-and-its-implications-for-livelihood-security-interventions-an-analysis-of-recent-literature-and-implications-for-poverty-alleviation

  83. Wydick, B. (1999). The effect of microenterprise lending on child schooling in Guatemala. Economic Development and Cultural Change, 47(4), 853–869. https://doi.org/10.1086/452435

    Article  Google Scholar 

  84. Shimamura, Y., & Lastarria-Cornhiel, S. (2010). Credit program participation and child schooling in Rural Malawi. World Development, 38(4), 567–580. https://doi.org/10.1016/j.worlddev.2009.11.005

    Article  Google Scholar 

  85. Alam, T. M., Mushtaq, M., Shaukat, K., Hameed, I. A., Umer Sarwar, M., & Luo, S. A. (2021). Novel method for performance measurement of public educational institutions using machine learning models. Applied Sciences, 11, 9296. https://doi.org/10.3390/app11199296

    Article  Google Scholar 

  86. Sunny, B. S., Elze, M., Chihana, M., Gondwe, L., Crampin, A. C., Munkhondya, M., Kondowe, S., & Glynn, J. R. (2017). Failing to progress or progressing to fail? Age-for-grade heterogeneity and grade repetition in primary schools in Karonga District, Northern Malawi. International Journal of Educational Development, 52(January), 68–80. https://doi.org/10.1016/j.ijedudev.2016.10.004

    Article  Google Scholar 

  87. Chikhungu, L., Kadzamira, E., Chiwaula, L., & Meke, E. (2020). Tackling girls dropping out of school in Malawi: Is improving household socio-economic status the solution? International Journal of Educational Research. https://doi.org/10.1016/j.ijer.2020.101578

    Article  Google Scholar 

  88. Cannistrà, M., Masci, C., Ieva, F., Agasisti, T., & Paganoni, A. M. (2020). Not the magic algorithm: modelling and early-predicting students dropout through machine learning and multilevel approach. Milano, Italy: Dipartimento di Matematica, Politecnico di Milano

  89. Sorensen, L. C. (2019). “Big Data” in educational administration: An application for predicting school dropout risk. Educational Administration Quarterly, 55(3), 404–446. https://doi.org/10.1177/0013161X18799439

    Article  Google Scholar 

  90. Cannistrà, M., Masci, C., Ieva, F., Agasisti, T., & Paganoni, A. M. (2020). Not the magic algorithm: modelling and early-predicting students dropout through machine learning and multilevel approach. Milano, Italy: Dipartimento di Matematica, Politecnico di Milano.

  91. Adelman, M., Haimovich, F., Ham, A., & Vazquez, E. (2018). Predicting school dropout with administrative data: New evidence from Guatemala and Honduras. Education Economics. https://doi.org/10.1080/09645292.2018.1433127

    Article  Google Scholar 

  92. Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention, 13(1), 17–35. https://doi.org/10.2190/CS.13.1.b

    Article  Google Scholar 

Download references

Acknowledgements

An analysis of feature importance with the SHAP values method was run on the Esv3-series virtual machine with Intel Xeon processors with access to the Microsoft Azure cloud computing platform of Development Analytics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hazal Colak Oz.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author declares that they have no known conflict of interests or personal relationships that could have appeared to influence the work reported in this paper. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Appendix A

See Table 9 below.

Table 9 Comparison of ML Models for school dropout prediction in the literature

Appendix B. List of IHPS questionnaire modules

Appendix B.1

See Table 10 below.

Table 10 List of IHPS household questionnaire modules

Appendix B.2.

See Table 11 below.

Table 11 List of IHPS community questionnaire modules

Appendix C

See Table 12 below.

Table 12 Hyperparameter Tuning for machine learning models

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Colak Oz, H., Güven, Ç. & Nápoles, G. School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach. J Comput Soc Sc 6, 245–287 (2023). https://doi.org/10.1007/s42001-022-00195-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42001-022-00195-3

Keywords

Navigation