Comparing nine machine learning classifiers for school-dropouts using a revised performance measure

Rezk, Sahar Saeed; Selim, Kamal Samy

doi:10.1007/s42001-024-00281-8

Comparing nine machine learning classifiers for school-dropouts using a revised performance measure

Research Article
Published: 20 May 2024

(2024)
Cite this article

Journal of Computational Social Science Aims and scope Submit manuscript

Sahar Saeed Rezk¹ &
Kamal Samy Selim¹

81 Accesses
Explore all metrics

Abstract

Addressing the pervasive issue of school-dropout in Egypt is imperative for advancing the country's educational system and fostering its social and economic progress. Recently, there is a growing interest in leveraging Machine Learning techniques as proactive tools for identifying students at-risk of dropping out so as to carry out timely interventions. This study implements nine supervised Machine Learning algorithms, namely Decision Trees, K-Nearest Neighbours, Logistic Regression, Naïve Bayes, Support Vector Machines, AdaBoost, Bagging, Random Forest, and Stacking, and compares their results to figure out the best performing one for classifying at-risk students in the Egyptian compulsory schools. Utilizing a dataset of a nationally representative sample survey, 52 meticulous classification experiments combining classifiers and resampling techniques are conducted. For the classifiers admitting hyper-parameter optimization, 32 initial parameter settings entailing parameter-space searches, using GridSearch heuristic algorithm, are tried to determine the best performing configuration models for classification. Rather than relying on disparate performance measures for comparing the resulting classifications, such as accuracy and F-score, this research proposes the weighted harmonic mean of several performance measures as a unified evaluation criterion. By resorting to this single criterion for comparisons, the Support Vector Machines classifier, conjoint with Random Under-Sampling and Synthetic Minority Over-sampling Technique for treating class imbalance, is evaluated as the best performing classification model. Because of its ability to provide classification rules in explicit functional forms, Support Vector Machines enables interpreting the embedded features in a similar way like the Logistic Regression classifier. Consequently, the best results reached could guide to develop an early predicting system aiming to support the efforts to eradicate the persisting problem of school-dropouts in Egypt over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of the performance of machine learning techniques as support in the prediction of school dropout

Article Open access 17 February 2024

School Dropout Prediction using Machine Learning Algorithms

Educational data mining: prediction of students' academic performance using machine learning algorithms

Article Open access 03 March 2022

Data availability

The original dataset of this study is available upon request from Harvard Dataverse through the following link. https://doi.org/https://doi.org/10.7910/DVN/89Y8YC.

References

Azar, D., Moussa, R., & Jreij, G. (2018). A comparative study of nine machine learning techniques used for the prediction of diseases. International Journal of Artificial Intelligence, 16(2), 25–40.
Google Scholar
Azevedo, B. F., Rocha, A. M. A. C., & Pereira, A. I. (2024). Hybrid approaches to optimization and machine learning methods: a systematic literature review. In Machine Learning. Springer US. https://doi.org/10.1007/s10994-023-06467-x
Berens, J., Schneider, K., Görtz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk––predicting student dropouts using administrative student data and machine learning methods. Journal of Educational Data Mining, 11(3), 1–41.
Google Scholar
Berrar, D. (2018). Bayes’ Theorem and Naive Bayes Classifier Bayes. In Encyclopedia of Bioinformatics and Computational Biology (pp. 403–412). Elsevier Science Publisher.
Bhavsar, H., & Ganatra, A. (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering, 2(4), 74–81.
Google Scholar
Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123–140.
Article Google Scholar
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Bühlmann, P. (2012). Bagging, Boosting and Ensemble Methods. In J. E. Gentle, W. K. Härdle, & Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 985–1022). Springer.
Chapter Google Scholar
CAPMAS. (2018). On the Occasion of the International Day of the Rights of the Child. https://www.capmas.gov.eg/Admin/News/PressRelease/2019112013343_666e.pdf
Chandra, B., Kothari, R., & Paul, P. (2010). A new node splitting measure for decision tree construction. Pattern Recognition, 43(8), 2725–2731.
Article Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique nitesh. Journal of Artificial Intelligence Research, 16, 321–357.
Article Google Scholar
Chikhungu, L., Kadzamira, E., Chiwaula, L., & Meke, E. (2020). Tackling girls dropping out of school in Malawi: Is improving household socio-economic status the solution? International Journal of Educational Research, 103(16). https://doi.org/10.1016/j.ijer.2020.101578
Colak Oz, H., Güven, Ç., & Nápoles, G. (2023). School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach. Journal of Computational Social Science, 6(1), 245–287. https://doi.org/10.1007/s42001-022-00195-3
Article Google Scholar
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). John Wiley & Sons.
Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., & Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52(10), 12049–12063.
Article Google Scholar
Elbadawy, A. (2014). Education in Egypt: Improvements in Attaiment problems with quality and inequality (Economic Research Forum (ERF) Working Paper 854).
Freitas, F., Vasconcelos, F., Peixoto, S., Hassan, M., Dewan, A., Albuquerque, V., & Rebouças, P. (2020). IoT system for school dropout prediction using machine learning techniques based on socioeconomic data. Electronics, 9(10), 1–14.
Article Google Scholar
Gil, J. S., Delima, J. A., & Vilchez, R. N. (2020). Predicting students’ dropout indicators in public school using data mining approaches. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 774–778.
Article Google Scholar
Gopal, M. (2019). Applied machine learning. McGraw-Hill Education.
Goudet, S. M., Kimani-Murage, E. W., Wekesah, F., Wanjohi, M., Griffiths, P. L., Bogin, B., & Madise, N. J. (2017). How does poverty affect children’s nutritional status in Nairobi slums? A qualitative study of the root causes of undernutrition. Public Health Nutrition, 20(4), 608–619.
Article Google Scholar
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. proceedings of IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–1328.
Hegazy, M., & Waguih, H. M. (2018). A proposed academic advisor model based on data mining classification techniques. International Journal of Advanced Computer Research, 8(36), 129–136.
Article Google Scholar
Huang, C., Yang, Y., Yang, D., & Chen, Y. (2009). Frog classification using machine learning techniques. Expert Systems With Applications, 36(2), 3737–3743.
Article Google Scholar
Isiaka, R. M., & Abdulsalam, S. O. (2019). A machine learning approach to dropout early warning system modeling. International Journal of Advanced Studies in Computer Science and Engineering, 8(2), 1–12.
Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning - with Applications in R. Springer.
Google Scholar
Joutou, T., & Yanai, K. (2009). A food image recognition system with multiple kernel learning. proceedings of the 16th IEEE International Conference on Image Processing (ICIP), 285–288.
Kabathova, J., & Drlik, M. (2021). Towards predicting student’s dropout in university courses using different machine learning techniques. Applied Sciences, 11(1), 1–19.
Google Scholar
Kazienko, P., Lughofer, E., & Trawinski, B. (2015). Editorial on the special issue “Hybrid and Ensemble Techniques in Soft Computing: Recent Advances and Emerging Trends.” Soft Computing, 19(12), 3353–3355.
Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: a machine learning approach. European Journal of Higher Education, 10(1), 28–47.
Article Google Scholar
Kondeti, P. K., Ravi, K., Mutheneni, S. R., Kadiri, M. R., Kumaraswamy, S., Vadlamani, R., & Upadhyayula, S. M. (2019). Applications of machine learning techniques to predict filariasis using socio-economic factors. Epidemiology and Infection. https://doi.org/10.1017/S0950268819001481
Article Google Scholar
Kristoffersen, L. R., & Hernandez, R. M. (2021). A comparative performance of breast cancer classification using hyper-parameterized machine learning models. International Journal of Advanced Technology and Engineering Exploration, 8(82), 1080–1101.
Article Google Scholar
Kuang, Q., & Zhao, L. (2009). A practical GPU based KNN algorithm. Proceedings of the Second Symposium International Computer Science and Computational Technology, 7(3), 151–155.
Google Scholar
Langsten, R., & Hassan, T. (2018). Primary education completion in Egypt: Trends and determinants. International Journal of Educational Development, 59, 136–145.
Article Google Scholar
Liang, D., Tsai, C. F., Dai, A. J., & Eberle, W. (2018). A novel classifier ensemble approach for financial distress prediction. Knowledge and Information Systems, 54(2), 437–462.
Article Google Scholar
Livieris, I. E., Drakopoulou, K., Tampakas, V. T., Mikropoulos, T. A., & Pintelas, P. (2019). Predicting secondary school students’ performance utilizing a semi-supervised learning approach. Journal of Educational Computing Research, 57(2), 448–470.
Article Google Scholar
Maes, S., Tuyls, K., Vanschoenwinkel, B., & Manderick, B. (2002). Credit card fraud detection using bayesian and neural networks. Proceedings of the 1st International Naiso Congress on Neuro Fuzzy Technologies, 261–270.
Mduma, N., Kalegele, K., & Machuve, D. (2019). A survey of machine learning approaches and techniques for student dropout prediction. Data Science Journal, 18(1), 1–10.
Google Scholar
Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of student dropout using personal profile and data mining approach. In K. Lavangnananda, S. Phon-Amnuaisuk, W. Engchuan, & J. H. Chan (Eds.), Intelligent and Evolutionary Systems. Proceedings in Adaptation, Learning and Optimization (pp. 143–155). Springer, Cham.
Mnyawami, Y. N., Maziku, H. H., & Mushi, J. C. (2022). Enhanced model for predicting student dropouts in developing countries using automated machine learning approach: A case of tanzanian’s secondary schools. Applied Artificial Intelligence, 36(1), 432–451.
Article Google Scholar
Moreno, M., & Hector, A. (2018). Predicting school dropout with administrative data new evidence from Guatemala and Honduras. Education Economics, 26(4), 356–372.
Article Google Scholar
Mukherjee, S., & Sharma, N. (2012). Intrusion detection using naive bayes classifier with feature reduction. Procedia Technology, 4(1), 119–128.
Article Google Scholar
Nguyen, H. M., Cooper, E. W., & Kamei, K. (2011). Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), 4–21.
Article Google Scholar
Orooji, M., & Chen, J. (2019). Predicting louisiana public high school dropout through imbalanced learning techniques. Proceedings of the 18th IEEE International Conference on Machine Learning and Applications (ICMLA), 456–461.
Pahlke, E., Hyde, J. S., & Allison, C. M. (2014). The effects of single-sex compared with coeducational schooling on students’ performance and attitudes: A meta-analysis. Psychological Bulletin, 140(4), 1042–1072.
Article Google Scholar
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217–222.
Article Google Scholar
Pierrakeas, C., Koutsonikos, G., Lipitakis, A. D., Kotsiantis, S., Xenos, M., & Gravvanis, G. A. (2020). The Variability of the Reasons for Student Dropout in Distance Learning and the Prediction of Dropout-Prone Students. In M. Virvou, E. Alepis, G. A. Tsihrintzis, & L. C. Jain (Eds.), Machine Learning Paradigms, Intelligent Systems Reference (pp. 91–111). Springer Nature.
Chapter Google Scholar
Population Council. (2015). Survey of young people in Egypt (SYPE) 2014. Retrieved June 20, 2022 from: https://www.unicef.org/egypt/media/4976/file/2014_Survey_on_Young_People_in_Egypt.pdf.
Rahaman, M., & Das, D. N. (2018). Determinants of school dropouts in elementary education in Manipur. Indian Journal of Geography and Environment, 15(16), 89–106.
Google Scholar
Rathore, S. S., & Kumar, S. (2021). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51(6), 3615–3644.
Article Google Scholar
Redondo, A. R., Navarro, J., Fernández, R. R., de Diego, I. M., Moguerza, J. M., & Fernández-Muñoz, J. J. (2020). Unified performance measure for binary classification problems. Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, 104–112.
Rhys, H. I. (2020). Machine learning with R, the Tidyverse, and Mlr. Manning Publications Co.
Rokach, L., & Maimon, O. (2005). Top-down induction of decision trees classifiers––A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)., 35(4), 476–487.
Sani, N. S., Nafuri, A. F. M., Othman, Z. A., Nazri, M. Z. A., & Nadiyah Mohamad, K. (2020). Drop-out prediction in higher education among B40 students. International Journal of Advanced Computer Science and Applications, 11(11), 550–559.
Article Google Scholar
Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2(3), 1–21. https://doi.org/10.1007/s42979-021-00592-x
Article Google Scholar
Sarra, A., Fontanella, L., & Di Zio, S. (2019). Identifying students at risk of academic failure within the educational data mining framework. Social Indicators Research, 146(1), 41–60.
Article Google Scholar
Satapathy, S. C. (2018). Prediction of Factors Associated with the Dropout Rates of Primary to High School Students in India Using Data Mining Tools. In S. C. Satapathy, V. Bhateja, B. Le Nguyen, N. G. Nguyen, & D.-N. Le (Eds.), Frontiers in Intelligent Computing: Theory and Applications (pp. 242–251). Springer.
Google Scholar
Schmidt, J., Marques, M. R. G., Botti, S., & Marques, M. A. L. (2019). Recent advances and applications of machine learning in solid-state materials science. Nature Partner Journals––Computational Materials, 5(83), 1–36.
Google Scholar
Selim, K. S., & Rezk, S. S. (2023). On predicting school dropouts in Egypt: A machine learning approach. Education and Information Technologies, 28(1), 9235–9266.
Article Google Scholar
Sushmita, S., Jose, S., Baadkar, T. R., & Murthy, S. (2019). An elective course decision support system using decision tree and fuzzy logic. In R. K. Shukla, J. Agrawal, S. Sharma, & G. S. Tomer (Eds.). Data, Engineering and Applications, 1, 149–157.
Google Scholar
Teles, G., Rodrigues, J., Saleem, K., Kozlov, S., & Rabêlo, R. (2020). Machine learning and decision support system on credit scoring. Neural Computing and Applications, 32(14), 9809–9826.
Article Google Scholar
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Google Scholar
Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230.
Article Google Scholar
Weybright, E. H., Caldwell, L. L., Wegner, L., & Smith, E. A. (2017). Predicting secondary school dropout among South African adolescents: A survival analysis approach. South African Journal of Education, 37(2), 1–11.
Article Google Scholar
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.
Article Google Scholar
Xing, W., & Du, D. (2019). Dropout prediction in MOOCs: Using deep learning for personalized intervention. Journal of Educational Computing Research, 57(3), 547–570.
Article Google Scholar

Download references

Acknowledgements

We wish to express our sincere gratitude to the anonymous reviewers for taking the time and effort necessary to review the manuscript. The reviewers' valuable comments and recommendations significantly helped the authors to improve the quality of this work.

Author information

Authors and Affiliations

Department of Socio-Computing, Faculty of Economics and Political Science, Cairo University, Giza, Egypt
Sahar Saeed Rezk & Kamal Samy Selim

Authors

Sahar Saeed Rezk
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Samy Selim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Authors confirm contributions to the paper as follows: Both authors contributed together to the study’s conceptualization, design, methodology, and implementation framework. Computational work and manuscript writings were mainly conducted by the first author, while guidance and revisions were the responsibilities of the second author.

Corresponding author

Correspondence to Sahar Saeed Rezk.

Ethics declarations

Conflict of interest

The authors declare no relevant financial or nonfinancial competing interests.

Ethical approval

Access to the dataset that underpins the findings of the research has been granted to the corresponding author for research purposes from Harvard Dataverse through online communication.

Informed consent

Accordingly, the study’s results and conclusions could be published as the raw data is made publicly available with permission for research purposes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 14, 15.

Table 14 Hyper-parameter tuning setups for experiments of resampling techniques

Full size table

Table 15 Hyper-parameter tuning setups for experiments of cost-sensitive learning

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rezk, S.S., Selim, K.S. Comparing nine machine learning classifiers for school-dropouts using a revised performance measure. J Comput Soc Sc (2024). https://doi.org/10.1007/s42001-024-00281-8

Download citation

Received: 26 July 2023
Accepted: 09 April 2024
Published: 20 May 2024
DOI: https://doi.org/10.1007/s42001-024-00281-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing nine machine learning classifiers for school-dropouts using a revised performance measure

Abstract

Access this article

Similar content being viewed by others

Application of the performance of machine learning techniques as support in the prediction of school dropout

School Dropout Prediction using Machine Learning Algorithms

Educational data mining: prediction of students' academic performance using machine learning algorithms

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparing nine machine learning classifiers for school-dropouts using a revised performance measure

Abstract

Access this article

Similar content being viewed by others

Application of the performance of machine learning techniques as support in the prediction of school dropout

School Dropout Prediction using Machine Learning Algorithms

Educational data mining: prediction of students' academic performance using machine learning algorithms

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation