Abstract
This work aims to contribute to the reduction of academic failure at higher education, by using machine learning techniques to identify students at risk of failure at an early stage of their academic path, so that strategies to support them can be put into place. A dataset from a higher education institution is used to build classification models to predict academic performance of students. The dataset includes information known at the time of student’s enrollment – academic path, demographics and social-economic factors. The problem is formulated as a three category classification task, in which there’s a strong imbalance towards one of the classes. Algorithms to promote class balancing with synthetic oversampling are tested, and classification models are trained and evaluated, both with standard machine learning algorithms and state of the art boosting algorithms. Our results show that boosting algorithms respond better to the specific classification task than standard methods. However, even these state of the art algorithms fall short in correctly identifying the majority of cases in one of the minority classes. Future directions of this study include the addition of information regarding student’s first year performance, such as academic grades from the first academic semesters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40, 601–618 (2010). https://doi.org/10.1109/TSMCC.2010.2053532
Mduma, N., Kalegele, K., Machuve, D.: A survey of machine learning approaches and techniques for student dropout prediction. Data Sci. J. 18, 1–10 (2019). https://doi.org/10.5334/dsj-2019-014
Shahiri, A.M., Husain, W., Rashid, N.A.: A review on predicting Student’s performance using data mining techniques. Procedia Comput. Sci. 72, 414–422. (2015). https://doi.org/10.1016/j.procs.2015.12.157
Rastrollo-Guerrero, J.L., Gómez-Pulido, J.A., Durán-Domínguez, A.: Analyzing and predicting Students’ performance by means of machine learning: a review. Appl. Sci. 10, 1042–1058 (2020). https://doi.org/10.3390/app10031042
Beaulac, C., Rosenthal, J.S.: Predicting university Students’ academic success and major using random forests. Res. High. Educ. 60, 1048–1064 (2019). https://doi.org/10.1007/s11162-019-09546-y
Hoffait, A.S., Schyns, M.: Early detection of university Students with potential difficulties. Decis. Support Syst. 101, 1–11 (2017). https://doi.org/10.1016/j.dss.2017.05.003
Miguéis, V.L., Freitas, A., Garcia, P.J.V., Silva, A.: Early segmentation of students according to their academic performance: a predictive modelling approach. Decis. Support Syst. 115, 36–51 (2018). https://doi.org/10.1016/j.dss.2018.09.001
Thammasiri, D., Delen, D., Meesad, P., Kasap, N.: A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst. Appl. 41, 321–330 (2014). https://doi.org/10.1016/j.eswa.2013.07.046
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft. Comput. Appl. 7, 176–204 (2015)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Lema, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 40, 1–5 (2015)
Hastie, T.J., Pregibon, D.: Generalized linear models. In: Statistical Models in S (2017)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/bf00116251
Pavlov, Y.L.: Random forests. Random Forests 1–122 (2019). https://doi.org/10.1201/9780367816377-11
Pedregosa, F., Gaël, V., Gramfort, A., Vincent, M., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 85, 2825 (2011)
Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data. 7, 70 (2020). https://doi.org/10.1186/s40537-020-00349-y
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: NIPS 2018: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649, December 2018. https://dl.acm.org/doi/10.5555/3327757.3327770
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28(2), 337–407 (2000). https://doi.org/10.1214/aos/1016218223
Acknowledgments
This research is supported by program SATDAP - Capacitação da Administração Pública under grant POCI-05-5762-FSE-000191.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martins, M.V., Tolledo, D., Machado, J., Baptista, L.M.T., Realinho, V. (2021). Early Prediction of student’s Performance in Higher Education: A Case Study. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-72657-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72656-0
Online ISBN: 978-3-030-72657-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)