Early Prediction of student’s Performance in Higher Education: A Case Study

Martins, Mónica V.; Tolledo, Daniel; Machado, Jorge; Baptista, Luís M. T.; Realinho, Valentim

doi:10.1007/978-3-030-72657-7_16

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1365))

Included in the following conference series:

World Conference on Information Systems and Technologies

3051 Accesses
5 Citations

Abstract

This work aims to contribute to the reduction of academic failure at higher education, by using machine learning techniques to identify students at risk of failure at an early stage of their academic path, so that strategies to support them can be put into place. A dataset from a higher education institution is used to build classification models to predict academic performance of students. The dataset includes information known at the time of student’s enrollment – academic path, demographics and social-economic factors. The problem is formulated as a three category classification task, in which there’s a strong imbalance towards one of the classes. Algorithms to promote class balancing with synthetic oversampling are tested, and classification models are trained and evaluated, both with standard machine learning algorithms and state of the art boosting algorithms. Our results show that boosting algorithms respond better to the specific classification task than standard methods. However, even these state of the art algorithms fall short in correctly identifying the majority of cases in one of the minority classes. Future directions of this study include the addition of information regarding student’s first year performance, such as academic grades from the first academic semesters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40, 601–618 (2010). https://doi.org/10.1109/TSMCC.2010.2053532
Mduma, N., Kalegele, K., Machuve, D.: A survey of machine learning approaches and techniques for student dropout prediction. Data Sci. J. 18, 1–10 (2019). https://doi.org/10.5334/dsj-2019-014
Shahiri, A.M., Husain, W., Rashid, N.A.: A review on predicting Student’s performance using data mining techniques. Procedia Comput. Sci. 72, 414–422. (2015). https://doi.org/10.1016/j.procs.2015.12.157
Rastrollo-Guerrero, J.L., Gómez-Pulido, J.A., Durán-Domínguez, A.: Analyzing and predicting Students’ performance by means of machine learning: a review. Appl. Sci. 10, 1042–1058 (2020). https://doi.org/10.3390/app10031042
Beaulac, C., Rosenthal, J.S.: Predicting university Students’ academic success and major using random forests. Res. High. Educ. 60, 1048–1064 (2019). https://doi.org/10.1007/s11162-019-09546-y
Article Google Scholar
Hoffait, A.S., Schyns, M.: Early detection of university Students with potential difficulties. Decis. Support Syst. 101, 1–11 (2017). https://doi.org/10.1016/j.dss.2017.05.003
Article Google Scholar
Miguéis, V.L., Freitas, A., Garcia, P.J.V., Silva, A.: Early segmentation of students according to their academic performance: a predictive modelling approach. Decis. Support Syst. 115, 36–51 (2018). https://doi.org/10.1016/j.dss.2018.09.001
Article Google Scholar
Thammasiri, D., Delen, D., Meesad, P., Kasap, N.: A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst. Appl. 41, 321–330 (2014). https://doi.org/10.1016/j.eswa.2013.07.046
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft. Comput. Appl. 7, 176–204 (2015)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Lema, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 40, 1–5 (2015)
Google Scholar
Hastie, T.J., Pregibon, D.: Generalized linear models. In: Statistical Models in S (2017)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Article MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/bf00116251
Article Google Scholar
Pavlov, Y.L.: Random forests. Random Forests 1–122 (2019). https://doi.org/10.1201/9780367816377-11
Pedregosa, F., Gaël, V., Gramfort, A., Vincent, M., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 85, 2825 (2011)
MathSciNet MATH Google Scholar
Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data. 7, 70 (2020). https://doi.org/10.1186/s40537-020-00349-y
Article Google Scholar
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Article MathSciNet MATH Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: NIPS 2018: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649, December 2018. https://dl.acm.org/doi/10.5555/3327757.3327770
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28(2), 337–407 (2000). https://doi.org/10.1214/aos/1016218223

Download references

Acknowledgments

This research is supported by program SATDAP - Capacitação da Administração Pública under grant POCI-05-5762-FSE-000191.

Author information

Authors and Affiliations

Polytechnic Institute of Portalegre (IPP), Portalegre, Portugal
Mónica V. Martins, Daniel Tolledo, Jorge Machado, Luís M. T. Baptista & Valentim Realinho
VALORIZA - Research Center for Endogenous Resource Valorization, Portalegre, Portugal
Valentim Realinho

Authors

Mónica V. Martins
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tolledo
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Machado
View author publications
You can also search for this author in PubMed Google Scholar
Luís M. T. Baptista
View author publications
You can also search for this author in PubMed Google Scholar
Valentim Realinho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mónica V. Martins .

Editor information

Editors and Affiliations

ISEG, University of Lisbon, Lisbon, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
Department of Information Sciences, University of Sheffield, Lisbon, Portugal
Ana Maria Ramalho Correia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, M.V., Tolledo, D., Machado, J., Baptista, L.M.T., Realinho, V. (2021). Early Prediction of student’s Performance in Higher Education: A Case Study. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-72657-7_16
Published: 23 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72656-0
Online ISBN: 978-3-030-72657-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics