Skip to main content

Early Prediction of student’s Performance in Higher Education: A Case Study

  • Conference paper
  • First Online:
Trends and Applications in Information Systems and Technologies (WorldCIST 2021)

Abstract

This work aims to contribute to the reduction of academic failure at higher education, by using machine learning techniques to identify students at risk of failure at an early stage of their academic path, so that strategies to support them can be put into place. A dataset from a higher education institution is used to build classification models to predict academic performance of students. The dataset includes information known at the time of student’s enrollment – academic path, demographics and social-economic factors. The problem is formulated as a three category classification task, in which there’s a strong imbalance towards one of the classes. Algorithms to promote class balancing with synthetic oversampling are tested, and classification models are trained and evaluated, both with standard machine learning algorithms and state of the art boosting algorithms. Our results show that boosting algorithms respond better to the specific classification task than standard methods. However, even these state of the art algorithms fall short in correctly identifying the majority of cases in one of the minority classes. Future directions of this study include the addition of information regarding student’s first year performance, such as academic grades from the first academic semesters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40, 601–618 (2010). https://doi.org/10.1109/TSMCC.2010.2053532

  2. Mduma, N., Kalegele, K., Machuve, D.: A survey of machine learning approaches and techniques for student dropout prediction. Data Sci. J. 18, 1–10 (2019). https://doi.org/10.5334/dsj-2019-014

  3. Shahiri, A.M., Husain, W., Rashid, N.A.: A review on predicting Student’s performance using data mining techniques. Procedia Comput. Sci. 72, 414–422. (2015). https://doi.org/10.1016/j.procs.2015.12.157

  4. Rastrollo-Guerrero, J.L., Gómez-Pulido, J.A., Durán-Domínguez, A.: Analyzing and predicting Students’ performance by means of machine learning: a review. Appl. Sci. 10, 1042–1058 (2020). https://doi.org/10.3390/app10031042

  5. Beaulac, C., Rosenthal, J.S.: Predicting university Students’ academic success and major using random forests. Res. High. Educ. 60, 1048–1064 (2019). https://doi.org/10.1007/s11162-019-09546-y

    Article  Google Scholar 

  6. Hoffait, A.S., Schyns, M.: Early detection of university Students with potential difficulties. Decis. Support Syst. 101, 1–11 (2017). https://doi.org/10.1016/j.dss.2017.05.003

    Article  Google Scholar 

  7. Miguéis, V.L., Freitas, A., Garcia, P.J.V., Silva, A.: Early segmentation of students according to their academic performance: a predictive modelling approach. Decis. Support Syst. 115, 36–51 (2018). https://doi.org/10.1016/j.dss.2018.09.001

    Article  Google Scholar 

  8. Thammasiri, D., Delen, D., Meesad, P., Kasap, N.: A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst. Appl. 41, 321–330 (2014). https://doi.org/10.1016/j.eswa.2013.07.046

    Article  Google Scholar 

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  10. Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft. Comput. Appl. 7, 176–204 (2015)

    Google Scholar 

  11. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  12. Lema, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 40, 1–5 (2015)

    Google Scholar 

  13. Hastie, T.J., Pregibon, D.: Generalized linear models. In: Statistical Models in S (2017)

    Google Scholar 

  14. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411

    Article  MATH  Google Scholar 

  15. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/bf00116251

    Article  Google Scholar 

  16. Pavlov, Y.L.: Random forests. Random Forests 1–122 (2019). https://doi.org/10.1201/9780367816377-11

  17. Pedregosa, F., Gaël, V., Gramfort, A., Vincent, M., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 85, 2825 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data. 7, 70 (2020). https://doi.org/10.1186/s40537-020-00349-y

    Article  Google Scholar 

  19. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451

    Article  MathSciNet  MATH  Google Scholar 

  20. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)

    Google Scholar 

  21. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: NIPS 2018: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6639–6649, December 2018. https://dl.acm.org/doi/10.5555/3327757.3327770

  22. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28(2), 337–407 (2000). https://doi.org/10.1214/aos/1016218223

Download references

Acknowledgments

This research is supported by program SATDAP - Capacitação da Administração Pública under grant POCI-05-5762-FSE-000191.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mónica V. Martins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martins, M.V., Tolledo, D., Machado, J., Baptista, L.M.T., Realinho, V. (2021). Early Prediction of student’s Performance in Higher Education: A Case Study. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_16

Download citation

Publish with us

Policies and ethics