Skip to main content

Advertisement

Log in

A comparative study on student performance prediction using machine learning

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Accompanied with the development of storage and processing capacity of modern technology, educational data increases sharply. It is difficult for educational researchers to derive useful information from much educational data. Therefore, educational data mining techniques are important for the development of modern education field. Recently, researches have demonstrated that machine learning, as an important tool for data mining, has shown promising performance in educational applications, especially in student performance prediction. However, few studies comprehensively compare existing machine learning methods in educational data. Moreover, most current studies only focus on a single type of educational data for student performance prediction. In this paper, three different types of task-oriented educational data are employed to investigate the performance of machine learning methods in different application scenarios. Specifically, seven parameter-optimized machine learning methods are implemented to study multiple types of performance prediction, including binary and multi-classification prediction tasks. In the experimental section, four evaluation metrics and visualizations are presented for a comparative study of different methods on three tasks, and an elaborated discussion of the experimental results is provided. The experimental results demonstrate that Random Forest has achieved superior generality on all selected datasets. In addition, the performance of Decision Tree and Artificial Neural Network models on the selected datasets indicates that they are also potential candidates to solve student performance prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The links of the data used have been provided in the paper.

References

  • Acharya, M. S., Armaan, A., & Antony, A. S. (2019, February). A comparison of regression models for prediction of graduate admissions. In 2019 international conference on computational intelligence in data science (ICCIDS) (pp. 1–5). IEEE.

  • Ali, J., Khan, R., Ahmad, N., & Maqsood, I. (2012). Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), 272.

    Google Scholar 

  • Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., & Olatunji, S. O. (2017, April). Student performance prediction using support vector machine and k-nearest neighbor. In 2017 IEEE 30th canadian conference on electrical and computer engineering (CCECE) (pp. 1–4). IEEE.

  • Amra, I. A. A., & Maghari, A. Y. (2017, May). Students performance prediction using KNN and Naïve Bayesian. In 2017 8th International Conference on Information Technology (ICIT) (pp. 909–913). IEEE.

  • Bagui, S., Nandi, D., Bagui, S., & White, R. J. (2021). Machine learning and deep learning for phishing email classification using one-hot encoding. Journal of Computer Science, 17, 610–623.

    Article  Google Scholar 

  • Basnet, R. B., Johnson, C., & Doleck, T. (2022). Dropout prediction in Moocs using deep learning and machine learning. Education and Information Technologies, 1–15.

  • Beaulac, C., & Rosenthal, J. S. (2019). Predicting university students’ academic success and major using random forests. Research in Higher Education, 60(7), 1048–1064.

    Article  Google Scholar 

  • Bisong, E. (2019). Introduction to Scikit-learn. In Building machine learning and deep learning models on Google cloud platform (pp. 215–229). Apress, Berkeley, CA.

  • Bujang, S. D. A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H., & Ghani, N. A. M. (2021). Multiclass prediction model for student grade prediction using machine learning. Ieee Access : Practical Innovations, Open Solutions, 9, 95608–95621.

    Article  Google Scholar 

  • Bydžovská, H., & Popelínský, L. (2014, July). The influence of social data on student success prediction. In Proceedings of the 18th International Database Engineering & Applications Symposium (pp. 374–375).

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.

    Article  MATH  Google Scholar 

  • Coussement, K., Phan, M., De Caigny, A., Benoit, D. F., & Raes, A. (2020). Predicting student dropout in subscription-based online learning environments: the beneficial impact of the logit leaf model. Decision Support Systems, 135, 113325.

    Article  Google Scholar 

  • Da Silva, I. N., Spatti, D. H., Flauzino, R. A., & Liboni, L. H. (2017). Artificial neural networks. B., & dos Reis Alves (p. 39). Cham: Springer International Publishing.

    Google Scholar 

  • Eashwar, K. B., Venkatesan, R., & Ganesh, D. (2017). Student performance prediction using SVM. International Journal of Mechanical Engineering and Technology, 8(11), 649–662.

    Google Scholar 

  • Gray, C. C., & Perkins, D. (2019). Utilizing early engagement and machine learning to predict student outcomes. Computers & Education, 131, 22–32.

    Article  Google Scholar 

  • Guarín, C. E. L., Guzmán, E. L., & González, F. A. (2015). A model to predict low academic performance at a specific enrollment using data mining. IEEE Revista Iberoamericana de tecnologias del Aprendizaje, 10(3), 119–125.

    Article  Google Scholar 

  • Haiyang, L., Wang, Z., Benachour, P., & Tubman, P. (2018, July). A time series classification method for behaviour-based dropout prediction. In 2018 IEEE 18th international conference on advanced learning technologies (ICALT) (pp. 191–195). IEEE.

  • Hosmer, D. W., Hosmer, T., Le Cessie, S., & Lemeshow, S. (1997). A comparison of goodness-of‐fit tests for the logistic regression model. Statistics in medicine, 16(9), 965–980.

    Article  MATH  Google Scholar 

  • Iam-On, N., & Boongoen, T. (2017). Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. International Journal of Machine Learning and Cybernetics, 8(2), 497–510.

    Article  Google Scholar 

  • Kardan, A. A., Sadeghi, H., Ghidary, S. S., & Sani, M. R. F. (2013). Prediction of student course selection in online higher education institutes using neural network. Computers & Education, 65, 1–11.

    Article  Google Scholar 

  • Kuzilek, J., Zdrahal, Z., & Fuglik, V. (2021). Student success prediction using student exam behaviour. Future Generation Computer Systems, 125, 661–671.

    Article  Google Scholar 

  • Liang, J., Li, C., & Zheng, L. (2016, August). Machine learning application in MOOCs: Dropout prediction. In 2016 11th International Conference on Computer Science & Education (ICCSE) (pp. 52–57). IEEE.

  • Maltz, E. N., Murphy, K. E., & Hand, M. L. (2007). Decision support for university enrollment management: implementation and experience. Decision Support Systems, 44(1), 106–123.

    Article  Google Scholar 

  • Potdar, K., Pardawala, T. S., & Pai, C. D. (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International journal of computer applications, 175(4), 7–9.

    Article  Google Scholar 

  • Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys (CSUR), 53(3), 1–34.

    Article  Google Scholar 

  • Priyanka, & Kumar, D. (2020). Decision tree classifier: a detailed survey. International Journal of Information and Decision Sciences, 12(3), 246–269.

    Article  Google Scholar 

  • Sa’ad, M. I., & Mustafa, M. S. (2020, October). Student Prediction of Drop Out Using Extreme Learning Machine (ELM) Algorithm. In 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1–6). IEEE.

  • Sekeroglu, B., Dimililer, K., & Tuncal, K. (2019, March). Student performance prediction and classification using machine learning algorithms. In Proceedings of the 2019 8th International Conference on Educational and Information Technology (pp. 7–11).

  • Sharma, G., & Uttam, A. K. (2021). Preparing application of K-Nearest Neighbor (KNN): a supervised machine learning based model in placement prediction for graduate course students. Recent Trends in Communication and Electronics (pp. 425–429). CRC Press.

  • Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130.

    Google Scholar 

  • Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education, 143, 103676.

    Article  Google Scholar 

  • Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature communications, 10(1), 1–7.

    Article  Google Scholar 

Download references

Acknowledgements

All errors remain mine. The author is thankful for all reviewers’ comments, recommendations, and suggestions.

Funding

The research contained in this paper was supported by the Key Research and Development Program of Shandong Province, China, under Grant 2017GGX10142.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linbo Zhai.

Ethics declarations

Conflict of Interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Zhai, L. A comparative study on student performance prediction using machine learning. Educ Inf Technol 28, 12039–12057 (2023). https://doi.org/10.1007/s10639-023-11672-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-023-11672-1

Keywords

Navigation