Skip to main content
Log in

Feature optimization and machine learning for predicting students’ academic performance in higher education institutions

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Developing tools to support students, educators, intuitions, and government in the educational environment has become an important task to improve the quality of education and learning outcomes. Information and communication technology (ICT) is adopted by educational institutions; one such instance is video interaction in flipped teaching. ICT-based learning generates a huge amount of data that can be utilized to better understand student behavior and improve students learning. Predicting students’ academic performance is essential to take proactive measures to improve student learning and reduce the risk of student dropout and failure. This study aims to use video learning analysis and data mining approaches to predict student academic achievement and identify the factors affecting their performance. For this purpose, the dataset containing records of 326 students from a higher education institution (HEI) is used which contains records from SIS, Moodle, and eDify. This study advocates the use of a balanced dataset and optimized feature set to obtain better performance for students’ academic performance prediction. Several machine learning and deep learning models are applied to analyze their performance against the original dataset, balanced dataset, and balanced dataset with the optimized feature set. Experimental results demonstrate decision tree classifier outperforms with an accuracy of 99.06% with a balanced dataset and optimized feature set. Further analysis indicates that the video interaction feature has a strong impact on the performance of students.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Amrieh, E. A., Hamtini, T., & Aljarah, I. (2015). Preprocessing and analyzing educational data set using x-api for improving student’s performance. 2015 ieee jordan conference on applied electrical engineering and computing technologies (aeect) (pp. 1–5).

  • Bujang, S. D. A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H., & Ghani, N. A. M. (2021). Multiclass prediction model for student grade prediction using machine learning. IEEE Access, 9, 95608–95621.

    Article  Google Scholar 

  • Chen, M.-S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and data Engineering, 8(6), 866–883.

    Article  Google Scholar 

  • Comi, S. L., Argentin, G., Gui, M., Origo, F., & Pagani, L. (2017). Is it the way they use it? teachers, ict and student achievement. Economics of Education Review, 56, 24–39.

    Article  Google Scholar 

  • Cromley, J. G. (2000). Learning with computers: The theory behind the practice. Focus on Basics, 4, 6–11.

    Google Scholar 

  • Daghriri, T., Rustam, F., Aljedaani, W., Bashiri, A. H., & Ashraf, I. (2022). Electroencephalogram signals for detecting confused students in online education platforms with probability-based features. Electronics, 11(18), 2855.

    Article  Google Scholar 

  • De Witte, K., & Rogge, N. (2014). Does ict matter for effectiveness and efficiency in mathematics education? Computers & Education, 75, 173–184.

    Article  Google Scholar 

  • Falck, O., Mang, C., & Woessmann, L. (2018). Virtually no effect? different uses of classroom computers and their effect on student achievement. Oxford Bulletin of Economics and Statistics, 80(1), 1–38.

    Article  Google Scholar 

  • Fomba, B. K., Talla, D. N. D. F., & Ningaye, P. (2023). Institutional quality and education quality in developing countries: Effects and transmission channels. Journal of the Knowledge Economy, 14(1), 86–115.

    Article  Google Scholar 

  • Geetha, A., & Prakash, N. (2022). Classification of glaucoma in retinal images using efficientnetb4 deep learning model. Computer Systems Science & Engineering, 43(3), 1041–1055.

    Article  Google Scholar 

  • Gull, H., Saqib, M., Iqbal, S. Z., & Saeed, S. (2020). Improving learning experience of students by early prediction of student performance using machine learning. 2020 ieee international conference for innovation in technology (inocon) (pp. 1–4).

  • Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., & Sarker, K. U. (2021). Dataset of students’ performance using student information system, moodle and the mobile application “edify.” Data, 6(11), 110.

  • Hasan, R., Palaniappan, S., Mahmood, S., Sarker, K. U., Sattar, M. U., Abbas, A., ..., & Rajegowda, P. M. (2021a). edify: Enhancing teaching and learning process by using video streaming server. International Journal of Interactive Mobile Technologies, 15(11).

  • Hasan, R., Palaniappan, S., Mahmood, S., Sarker, K. U., Sattar, M. U., Abbas, A., ..., & Rajegowda, P. M. (2021b). edify: Enhancing teaching and learning process by using video streaming server. International Journal of Interactive Mobile Technologies, 15(11)

  • Huang, S., Cai, N., Pacheco, P. P., Narrandes, S., Wang, Y., & Xu, W. (2018). Applications of support vector machine (svm) learning in cancer genomics. Cancer Genomics & Proteomics, 15(1), 41–51.

    Google Scholar 

  • Ishaq, A., Umer, M., Mushtaq, M. F., Medaglia, C., Siddiqui, H. U. R., Mehmood, A., & Choi, G. S. (2021). Extensive hotel reviews classification using long short term memory. Journal of Ambient Intelligence and Humanized Computing, 12, 9375–9385.

    Article  Google Scholar 

  • Kaunang, F. J., & Rotikan, R. (2018). Students’ academic performance prediction using data mining. 2018 third international conference on informatics and computing (icic) (pp. 1–5).

  • Khan, A., & Sarfaraz, A. (2019). Rnn-lstm-gru based language transformation. Soft Computing, 23(24), 13007–13024.

    Article  Google Scholar 

  • Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39, 261–283.

    Article  Google Scholar 

  • Livingstone, S. (2015). Critical reflections on the benefits of ict in education. Digital technologies in the lives of young people (pp. 9–24). Routledge.

  • Manzoor, M., Umer, M., Sadiq, S., Ishaq, A., Ullah, S., Madni, H. A., & Bisogni, C. (2021). Rfcnn: Traffic accident severity prediction based on decision level fusion of machine and deep learning model. IEEE Access, 9, 128359–128371.

    Article  Google Scholar 

  • Nabil, A., Seyam, M., & Abou-Elfetouh, A. (2021). Prediction of students’ academic performance based on courses’ grades using deep neural networks. IEEE Access, 9, 140731–140746.

    Article  Google Scholar 

  • Nabizadeh, A. H., Gonçalves, D., Gama, S., & Jorge, J. (2022). Early prediction of students’ final grades in a gamified course. IEEE Transactions on Learning Technologies, 15(3), 311–325.

    Article  Google Scholar 

  • Ontivero-Ortega, M., Lage-Castellanos, A., Valente, G., Goebel, R., & Valdes-Sosa, M. (2017). Fast gaussian naïve bayes for searchlight classification analysis. Neuroimage, 163, 471–479.

    Article  Google Scholar 

  • Powers, D. M. (2020). Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv:2010.16061

  • Prasertisirikul, P., Laohakiat, S., Trakunphutthirak, R., & Sukaphat, S. (2022). A predictive model for student academic performance in online learning system. 2022 international conference on digital government technology and innovation (dgti-con) (pp. 76–79).

  • Rahman, M. H., & Islam, M. R. (2017). Predict student’s academic performance and evaluate the impact of different attributes on the performance using data mining techniques. 2017 2nd international conference on electrical & electronic engineering (iceee) (pp. 1–4).

  • Rustam, F., Siddique, M. A., Siddiqui, H. U. R., Ullah, S., Mehmood, A., Ashraf, I., & Choi, G. S. (2021). Wireless capsule endoscopy bleeding images classification using cnn based model. IEEE Access, 9, 33675–33688.

    Article  Google Scholar 

  • Sánchez-Pozo, N. N., Mejía-Ordóñez, J. S., Chamorro, D. C., Mayorca-Torres, D., & Peluffo-Ordóñez, D. H. (2021). Predicting high school students’ academic performance: A comparative study of supervised machine learning techniques. 2021 machine learning-driven digital technologies for educational innovation workshop (pp. 1–6).

  • Singh, R., & Pal, S. (2020). Machine learning algorithms and ensemble technique to improve prediction of students performance. IJATCSE, 9(3), 5.

    Article  Google Scholar 

  • Sinha, D., & El-Sharkawy, M. (2019). Thin mobilenet: An enhanced mobilenet architecture. 2019 ieee 10th annual ubiquitous computing, electronics & mobile communication conference (uemcon) (pp. 0280–0285).

  • Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica, 24(1), 12–18.

    Article  Google Scholar 

  • Spiezia, V. (2011). Does computer use increase educational achievements? student-level evidence from pisa. OECD Journal: Economic Studies, 2010(1), 1–22.

    Google Scholar 

  • Waheed, H., Hassan, S.-U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from vle big data using deep learning models. Computers in Human behavior, 104, 106189.

    Article  Google Scholar 

  • Wyner, A. J., Olson, M., Bleich, J., & Mease, D. (2017). Explaining the success of adaboost and random forests as interpolating classifiers. The Journal of Machine Learning Research, 18(1), 1558–1590.

    MathSciNet  Google Scholar 

  • Yaacob, W. F. W., Nasir, S. A. M., Yaacob, W. F. W., & Sobri, N. M. (2019). Supervised data mining approach for predicting student performance. Indonesian Journal of Electrical Engineering and Computer Science, 16(3), 1584–1592.

    Article  Google Scholar 

Download references

Funding

This research was funded by the European University of Atlantic.

Author information

Authors and Affiliations

Authors

Contributions

AP conceived the idea, performed data analysis and wrote the original draft. QS conceived the idea, performed data curation and wrote the original draft. RS performed data curation, formal analysis, and designed methodology. FR designed methodology, dealt with software and performed visualization. MGV acquired the funding for research, and performed visualization and initial investigation. ESA dealt with software, carried out project administration and performed visualization. IDLTD performed investigation, supervised the study and performed validation. IA supervised the study, performed validation and review and edit the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Imran Ashraf.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perkash, A., Shaheen, Q., Saleem, R. et al. Feature optimization and machine learning for predicting students’ academic performance in higher education institutions. Educ Inf Technol (2024). https://doi.org/10.1007/s10639-024-12698-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10639-024-12698-9

Keywords

Navigation