Skip to main content
Log in

An automatic prediction of students’ performance to support the university education system: a deep learning approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Predicting student performance is a critical aspect of educational systems. Although forecasting a student’s future performance is essential in many applications, it is a challenging process due to various factors. Previous research in this area has mainly focused on comparing machine learning methods to automate student evaluation and predict their final performance. However, there have been limited studies that thoroughly explore the issue of class imbalance using a deep learning approach. Moreover, the large dataset targeting university students makes it well-suited for in-depth analysis and increases the likelihood of obtaining more accurate results. This study presents a deep learning model based on convolution and introduces a comprehensive exploration of oversampling and undersampling methods to address the issue of imbalanced classes. The paper investigates various features and characteristics of undergraduate students at the University of Jordan, utilizing a large dataset collected from the university’s registration unit. These features include demographic information, attributes related to students’ majors, faculties, registrations, courses taken (such as passed, repeated, and completed), as well as their high school averages and performance in the first four semesters. The results demonstrate that the model performs exceptionally well in terms of gmean when predicting students’ excellence. This research project has significant implications and provides valuable insights to the research community and higher education managers, aiding in the development of improved strategies to enhance educational performance. Future researchers can utilize the methods employed in this paper during the data preprocessing stages and implement the demonstrated balancing strategies for further advancements in this field of study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data-sets generated during or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Guan C, Mou J, Jiang Z (2020) Artificial intelligence innovation in education: a twenty-year data-driven historical analysis. Int J Innov Stud 4(4):134–147

    Article  Google Scholar 

  2. Zhang Y, Yun Y, An R, Cui J, Dai H, Shang X (2021) Educational data mining techniques for student performance prediction: method review and comparison analysis. Front Psychol 12:698490

    Article  Google Scholar 

  3. Nisbet R, Miner G, Yale K (2009) Theoretical considerations for data mining. Handbook of statistical analysis and data mining applications, pp 21–37

  4. Domingos P (1999) Metacost: A general method for making classifiers cost-sensitive, in: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 155–164

  5. Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu T-Y (2020) Self-paced ensemble for highly imbalanced massive data classification. In: 2020 IEEE 36th international conference on data engineering (ICDE). IEEE pp 841–852

  6. Ketu S, Mishra PK (2021) Scalable kernel-based svm classification algorithm on imbalance air quality data for proficient healthcare. Complex & Intell Syst 7(5):2597–2615

    Article  Google Scholar 

  7. Mohammed R, Rawashdeh J, Abdullah M (2020) Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th international conference on information and communication systems (ICICS). IEEE pp 243–248

  8. Razavi S (2021) Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling. Environ Model Softw 144:105159

    Article  Google Scholar 

  9. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press

  10. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Article  Google Scholar 

  11. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  12. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Into Imaging 9(4):611–629

    Article  Google Scholar 

  13. Son LH, Fujita H (2019) Neural-fuzzy with representative sets for prediction of student performance. Appl Intell 49(1):172–187

    Article  Google Scholar 

  14. Kamal P, Ahuja S (2019) Academic performance prediction using data mining techniques: Identification of influential factors effecting the academic performance in undergrad professional course. In: Harmony search and nature inspired optimization algorithms. Springer, pp 835–843

  15. Almasri A, Celebi E, Alkhawaldeh RS (2019) Emt: Ensemble meta-based tree model for predicting student performance. Scientific Programming 2019

  16. Deng H, Wang X, Guo Z, Decker A, Duan X, Wang C, Ambrose GA, Abbott K (2019) Performancevis: Visual analytics of student performance data from an introductory chemistry course. Vis Inf 3(4):166–176

    Google Scholar 

  17. Wang X, Yu X, Guo L, Liu F, Xu L (2020) Student performance prediction with short-term sequential campus behaviors. Information 11(4):201

    Article  Google Scholar 

  18. Crespo-Turrado C, Casteleiro-Roca JL, Sánchez-Lasheras F, López-Vázquez JA, De Cos Juez FJ, Pérez Castelo FJ, Calvo-Rolle JL, Corchado E (2020) Comparative study of imputation algorithms applied to the prediction of student performance. Logic Journal of the IGPL 28(1):58–70

    Article  MathSciNet  Google Scholar 

  19. Mengash HA (2020) Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access 8:55462–55470

    Article  Google Scholar 

  20. Tsiakmaki M, Kostopoulos G, Kotsiantis S, Ragos O (2020) Transfer learning from deep neural networks for predicting student performance. Appl Sci 10(6):2145

    Article  Google Scholar 

  21. Hai-tao P, Ming-qu F, Hong-bin Z, Bi-zhen Y, Jin-jiao L, Chun-fang L, Yan-ze Z, Rui S (2021) Predicting academic performance of students in chinese-foreign cooperation in running schools with graph convolutional network. Neural Comput Appl 33(2):637–645

    Article  Google Scholar 

  22. Asselman A, Khaldi M, Aammou S (2021) Enhancing the prediction of student performance based on the machine learning xgboost algorithm. Interactive Learning Environments pp 1–20

  23. Turabieh H, Azwari SA, Rokaya M, Alosaimi W, Alharbi A, Alhakami W, Alnfiai M (2021) Enhanced harris hawks optimization as a feature selection for the prediction of student performance. Computing 103(7):1417–1438

    Article  MathSciNet  Google Scholar 

  24. Pallathadka H, Wenda A, Ramirez-Asís E, Asís-López M, Flores-Albornoz J, Phasinam K (2021) Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings

  25. Yousafzai BK, Khan SA, Rahman T, Khan I, Ullah I, Ur Rehman A, Baz M, Hamam H, Cheikhrouhou O (2021) Student-performulator: student academic performance using hybrid deep neural network. Sustainability 13(17):9775

    Article  Google Scholar 

  26. Mahareek EA, Desuky AS, El-Zhni HA (2021) Simulated annealing for svm parameters optimization in student’s performance prediction. Bull Electr Eng Inform 10(3):1211–1219

    Article  Google Scholar 

  27. Keser SB, Aghalarova S (2022) Hela: A novel hybrid ensemble learning algorithm for predicting academic performance of students. Educ Inf Technol 27(4):4521–4552

    Article  Google Scholar 

  28. Alarape MA, Ameen AO, Adewole KS (2022) Hybrid students’ academic performance and dropout prediction models using recursive feature elimination technique. In: Advances on smart and soft computing. Springer, pp 93–106

  29. Shreem SS, Turabieh H, Al Azwari S, Baothman F (2022) Enhanced binary genetic algorithm as a feature selection to predict student performance. Soft Comput 26(4):1811–1823

    Article  Google Scholar 

  30. Hidalgo ÁC, Ger PM, Valentín LDLF (2022) Using meta-learning to predict student performance in virtual learning environments. Appl Intell 52(3):3352–3365

    Article  Google Scholar 

  31. Yağcı M (2022) Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ 9(1):1–19

    Article  MathSciNet  Google Scholar 

  32. Poudyal S, Mohammadi-Aragh MJ, Ball JE (2022) Prediction of student academic performance using a hybrid 2d cnn model. Electronics 11(7):1005

    Article  Google Scholar 

  33. Kanetaki Z, Stergiou C, Bekas G, Jacques S, Troussas C, Sgouropoulou C, Ouahabi A (2022) Grade prediction modeling in hybrid learning environments for sustainable engineering education. Sustainability 14(9):5205

    Article  Google Scholar 

  34. Abhinav K, Subramanian V, Dubey A, Bhat P, Venkat AD (2018) Lecore: A framework for modeling learner’s preference. In: EDM

  35. Tang S, Peterson JC, Pardos ZA (2016) Deep neural networks and how they apply to sequential education data. In: Proceedings of the third acm conference on learning@ scale, pp 321–324

  36. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1(1):18

    Article  Google Scholar 

  37. Guo B, Zhang R, Xu G, Shi C, Yang L (2015) Predicting students performance in educational data mining. In: 2015 International symposium on educational technology (ISET). IEEE pp 125–128

  38. Khajah M, Lindsey RV, Mozer MC (2016) How deep is knowledge tracing. arXiv:1604.02416

  39. Brugman S (2019) pandas-profiling: Exploratory Data Analysis for Python. https://github.com/pandas-profiling/pandas-profiling, version: 2.X, Accessed: June 22, 2022

  40. Fujiwara K, Huang Y, Hori K, Nishioji K, Kobayashi M, Kamaguchi M, Kano M (2020) Over and under sampling approach for extremely imbalanced and small minority data problem in health record analysis. Front Public Health 8:178

    Article  Google Scholar 

  41. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  42. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE 2008:1322–1328

  43. Han H, Wang WY , Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887

  44. Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2008) Svms modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics) 39(1):281–288

    Article  Google Scholar 

  45. Batista GE, Bazzan AL, Monard MC et al (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB pp 10–18

  46. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsletter 6(1):20–29

    Article  Google Scholar 

  47. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  48. Mustaqeem Kwon S (2019) A cnn-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183

    Article  Google Scholar 

  49. Gómez WE, Isaza CV, Daza JM (2018) Identifying disturbed habitats: A new method from acoustic indices. Ecol Inform 45:16–25

    Article  Google Scholar 

  50. Wang H, He J, Zhang X, Liu S (2020) A short text classification method based on n-gram and cnn. Chin J Electron 29(2):248–254

    Article  Google Scholar 

  51. Hand DJ (2007) Principles of data mining. Drug Safety 30(7):621–622

    Article  Google Scholar 

  52. Chollet F et al (2015) Keras. https://keras.io

  53. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  54. Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563

    Google Scholar 

  55. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (Eds.), Advances in neural information processing systems, vol 30, Curran Associates, Inc., 2017. https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

  56. Suhaimi NM, Abdul-Rahman S, Mutalib S, Hamid NA, Hamid A (2019) Review on predicting students’ graduation time using machine learning algorithms. Int J Mod Educ Comput Sci 11(7):1–13

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by The University of Jordan (Deanship of Scientific Research).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yazn Alshamaila.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alshamaila, Y., Alsawalqah, H., Aljarah, I. et al. An automatic prediction of students’ performance to support the university education system: a deep learning approach. Multimed Tools Appl 83, 46369–46396 (2024). https://doi.org/10.1007/s11042-024-18262-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18262-4

Keywords

Navigation