Abstract
Educational data mining (DEM) provides valuable educational information by applying data mining tools and techniques to analyze data at educational institutions. In this paper, tree-based machine learning algorithms are used to predict students’ overall academic performance in their bachelor’s program. The transcript data of the students in the same department in a Chinese university were collected. All the courses in the bachelor’s program were then divided into six typical categories, and the mean GPAs of each category were taken as primary input features for prediction. Three tree-based machine learning models were established, i.e. decision tree (DT), Gradient boosting decision tree (GBDT) and random forest (RF). Results show that we can successfully identify more than 80% of the students at low-performance risk using the RF model at the end of the second semester, which is meaningful because the global quality of teaching and learning of the department can be improved by taking targeted measures in time according to the machine learning model. Feature importance and the structure of decision tree were also analyzed to extract knowledge that is valuable for both students and teachers. The results of this case study can be used as a reference for other engineering departments in China.
Similar content being viewed by others
References
Al-Sudani, S., & Palaniappan, R. (2019). Predicting students’ final degree classification using an extended profile. Education and Information Technologies, 24(4), 2357–2369. https://doi.org/10.1007/s10639-019-09873-8.
Arsad, P. M., Buniyamin, N., & Ab Manan, J.-L. (2012). Neural network model to predict electrical students’ academic performance. In 4th international congress on engineering education (ICEED). https://doi.org/10.1109/ICEED.2012.6779270.
Aydoğdu, Ş. (2020). Predicting student final performance using artificial neural networks in online learning environments. Education and Information Technologies, 25(3), 1913–1927. https://doi.org/10.1007/s10639-019-10053-x.
Baker, R.S.J.D., & Yacef, K. (2009). The state of educational data mining in. A review and future visions. Journal of Educational Data Mining, 1(1), 3–17. https://doi.org/10.5281/zenodo.3554657.
Baker, R.S.J.D. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112–118.
Berka, P., & Marek, L. (2021). Bachelor’s degree student dropouts: Who tend to stay and who tend to leave? Studies in Educational Evaluation, 70, 100999. https://doi.org/10.1016/j.stueduc.2021.100999.
Bhardwaj, B. K., & Pal, S. (2011). Data Mining: A prediction for performance improvement using classification. International Journal of Computer Science and Information Security, 9(4), 355–358 https://doi.org/10.1109/wocn.2012.6335530.
Breiman, L., Freidman, J., Stone, C.J., & Olshen, R.A. (1984). Classification and regression trees. California: Wadsworth.
Breiman, L. (2001). Using iterated bagging to debias regressions. Machine Learning, 45(3), 261–277. https://doi.org/10.1023/A:1017934522171.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 -Step-by-step data mining guide, The CRISP-DM consortium.
Curcio, G., Ferrara, M., & Gennaro, L.D. (2006). Sleep loss, learning capacity and academic performance. Sleep Medicine Reviews, 10(5), 323–337. https://doi.org/10.1016/j.smrv.2005.11.001.
Elbadrawy, A., Studham, S., & Karypis, G. (2015). Personalized Multi-regression models for predicting students performance in course activities. In Paper presented at the 5th international conference on learning analytics and knowledge, March 2015, 16–20. https://doi.org/10.1145/2723576.2723590.
Friedman, J.H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38, 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2.
Gaševic, D., Zouaq, A., & Janzen, R. (2013). Choose your classmates, your GPA is at stake! The association of cross-class’ social ties and academic performance. American Behavioral Scientist, 57(10), 1460–1479. https://doi.org/10.1177/0002764213479362.
Hamsa, H., Indiradevi, S., & Kizhakkethottam, J.J. (2016). Student academic performance prediction model using decision tree and fuzzy genetic algorithm. Procedia Technology, 25, 326–332. https://doi.org/10.1016/j.protcy.2016.08.114.
Iqbal, Z., Qadir, J., Mian, A.N., & Kamiran, F. (2017). Machine Learning based student grade prediction. A case study. arXiv:1708.08744.
Kabra, R.R., & Bichkar, R.S. (2011). Performance prediction of engineering students using decision trees. International Journal of Computer Applications, 36(11), 8–12. https://doi.org/10.5120/4532-6414.
Khan, I., Sadiri, A. A. l., Ahmad, A.R., & Jabeur, N. (2019). Tracking student performance in introductory programming by means of machine learning. In 4th MEC international conference on big data and smart city (ICBDSC) (pp. 1–6). https://doi.org/10.1109/ICBDSC.2019.8645608
Lavin, D.E. (1965). The prediction of academic performance. New York: Russell Sage Foundation.
Lu, O.H.T., Huang, A.Y.Q., Huang, J.C.H., Lin, A.J., Ogata, H., & Yang, S.J. (2018). Applying learning analytics for the early prediction of Students’ academic performance in blended learning. Journal of Educational Technology & Society, 21(2), 220–232.
Luan, H., & Tsai, C.C. (2021). A review of using machine learning approaches for precision education. Educational Technology & Society, 24(1), 250–266.
Macan, T.H., Shahani, C., Dipboye, R.L., & Phillips, A.P. (1990). College students’ time management: Correlations with academic performance and stress. Journal of Educational Psychology, 82(4), 760–768. https://doi.org/10.1037//0022-0663.82.4.760.
Manjarres, A.V., Sandoval, L.G.M., & Suárez, M.S. (2018). Data mining techniques applied in educational environments: Literature review. Digital Education, 33, 235–266. https://doi.org/10.1344/der.2018.33.235-266.
Mohamada, S.K., & Tasir, Z. (2013). Educational data mining: A review. Procedia-Social and Behavioral Sciences, 97(6), 320–324. https://doi.org/10.1016/j.sbspro.2013.10.240.
Nahar, K., Shova, B.I., Ria, T., Rashid, H.B., & Islam, A.H.M. (2021). Mining educational data to predict students performance. Education and Information Technologies, 26 (5), 6051–6067. https://doi.org/10.1007/s10639-021-10575-3.
Pathan, A.A., Hasan, M., Ahmed, M.F., & Farid, D.M. (2014). Educational data mining: A mining model for developing students’ programming skills. In The 8th International conference onsoftware, knowledge, information management and applications (SKIMA 2014): IEEE. https://doi.org/10.1109/SKIMA.2014.7083552.
Papadogiannis, I., Poulopoulos, V., & Wallace, M (2020). A Critical Review of Data Mining for Education: What has been done, what has been learnt and what remains to be seen. International Journal of Educational Research Review, 5 (4), 353–372. https://doi.org/10.24331/ijere.755047.
Pojon, M. (2016). Using machine learning to predict student performance. MS. thesis University of Tampere, Tampere, The Finland.
Polyzou, A., & Karypis, G. (2016). Grade prediction with models specific to students and courses. International Journal of Data Science and Analytics, 2(3), 159–171. https://doi.org/10.1007/s41060-016-0024-z.
Qazdar, A., Er-Raha, B., Cherkaoui, C., & Mammass, D. (2019). A machine learning algorithm framework for predicting students performance: a case study of baccalaureate students in Morocco. Education and Information Technologies, 24(6), 3577–3589. https://doi.org/10.1007/s10639-019-09946-8.
Quinlan, J.R. (1986). Induction of decision trees. Machine Learn, 1(1), 81–106. https://doi.org/10.1007/BF00116251.
Rahman, M. H., & Islam, M. R. (2017). Predict student’s academic performance and evaluate the impact of different attributes on the performance using data mining techniques. In 2nd International Conference on Electrical & Electronic Engineering (ICEEE). https://doi.org/10.1109/CEEE.2017.8412892.
Rashu, R. I., Haq, N., & Rahman, R. M. (2014). Data mining approaches to predict final grade by overcoming class imbalance problem. In 2014 17th International conference on computer and information technology (ICCIT). https://doi.org/10.1109/ICCITechn.2014.7073095.
Schapire, R. (2003). The boosting approach to machine learning: an overview. Nonlinear Estimation and Classification, 171, 149–171 https://doi.org/10.1007/978-0-387-21579-2_9.
Singh, A., Uijtdewilligen, L., Twisk, J.W., Van Mechelen, W., & Chinapaw, M.J. (2012). Physical activity and performance at school: a systematic review of the literature including a methodological quality assessment. Archives of Pediatrics & Adolescent Medicine, 166(1), 49–55. https://doi.org/10.1001/archpediatrics.2011.716.
Singh, S. P., Malik, S., & Singh, P. (2016). Factors affecting academic performance of Students. Paripex-Indian Journal of Research, 5(4), 176–178.
Thai-nghe, N., Drumond, L., Krohn-grimberghe, A., & Schmidt-Thieme, L. (2010). Recommender system for predicting student performance. Procedia Computer Science, 1(2), 2811–2819. https://doi.org/10.1016/j.procs.2010.08.006.
Veeramuthu, P., & Periasamy, R. (2014). Application of higher education system for predicting student using data mining techniques. International Journal of Innovative Research in Advanced Engineering, 1(5), 36–38.
Yağci, A., & Çevik, M. (2019). Prediction of academic achievements of vocational and technical high school (VTS) students in science courses through artificial neural networks (comparison of Turkey and Malaysia). Education and Information Technologies, 24 (5), 2741–2761. https://doi.org/10.1007/s10639-019-09885-4.
Yousafzai, B.K., Hayat, M., & Afzal, S. (2020). Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Education and Information Technologies, 25 (6), 4677–4697. https://doi.org/10.1007/s10639-020-10189-1.
Acknowledgments
This work is funded by the 2021 Project of Higher Education Teaching Quality and Teaching Reform sponsored by the Department of Education of Guangdong Province (Grant No. 2021-8.2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, W., Wang, Y. & Wang, S. Predicting academic performance using tree-based machine learning models: A case study of bachelor students in an engineering department in China. Educ Inf Technol 27, 13051–13066 (2022). https://doi.org/10.1007/s10639-022-11170-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11170-w