Skip to main content

Practical early prediction of students’ performance using machine learning and eXplainable AI

Abstract

Predicting students’ performance in advance could help assist the learning process; if “at-risk” students can be identified early on, educators can provide them with the necessary educational support. Despite this potential advantage, the technology for predicting students’ performance has not been widely used in education due to practical limitations. We propose a practical method to predict students’ performance in the educational environment using machine learning and explainable artificial intelligence (XAI) techniques. We conducted qualitative research to ascertain the perspectives of educational stakeholders. Twelve people, including educators, parents of K-12 students, and policymakers, participated in a focus group interview. The initial practical features were chosen based on the participants’ responses. Then, a final version of the practical features was selected through correlation analysis. In addition, to verify whether at-risk students could be distinguished using the selected features, we experimented with various machine learning algorithms: Logistic Regression, Decision Tree, Random Forest, Multi-Layer Perceptron, Support Vector Machine, XGBoost, LightGBM, VTC, and STC. As a result of the experiment, Logistic Regression showed the best overall performance. Finally, information intended to help each student was visually provided using the XAI technique.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  • Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education.

  • Aggarwal, D., Mittal, S., & Bali, V. (2021). Significance of non-academic parameters for predicting student performance using ensemble learning techniques. International Journal of System Dynamics Applications (IJSDA), 10(3), 38–49.

    Article  Google Scholar 

  • Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in Human Behavior, 31(1), 542–550. https://doi.org/10.1016/j.chb.2013.05.031

    Article  Google Scholar 

  • Ahmed, N. S., & Hikmat Sadiq, M. (2018). Clarify of the Random Forest Algorithm in an Educational Field. ICOASE 2018 - International Conference on Advanced Science and Engineering, 179–184. https://doi.org/10.1109/ICOASE.2018.8548804

  • Ahmed, S., Paul, R., & Hoque, A. S. M. L. (2003). Knowledge discovery from academic data using association rule mining. 2014 17th International Conference on Computer and Information Technology, ICCIT 2014, 314–319. https://doi.org/10.1109/ICCITechn.2014.7073107

  • Ajibade, S. S. M., Ahmad, N. B. B., & Shamsuddin, S. M. (2019). Educational data mining: enhancement of student performance model using ensemble methods. In IOP Conference Series: Materials Science and Engineering (vol. 551, no. 1, p. 012061). IOP Publishing.

  • Al-Barrak, M. A., & Al-Razgan, M. (2016). Predicting students final GPA using decision trees: A case study. International Journal of Information and Education Technology, 6(7), 528–533. https://doi.org/10.7763/ijiet.2016.v6.745

    Article  Google Scholar 

  • Al-Obeidat, F., Tubaishat, A., Dillon, A., & Shah, B. (2017). Analyzing students’ performance using multi-criteria classification. Cluster Computing, 21(1), 623–632. https://doi.org/10.1007/s10586-017-0967-4

    Article  Google Scholar 

  • Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9). https://doi.org/10.3390/educsci11090552

  • Amro, F., & Borup, J. (2019). Exploring blended teacher roles and obstacles to success when using personalized learning software. Journal of Online Learning Research, 5(3), 229–250.

    Google Scholar 

  • Arbaugh, J. B. (2014). System, scholar or students? Which most influences online MBA course effectiveness? Journal of Computer Assisted Learning, 30(4), 349–362. https://doi.org/10.1111/jcal.12048

    Article  Google Scholar 

  • Atherton, M., Shah, M., Vazquez, J., Griffiths, Z., Jackson, B., & Burgess, C. (2017). Using learning analytics to assess student engagement and academic outcomes in open access enabling programmes. Open Learning: The Journal of Open, Distance and e-Learning, 32(2), 119–136.

    Article  Google Scholar 

  • Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: Focus on clinicians. Journal of Medical Internet Research, 22(6), 1–7. https://doi.org/10.2196/15154

    Article  Google Scholar 

  • Aydoğdu, Ş. (2020). Predicting student final performance using artificial neural networks in online learning environments. Education and Information Technologies, 25(3), 1913–1927. https://doi.org/10.1007/s10639-019-10053-x

    Article  Google Scholar 

  • Beer, C., Zlotkowski, E., & Hollander, E. L. (2011). Indicators of engagement. Higher Education and Democracy: Essays on Service-Learning and Civic Engagement, 9781439900, 285–302. https://doi.org/10.1007/978-1-4615-0885-4_3

    Article  Google Scholar 

  • Belgiu, M., & Drăgu, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011

    Article  Google Scholar 

  • Bendikson, L., Hattie, J., & Robinson, V. (2011). Identifying the comparative academic performance of secondary schools. Journal of Educational Administration, 49(4), 433–449. https://doi.org/10.1108/09578231111146498

    Article  Google Scholar 

  • Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2

    Article  Google Scholar 

  • Cai, L., Ren, X., Fu, X., Peng, L., Gao, M., & Zeng, X. (2021). iEnhancer-XG: Interpretable sequence-based enhancers and their strength predictor. Bioinformatics, 37(8), 1060–1067.

    Article  Google Scholar 

  • Car, Z., Baressi Šegota, S., Anđelić, N., Lorencin, I., & Mrzljak, V. (2020). Modeling the Spread of COVID-19 Infection Using a Multilayer Perceptron. Computational and Mathematical Methods in Medicine, 2020. https://doi.org/10.1155/2020/5714714

  • Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics (Switzerland), 8(8), 1–34. https://doi.org/10.3390/electronics8080832

    Article  Google Scholar 

  • Cen, L., Ruta, D., Powell, L., Hirsch, B., & Ng, J. (2016). Quantitative approach to collaborative learning: Performance prediction, individual assessment, and group composition. International Journal of Computer-Supported Collaborative Learning, 11(2), 187–225. https://doi.org/10.1007/s11412-016-9234-6

    Article  Google Scholar 

  • Cerezo, R., Sánchez-Santillán, M., Paule-Ruiz, M. P., & Núñez, J. C. (2016). Students’ LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers and Education, 96, 42–54. https://doi.org/10.1016/j.compedu.2016.02.006

    Article  Google Scholar 

  • Chalvatza, F., Karkalas, S., & Mavrikis, M. (2019). Communicating learning analytics: Stakeholder participation and early stage requirement analysis. CSEDU 2019 - Proceedings of the 11th International Conference on Computer Supported Education, 2(Csedu), 339–346. https://doi.org/10.5220/0007716503390346

  • Chaturvedi, R., & Ezeife, C. I. (2017). Predicting Student Performance in an ITS Using Task-Driven Features. IEEE CIT 2017 - 17th IEEE International Conference on Computer and Information Technology, 168–175. https://doi.org/10.1109/CIT.2017.34

  • Chaudhury, P., & Tripaty, H. K. (2017). An empirical study on attribute selection of student performance prediction model. International Journal of Learning Technology, 12(3), 241–252. https://doi.org/10.1504/IJLT.2017.088407

    Article  Google Scholar 

  • Chen, T., & He, T. (2015). Higgs boson discovery with boosted trees. In NIPS 2014 workshop on high-energy physics and machine learning (pp. 69–80). PMLR.

  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).

  • Chen, W., Brinton, C. G., Cao, D., Mason-Singh, A., Lu, C., & Chiang, M. (2019). Early detection prediction of learning outcomes in online short-courses via learning behaviors. IEEE Transactions on Learning Technologies, 12(1), 44–58. https://doi.org/10.1109/TLT.2018.2793193

    Article  Google Scholar 

  • Chitti, M., Chitti, P., & Jayabalan, M. (2020). Need for Interpretable Student Performance Prediction. Proceedings - International Conference on Developments in ESystems Engineering, DeSE, 2020-Decem, 269–272. https://doi.org/10.1109/DeSE51703.2020.9450735

  • Choi, S., Jang, Y., & Kim, H. (2022). Influence of pedagogical beliefs and perceived trust on teachers’ acceptance of educational artificial intelligence tools. International Journal of Human–Computer Interaction, 1–13.

  • Chou, C., Peng, H., & Chang, C. Y. (2010). The technical framework of interactive functions for course-management systems: Students’ perceptions, uses, and evaluations. Computers and Education, 55(3), 1004–1017. https://doi.org/10.1016/j.compedu.2010.04.011

    Article  Google Scholar 

  • Chounta, I. A., Bardone, E., Raudsep, A., & Pedaste, M. (2021). Exploring teachers’ perceptions of artificial intelligence as a tool to support their practice in Estonian K-12 education. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-021-00243-5

    Article  Google Scholar 

  • Clark, R., Kaw, A., Lou, Y., Scott, A., & Besterfield-Sacre, M. (2018). Evaluating blended and flipped instruction in numerical methods at multiple engineering schools. International Journal for the Scholarship of Teaching and Learning, 12(1), 1–16. https://doi.org/10.20429/ijsotl.2018.120111

    Article  Google Scholar 

  • Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683–695. https://doi.org/10.1080/13562517.2013.827653

    Article  Google Scholar 

  • Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312

    Article  Google Scholar 

  • Cortez, P., & Silva, A. (2008). Using data mining to predict secondary school student performance. 15th European Concurrent Engineering Conference 2008, ECEC 2008 - 5th Future Business Technology Conference, FUBUTEC 2008, 2003(2000), 5–12.

  • Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/j.chb.2017.01.047

    Article  Google Scholar 

  • Das, A., & Rad, P. (2020). Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. 1–24. http://arxiv.org/abs/2006.11371. Accessed 29 May 2022.

  • Dawson, S. P., Mcwilliam, E., & Tan, J. P. (2008). Teaching smarter: How mining ICT data can inform and improve learning and teaching practice. 221–230.

  • Dietz-Uhler, B., & Hurn, J. E. (2013). Using learning analytics to predict (and improve) student success: A faculty perspective. Journal of Interactive Online Learning, 12(1), 17–26.

    Google Scholar 

  • Dinesh Kumar, A., Pandi Selvam, R., & Sathesh Kumar, K. (2018). Review on prediction algorithms in educational data mining. International Journal of Pure and Applied Mathematics, 118(Special Issue 8), 531–537.

    Google Scholar 

  • Dogan, A., & Birant, D. (2019). A weighted majority voting ensemble approach for classification. In 2019 4th International Conference on Computer Science and Engineering (UBMK) (pp. 1–6). IEEE.

  • Dollinger, S. J., Matyja, A. M., & Huber, J. L. (2008). Which factors best account for academic success: Those which college students can control or those they cannot? Journal of Research in Personality, 42(4), 872–885. https://doi.org/10.1016/j.jrp.2007.11.007

    Article  Google Scholar 

  • Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14(2), 241–258.

    Article  Google Scholar 

  • Downing, K. J., Lam, T., Kwong, T., Downing, W., & Chan, S. (2007). Creating interaction in online learning: A case study. Alt-J, 15(3), 201–215. https://doi.org/10.1080/09687760701673592

    Article  Google Scholar 

  • Duffy, T., & Cunningham, D. (1996). Constructivism: Implications for the design and delivery of instruction. Handbook of Research on Educational Communications and Technology, 171(4), 1–31.

    Google Scholar 

  • Dvorak, T., & Jia, M. (2016). Do the Timeliness, Regularity, and Intensity of Online Work Habits Predict Academic Performance? Journal of Learning Analytics, 3(3), 318–330. https://learning-analytics.info/index.php/JLA/article/view/4676. Accessed 29 May 2022.

  • El Aissaoui, O., El Alami El Madani, Y., Oughdir, L., Dakkak, A., & El Allioui, Y. (2020). A Multiple Linear Regression-Based Approach to Predict Student Performance. In Advances in Intelligent Systems and Computing: Vol. 1102 AISC (Issue January). Springer International Publishing. https://doi.org/10.1007/978-3-030-36653-7_2

  • Felisoni, D. D., & Godoi, A. S. (2018). Cell phone usage and academic performance: An experiment. Computers and Education, 117(March 2017), 175–187. https://doi.org/10.1016/j.compedu.2017.10.006

    Article  Google Scholar 

  • Ferguson, R., Brasher, A., Clow, D., Cooper, A., Hillaire, G., Mittelmeier, J., Rienties, B., Ullmann, T., & Vuorikari, R. (2016). Research Evidence on the Use of Learning Analytics - Implications for Education Policy. In A European Framework for Action on Learning Analytics (Issue 2016). https://doi.org/10.2791/955210

  • Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. Internet and Higher Education, 28, 68–84. https://doi.org/10.1016/j.iheduc.2015.10.002

    Article  Google Scholar 

  • Gowda, S. M., Baker, R. S., Corbett, A. T., & Rossi, L. M. (2013). Towards automatically detecting whether student learning is shallow. International Journal of Artificial Intelligence in Education, 23(1–4), 50–70. https://doi.org/10.1007/s40593-013-0006-4

    Article  Google Scholar 

  • Grivokostopoulou, F., Perikos, I., & Hatzilygeroudis, I. (2015). Utilizing semantic web technologies and data mining techniques to analyze students learning and predict final performance. Proceedings of IEEE International Conference on Teaching, Assessment and Learning for Engineering: Learning for the Future Now, TALE 2014, December, 488–494. https://doi.org/10.1109/TALE.2014.7062571

  • Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017, July). Application of ensemble algorithm in students' performance prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 735–740). IEEE.

  • Haridas, M., Gutjahr, G., Raman, R., Ramaraju, R., & Nedungadi, P. (2020). Predicting school performance and early risk of failure from an intelligent tutoring system. Education and Information Technologies. https://doi.org/10.1007/s10639-020-10144-0

    Article  Google Scholar 

  • Hasan, M. M., Schaduangrat, N., Basith, S., Lee, G., Shoombuatong, W., & Manavalan, B. (2020). HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 36(11), 3350–3356.

    Article  Google Scholar 

  • Hasan, R., & Chu, C. (2022). Noise in Datasets: What Are the Impacts on Classification Performance?[Noise in Datasets: What Are the Impacts on Classification Performance?]. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods.

  • Hassan, H., Ahmad, N. B., & Anuar, S. (2020). Improved students’ performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining. In Journal of Physics: Conference Series (vol. 1529, no. 5, p. 052041). IOP Publishing.

  • Helle, L., Nivala, M., Kronqvist, P., Ericsson, K. A., & Lehtinen, E. (2010). Do prior knowledge, personality and visual perceptual ability predict student performance in microscopic pathology? Medical Education, 44(6), 621–629. https://doi.org/10.1111/j.1365-2923.2010.03625.x

    Article  Google Scholar 

  • Hossain, S., Bushra, J., Sarma, D., Sen, S., & Taher, M. (2019). Student Performance under Uncertainty. December, 18–20.

  • Hu, Y. H., Lo, C. L., & Shih, S. P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. https://doi.org/10.1016/j.chb.2014.04.002

    Article  Google Scholar 

  • Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques. International Journal of Emerging Technologies in Learning, 14(14).

  • Ingale, N. V., Sivakkumar, M., & Namdeo, V. (2021). Survey on prediction system for student academic performance using educational data. Mining Turkish Journal of Computer and Mathematics Education, 12(13), 363–369.

    Google Scholar 

  • Jayaprakash, S. M., Moody, E. W., Lauría, E. J. M., Regan, J. R., & Baron, J. D. (2014). Early Alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47. https://doi.org/10.18608/jla.2014.11.3

    Article  Google Scholar 

  • Jin, D., Lu, Y., Qin, J., Cheng, Z., & Mao, Z. (2020). SwiftIDS: Real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism. Computers & Security, 97, 101984.

    Article  Google Scholar 

  • Jishan, S. T., Rashu, R. I., Haque, N., & Rahman, R. M. (2015). Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique. Decision Analytics, 2(1), 1–25. https://doi.org/10.1186/s40165-014-0010-2

    Article  Google Scholar 

  • Joksimović, S., Gašević, D., Loughin, T. M., Kovanović, V., & Hatala, M. (2015). Learning at distance: Effects of interaction traces on academic achievement. Computers and Education, 87, 204–217. https://doi.org/10.1016/j.compedu.2015.07.002

    Article  Google Scholar 

  • Kadoic, N., & Oreski, D. (2018). Analysis of student behavior and success based on logs in Moodle. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2018 - Proceedings, 654–659. https://doi.org/10.23919/MIPRO.2018.8400123

  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30.

  • Kim, B., Khanna, R., & Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. Advances in neural information processing systems, 29.

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai (Vol. 14, No. 2, pp. 1137–1145).

  • Kondo, N., Okubo, M., & Hatanaka, T. (2017). Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data. Proceedings - 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, 198–201. https://doi.org/10.1109/IIAI-AAI.2017.51

  • Kotsiantis, S., Pierrakeas, C., & Pintelas, P. (2004). Predicting students’ performance in distance learning using machine learning techniques. Applied Artificial Intelligence, 18(5), 411–426. https://doi.org/10.1080/08839510490442058

    Article  Google Scholar 

  • Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive presence in asynchronous online discussions. Internet and Higher Education, 27, 74–89. https://doi.org/10.1016/j.iheduc.2015.06.002

    Article  Google Scholar 

  • Krueger, R. A. (1994). Focus Groups: A Practical Guide For Applied Research Description: Title: Focus Groups: A Practical Guide for Applied Research.

  • Kumari, P., Jain, P. K., & Pamula, R. (2018). An efficient use of ensemble methods to predict students academic performance. In 2018 4th International Conference on Recent Advances in Information Technology (RAIT) (pp. 1–6). IEEE.

  • Lauría, E. J. M., Baron, J. D., Devireddy, M., Sundararaju, V., & Jayaprakash, S. M. (2012). Mining academic data to improve college student retention. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge - LAK ’12, May, 139. http://dl.acm.org/citation.cfm?doid=2330601.2330637. Accessed 29 May 2022.

  • Lemay, D. J., & Doleck, T. (2020). Grade prediction of weekly assignments in MOOCS: Mining video-viewing behavior. Education and Information Technologies, 25(2), 1333–1342. https://doi.org/10.1007/s10639-019-10022-4

    Article  Google Scholar 

  • Liu, P., Chen, P., Yuan, Y., Zhang, W., & He, X. (2020). A teaching assistant system for big data analysis. Journal of Physics: Conference Series, 1678(1). https://doi.org/10.1088/1742-6596/1678/1/012090

  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 32(2), 4768–4777). https://doi.org/10.1016/j.inffus.2019.12.012%0A10.1016/j.ophtha.2018.11.016

  • Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K. W., Newman, S. F., Kim, J., & Lee, S. I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. https://doi.org/10.1038/s41551-018-0304-0

    Article  Google Scholar 

  • Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers and Education, 54(2), 588–599. https://doi.org/10.1016/j.compedu.2009.09.008

    Article  Google Scholar 

  • Mandrekar, J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. Journal of Thoracic Oncology, 5(9), 1315–1316. https://doi.org/10.1097/JTO.0b013e3181ec173d

    Article  Google Scholar 

  • Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers and Education, 103, 1–15. https://doi.org/10.1016/j.compedu.2016.09.005

    Article  Google Scholar 

  • Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8, 55462–55470. https://doi.org/10.1109/ACCESS.2020.2981905

    Article  Google Scholar 

  • Meyer, J., & Land, R. (2005). Overcoming barriers to student understanding. Taylor & Francis Limited.

    Google Scholar 

  • Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007

    MathSciNet  Article  MATH  Google Scholar 

  • Moghaddam, D. D., Rahmati, O., Panahi, M., Tiefenbacher, J., Darabi, H., Haghizadeh, A., ..., & Bui, D. T. (2020). The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers. Catena, 187, 104421.

  • Moore, M. G. (1989). Editorial: Three types of interaction. American Journal of Distance Education, 3(2), 1–7. https://doi.org/10.1080/08923648909526659

    Article  Google Scholar 

  • Morris, L. V., Finnegan, C., & Wu, S. S. (2005). Tracking student behavior, persistence, and achievement in online courses. Internet and Higher Education, 8(3), 221–231. https://doi.org/10.1016/j.iheduc.2005.06.009

    Article  Google Scholar 

  • Motlagh, M. N., Fehresti, S., Talebi, Z., & Hesari, M. (2013). The study of the teacher’s role and student interaction in e-learning process. 4th International Conference on E-Learning and e-Teaching, ICELET 2013, 130–134. https://doi.org/10.1109/ICELET.2013.6681659

  • Muñoz-Organero, M., Muñoz-Merino, P. J., & Kloos, C. D. (2010). Student behavior and interaction patterns with an lms as motivation predictors in e-learning settings. IEEE Transactions on Education, 53(3), 463–470. https://doi.org/10.1109/TE.2009.2027433

    Article  Google Scholar 

  • Nandi, D., Hamilton, M., Harland, J., & Warburton, G. (2011). How active are students in online discussion forums? Conferences in Research and Practice in Information Technology Series, 114, 125–133.

    Google Scholar 

  • Nikian, S., Nor, F. M., & Aziz, M. A. (2013). Malaysian teachers’ perception of applying technology in the classroom. Procedia - Social and Behavioral Sciences, 103, 621–627. https://doi.org/10.1016/j.sbspro.2013.10.380

    Article  Google Scholar 

  • O’Connell, K. A., Wostl, E., Crosslin, M., Berry, T. L., & Grover, J. P. (2018). Student ability best predicts final grade in a college algebra course. Journal of Learning Analytics, 5(3), 167–181. https://doi.org/10.18608/jla.2018.53.11

    Article  Google Scholar 

  • Onwuegbuzie, A. J., Dickinson, W. B., Leech, N. L., & Zoran, A. G. (2009). A qualitative framework for collecting and analyzing data in focus group research. International Journal of Qualitative Methods, 8(3), 1–21. https://doi.org/10.1177/160940690900800301

    Article  Google Scholar 

  • Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297–2307. https://doi.org/10.1109/TGRS.2009.2039484

    Article  Google Scholar 

  • Pal, S., & Chaurasia, V. (2017). Is alcohol affect higher education students performance: searching and predicting pattern using data mining algorithms. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2991214

    Article  Google Scholar 

  • Pandey, M., & Taruna, S. (2014). A comparative study of ensemble methods for students' performance modeling. International Journal of Computer Applications, 103(8).

  • Ping, T. A. (2011). Students’ interaction in the online learning management systems: A comparative study of undergraduate and postgraduate courses. Asian Association of Open Universities Journal, 6(1), 59–73. https://doi.org/10.1108/aaouj-06-01-2011-b007

    MathSciNet  Article  Google Scholar 

  • Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence-based educational systems: Evidence from China. British Journal of Educational Technology, 51(5), 1693–1710. https://doi.org/10.1111/bjet.12994

    Article  Google Scholar 

  • Rabiee, F. (2004). Focus-group interview and data analysis. Proceedings of the Nutrition Society, 63(4), 655–660. https://doi.org/10.1079/pns2004399

    Article  Google Scholar 

  • Rafaeli, S., Ravid, G., Keren, O., Ben-Hanoch, R., Yarchi-Cohen, A., Goshen, Y., Shabtai, I., & Bar-, T. (n.d.). OnLine, Web Based Learning Environment for an Information Systems course: Access logs, Linearity and Performance.

  • Ragab, M., Abdel Aal, A. M., Jifri, A. O., & Omran, N. F. (2021). Enhancement of predicting students performance model using ensemble approaches and educational data mining techniques. Wireless Communications and Mobile Computing, 2021.

  • Ramesh, V., Parkavi, P., & Ramar, K. (2013). Predicting student performance: A statistical and data mining approach. International Journal of Computer Applications, 63(8), 35–39. https://doi.org/10.5120/10489-5242

    Article  Google Scholar 

  • Rienties, B., Toetenel, L., & Bryan, A. (2015). “Scaling up” learning design: Impact of learning design activities on LMS behavior and performance. ACM International Conference Proceeding Series, 1620-Marc, 315–319. https://doi.org/10.1145/2723576.2723600

  • Riestra-González, M., Paule-Ruíz, M. del P., & Ortin, F. (2021). Massive LMS log data analysis for the early prediction of course-agnostic student performance. Computers and Education, 163(December 2020). https://doi.org/10.1016/j.compedu.2020.104108

  • Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

    Article  Google Scholar 

  • Saadatmand, M., Uhlin, L., Hedberg, M., Åbjörnsson, L., & Kvarnström, M. (2017). Examining Learners’ interaction in an open online course through the community of inquiry framework. European Journal of Open, Distance and E-Learning, 20(1), 61–79. https://doi.org/10.1515/eurodl-2017-0004

    Article  Google Scholar 

  • Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.

    Google Scholar 

  • Sathe, M. T., & Adamuthe, A. C. (2021). Comparative study of supervised algorithms for prediction of students' performance. International Journal of Modern Education & Computer Science, 13(1).

  • Schell, J., Lukoff, B., & Alvarado, C. (2014). Using early warning signs to predict academic risk in interactive, blended teaching environments. Internet Learning, 3(2). https://doi.org/10.18278/il.3.2.5

  • Shin, D. (2021). The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human Computer Studies, 146(April 2020), 102551. https://doi.org/10.1016/j.ijhcs.2020.102551

    Article  Google Scholar 

  • Shum, S. J. B., & Luckin, R. (2019). Learning analytics and ai: Politics, pedagogy and practices. British Journal of Educational Technology, 50(6), 2785–2793.

    Article  Google Scholar 

  • Singh, B. K., Verma, K., & Thoke, A. S. (2015). Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification. International Journal of Computer Applications, 116(19).

  • Song, Y. Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130–135. https://doi.org/10.11919/j.issn.1002-0829.215044

    Article  Google Scholar 

  • Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica, 24(1), 12–18. https://doi.org/10.11613/BM.2014.003

    Article  Google Scholar 

  • Stapel, M., Zheng, Z., & Pinkwart, N. (2016). An Ensemble Method to Predict Student Performance in an Online Math Learning Environment. International Educational Data Mining Society.

  • Stemler, S. (2001). An overview of content analysis. Practical Assessment, Research and Evaluation, 7(17), 2000–2001. https://doi.org/10.1362/146934703771910080

    Article  Google Scholar 

  • Stojić, A., Stanić, N., Vuković, G., Stanišić, S., Perišić, M., Šoštarić, A., & Lazić, L. (2019). Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Science of the Total Environment, 653, 140–147. https://doi.org/10.1016/j.scitotenv.2018.10.368

    Article  Google Scholar 

  • Tanner, T., & Toivonen, H. (2010). Predicting and preventing student failure – using the k-nearest neighbour method to predict student performance in an online course environment. International Journal of Learning Technology, 5(4), 356. https://doi.org/10.1504/ijlt.2010.038772

    Article  Google Scholar 

  • Tawfik, A. A., Reeves, T. D., Stich, A. E., Gill, A., Hong, C., McDade, J., Pillutla, V. S., Zhou, X., & Giabbanelli, P. J. (2017). The nature and level of learner–learner interaction in a chemistry massive open online course (MOOC). Journal of Computing in Higher Education, 29(3), 411–431. https://doi.org/10.1007/s12528-017-9135-3

    Article  Google Scholar 

  • Tempelaar, D. T., Rienties, B., & Giesbers, B. (2015). In search for the most informative data for feedback generation: Learning analytics in a data-rich context. Computers in Human Behavior, 47, 157–167. https://doi.org/10.1016/j.chb.2014.05.038

    Article  Google Scholar 

  • Turabieh, H. (2019). Hybrid machine learning classifiers to predict student performance. 2019 2nd International Conference on New Trends in Computing Sciences, ICTCS 2019 - Proceedings. https://doi.org/10.1109/ICTCS.2019.8923093

  • Umer, R., Mathrani, A., Susnjak, T., & Lim, S. (2019). Mining activity log data to predict student's outcome in a course. In proceedings of the 2019 international conference on big data and education (pp. 52–58).

  • Vij, M. (2017). Teacher as an Agent or Barrier to Integrated Technology. Research Review International Journal of Multidisciplinary, 3085(04), 42–46.

  • Vonkova, H., Papajoanu, O., Stipek, J., & Kralova, K. (2021). Identifying the accuracy of and exaggeration in self-reports of ICT knowledge among different groups of students: The use of the overclaiming technique. Computers and Education, 164(May 2020), 104112. https://doi.org/10.1016/j.compedu.2020.104112

    Article  Google Scholar 

  • Wang, Y., Pan, Q., Liu, X., & Ding, Y. (2022). ET-MSF: A model stacking framework to identify electron transport proteins. Frontiers in Bioscience (landmark Edition), 27(1), 12–12.

    Google Scholar 

  • Widyahastuti, F., & Tjhin, V. U. (2017). Predicting students performance in final examination using linear regression and multilayer perceptron. Proceedings - 2017 10th International Conference on Human System Interactions, HSI 2017, 188–192. https://doi.org/10.1109/HSI.2017.8005026

  • Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

    Article  Google Scholar 

  • Xia, J. C., Fielder, J., & Siragusa, L. (2013). Achieving better peer interaction in online discussion forums: A reflective practitioner case study. Issues in Educational Research, 23(1), 97–113.

    Google Scholar 

  • Yağci, A., & Çevik, M. (2019). Prediction of academic achievements of vocational and technical high school (VTS) students in science courses through artificial neural networks (comparison of Turkey and Malaysia). Education and Information Technologies, 24(5), 2741–2761. https://doi.org/10.1007/s10639-019-09885-4

    Article  Google Scholar 

  • Yan, L., & Liu, Y. (2020). An ensemble prediction model for potential student recommendation using machine learning. Symmetry, 12(5), 728.

    Article  Google Scholar 

  • Yousafzai, B. K., Hayat, M., & Afzal, S. (2020). Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Education and Information Technologies, 25(6), 4677–4697. https://doi.org/10.1007/s10639-020-10189-1

    Article  Google Scholar 

  • Yu, L. C., Lee, C. W., Pan, H. I., Chou, C. Y., Chao, P. Y., Chen, Z. H., Tseng, S. F., Chan, C. L., & Lai, K. R. (2018). Improving early prediction of academic failure using sentiment analysis on self-evaluated comments. Journal of Computer Assisted Learning, 34(4), 358–365. https://doi.org/10.1111/jcal.12247

    Article  Google Scholar 

  • Yu, T., & Jo, I. H. (2014). Educational technology approach toward learning analytics: Relationship between student online behavior and learning performance in higher education. ACM International Conference Proceeding Series, 269–270. https://doi.org/10.1145/2567574.2567594

  • Yu, R., Li, Q., Fischer, C., Doroudi, S., & Xu, D. (2020a). Towards accurate and fair prediction of college success: Evaluating different sources of student data. Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020a), Edm, 292–301.

  • Yu, X., Zhou, J., Zhao, M., Yi, C., Duan, Q., Zhou, W., & Li, J. (2020b). Exploiting XG boost for predicting enhancer-promoter interactions. Current Bioinformatics, 15(9), 1036–1045.

    Article  Google Scholar 

  • Zacharis, N. Z. (2015). A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet and Higher Education, 27, 44–53. https://doi.org/10.1016/j.iheduc.2015.05.002

    Article  Google Scholar 

  • Zhang, Y., Wang, Y., Gao, M., Ma, Q., Zhao, J., Zhang, R., ..., & Huang, L. (2019). A predictive data feature exploration-based air quality prediction approach. IEEE Access, 7, 30732-30743.

  • Zydney, J. M., Denoyelles, A., & Kyeong-JuSeo, K. (2012). Creating a community of inquiry in online environments: An exploratory study on the effect of a protocol on interactions within asynchronous discussions. Computers and Education, 58(1), 77–87. https://doi.org/10.1016/j.compedu.2011.07.009

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation (NRF), Korea, under the project BK21 FOUR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyeoncheol Kim.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Features and related literature

Appendix. Features and related literature

Table 8 shows the features that affect student performance and studies in which the features are used. The abbreviated form of the feature names was partially modified to clarify the meaning of each feature (for example, “Medu” was changed to “MotherEducation”). If the feature names used in each study were different for features with the same meaning, they were merged under one name (for example, “low income” and “income” were merged under “income”).

Table 8 Features that may have an impact on a student’s performance and the related literature

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jang, Y., Choi, S., Jung, H. et al. Practical early prediction of students’ performance using machine learning and eXplainable AI. Educ Inf Technol (2022). https://doi.org/10.1007/s10639-022-11120-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10639-022-11120-6

Keywords

  • Learning performance prediction
  • Early Prediction
  • Artificial intelligence in education
  • Educational data mining
  • Explainable AI in education