Abstract
Students’ academic performance prediction is one of the most important applications of Educational Data Mining (EDM) that helps to improve the quality of the education process. The attainment of student outcomes in an Outcome-based Education (OBE) system adds invaluable rewards to facilitate corrective measures to the learning processes. Furthermore, the explosive increase of e-learning platforms generates a large volume of data that demands the extraction of useful information using up-to-date techniques. Keeping this view in mind and to check the impact of various features on student outcomes during online classes, we have analyzed two sets of datasets; the Kalboard 360 dataset (a larger dataset) that contains academic, demographic as well as behavioral features which have been observed and recorded during the classes held and a local Institute dataset that does not acquire behavioral features. To achieve this, we have selected a few machine learning algorithms such as Decision Tree (J48), Naïve Bayes (NB), Random Forest (RF), and Multilayer Perceptron (MLP) to classify the students, along with a few filter-based feature selection methods like Info gain, gain ratio, and correlation features have been applied to select the key attributes. Finally, we have fine-tuned the learning parameters of MLP called “Opt-MLP” to get an optimized output and compared it with other classification models. Our experimental results conclude that Opt-MLP proves its superiority over other classification models by predicting an accuracy of 87.14% without the feature selection (WOFS) and 90.74% accuracy with the feature selection (WFS) method for data set 1 and an accuracy of 79.37% without feature selection and 97.08% with feature selection for dataset 2. But, when the students’ behavioral feature is considered along with other features, the RF model provides 100% accuracy justifying that students’ behavior during class hours has a great impact on attaining the students’ outcomes.
Similar content being viewed by others
Data availability
1. The datasets generated and analysed during the current study are available in the Kaggle repository. https://www.kaggle.com/datasets/shaikvaheed91/kalboard360
2. The datasets generated and analysed during the current study are available in the Kaggle repository by second author name. https://www.kaggle.com/datasets/shaikvaheed91/griet2021
3. The datasets generated and analysed during the current study are available in the GitHub repository by second author name. https://github.com/vaheed4274/Student-Perforamance-Analyzer
References
Anoop Kumar, M., & Md Zubair Rahman, A. M. J. (2016). A review on data mining techniques and factors used in educational data mining to predict student amelioration (2016). Proc. 2016 IEEE Int. Conf. on Data Min. Adv. Comput. SAPIENCE 2016, 122–133.
Ahmad, Ahmadi, et al. (2023). Prediction of academic motivation based on variables of personality traits, academic self-efficacy, academic alienation and social support in paramedical students. Community Health Equity Research & Policy, 43(2), 195–201. https://doi.org/10.1177/0272684X211004948
Bradley, P., Fayyad, U., & Renia, C. (1999). Scaling EM clustering to large databases. Technical Report. Microsoft Research, Redmond, WA 98052, USA, MSR-TR-98-35.
Burcu A. M. (2013). A path model for analyzing undergraduate students’ achievement. Journal of WEI Business and Economics, 2(3), 1–7.
Cerezo, R., Esteban, M., Sánchez-Santillán, M., & Núñez, J. C. (2017). Procrastinating behavior in computer-based learning environments to predict performance: A case study in Moodle. Frontiers In Psychology, 8, 1403.
Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access: Practical Innovations, Open Solutions, 5, 15991–16005.
El-Halees, A. (2008). Mining students data to analyze learning behavior: a case study. The 2008 international Arab Conference of Information Technology (ACIT2008) – Conference Proceedings, University of Sfax, Tunisia, Dec 15–18.
Elvers, G. C., Polzella, D. J., & Graetz, K. (2003). Procrastination in online courses: Performance and attitudinal differences. Teaching of Psychology, 30(2), 159–162.
Gopika, N., & Kowshalaya M.E., A. M. (2018). Correlation based feature selection algorithm for machine learning. 2018 3rd International Conference on Communication and Electronics Systems (ICCES) (pp 692–695). https://doi.org/10.1109/CESYS.2018.8723980
Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V. V., Gutica, M., Hynninen, T., Knutas, A., Leinonen, J., Messom, C., & Liao, S. N. (2018). Predicting Academic performance: A systematic literature review (pp. 175–199). ACM. https://doi.org/10.1145/3293881.3295783
Kalboard360E-learning system (2015). http://cloud.kalboard360.com/User/Login#home/index/. Accessed 31 July 2015.
Khan, A., & Ghosh, S. K. (2021). Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Education and Information Technologies, 26, 205–240.
Kotsiantis, S. (2009). Educational data mining: A case study for predicting dropout-prone students. Int Journal of Knowledge Engineering and Soft Data Paradigm, 1, 101–111.
Kotsiantis, S., Patriarcheas, K., & Xenos, M. (2010). A Combinational incremental ensemble of classifiers as a technique for predicting student’s performance in distance education. Knowledge Based Systems, 23(6), 529–535. https://doi.org/10.1016/j.knosys.2010.03.010
Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, 103, 1–15.
Michinov, N., Brunot, S., Le Bohec, O., Juhel, J., & Delaval, M. (2011). Procrastination, participation, and performance in online learning environments. Computers & Education, 56, 243–252.
Nti, I . K., Sam, S. A., Bediako-Kyeremeh, B., et al. (2021) Predicting Students Academic Performance Using Machine Learning Algorithms (MLAs). Journal of Computer in Education, 9 (1-2). https://doi.org/10.1007/s40692-021-00201-z
Oshodi, O. S., Aluko, R. O., Daniel, E. I., Aigbavboa, C. O., & Abisuga, A. O. (2018). Towards reliable prediction of academic performance of architecture students using data mining techniques. Journal of Engineering Design and Technology, 16(3), 385–397.
Owusu-Boadu, B. et al. (2021). Academic performance modelling with machine learning based on cognitive and non-cognitive features. Applied Computer Systems, (2), 122–131. https://doi.org/10.2478/acss-2021-0015
Sk. Vaheed, R. P., Singh, P., Nayak, C., & Mallikarjuna Rao (2022). Students’ Academic Performance Prediction using Ensemble methods through educational data mining. In Proceedings of Smart Intelligent Computing and Applications (Vol. 1, pp. 215–224).
Verma, C., Stoffová, V., Illes, Z., et al. (2020a). Machine learning-based student native place identification for real-time. IEEE Access: Practical Innovations, Open Solutions, 8, 130840–130854.
Verma, C., Illes, Z., & Stoffova, V. (2020b). Study level prediction of Indian and Hungarian students towards ICT and mobile technology for the realtime. In Proc. Int. Conf. Comput., Autom. Knowl. Manage. (ICCAKM), pp. 219–223. https://doi.org/10.1109/iccakm46823.2020.9051551
Verma, C., Illés, Z., & Sttofová, V. (2020c). Real-time classification of national and international students for ICT and mobile technology: An experimental study on Indian and Hungarian University. Journal of Physics: Conference Series. 1432, Art. no. 012091. https://doi.org/10.1088/1742-6596/1432/1/012091
Verma, C., Stoffova, V., & Illes, Z. (2020d). Ensemble methods to predict the locality scope of Indian and Hungarian students for the real-time. In Advances in Intelligent Systems and Computing, Odisha, India (pp. 1–13).
Verma, C., Tarawneh, A. S., Illes, Z., Stoffova, V., & Dahiya, S. (2018). Gender prediction of the European school’s teachers using machine learning: Preliminary results. In Proc. IEEE 8th Int. Advance Comput. Conf. (IACC), Dec. pp. 213–220. https://doi.org/10.1109/iadcc.2018.8692100
Verma, C., Illes, Z., & Stoffova, V. (2019). Age group predictive models for the real-time prediction of the University students using machine learning: Preliminary results. In Proc. IEEE Int. Conf. Electr., Comput. Commun. Technol. (ICECCT), pp. 1–7. https://doi.org/10.1109/icecct.2019.8869136
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
-
Confusion Matrix: It presents the complete performance of the model and presents output in a matrix form by calculating the actual values and predicted values as shown in Table 1.
Example of a Confusion Matrix
-
True positive is defined as the predicted value equal to the actual value and both are positive.
-
True negative is defined as the predicted value equal to the actual value, but both are negative.
-
False-positive presents the difference between the predicted value and actual value. The model can predict a positive value but the actual value is negative.
-
False-negative implies that the predicted value is negative, but the actual output is positive.
-
-
Accuracy: It is a ratio between numbers of correctly predicted instances to the total number of instances. It is given by.
-
Precision: It is a ratio between correctly classified positive values to total predicted positive values.
-
Recall: Recall is a measure of the ratio of correctly predicted positive classes to all classes in the actual class.
-
Root Mean square error (RMSE): It is expressed as the square root of the mean of all the errors i.e. the square difference between actual values and predicted values.
-
ROC area: Receiver Operating Characteristics (ROC) is one of the most used parameters for evaluating an ML model. It is a graph between the true positive rate (TPR) and the false-positive rate (FPR). The area under this curve is called the ROC area.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nayak, P., Vaheed, S., Gupta, S. et al. Predicting students’ academic performance by mining the educational data through machine learning-based classification model. Educ Inf Technol 28, 14611–14637 (2023). https://doi.org/10.1007/s10639-023-11706-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-11706-8