A Class-Cluster k-Nearest Neighbors Method for Temporal In-Trouble Student Identification
Temporal in-trouble student identification is a classification task at the program level that predicts a final study status of a current student at the end of his/her study time using the data gathered from the students in the past. Moreover, this task focuses on correct predictions for the in-trouble students whose predicted labels are at the lowest performance level. Educational datasets in this task have many challenging characteristics such as multiple classes, overlapping, and imbalance. Simultaneously handling these characteristics has not yet been investigated in educational data mining. For the existing general-purpose works, their methods are not straightforwardly applicable to the educational datasets. Therefore, in this paper, a novel method is proposed as an effective solution to the previously defined task. Combining the traditional k-nearest neighbors and clustering ensemble methods, our method is designed with three new features: relax the number k of the nearest neighbors, use a set of the cluster-based neighbors newly generated by partitioning the subspace of each class, and set four new criteria to decide a final class label rather than the majority voting scheme. As a result, it is a new lazy learning method able to provide correct predictions of more instances belonging to a positive minority class. In an empirical evaluation, higher Accuracy, Recall, and F-measure confirmed the effectiveness of our method as compared to some popular methods on our two real educational datasets and the benchmarking “Iris” dataset.
KeywordsStudent classification Educational data mining k-nearest neighbors Clustering ensemble Fisher’s discriminant ratio Data imbalance
This research is funded by Vietnam National University Ho Chi Minh City, Vietnam, under grant number C2017-20-18.
- 1.Academic Affairs Office: Ho Chi Minh City University of Technology, Vietnam. http://www.aao.hcmut.edu.vn. Accessed 29 June 2017
- 2.Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)Google Scholar
- 4.Chujai, P., Chomboon, K., Chaiyakhan, K., Kerdprasop, K., Kerdprasop, N.: A cluster based classification of imbalanced data with overlapping regions between classes. In: Proceedings of the International Multi-Conference of Engineers and Computer Scientists I, pp. 1–6 (2017)Google Scholar
- 6.Das, B., Krishnan, N.C., Cook, D.J.: Handling class overlap and imbalance to detect prompt situations in smart homes. In: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 1–8 (2013)Google Scholar
- 9.Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)Google Scholar
- 10.Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innov. 382, 401–410 (2012)Google Scholar
- 12.Livieris, I.E., Drakopoulou, K., Tampakas, V.T., Mikropoulos, T.A., Pintelas, P.: Predicting secondary school students’ performance utilizing a semi-supervised learning approach. J. Educ. Comput. Res. (2018)Google Scholar
- 14.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics Probability, vol. 1, pp. 281–297 (1967)Google Scholar
- 16.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)Google Scholar
- 18.Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)Google Scholar