A Class-Cluster k-Nearest Neighbors Method for Temporal In-Trouble Student Identification

  • Chau VoEmail author
  • Hua Phung NguyenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11431)


Temporal in-trouble student identification is a classification task at the program level that predicts a final study status of a current student at the end of his/her study time using the data gathered from the students in the past. Moreover, this task focuses on correct predictions for the in-trouble students whose predicted labels are at the lowest performance level. Educational datasets in this task have many challenging characteristics such as multiple classes, overlapping, and imbalance. Simultaneously handling these characteristics has not yet been investigated in educational data mining. For the existing general-purpose works, their methods are not straightforwardly applicable to the educational datasets. Therefore, in this paper, a novel method is proposed as an effective solution to the previously defined task. Combining the traditional k-nearest neighbors and clustering ensemble methods, our method is designed with three new features: relax the number k of the nearest neighbors, use a set of the cluster-based neighbors newly generated by partitioning the subspace of each class, and set four new criteria to decide a final class label rather than the majority voting scheme. As a result, it is a new lazy learning method able to provide correct predictions of more instances belonging to a positive minority class. In an empirical evaluation, higher Accuracy, Recall, and F-measure confirmed the effectiveness of our method as compared to some popular methods on our two real educational datasets and the benchmarking “Iris” dataset.


Student classification Educational data mining k-nearest neighbors Clustering ensemble Fisher’s discriminant ratio Data imbalance 



This research is funded by Vietnam National University Ho Chi Minh City, Vietnam, under grant number C2017-20-18.


  1. 1.
    Academic Affairs Office: Ho Chi Minh City University of Technology, Vietnam. Accessed 29 June 2017
  2. 2.
    Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)Google Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  4. 4.
    Chujai, P., Chomboon, K., Chaiyakhan, K., Kerdprasop, K., Kerdprasop, N.: A cluster based classification of imbalanced data with overlapping regions between classes. In: Proceedings of the International Multi-Conference of Engineers and Computer Scientists I, pp. 1–6 (2017)Google Scholar
  5. 5.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)CrossRefGoogle Scholar
  6. 6.
    Das, B., Krishnan, N.C., Cook, D.J.: Handling class overlap and imbalance to detect prompt situations in smart homes. In: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 1–8 (2013)Google Scholar
  7. 7.
    Fernández, A., García, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Ho, T., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24, 289–300 (2002)CrossRefGoogle Scholar
  9. 9.
    Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)Google Scholar
  10. 10.
    Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innov. 382, 401–410 (2012)Google Scholar
  11. 11.
    Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst. Appl. 98, 72–83 (2018)CrossRefGoogle Scholar
  12. 12.
    Livieris, I.E., Drakopoulou, K., Tampakas, V.T., Mikropoulos, T.A., Pintelas, P.: Predicting secondary school students’ performance utilizing a semi-supervised learning approach. J. Educ. Comput. Res. (2018)Google Scholar
  13. 13.
    López, V., Fernández, A., Moreno-Torres, J.G., Herrera, F.: Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 39, 6585–6608 (2012)CrossRefGoogle Scholar
  14. 14.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  15. 15.
    Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)CrossRefGoogle Scholar
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)Google Scholar
  17. 17.
    Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)CrossRefGoogle Scholar
  18. 18.
    Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)Google Scholar
  19. 19.
    Vorraboot, P., Rasmequan, S., Chinnasarn, K.: Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152, 429–443 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Ho Chi Minh City University of TechnologyVietnam National UniversityHo Chi Minh CityVietnam

Personalised recommendations