A Prognosis of Junior High School Students’ Performance Based on Active Learning Methods

  • Georgios Kostopoulos
  • Sotiris Kotsiantis
  • Vassilios S. Verykios
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10512)


In recent years, there is a growing research interest in applying data mining techniques in education. Educational Data Mining has become an efficient tool for teachers and educational institutions trying to effectively analyze the academic behavior of students and predict their progress and performance. The main objective of this study is to classify junior high school students’ performance in the final examinations of the “Geography” module in a set of five pre-defined classes using active learning. The exploitation of a small set of labeled examples together with a large set of unlabeled ones to build efficient classifiers is the key point of the active learning framework. To the best of our knowledge, no study exist dealing with the implementation of active learning methods for predicting students’ performance. Several assessment attributes related to students’ grades in homework assignments, oral assessment, short tests and semester exams constitute the dataset, while a number of experiments are carried out demonstrating the advantage of active learning compared to familiar supervised methods, such as the Naïve Bayes classifier.


Pool-based active learning Uncertainty sampling strategy Prediction Student performance Junior high school 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefGoogle Scholar
  2. 2.
    Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance (2008)Google Scholar
  3. 3.
    Dasgupta, S.: Two faces of active learning. Theoretical Computer Science 412(19), 1767–1781 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Hady, M.F.A., Schwenker, F.: Combining committee-based semi-supervised learning and active learning. Journal of Computer Science and Technology 25(4), 681–698 (2010)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Hodges, J.L., Lehmann, E.L.: Rank methods for combination of independent experiments in analysis of variance. The Annals of Mathematical Statistics 33(2), 482–497 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)CrossRefGoogle Scholar
  7. 7.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  8. 8.
    Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Predicting student performance in distance higher education using semi-supervised techniques. In: Bellatreche, L., Manolopoulos, Y. (eds.) MEDI 2015. LNCS, vol. 9344, pp. 259–270. Springer, Cham (2015). doi: 10.1007/978-3-319-23781-7_21 CrossRefGoogle Scholar
  9. 9.
    Leng, Y., Xu, X., Qi, G.: Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems 44, 121–131 (2013)CrossRefGoogle Scholar
  10. 10.
    Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. IJCAI 3, 519–524 (2003)Google Scholar
  11. 11.
    Livieris, I.E., Mikropoulos, T.A., Pintelas, P.: A decision support system for predicting students’ performance. Themes in Science and Technology Education 9(1), 43–57 (2016)Google Scholar
  12. 12.
    Mamitsuka, N.A.H.: Query learning strategies using boosting and bagging. In: Machine Learning: Proceedings of the Fifteenth International Conference (ICML 1998), vol. 1. Morgan Kaufmann Pub. (1998)Google Scholar
  13. 13.
    Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence 38(3), 315–330 (2013)CrossRefGoogle Scholar
  14. 14.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)zbMATHGoogle Scholar
  15. 15.
    Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998)Google Scholar
  16. 16.
    Ramirez-Loaiza, M.E., Sharma, M., Kumar, G., Bilgic, M.: Active learning: an empirical study of common baselines. Data Mining and Knowledge Discovery, 1–27 (2016)Google Scholar
  17. 17.
    Reyes, O., Pérez, E., del Carmen Rodriguez-Hernández, M., Fardoun, H.M., Ventura, S.: JCLAL: a Java framework for active learning. Journal of Machine Learning Research 17(95), 1–5 (2016)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Romero, C., Ventura, S.: Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3(1), 12–27 (2013)Google Scholar
  19. 19.
    Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)Google Scholar
  20. 20.
    Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sharma, M., Bilgic, M.: Evidence-based uncertainty sampling for active learning. Data Mining and Knowledge Discovery 31(1), 164–202 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Slater, S., Joksimović, S., Kovanovic, V., Baker, R.S., Gasevic, D.: Tools for Educational Data Mining A Review. Journal of Educational and Behavioral Statistics (2016)Google Scholar
  24. 24.
    Stapel, M., Zheng, Z., Pinkwart, N.: An ensemble method to predict student performance in an online math learning environment. In: Proceedings of the 9th International Conference on Educational Data Mining, International Educational Data Mining Society, pp. 231–238 (2016)Google Scholar
  25. 25.
    Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems 42(2), 245–284 (2015)CrossRefGoogle Scholar
  26. 26.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)Google Scholar
  27. 27.
    Zhang, H.: The optimality of naive Bayes. AA 1(2), 3 (2004)Google Scholar
  28. 28.
    Zhou, Z.-H.: Learning with unlabeled data and its application to image retrieval. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS, vol. 4099, pp. 5–10. Springer, Heidelberg (2006). doi: 10.1007/978-3-540-36668-3_3 CrossRefGoogle Scholar
  29. 29.
    Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on The Continuum From Labeled To Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Georgios Kostopoulos
    • 1
  • Sotiris Kotsiantis
    • 1
  • Vassilios S. Verykios
    • 2
  1. 1.Educational Software Development Laboratory (ESDLab), Department of MathematicsUniversity of PatrasPatrasGreece
  2. 2.Hellenic Open UniversityPatrasGreece

Personalised recommendations