Predicting Student Performance in Distance Higher Education Using Semi-supervised Techniques

  • Georgios KostopoulosEmail author
  • Sotiris Kotsiantis
  • Panagiotis Pintelas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9344)


Students’ performance prediction in distance higher education has been widely researched over the past decades. Machine learning techniques and especially supervised learning have been used in numerous studies to identify in time students that are possible to fail in final exams. The identification of in case failure as soon as possible, could lead the academic staff to develop learning strategies aiming to improve students’ overall performance. In this paper, we investigate the effectiveness of semi-supervised techniques in predicting students’ performance in distance higher education. Several experiments take place in our research comparing to the accuracy measures of familiar semi-supervised algorithms. As far as, we are aware various researches deal with students’ performance prediction in distance learning by using machine learning techniques and especially supervised methods, but none of them investigate the effectiveness of semi-supervised algorithms. Our results confirm the advantage of semi-supervised methods and especially the satisfactory performance of Tri-Training algorithm.


Distance higher education Performance prediction Semi-supervised learning Tri-training C4.5 decision tree 


  1. 1.
    Adhatrao, K., Gaykar, A., Dhawan, A., Jha, R., Honrao, V.: Predicting students’ performance using ID3 and C4.5 classification algorithms. Int. J. Data Min. Knowl. Manage. Process 3(5), 39–52 (2013)CrossRefGoogle Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: 11th Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)Google Scholar
  3. 3.
    Cardie, C., Ng, V.: Weakly supervised natural language learning without redundant views. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 94–101. Association for Computational Linguistics (2003)Google Scholar
  4. 4.
    Deng, C., Guo, M.-Z.: Tri-training and data editing based semi-supervised clustering algorithm. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 641–651. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  6. 6.
    Huang, S., Fang, N.: Predicting student academic performance in an engineering dynamics course: a comparison of four types of predictive mathematical models. Comput. Educ. 61, 133–145 (2013)CrossRefGoogle Scholar
  7. 7.
    Kabakchieva, D.: Predicting student performance by using data mining methods for classification. Cybern. Inf. Technol. 13(1), 61–72 (2013)Google Scholar
  8. 8.
    Kotsiantis, S., Pierrakeas, C., Pintelas, P.: Predicting students’ performance in distance learning using machine learning. Appl. Artif. Intell. 18(5), 411–426 (2004)CrossRefGoogle Scholar
  9. 9.
    Kovacic, Z.: Early prediction of student success: mining students’ enrolment data. In: Proceedings of Informing Science and IT Education Conference (InSITE), pp. 647–665 (2010)Google Scholar
  10. 10.
    Mashiloane, L., Mchunu, M.: Mining for marks: a comparison of classification algorithms when predicting academic performance to identify “students at risk”. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 541–552. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Mihalcea, R.: Co-training and self-training for word sense disambiguation. In: Proceedings of the Conference on Computational Natural Language Learning (2004)Google Scholar
  12. 12.
    Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)Google Scholar
  13. 13.
    Navarro, P., Shoemaker, J.: Performance and perceptions of distance learners in cyberspace. Am. J. Distance Educ. 14(2), 15–35 (2000)CrossRefGoogle Scholar
  14. 14.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (1993)Google Scholar
  15. 15.
    Rokach, L.: Data Mining with Decision Trees: Theory and Applications. World scientific, Singapore (2007)Google Scholar
  16. 16.
    Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World scientific, Singapore (2015)Google Scholar
  17. 17.
    Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)CrossRefGoogle Scholar
  18. 18.
    Ruggieri, S.: Efficient C4.5 classification algorithm. IEEE Trans. Knowl. Data Eng. 14(2), 438–444 (2002)CrossRefGoogle Scholar
  19. 19.
    Simpson, O.: Predicting student success in open and distance learning. Open Learn. 21(2), 125–138 (2006)CrossRefGoogle Scholar
  20. 20.
    Wang, J., Luo, S.W., Zeng, X.H.: A random subspace method for co-training. In: IEEE International Joint Conference on Neural Networks, pp. 195–200. IEEE (2008)Google Scholar
  21. 21.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Association for Computational Linguistics (1995)Google Scholar
  22. 22.
    Yaslan, Y., Cataltepe, Z.: Co-training with relevant random subspaces. Neurocomputing 73(10), 1652–1661 (2010)CrossRefGoogle Scholar
  23. 23.
    Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)CrossRefGoogle Scholar
  24. 24.
    Zhou, Y., Goldman, S.: Democratic co-learning. In: ICTAI 2004, pp. 594–602. IEEE (2004)Google Scholar
  25. 25.
    Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Georgios Kostopoulos
    • 1
    Email author
  • Sotiris Kotsiantis
    • 1
  • Panagiotis Pintelas
    • 1
  1. 1.Educational Software Development Laboratory (ESDLab), Department of MathematicsUniversity of PatrasPatrasGreece

Personalised recommendations