Predicting Student Performance in Distance Higher Education Using Active Learning

  • Georgios Kostopoulos
  • Anastasia-Dimitra Lipitakis
  • Sotiris Kotsiantis
  • George Gravvanis
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 744)


Students’ performance prediction in higher education has been identified as one of the most important research problems in machine learning. Educational data mining constitutes an important branch of machine learning trying to effectively analyze students’ academic behavior and predict their performance. Over recent years, several machine learning methods have been effectively used in the educational field with remarkable results, and especially supervised classification methods. The early identification of in case fail students is of utmost importance for the academic staff and the universities. In this paper, we investigate the effectiveness of active learning methodologies in predicting students’ performance in distance higher education. As far as we are aware of there exists no study dealing with the implementation of active learning methodologies in the educational field. Several experiments take place in our research comparing the accuracy measures of familiar active learners and demonstrating their efficiency by the exploitation of a small labeled dataset together with a large pool of unlabeled data.


Distance higher education Performance prediction Unlabeled data Pool-based active learning Uncertainty sampling query 


  1. 1.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)Google Scholar
  2. 2.
    Dasgupta, S.: Two faces of active learning. Theor. Comput. Sci. 412(19), 1767–1781 (2011)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmos. Environ. 32(14), 2627–2636 (1998)CrossRefGoogle Scholar
  4. 4.
    Hodges, J.L., Lehmann, E.L.: Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33(2), 482–497 (1962)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRefGoogle Scholar
  6. 6.
    Huang, S., Fang, N.: Predicting student academic performance in an engineering dynamics course: a comparison of four types of predictive mathematical models. Comput. Educ. 61, 133–145 (2013)CrossRefGoogle Scholar
  7. 7.
    Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Advances in Neural Information Processing Systems, pp. 892–900 (2010)Google Scholar
  8. 8.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  9. 9.
    Koprinska, I., Stretton, J., Yacef, K.: Students at risk: detection and remediation. In: Educational Data Mining (2015)Google Scholar
  10. 10.
    Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Predicting student performance in distance higher education using semi-supervised techniques. In: Bellatreche, L., Manolopoulos, Y. (eds.) MEDI 2015. LNCS, vol. 9344, pp. 259–270. Springer, Cham (2015). doi:10.1007/978-3-319-23781-7_21 CrossRefGoogle Scholar
  11. 11.
    Kotsiantis, S., Patriarcheas, K., Xenos, M.: A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowl. Syst. 23(6), 529–535 (2010)CrossRefGoogle Scholar
  12. 12.
    Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 4(4), 313–326 (2014)Google Scholar
  13. 13.
    Leng, Y., Xu, X., Qi, G.: Combining active learning and semi-supervised learning to construct SVM classifier. Knowl. Syst. 44, 121–131 (2013)CrossRefGoogle Scholar
  14. 14.
    Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. IJCAI 3, 519–524 (2003)Google Scholar
  15. 15.
    Luna, J.M., Castro, C., Romero, C.: MDM tool: a data mining framework integrated into Moodle. Comput. Appl. Eng. Educ. 25(1), 90–102 (2017)CrossRefGoogle Scholar
  16. 16.
    Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv. Neural. Inf. Process. Syst. 2, 841–848 (2002)Google Scholar
  17. 17.
    Noaman, A.Y., Luna, J.M., Ragab, A.H., Ventura, S.: Recommending degree studies according to students’ attitudes in high school by means of subgroup discovery. Int. J. Comput. Intell. Syst. 9(6), 1101–1117 (2016)CrossRefGoogle Scholar
  18. 18.
    Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines, Microsoft Research. Technical report MSR-TR-98-14 (1998)Google Scholar
  19. 19.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (1993)Google Scholar
  20. 20.
    Ramirez-Loaiza, M.E., Sharma, M., Kumar, G., Bilgic, M.: Active learning: an empirical study of common baselines. Data Min. Knowl. Discov. 31, 1–27 (2016)MathSciNetGoogle Scholar
  21. 21.
    Reyes, O., Pérez, E., del Carmen Rodrıguez-Hernández, M., Fardoun, H.M., Ventura, S.: JCLAL: a Java framework for active learning. J. Mach. Learn. Res. 17(95), 1–5 (2016)MathSciNetMATHGoogle Scholar
  22. 22.
    Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)CrossRefGoogle Scholar
  23. 23.
    Romero, C., Ventura, S.: Educational data mining a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev) 40(6), 601–618 (2010)CrossRefGoogle Scholar
  24. 24.
    Santana, M.A., Costa, E.B., Fonseca, B., Rego, J., de Araújo, F.F.: Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Comput. Hum. Behav. 73, 247–256 (2017)CrossRefGoogle Scholar
  25. 25.
    Settles, B.: Active learning literature survey. University of Wisconsin, Madison, vol. 52, pp. 55–66 (2010) 11 p.Google Scholar
  26. 26.
    Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Simpson, O.: Predicting student success in open and distance learning. Open Learn. 21(2), 125–138 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Slater, S., Joksimović, S., Kovanovic, V., Baker, R.S., Gasevic, D.: Tools for educational data mining a review. J. Educ. Behav. Stat. 42, 85–106 (2016)Google Scholar
  29. 29.
    Smola, A., Vishwanathan, S.V.N.: Introduction to Machine Learning. Press syndicate of the University of Cambridge, Cambridge (2008)Google Scholar
  30. 30.
    Sullare, V.A., Thakur, R.S., Mishra, B.: Analysis of student performance using mining technique: a review. Artif. Intell. Syst. Mach. Learn. 8(3), 94–97 (2016)Google Scholar
  31. 31.
    Xing, W., Guo, R., Petakovic, E., Goggins, S.: Participation-based student final performance prediction model through interpretable genetic programming: integrating learning analytics, educational data mining and theory. Comput. Hum. Behav. 47, 168–181 (2015)CrossRefGoogle Scholar
  32. 32.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)CrossRefGoogle Scholar
  33. 33.
    Zhang, H.: The optimality of naive bayes. AA 1(2), 3 (2004)Google Scholar
  34. 34.
    Zhou, Z.-H.: Learning with unlabeled data and its application to image retrieval. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS, vol. 4099, pp. 5–10. Springer, Heidelberg (2006). doi:10.1007/978-3-540-36668-3_3 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Georgios Kostopoulos
    • 1
  • Anastasia-Dimitra Lipitakis
    • 3
  • Sotiris Kotsiantis
    • 1
    • 4
  • George Gravvanis
    • 2
    • 4
  1. 1.Educational Software Development Laboratory (ESDLab), Department of MathematicsUniversity of PatrasPatrasGreece
  2. 2.Department of Electrical and Computer Engineering, School of EngineeringDemocritus University of ThraceXanthiGreece
  3. 3.Department of Informatics and TelematicsHarokopio University of AthensKallitheaGreece
  4. 4.Hellenic Open UniversityPatrasGreece

Personalised recommendations