Mexican International Conference on Artificial Intelligence

Advances in Artificial Intelligence and Its Applications pp 208-219 | Cite as

Applying Data Mining Techniques to Identify Success Factors in Students Enrolled in Distance Learning: A Case Study

  • José Gerardo Moreno Salinas
  • Christopher R. Stephens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9414)


Distance learning is now a key component in higher level education. Given the high dropout rates and the important investments in distance learning it is of utmost concern to determine the most critical data in the success and failure of students. In this article we data mine enrollment profiles, educational background and students´ data from the Open University System and Distance Learning of the National Autonomous University of Mexico to determine the key factors that drive success and failure, creating a relevant predictive model using a Naive Bayes classifier. We have found that the number of subjects approved and their average qualification in the first semester are part of the most interesting predictors of student success.


Distance learning Keys to success Data mining Naive Bayes classifier 


  1. 1.
    Zhao, C.-M., Luan, J.: Data mining: going beyond traditional statistics. New Dir. Inst. Res. 131(2), 7–16 (2006)Google Scholar
  2. 2.
    Yukselturk, E., Ozekes, S., Türel, Y.: Predicting dropout student: an application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 17(1), 118–133 (2014)Google Scholar
  3. 3.
    Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., Loumos, V.: Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput. Educ. 53(3), 950–965 (2009)CrossRefGoogle Scholar
  4. 4.
    Willging, P.A., Johnson, S.D.: Factors that influence students’ decision to dropout of online courses. J. Asynchronous Learn. Netw. 13(3), 115–127 (2004)Google Scholar
  5. 5.
    Lile, A.: Analyzing E-learning systems using educational data mining techniques. Mediterranean J. Soc. Sci. 2(3), 403–419 (2011)Google Scholar
  6. 6.
    Kotsiantis, S., Pierrakeas, C., Pintelas, P.: Preventing student dropout in distance learning using machine learning techniques. Knowl.-Based Intell. Inf. Eng. Syst. 2774, 267–274 (2003)Google Scholar
  7. 7.
    Zang, W., Lin, F.: Investigation of web-based teaching and learning by boosting algorithms. In: Proceedings of IEEE International Conference on Information Technology: Research and Education (ITRE 2003), pp. 445–449 (2003)Google Scholar
  8. 8.
    Dekker, G., Pechenizkiy, M., Vleeshouwers, J.: Predicting student drop out: a case study. In: Barnes, T., Desmarais, M., Romero, C., Ventura, S. (eds.), Proceedings of the 2nd International Conference on Educational Data Mining, (EDM 2009), pp. 41–50 (2009)Google Scholar
  9. 9.
    Stephens, C.R., Heau, J.G., González, C., Ibarra-Cerdeña, C.N., Sánchez-Cordero, V., et al.: Using biotic interaction networks for prediction in biodiversity and emerging diseases. PLoS ONE 4(5), e5725 (2009). doi:10.1371/journal.pone.0005725 CrossRefGoogle Scholar
  10. 10.
    Mitchell, T., Machine Learning, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression (Draft Version). McGraw Hill (2005)Google Scholar
  11. 11.
    Swet, J.A.: Measuring the accuracy of diagnostic systems. Science 240, 1285–1293 (1988)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • José Gerardo Moreno Salinas
    • 1
  • Christopher R. Stephens
    • 2
  1. 1.Coordinación de Universidad Abierta y Educación a Distancia (CUAED) – UNAMCoyoacánMexico
  2. 2.Centro de Ciencias de la Complejidad (C3) e Instituto de Ciencias Nucleares – UNAMMexico CityMexico

Personalised recommendations