Early Prediction and Variable Importance of Certificate Accomplishment in a MOOC

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10254)


The emergence of MOOCs (Massive Open Online Courses) makes available big amounts of data about students’ interaction with online educational platforms. This allows for the possibility of making predictions about future learning outcomes of students based on these interactions. The prediction of certificate accomplishment can enable the early detection of students at risk, in order to perform interventions before it is too late. This study applies different machine learning techniques to predict which students are going to get a certificate during different timeframes. The purpose is to be able to analyze how the quality metrics change when the models have more data available. From the four machine learning techniques applied finally we choose a boosted trees model which provides stability in the prediction over the weeks with good quality metrics. We determine the variables that are most important for the prediction and how they change during the weeks of the course.


Educational data mining Learning Analytics Prediction Machine learning MOOCs 



Work partially funded by the Madrid Regional Government with grant No. S2013/ICE-2715, the Spanish Ministry of Economy and Competitiveness projects RESET (TIN2014-53199-C3-1-R) and Flexor (TIN2014-52129-R) and the European Erasmus+ projects MOOC Maker (561533-EPP-1-2015-1-ES-EPPKA2-CBHE-JP) and SHEILA (562080-EPP-1-2015-BE-EPPKA3-PI-FORWARD). This research work was made possible thanks to Universidad Autónoma de Madrid, which provided us with the dataset, and to Prof. Pedro García, who was the instructor of the selected MOOC.


  1. 1.
    Aguiar, E., Lakkaraju, H., Bhanpuri, N., Miller, D., Yuhas, B., Addison, K.L.: Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pp. 93–102. ACM (2015)Google Scholar
  2. 2.
    Alexandron, G., Ruipérez-Valiente, J.A., Chen, Z., Muñoz-Merino, P.J., Pritchard, D.E.: Copying@Scale: using harvesting accounts for collecting correct answers in a MOOC. Comput. Educ. 108, 96–114 (2017)CrossRefGoogle Scholar
  3. 3.
    Anozie, N., Junker, B.W.: Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. In: Educational Data Mining: Papers from the AAAI Workshop. AAAI Press, Menlo Park (2006)Google Scholar
  4. 4.
    Breslow, L., Pritchard, D.E., DeBoer, J., Stump, G.S., Ho, A.D., Seaton, D.T.: Studying learning in the worldwide classroom: research into edX’s First MOOC. Res. Pract. Assess. 8, 13–25 (2013)Google Scholar
  5. 5.
    Claros, I., Cobos, R., Sandoval, G., Villanueva, M.: Creating MOOCs by UAMx: experiences and expectations. In: The Third European MOOCs Stakeholders Summit (eMOOC 2015), pp. 61–64 (2015)Google Scholar
  6. 6.
    Coleman, C.A., Seaton, D.T., Chuang, I.: Probabilistic use cases: discovering behavioral patterns for predicting certification. In: Proceedings of the Second (2015) ACM Conference on Learning@Scale, pp. 141–148. ACM (2015)Google Scholar
  7. 7.
    Elbadrawy, A., Studham, R.S., Karypis, G.: Collaborative multi-regression models for predicting students’ performance in course activities. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pp. 103–107. ACM (2015)Google Scholar
  8. 8.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Guo, S., Wu, W.: Modeling student learning outcomes in MOOCs. In: The 4th International Conference on Teaching, Assessment, and Learning for Engineering (2015)Google Scholar
  10. 10.
    Hill, P.: Emerging student patterns in MOOCs: a (revised) graphical view (2013)Google Scholar
  11. 11.
    Jordan, K.: MOOC completion rates: the data (2013). Accessed 27 Aug 2014
  12. 12.
    Kelly, K., Arroyo, I., Heffernan, N.: Using ITS generated data to predict standardized test scores. In: Educational Data Mining 2013 (2013)Google Scholar
  13. 13.
    Khalil, H., Ebner, M.: MOOCs completion rates and possible methods to improve retention-a literature review. In: World Conference on Educational Multimedia, Hypermedia and Telecommunications, no. 1, pp. 1305–1313 (2014)Google Scholar
  14. 14.
    Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 60–65 (2014)Google Scholar
  15. 15.
    Muñoz-Merino, P.J., Molina, M.F., Muñoz-Organero, M., Kloos, C.D.: An adaptive and innovative question-driven competition-based intelligent tutoring system for learning. Expert Syst. Appl. 39(8), 6932–6948 (2012)CrossRefGoogle Scholar
  16. 16.
    Pardo, A., Mirriahi, N., Martinez-Maldonado, R., Jovanovic, J., Dawson, S., Gašević, D.: Generating actionable predictive models of academic performance. In: Proceedings of the Sixth International Conference on Learning Analytics and Knowledge, pp. 474–478. ACM (2016)Google Scholar
  17. 17.
    Ren, Z., Rangwala, H., Johri, A.: Predicting performance on MOOC assessments using multi-regression models. arXiv preprint arXiv:1605.02269 (2016)
  18. 18.
    Ruipérez-Valiente, J.A., Muñoz-Merino, P.J., Delgado Kloos, C.: A predictive model of learning gains for a video and exercise intensive learning environment. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS, vol. 9112, pp. 760–763. Springer, Cham (2015). doi: 10.1007/978-3-319-19773-9_110 CrossRefGoogle Scholar
  19. 19.
    Sinha, T., Jermann, P., Li, N., Dillenbourg, P.: Your click decides your fate: Inferring information processing and attrition behavior from MOOC video clickstream interactions. arXiv preprint arXiv:1407.7131 (2014)
  20. 20.
    Tabaa, Y., Medouri, A.: LASyM: A learning analytics system for MOOCs. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(5), 113–119 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Universidad Carlos III de MadridLeganésSpain
  2. 2.Universidad Autónoma de MadridMadridSpain
  3. 3.IMDEA Networks InstituteLeganésSpain

Personalised recommendations