A supervised learning framework: using assessment to identify students at risk of dropping out of a MOOC


Both educational data mining and learning analytics aim to understand learners and optimise learning processes of educational settings like Moodle, a learning management system (LMS). Analytics in an LMS covers many different aspects: finding students at risk of abandoning a course or identifying students with difficulties before the assessments. Thus, there are multiple prediction models that can be explored. The prediction models can target at the course also. For instance, will this activity assessment engage learners? To ease the evaluation and usage of prediction models in Moodle, we abstract out the most relevant elements of prediction models and develop an analytics framework for Moodle. Apart from the software framework, we also present a case study model which uses variables based on assessments to predict students at risk of dropping out of a massive open online course that has been offered eight times from 2013 to 2018, including a total of 46,895 students. A neural network is trained with data from past courses and the framework generates insights about students at risk in ongoing courses. Predictions are then generated after the first, the second, and the third quarters of the course. The average accuracy that we achieve is 88.81% with a 0.9337 F1 score and a 73.12% of the area under the ROC curve.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    https://www.coursera.org/, https://www.edx.org/, https://www.udacity.com/.

  2. 2.


  3. 3.


  4. 4.


  5. 5.


  6. 6.



  1. Abdullah, M., Alqahtani, A., Aljabri, J., Altowirgi, R., & Fallatah, R. (2015). Learning style classification based on student’s behavior in Moodle learning management system. Transactions on Machine Learning and Artificial Intelligence, 3(1), 28.

    Google Scholar 

  2. Adamopoulos, P. (2013). What makes a great MOOC? An interdisciplinary analysis of student retention in online courses. In Thirty fourth international conference on information systems: ICIS 2013.

  3. Aleman de la Garza, L. (2016). Research analysis on MOOC course dropout and retention rates. Turkish Online Journal of Distance Education, 17(April), 3–14.

    Google Scholar 

  4. Aljawarneh, S. A. (2019). Reviewing and exploring innovative ubiquitous learning tools in higher education. Journal of Computing in Higher Education. https://doi.org/10.1007/s12528-019-09207-0.

    Article  Google Scholar 

  5. Bakhshinategh, B., Zaiane, O. R., ElAtia, S., & Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies, 23(1), 537–553.

    Article  Google Scholar 

  6. Bogarín, A., Cerezo, R., & Romero, C. (2018). A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(1), e1230.

    Google Scholar 

  7. Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 66, 541–556.

    Article  Google Scholar 

  8. Chaplot, D. S., Rhim, E., & Kim, J. (2015). Predicting student attrition in MOOCs using sentiment analysis and neural networks. CEUR Workshop Proceedings, 1432(June), 7–12.

    Google Scholar 

  9. Chatti, M. A., Dyckhoff, A. L., Schroeder, U., & Thüs, H. (2012). A reference model for learning analytics. International Journal of Technology Enhanced Learning., 4(5/6), 318.

    Article  Google Scholar 

  10. Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29.

    Article  Google Scholar 

  11. Cox, D. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 20(2), 215–242.

    Article  Google Scholar 

  12. Doolittle, P. E., & Camp, W. G. (1999). Constructivism: The career and technical education perspective. Journal of Vocational and Technical Education., 16(1), 23–46.

    Google Scholar 

  13. Dragulescu, B. , Bucos, M. & Radu, V. (2015). CVLA: Integrating multiple analytics techniques in a custom Moodle report. In International conference on information and software technologies.

  14. Fei, M., & Yeung, D. Y. (2016). Temporal models for predicting student dropout in massive open online courses. Proceedings 15th IEEE international conference on data mining workshop, ICDMW 2015 (pp. 256–263).

  15. Greller, W., & Drachsler, H. (2012). Translating learning into numbers: A generic framework for learning analytics. Educational Technology and Society, 15(3), 42–57.

    Google Scholar 

  16. Hein, G. E. (1991). Constructivist learning theory. In International committee of museum educators conference.

  17. Hone, K. S., & El Said, G. R. (2016). Exploring the factors affecting MOOC retention: A survey study. Computers and Education, 98, 157–168.

    Article  Google Scholar 

  18. Jeni, L. A., Cohn, J. F., & De La Torre, F. (2013). Facing imbalanced data—Recommendations for the use of performance metrics. In 2013 Humaine association conference on affective computing and intelligent interaction (pp. 245–251).

  19. Lipton, Z. C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv:abs/1506.00019.

  20. Luna, J. M., Castro, C., & Romero, C. (2017). MDM tool: A data mining framework integrated into Moodle. Computer Applications in Engineering Education, 25(1), 90–102.

    Article  Google Scholar 

  21. Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124.

    Article  Google Scholar 

  22. Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. BBA Protein Structure, 405(2), 442–451.

    Article  Google Scholar 

  23. Monllaó Olivé, D. , Huynh, D. Q., Reynolds, M., Dougiamas, M., & Wiese, D. (2018). A supervised learning framework for learning management systems. In Proceedings of the first international conference on data science, e-learning and information systems—Data ’18 (pp. 1–8). ACM.

  24. Papert, B. S., & Harel, I. (1991). Situating constructionism. Constructionism, 36, 1–11.

    Google Scholar 

  25. Powers, D. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

    Google Scholar 

  26. Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.

    Article  Google Scholar 

  27. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5(3), 1.

    Google Scholar 

  28. Siemens, G., Gasevic, D., Haythornthwaite, C., Dawson, S. P., Shum, S., Ferguson, R., & Baker, R. S. J. D. (2011). Open learning analytics: An integrated & modularized platform (Technical Report).

  29. Xing, W., Rui, G., Petakovic, E., & Goggins, S. (2015). Participation-based student final performance prediction model through interpretable Genetic programming: Integrating learning analytics, educational data mining and theory. Computers in Human Behaviour, 18(2), 110–128.

    Google Scholar 

Download references


This research project was funded by Moodle Pty Ltd, and by the Australian government and The University of Western Australia through the Research Training Program (RTP). We thank Moodle HQ for providing the dataset used in this study. Special thanks for Helen Foster and Mary Cooch for setting up the MOOC and for running regular versions of the course. Also thanks to all Moodle HQ staff and members of the Moodle community that participated in the project by doing code reviews, by testing the framework and by helping design the user interface of the tool.

Author information



Corresponding author

Correspondence to David Monllaó Olivé.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Monllaó Olivé, D., Huynh, D.Q., Reynolds, M. et al. A supervised learning framework: using assessment to identify students at risk of dropping out of a MOOC. J Comput High Educ 32, 9–26 (2020). https://doi.org/10.1007/s12528-019-09230-1

Download citation


  • Assessment
  • Learning management systems
  • Moodle
  • Learning analytics
  • Educational data mining
  • Machine learning
  • Neural networks