Dropout Prediction in MOOCs: A Comparison Between Process and Sequence Mining

  • Galina DeevaEmail author
  • Johannes De Smedt
  • Pieter De Koninck
  • Jochen De Weerdt
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 308)


Recently, Massive Open Online Courses (MOOCs) have experienced rapid development. However, one of the major issues of online education is the high dropout rates of participants. Many studies have attempted to explore this issue, using quantitative and qualitative methods for student attrition analysis. Nevertheless, there is a lack of studies which (1) predict the actual moment of dropout, providing opportunities to enhance MOOCs’ student retention by offering timely interventions; and (2) compare the performance of such predicting algorithms. In this paper, we aim to predict student drop out in MOOCs using process and sequence mining techniques, and provide a comparative analysis of these techniques. We perform a case study based on the data from KU Leuven online course “Trends in e-Psychology”, available on the edX platform. The results reveal, that while process mining is better capable to perform descriptive analysis, sequence mining techniques provide better features for predictive purposes.


Dropout prediction Process mining Sequence classification Massive Open Online Course Educational data mining 


  1. 1.
    Milligan, C., Margaryan, A., Littlejohn, A.: Patterns of engagement in massive open online courses. J. Online Learn. Technol. 9(2), 149–159 (2013)Google Scholar
  2. 2.
    Zheng, S., Rosson, M.B., Shih, P.C., Carroll, J.M.: Understanding student motivation, behaviors and perceptions in MOOCs. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1882–1895. ACM (2015)Google Scholar
  3. 3.
    Eriksson, T., Adawi, T., Stöhr, C.: “Time is the bottleneck”: a qualitative study exploring why learners drop out of MOOCs. J. Comput. High. Educ. 29(1), 133–146 (2017)CrossRefGoogle Scholar
  4. 4.
    Yang, D., Sinha, T., Adamson, D., Rosé, C.P.: Turn on, tune in, drop out: anticipating student dropouts in massive open online courses. In: Proceedings of the 2013 NIPS Data-driven Education Workshop. vol. 11, p. 14 (2013)Google Scholar
  5. 5.
    Adamopoulos, P.: What makes a great MOOC? An interdisciplinary analysis of student retention in online courses (2013)Google Scholar
  6. 6.
    Balakrishnan, G., Coetzee, D.: Predicting student retention in massive open online courses using hidden markov models. Electrical Engineering and Computer Sciences, University of California at Berkeley (2013)Google Scholar
  7. 7.
    Halawa, S., Greene, D., Mitchell, J.: Dropout prediction in MOOCs using learner activity features. Experiences Best Pract. Around MOOCs 7, 3–12 (2014)Google Scholar
  8. 8.
    Mukala, P., Buijs, J., Van Der Aalst, W.: Exploring students’ learning behaviour in moocs using process mining techniques. Technical report, Eindhoven University of Technology, BPM Center Report BPM-15-10, (2015)Google Scholar
  9. 9.
    Kinnebrew, J.S., Loretz, K.M., Biswas, G.: A contextualized, differential sequence mining method to derive students’ learning behavior patterns. JEDM-J. Educ. Data Min. 5(1), 190–219 (2013)Google Scholar
  10. 10.
    Luan, J.: Data mining and its applications in higher education. New Dir. Inst. Res. 2002(113), 17–36 (2002)Google Scholar
  11. 11.
    Reimann, P., Markauskaite, L., Bannert, M.: e-Research and learning theory: what do sequence and process mining methods contribute? Br. J. Educ. Technol. 45(3), 528–540 (2014)CrossRefGoogle Scholar
  12. 12.
    van der Aalst, W.M.P.: Process Mining - Data Science in Action. Springer, Heidelberg (2016).
  13. 13.
    De Weerdt, J., vanden Broucke, S.K.L.M., Vanthienen, J., Baesens, B.: Active trace clustering for improved process discovery. IEEE Trans. Knowl. Data Eng. 25(12), 2708–2720 (2013)CrossRefGoogle Scholar
  14. 14.
    De Koninck, P., De Weerdt, J., vanden Broucke, S.K.L.M.: Explaining clusterings of process instances. Data Min. Knowl. Discov. 31(3), 774–808 (2017)MathSciNetCrossRefGoogle Scholar
  15. 15.
    de Leoni, M., van der Aalst, W.M.P., Dees, M.: A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf. Syst. 56, 235–257 (2016)CrossRefGoogle Scholar
  16. 16.
    van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Inf. Syst. 36(2), 450–475 (2011)CrossRefGoogle Scholar
  17. 17.
    Rogge-Solti, A., Weske, M.: Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 389–403. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  18. 18.
    Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Data-aware remaining time prediction of business process instances. In: IJCNN, pp. 816–823. IEEE (2014)Google Scholar
  19. 19.
    Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activity sequence prediction of business process instances. CoRR abs/1602.07566 (2016)Google Scholar
  20. 20.
    Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q. 40(4), 1009–1034 (2016)CrossRefGoogle Scholar
  21. 21.
    Westergaard, M., Maggi, F.M.: Looking into the future. Using timed automata to provide a priori advice about timed declarative process models. In: Meersman, R., Panetto, H., Dillon, T., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F. (eds.) OTM 2012. LNCS, vol. 7565, pp. 250–267. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  22. 22.
    Francescomarino, C.D., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-based predictive process monitoring. arXiv preprint arXiv:1506.01428 (2015)
  23. 23.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). CrossRefGoogle Scholar
  24. 24.
    Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: IEEE 23rd International Conference on Data Engineering 2007, ICDE 2007, pp. 716–725. IEEE (2007)Google Scholar
  25. 25.
    Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)CrossRefGoogle Scholar
  26. 26.
    Lesh, N., Zaki, M.J., Oglhara, M.: Scalable feature mining for sequential data. IEEE Intell. Syst. Appl. 15(2), 48–56 (2000)CrossRefGoogle Scholar
  27. 27.
    Wang, J., Karypis, G.: Harmony: efficiently mining the best rules for classification. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM, pp. 205–216 (2005)Google Scholar
  28. 28.
    Egho, E., Gay, D., Boullé, M., Voisine, N., Clérot, F.: A parameter-free approach for mining robust sequential classification rules. In: ICDM, IEEE Computer Society, pp. 745–750 (2015)Google Scholar
  29. 29.
    Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31–60 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Galina Deeva
    • 1
    Email author
  • Johannes De Smedt
    • 2
  • Pieter De Koninck
    • 1
  • Jochen De Weerdt
    • 1
  1. 1.Department of Decision Sciences and Information Management, Faculty of Economics and BusinessKU LeuvenLeuvenBelgium
  2. 2.Management Science and Business Economics Group, Business SchoolUniversity of EdinburghEdinburghScotland

Personalised recommendations