Skip to main content

Communication at Scale in a MOOC Using Predictive Engagement Analytics

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10947)


When teaching at scale in the physical classroom or online classroom of a MOOC, the scarce resource of personal instructor communication becomes a differentiating factor between the quality of learning experience available in smaller classrooms. In this paper, through real-time predictive modeling of engagement analytics, we augment a MOOC platform with personalized communication affordances, allowing the instructional staff to direct communication to learners based on individual predictions of three engagement analytics. The three model analytics are the current probability of earning a certificate, of submitting enough materials to pass the class, and of leaving the class and not returning. We engineer an interactive analytics interface in edX which is populated with real-time predictive analytics from a backend API service. The instructor can target messages to, for example, all learners who are predicted to complete all materials but not pass the class. Our approach utilizes the state-of-the-art in recurrent neural network classification, evaluated on a MOOC dataset of 20 courses and deployed in one. We provide evaluation of these courses, comparing a manual feature engineering approach to an automatic feature learning approach using neural networks. Our provided code for the front-end and back-end allows any instructional team to add this personalized communication dashboard to their edX course granted they have access to the historical clickstream data from a previous offering of the course, their course’s daily provided log data, and an external machine to run the model service API.


  • Representation learning
  • MOOCs
  • Learning analytics
  • Engagement
  • Drop-out prediction
  • Instructor communication
  • edX
  • User-interface

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-93843-1_18
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-93843-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.


  1. 1.

    These data were provided by way of the edX partners’ Research Data Exchange (RDX). All data have been anonymized before being received and are restricted in use by MOU.

  2. 2.

    A student gained certification if the “status” column in the edX provided certificates_generatedcertificate-prod-analytics.sql file was set to “downloadable”.

  3. 3.

    All implemented using Python’s scikit-learn machine learning library.

  4. 4.

    The longest event streams were in EPFLx “Plasma Physics and Applications”.

  5. 5.

  6. 6.


  1. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  2. Ho, A., Reich, J., Nesterko, S., Seaton, D., Mullaney, T., Waldo, J., Chuang, I.: HarvardX and MITx: The first year of open online courses, fall 2012-summer 2013 (2014)

    Google Scholar 

  3. Reich, J.: MOOC completion and retention in the context of student intent. EDUCAUSE Review Online (2014)

    Google Scholar 

  4. Mass, A., Heather, C., Do, C., Brandman, R., Koller, D., Ng, A.: Offering verified credentials in massive open online courses. In: Ubiquity Symposium (2014)

    Google Scholar 

  5. Mi, F., Yeung, D.: Temporal models for predicting student drop-out in massive open online courses. In: 2015 IEEE International Conference Data Mining Workshop (ICDMW) (2015)

    Google Scholar 

  6. Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC drop-out over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs (2014)

    Google Scholar 

  7. Jiang, S., Williams, A., Schenke, K., Warschauer, M., O’dowd, D.: Predicting MOOC performance with week 1 behavior. In: Educational Data Mining 2014 (2014)

    Google Scholar 

  8. Balakrishnan, G., Coetzee, D.: Predicting student retention in massive open online courses using hidden markov models (2013)

    Google Scholar 

  9. Boyer, S., Veeramachaneni, K.: Robust predictive models on moocs: transferring knowledge across courses. In: Proceedings of the 9th International Conference on Educational Data Mining (2016)

    Google Scholar 

  10. Crossley, S., Paquette, L., Dascalu, M., McNamara, D., Baker, R.: Combining click-stream data with NLP tools to better understand MOOC completion. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (2016)

    Google Scholar 

  11. Kizilcec, R., Halawa, S.: Attrition and achievement gaps in online learning. In: Proceedings of the Second ACM Conference on Learning@ Scale (2015)

    Google Scholar 

  12. Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., Sohl-Dickstein, J.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)

    Google Scholar 

  13. Tang, S., Peterson, J., Pardos, Z.: Modelling student behavior using granular large scale action data from a MOOC. arXiv:1608.04789 (2016)

  14. Whitehill, J., Williams, J., Lopez, C.C., Reich, J.: Beyond prediction: toward automatic intervention to reduce mooc student stopout. In: Educational Data Mining (2015)

    Google Scholar 

  15. Boyer, S., Gelman, B., Schreck, B., Veeramachaneni, K.: Data science foundry for MOOCs. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678 2015 (2015)

    Google Scholar 

  16. Pardos, Z.A., Gowda, S., Baker, R., Heffernan, N.: The sum is greater than the parts: ensembling models of student knowledge in educational software. ACM SIGKDD Explor. Newlett. 12(2), 37–44 (2012)

    CrossRef  Google Scholar 

  17. Wise, A., Cui, Y., Vytasek, J.: Bringing order to chaos in MOOC discussion forums with content-related thread identification. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (2016)

    Google Scholar 

  18. Jayaprakash, S.M., Moody, E.W., Lauría, E.J., Regan, J.R., Baron, J.D.: Early alert of academically at-risk students: an open source analytics initiative. J. Learn. Analytics 1(1), 6–47 (2014)

    CrossRef  Google Scholar 

  19. Tang, S., Peterson, J., Pardos, Z.: Predictive modelling of student behaviour using granular large-scale action data. In: Lang, C., Siemens, G., Wise, A.F., Gaevic, D. (eds.) The Handbook of Learning Analytics, 1st edn., pp. 223–233. Society for Learning Analytics Research (SoLAR), Alberta (2017)

    Google Scholar 

  20. Pardos, Z.A., Tang, S., Davis, D., Le. C.V.: Enabling real-time adaptivity in MOOCs with a personalized next-step recommendation framework. In: Proceedings of the Fourth ACM Conference on Learning @ Scale (L@S). Cambridge, MA. pp. 23–32. ACM (2017)

    Google Scholar 

  21. Ferschke, O., Yang, D., Tomar, G., Rosé, C.P.: Positive impact of collaborative chat participation in an edX MOOC. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.Felisa (eds.) AIED 2015. LNCS (LNAI), vol. 9112, pp. 115–124. Springer, Cham (2015).

    CrossRef  Google Scholar 

  22. Andres, J.M.L., Baker, R.S., Siemens, G., Spann, C.A., Gasevic, D., Crossley, S.: Studying MOOC completion at scale using the MOOC replication framework. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 338–339 (2017)

    Google Scholar 

Download references


These multi-institution analyses were made possible by anonymized data from the edX partners’ Research Data Exchange (RDX) program. This work was supported in part by a grant from the National Science Foundation (Award #1446641).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zachary A. Pardos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Le, C.V., Pardos, Z.A., Meyer, S.D., Thorp, R. (2018). Communication at Scale in a MOOC Using Predictive Engagement Analytics. In: , et al. Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science(), vol 10947. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93842-4

  • Online ISBN: 978-3-319-93843-1

  • eBook Packages: Computer ScienceComputer Science (R0)