Skip to main content

Classifying MOOC forum posts using corpora semantic similarities: a study on transferability across different courses

A Correction to this article was published on 22 April 2021

This article has been updated


Information overload in MOOC discussion forums is a major problem that hinders the effectiveness of learner facilitation by the course staff. To address this issue, supervised classification models have been studied and developed in order to assist course facilitators in detecting forum discussions that seek for their intervention. A key issue studied by the literature refers to the transferability of these models to domains other than the domain in which they were initially trained. Typically these models employ domain-dependent features, and therefore they fail to transfer to other subject matters. In this study, we propose and evaluate an alternative way of building supervised models in this context, by using the semantic similarities of the forum transcripts with the dynamically created corpora from the MOOC environment as training features. Specifically, in this study, we analyze the case of two MOOCs, in which the models that we built are classifying forum discussions into three categories, course logistics, content-related and no action required. Furthermore, we evaluate the transferability of the derived models and interpret which features can be effectively transferred to other unseen courses. The findings of this study reveal the main benefits and trade-offs of the proposed approach and provide MOOC developers with insights about the main issues that inhibit the transferability of these models.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Change history


  1. Terras MM, Ramsay J (2015) Massive open online courses (MOOCs): Insights and challenges from a psychological perspective. B J Educ Technol 46(3):472–487.

    Article  Google Scholar 

  2. O’Reilly UM, Veeramachaneni K (2014) Technology for mining the big data of MOOCs. Res Pract Assess 9:29–37

    Google Scholar 

  3. Kizilcec RF, Piech C, Schneider E (2013) Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. In: Learning analytics & knowledge, pp 170–179.

  4. Kennedy G, Coffrin C, De Barba P, Corrin L (2015) Predicting success: how learners’ prior knowledge, skills and activities predict MOOC performance. In: Learning analytics & knowledge, pp 136–140

  5. Liyanagunawardena TR, Parslow P, Williams SA (2014) Dropout: MOOC Participants’ Perspective. In: European MOOCs Stakeholders Summit, pp 95–100

  6. Hecking T, Chounta IA, Hoppe HU (2017) Role modelling in MOOC discussion forums. J Learn Anal 4(1):85–116.

    Article  Google Scholar 

  7. Kumar M, Kan MY, Tan BC, Ragupathi K (2015) Learning Instructor Intervention from MOOC Forums: Early Results and Issues. In: Educational data mining, pp 218-225

  8. Wiley DA, Edwards EK (2002) Online self-organizing social systems: The decentralized future of online learning. Q Rev Distance Educ 3(1):33–46

    Google Scholar 

  9. Drachsler H, Kalz M (2016) The MOOC and learning analytics innovation cycle (MOLAC): a reflective summary of ongoing research and its challenges. J Comput Assist Learn 32(3):281–290.

    Article  Google Scholar 

  10. Ntourmas A, Avouris N, Daskalaki S, Dimitriadis Y (2018) Teaching assistants’ interventions in online courses: a comparative study of two massive open online courses. In: Pan-Hellenic conference on informatics, pp 288-293.

  11. Peters VL, Hewitt J (2010) An investigation of student practices in asynchronous computer conferencing courses. Comput Educ 54(4):951–961.

    Article  Google Scholar 

  12. Brinton CG, Chiang M, Jain S, Lam H, Liu Z, Wong FMF (2014) Learning about social learning in MOOCs: From statistical analysis to generative model. IEEE Trans Learn Technol 7(4):346–359.

    Article  Google Scholar 

  13. Rowe M (2018) Operating at the Limit of what was Possible: A case study of facilitator experiences in an Open Online Course. Curric Teach 33(2):91–105.

    MathSciNet  Article  Google Scholar 

  14. Ntourmas A, Avouris N, Daskalaki S, Dimitriadis Y (2019) Evaluation of a Massive Online Course forum: design issues and their impact on learners’ support. In: IFIP conference on human-computer interaction, pp 197-206

  15. Ntourmas A, Avouris N, Daskalaki S, Dimitriadis Y (2019) Teaching Assistants in MOOCs Forums: Omnipresent Interlocutors or Knowledge Facilitators. In: European conference on technology enhanced learning, pp 236-250

  16. Sharif A, Magrill B (2015) Discussion forums in MOOCs. Int J Learn Teach Educ Res 12(1):119–132

    Google Scholar 

  17. Fu S, Zhao J, Cui W, Qu H (2016) Visual analysis of MOOC forums with iForum. IEEE Trans Vis Comput Graph 23(1):201–210.

    Article  Google Scholar 

  18. Wong JS (2018) Messagelens: A visual analytics system to support multifaceted exploration of MOOC forum discussions. Visual Inf. 2(1):37–49.

    Article  Google Scholar 

  19. Chandrasekaran MK, Kan MY, Tan BC, Ragupathi K (2015) Learning instructor intervention from mooc forums: Early results and issues. In: Educational data mining, pp 218-225

  20. Chandrasekaran MK, Epp CD, Kan MY, Litman DJ (2017) Using discourse signals for robust instructor intervention prediction. In: AAAI conference on artificial intelligence, pp 3415–3421

  21. Yang D, Piergallini M, Howley I, Rose C (2014) Forum thread recommendation for massive open online courses. In: Educational data mining, pp 257–260

  22. Howley I, Tomar GS, Ferschke O, Rose CP (2017) Reputation systems impact on help seeking in mooc discussion forums. IEEE Trans Learn Technol 99(1):1–14.

    Article  Google Scholar 

  23. Ntourmas A, Avouris N, Daskalaki S, Dimitriadis Y (2018) Comparative study of MOOC forums: Does course subject matter?. In: ICT in Education, pp 1–8

  24. Moreno-Marcos PM, De Laet T, Muñoz-Merino PJ, Van Soom C, Broos T, Verbert K, Delgado Kloos C (2019) Generalizing predictive models of admission test success based on online interactions. Sustainability 11(18):4940.

    Article  Google Scholar 

  25. Ferguson R, Clow D, Macfadyen L, Essa A, Dawson S, Alexander S (2014) Setting learning analytics in context: Overcoming the barriers to large-scale adoption. In: Learning Analytics And Knowledge, pp 251-253.

  26. Gašević D, Dawson S, Siemens G (2015) Let’s not forget: Learning analytics are about learning. TechTrends 59(1):64–71.

    Article  Google Scholar 

  27. Shatnawi S, Gaber MM, Cocea M (2014) Automatic content related feedback for MOOCs based on course domain ontology. In: Intelligent data engineering and automated learning, pp 27-35.

  28. Atapattu T, Falkner K (2016) A framework for topic generation and labeling from MOOC discussions. In: Learning at Scale, pp 201-204.

  29. Ezen-Can A, Boyer KE, Kellogg S, Booth S (2015) Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach. Learning Analytics & Knowledge, pp 416–150

    Article  Google Scholar 

  30. Liu W, Kidzićski Ł, Dillenbourg P (2016) Semiautomatic annotation of mooc forum posts. In: State-of-the-art and future directions of smart learning, pp 399-408

  31. Almatrafi O, Johri A, Rangwala H (2018) Needle in a haystack: Identifying learner posts that require urgent response in MOOC discussion forums. Comput Educ 118:1–9.

    Article  Google Scholar 

  32. Boyer S, Veeramachaneni K (2015) Transfer learning for predictive models in massive open online courses. In: Artificial intelligence in education, pp 54-63

  33. Whitehill J, Williams J, Lopez G, Coleman C, Reich J (2015) Beyond prediction: First steps toward automatic intervention in MOOC student stopout. Educational data mining, pp 171–178.

  34. Kizilcec RF, Halawa S (2015) Attrition and achievement gaps in online learning. Learning at Scale, pp 57–66.

  35. Kidzinsk L, Sharma K, Boroujeni MS, Dillenbourg P (2016) On Generalizability of MOOC Models. In: International educational data mining society, pp 406–411

  36. Wise AF, Cui Y, Vytasek J (2016) Bringing order to chaos in MOOC discussion forums with content related thread identification. Learning Analytics & Knowledge, pp 188–197

    Article  Google Scholar 

  37. Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: a review of interrater agreement measures. Can J Stat 27(1):3–23.

    MathSciNet  Article  MATH  Google Scholar 

  38. Hernandez N, Hazem A (2018). PyRATA, Python Rule-based feAture sTructure Analysis. Language Resources and Evaluation.

  39. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  40. Duan KB, Keerthi SS (2005) Which is the best multiclass SVM method? An empirical study. In: International workshop on multiple classifier systems, pp 278-285

  41. Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31.

    Article  MATH  Google Scholar 

  42. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174.

    Article  MATH  Google Scholar 

  43. Ntourmas A, Avouris N, Daskalaki S, Dimitriadis Y (2019) Comparative study of two different MOOC forums posts classifiers: analysis and generalizability issues. In: International conference on information, intelligence, systems and applications, pp 1-8.

Download references


This research is performed in the frame of collaboration of the University of Patras with online platform Supply of MOOCs data, by Mathesis is gratefully acknowledged. Doctoral scholarship “Strengthening Human Resources Research Potential via Doctorate Research – 2nd Cycle” (MIS-5000432), implemented by the State Scholarships Foundation (IKY) is also gratefully acknowledged. This research has also been partially funded by the Spanish State Research Agency (AEI) under project Grants TIN2014-53199-C3-2-R and TIN2017-85179-C3-2-R, the Regional Government of Castilla y León Grant VA082U16, the EC Grant 588438-EPP-1-2017-1-EL-EPPKA2-KA.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anastasios Ntourmas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: Due to open choice cancellation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ntourmas, A., Daskalaki, S., Dimitriadis, Y. et al. Classifying MOOC forum posts using corpora semantic similarities: a study on transferability across different courses. Neural Comput & Applic (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Massive Open Online Courses
  • Supervised modeling
  • Transferability
  • Forum discussion classification