Skip to main content

Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week’s Activities

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 11528)

Abstract

While Massive Open Online Course (MOOCs) platforms provide knowledge in a new and unique way, the very high number of dropouts is a significant drawback. Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout. The jury is still out on which factors are the most appropriate predictors. However, the literature agrees that early prediction is vital to allow for a timely intervention. Whilst feature-rich predictors may have the best chance for high accuracy, they may be unwieldy. This study aims to predict learner dropout early-on, from the first week, by comparing several machine-learning approaches, including Random Forest, Adaptive Boost, XGBoost and GradientBoost Classifiers. The results show promising accuracies (82%94%) using as little as 2 features. We show that the accuracies obtained outperform state of the art approaches, even when the latter deploy several features.

Keywords

  • Educational data mining
  • Learning analytics
  • Dropout prediction
  • Machine learning
  • MOOCs

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-22244-4_20
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-22244-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    https://www.mooclab.club/resources/mooclab-report-the-global-mooc-landscape-2017.214/

References

  1. Ipaye, B., Ipaye, C.B.: Opportunities and challenges for open educational resources and massive open online courses: the case of Nigeria. Commonwealth of Learning. Educo-Health Project. Ilorin (2013)

    Google Scholar 

  2. Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 60–65 (2014)

    Google Scholar 

  3. Yang, D., Sinha, T., Adamson, D., Rose, C.P.: Turn on, tune in, drop out: anticipating student dropouts in massive open online courses. In: Proceedings of NIPS Work Data Driven Education, pp. 1–8 (2013)

    Google Scholar 

  4. Jordan, K.: MOOC completion rate: the data (2013)

    Google Scholar 

  5. Ye, C., Biswas, G.: Early prediction of student dropout and performance in MOOCs using higher granularity temporal information. J. Learn. Anal. 1, 169–172 (2014)

    CrossRef  Google Scholar 

  6. Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of International Conference Document Anal. and Recognition ICDAR, pp. 440–445 (2011)

    Google Scholar 

  7. Wen, M., Yang, D., Ros, C.P., Rosé, C.P., Rose, C.P.: Linguistic reflections of student engagement in massive open online courses. In: Proceedings of 8th International Conference of Weblogs Social Media, ICWSM 2014, pp. 525–534 (2014)

    Google Scholar 

  8. Wen, M., Yang, D., Rosé, C.P.: Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In: Proceedings of the 7th International Conference on Educational Data Mining (EDM), pp. 1–8 (2014)

    Google Scholar 

  9. Gardner, J., Brooks, C.: Student success prediction in MOOCs. User Model. User-Adapt. Inter. 28, 127–203 (2018)

    CrossRef  Google Scholar 

  10. Hong, B., Wei, Z., Yang, Y.: Discovering learning behavior patterns to predict dropout in MOOC. In: 12th International Conference on Computer Science and Education, ICCSE 2017, pp. 700–704. IEEE. (2017)

    Google Scholar 

  11. Xing, W., Chen, X., Stein, J., Marcinkowski, M.: Temporal predication of dropouts in MOOCs: reaching the low hanging fruit through stacking generalization. Comput. Hum. Behav. 58, 119–129 (2016)

    CrossRef  Google Scholar 

  12. Halawa, S., Greene, D., Mitchell, J.: Dropout prediction in MOOCs using learner activity features. In: Proceedings of the Second European MOOC Stakeholder Summit, pp. 58–65 (2014)

    Google Scholar 

  13. Sharkey, M., Sanders, R.: A process for predicting MOOC attrition. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 50–54 (2014)

    Google Scholar 

  14. Nagrecha, S., Dillon, J.Z., Chawla, N.V.: MOOC dropout prediction: lessons learned from making pipelines interpretable. In: International World Wide Web Conferences Steering Committee Proceedings of the 26th International Conference on World Wide Web Companion, pp. 351–359 (2017)

    Google Scholar 

  15. Bote-Lorenzo, M.L., Gómez-Sánchez, E.: Predicting the decrease of engagement indicators in a MOOC. In: Proceedings of the Seventh International Learning Analytics and Knowledge Conference on LAK 2017. pp. 143–147. ACM Press, New York (2017)

    Google Scholar 

  16. Liang, J., Yang, J., Wu, Y., Li, C., Zheng, L.: Big data application in education: Dropout prediction in Edx MOOCs. In: Proceedings of 2016 IEEE 2nd International Conference on Multimedia Big Data, BigMM 2016, pp. 440–443, IEEE (2016)

    Google Scholar 

  17. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, ACM. (2016)

    Google Scholar 

  18. Dietterich, Thomas G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    CrossRef  Google Scholar 

  19. Ruipérez-Valiente, J.A., Cobos, R., Muñoz-Merino, P.J., Andujar, Á., Delgado Kloos, C.: Early prediction and variable importance of certificate accomplishment in a MOOC. In: Delgado Kloos, C., Jermann, P., Pérez-Sanagustín, M., Seaton, D.T., White, S. (eds.) EMOOCs 2017. LNCS, vol. 10254, pp. 263–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59044-8_31

    CrossRef  Google Scholar 

  20. Cristea, A.I., Alamri, A., Kayama, M., Stewart, C., Alshehri, M., Shi, L.: Earliest predictor of dropout in MOOCs: a longitudinal study of futurelearn courses. In: 27th International Conference on Information Systems Development (ISD) (2018)

    Google Scholar 

  21. Alshehri, M., et al.: On the need for fine-grained analysis of gender versus commenting behaviour in MOOCs. In: Proceedings of the 2018 The 3rd International Conference on Information and Education Innovations, pp. 73–77. ACM (2018)

    Google Scholar 

  22. Cristea, A.I., Alshehri, M., Alamri, A., Kayama, M., Stewart, C., Shi, L.: How is learning fluctuating? futurelearn MOOCs fine-grained temporal analysis and feedback to teachers and designers. In: 27th International Conference on Information Systems Development (ISD2018). Association for Information Systems, Lund (2018)

    Google Scholar 

  23. Dorfman, R.: A formula for the Gini coefficient. Rev. Econ. Stat. 61, 146–149 (1979)

    MathSciNet  CrossRef  Google Scholar 

  24. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    MathSciNet  CrossRef  Google Scholar 

  25. Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class adaboost. Statistics and its. Interface 2, 349–360 (2009)

    MathSciNet  MATH  Google Scholar 

  26. Schapire, R.E., Freund, Y.: Boosting: Foundations and algorithms. MIT press, Cambridge (2012)

    MATH  Google Scholar 

  27. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    CrossRef  Google Scholar 

  28. An, S., Liu, W., Venkatesh, S.: Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognit. 40, 2154–2162 (2007)

    CrossRef  Google Scholar 

  29. Hinkley, D.V., Cox, D.: Theoretical Statistics. Chapman and Hall/CRC, London (1979)

    MATH  Google Scholar 

Download references

Acknowledgment

We would like to thank FAPEAM (Foundation for the State of Amazonas Research), through Edital 009/2017, for partially funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandra Cristea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Alamri, A. et al. (2019). Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week’s Activities. In: Coy, A., Hayashi, Y., Chang, M. (eds) Intelligent Tutoring Systems. ITS 2019. Lecture Notes in Computer Science(), vol 11528. Springer, Cham. https://doi.org/10.1007/978-3-030-22244-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22244-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22243-7

  • Online ISBN: 978-3-030-22244-4

  • eBook Packages: Computer ScienceComputer Science (R0)