Abstract
Massive open online courses (MOOCs) have become one of the most popular ways of learning in recent years due to their flexibility and convenience. However, high dropout rate has become a prominent problem that hinders the further development of MOOCs. Therefore, the prediction of student dropouts is the key to further enhance the MOOCs platform. The traditional dropout prediction models based on machine learning are difficult to guarantee the prediction effect due to the shortcomings such as insufficient mining of feature information and not considering the influence of time series. To address this problem, in this paper, we propose the learning behavior feature fused deep learning network model (LBDL) for MOOC dropout prediction. The core of the model lies in modeling different types of information separately and incorporating them into an overall framework. In the data processing stage, the LBDL model divides the data features into video learning behavior features containing time series information and general information features. For video learning behavior features, the model uses Bi-LSTM and attention mechanisms to mine time series information, and for general information features, it uses embedding layer and fully connected layer for processing. A hidden vector containing both types of feature information can be obtained by two different modeling approaches. Then the original feature information is combined to train the gradient boosting framework LightGBM. Experiments on the MOOCCube video dataset show that the AUC and F1-Score of our model can reach 82.39% and 74.89%, respectively, which are higher than other baseline models. It indicates that the proposed LBDL model has better performance in the dropout rate prediction problem.
Similar content being viewed by others
Data availability
Data sets used during the current study are available from the appropriate authors upon reasonable request.
References
Amnueypornsakul, B., Bhat, S., & Chinprutthiwong, P. (2014). Predicting attrition along the way: The UIUC model. In Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs (pp. 55–59). Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-4110
Baranyi, M., & Moontay, R. (2020, October). Interpretable deep learning for university dropout prediction. In Proceedings of the 21st annual conference on information technology education (pp. 13–19). https://doi.org/10.1145/3368308.3415382
Basnet, R. B., Johnson, C., & Doleck, T. (2022). Dropout prediction in Moocs using deep learning and machine learning. Education and Information Technologies, 27(8), 11499–11513. https://doi.org/10.1007/s10639-022-11068-7
Brahimi, T., & Sarirete, A. (2015). Learning outside the classroom through MOOCs. Computers in Human Behavior, 51, 604–609. https://doi.org/10.1016/j.chb.2015.03.013
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Cai, L., & Zhang, G. (2021). Prediction of MOOCs Dropout based on WCLSRT Model. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation ControliiConferencew(pp. 780–784). IEEE. https://doi.org/10.1109/IAEAC50856.2021.9390886
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
Chen, M., & Wu, L. A. (2021). A dropout prediction method based on time series model in MOOCs. Journal of Physics: Conferrence Series, 1774(1), 012065. https://doi.org/10.1088/1742-6596/1774/1/012065
Cui, H., Mittal, V. O., & Datar, M. (2006, July). Comparative experiments on sentiment classification for online product reviews. In proceedings of the 21st national conference on Artificial intelligence (Vol. 6, No. 1265–1270, p.30).
Dai, Z., Fu, J., Zhu, Q., Cui, H., Li, X., & Qi, Y. (2020). Local contextual attention with hierarchical structure for dialogue act recognition. arXiv preprint arXiv:2003.06044. https://doi.org/10.48550/arxiv.2003.06044
Dey, R., & Salemt, F. M. (2017, August). Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (pp. 1597–1600).IEEE. https://doi.org/10.1109/MWSCAS.2017.8053243
Fei, M., & Yeung, D. Y. (2015, November). Temporal Models for Predicting Student Dropout in Massive Open Online Courses. In 2015 IEEE International Confe-rence on Data Mining Workshop (pp. 256–263). IEEE. https://doi.org/10.1109/ICDMW.2015.174
Feng, W., Tang, J., & Liu, T. X. (2019, July). Understanding Dropouts in MOOCs. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 517–524). https://doi.org/10.1609/aaai.v33i01.3301517
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Kim, Y., Denton, C., Hoang L., & Rush, A. M. (2017). Structured attention networks. arXiv preprint arXiv:1702.00887. https://doi.org/10.48550/arXiv.1702.00887
Kloft, M., Stiehler, F., Zheng, Z., & Pinkwart, P. (2014). Predicting mooc dropout over weeks using machine learning methods. In Proceedings of the EMNLP workshop on analysis of large scale social interaction in MOOCs (60–65). https://doi.org/10.3115/v1/W14-4111
Lakkaraju, H., Aguiar, E., & Shan, C. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (1909–1918). https://doi.org/10.1145/2783258.2788620
Long, F., Zhou, K., & Ou, W. (2019). Sentiment Analysis of Text Based on Bidirectional LSTM With Multi-Head Attention. IEEE Access, 7, 141960–141969. https://doi.org/10.1109/ACCESS.2019.2942614
Mubarak, A. A., Cao, H., & Zhang, W. (2021). Visual analytics of video-clickstream data and prediction of learners’ performance using deep learning models in MOOCs’ courses. Computer Applications in Engineering Education, 29(4), 710–732. https://doi.org/10.1002/cae.22328
Pulikottil, S. C., & Gupta, M. (2020). ONet – A Temporal Meta Embedding Network for MOOC Dropout Prediction. In 2020 IEEE International Conference on Big Data (5209–5217). https://doi.org/10.1109/BigData50022.2020.9378001
Qi, M. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in neural information processing systems, 30.
Qiu, L., Liu, Y., Hu, Q., & Liu, Y. (2018). Student dropout prediction in massive open online courses by convolutional neural networks. Soft Computing, 23(20), 10287–10301. https://doi.org/10.1007/s00500-018-3581-3
Saber, A., Sakr, M., & Abo-Seida, O. M. (2021). A Novel Deep-Learning Model for Automatic Detection and Classification of Breast Cancer Using the Transfer-Learning Technique. IEEE Access, 9, 71194–71209. https://doi.org/10.1109/ACCESS.2021.3079204
Tang, C., Ouyang, Y., & Rong, W. (2018). Time series model for predicting dropout in massive open online courses. In Artificial Intelligence in Education: 19th International Conference, AIED 2018, London, UK, June 27–30, 2018, Proceedings, Part II 19 (pp. 353–357). Springer International Publishing. https://doi.org/10.1007/978-3-319-93846-2_66
Taylor, C., Veeramachaneni, K., & O’Reilly, U. M. (2014). Likely to stop? Predicting stopout in massive open online courses. arXiv preprint, 214, 118–33. https://doi.org/10.48550/arXiv.1408.3382
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arxiv.1706.03762
Wang, W., Yu, H., & Miao, C. (2017, July). Deep model for dropout prediction in MOOCs. In 2nd International Conference on Crowd Science and Engineering (pp. 26–32). https://doi.org/10.1145/3126973.3126990
Wen, Y., Tian, Y., Wen, B., Zhou, Q., Cai, G., & Liu, S. (2020). Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs. Tsinghua Science and Technology, 25(3), 336–347. https://doi.org/10.26599/TST.2019.9010013
Whitehill, J., Mohan, K.V., Seaton, D.T., Rosen, Y., & Tingley, D. (2017). Delving Deeper into MOOC Student Dropout Prediction. arXiv preprint arXiv:1702.06404. https://doi.org/10.48550/arXiv.1702.06404
Wilson, K. H., Xiong, X., Khajah, M., Lindsey, R. V., Zhao, S., Karklin, Y., ... & Heffernan, N. (2016). Estimating student proficiency: Deep learning is not the panacea. In Neural Information Processing Systems, Workshop on Machine Learning for Education.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. Proc. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (1480–1489). San Diego, California: Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1174
Youssef, M., Mohamed, S., Kabtane, H. E., & Wafaa, B. F. (2019). A machine learning-based methodology to predict learners’ dropout, success or failure in MOOCs. International Journal of Web Information Systems, 15(5), 489–509. https://doi.org/10.1108/IJWIS-11-2018-0080
Yu, J. (2020). MOOCCube: a large-scale data repository for NLP applications in MOOCs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (3135–3142). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.285
Zheng, Y., Gao, Z., & Wang, Y. (2020). MOOC Dropout Prediction Using FWTS-CNN Model Based on Fused Feature Weighting and Time Series. IEEE Access, 8, 225324–225335. https://doi.org/10.1109/ACCESS.2020.3045157
Acknowledgements
All authors would like to thank all the participants who devoted their time and energy in taking part in the experiments. We thank the associate editor and the reviewers for their useful feedback that improved this paper.
Funding
This work is supported by the National Natural Science Foundation of China (Grant Nos. 62071379, 62071378, 61901365 and 62106196), the Natural Science Basic Research Plan in Shaanxi Province of China (Grant Nos. 2021JM-461 and 2020JM-299), the Fundamental Research Funds for the Central Universities (Grant No. GK202103085), and New Star Team of Xi'an University of Posts & Telecommunications (Grant No. xyt2016-01).
Author information
Authors and Affiliations
Contributions
Xiao Chen: designer of the paper model, responsible for data processing, experiment and writing the original draft. Hanqiang Liu: put forward his own suggestions and ideas on the paper model, participated in the revision of the manuscript and approved the final manuscript. Feng Zhao: provided suggestions on the experimental part of the model and participated in the revision of the manuscript.
Corresponding author
Ethics declarations
Ethics approval
The participants were protected by hiding their personal information during the research process. They knew that their participation was voluntary and they could withdraw from the study at any time.
Confict of interest
The authors declare that they have no confict of interest.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, H., Chen, X. & Zhao, F. Learning behavior feature fused deep learning network model for MOOC dropout prediction. Educ Inf Technol 29, 3257–3278 (2024). https://doi.org/10.1007/s10639-023-11960-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-11960-w