Big data analytics for MOOC video watching behavior based on Spark

  • Hui Hu
  • Guofeng Zhang
  • Wanlin Gao
  • Minjuan Wang
Multi-Source Data Understanding (MSDU)


The purpose of this study is to measure the effectiveness of courses delivered using MOOCs in China Agricultural University. Video watching is considered to be the most important way to disseminate knowledge in Massive Open Online Course (MOOC). Its mission is to understand the degree of students’ learning engagement and to provide suggestions for teachers to construct courses. This paper proposes the analysis methods of students’ video watching behavior in MOOCs platform and verifies it with the data of the cauX platform. Initially, a detailed statistical analysis of video watching data and behavior was performed. Later, data preprocessing algorithms based on Spark platform were developed and used to calculate the number of video watching behaviors in every hour and every minute. Then, the entropy weight method was used to calculate the weight of pause video, seek video and speed change video. Finally, we analyze and discuss the results of experiment. The results show that the proposed method based on Spark platform can quickly and accurately analyze the characteristics of video watching behavior.


MOOC Big data Video watching behavior Spark 



The authors would like to thank their colleagues for their support of this work. The detailed comments from the anonymous reviewers were gratefully acknowledged. This work was supported by the Beijing higher education and teaching reform project in 2014 (No. 2014-ms044).

Compliance with ethical standards

Conflict of interest

The authors declared that they have no conflicts of interest to this work.


  1. 1.
    Kahl MP (2015) An overview of the world of moocs. Proc Soc Behav Sci 174(1):427–433MathSciNetGoogle Scholar
  2. 2.
    Brinton CG, Chiang M (2015) MOOC performance prediction via clickstream data and social learning networks. In: 2015 IEEE conference on computer communications (INFOCOM), pp 2299–2307. IEEEGoogle Scholar
  3. 3.
    Li X, Chen Y, Gong X (2017) MOOCs in China: a review of literature, 2012–2016. In: New ecology for education—communication X learning, pp 21–32. Springer, SingaporeGoogle Scholar
  4. 4.
    Sun Xiaoyin, Zhou Wei (2017) big data analytics technology based on MOOC. Comput Mod 4:89–93Google Scholar
  5. 5.
    Hmedna B, El Mezouary A, Baz O (2017) An approach for the identification and tracking of learning styles in MOOCs. In: Europe and MENA cooperation advances in information and communication technologies, pp 125–134. Springer, ChamGoogle Scholar
  6. 6.
    Chen CJ, Wong VS, Teh CS, Chuah KM (2017) MOOC videos-derived emotions. J Telecommun Electr Comput Eng (JTEC) 9(2–9):137–140Google Scholar
  7. 7.
    Li Manli, Shunping Xu, Sun Mengliao (2015) Analysis of learning behaviors in MOOCs—a case study of the course “Principles of Electric Circuits”. Open Educ Res 21(2):63–69Google Scholar
  8. 8.
    Johnson L, Adams Becker S, Estrada V, Freeman A (2014) The NMC horizon report: 2014 higher education edition. Austin, TexasGoogle Scholar
  9. 9.
    Slemmons K, Anyanwu K, Hames J, Grabski D, Mlsna J, Simkins E et al (2018) The impact of video length on learning in a middle-level flipped science setting: implications for diversity inclusion. J Sci Educ Technol 27(5):469–479CrossRefGoogle Scholar
  10. 10.
    Zhang H, Huang T, Lv Z, Liu S, Zhou Z (2017) MCRS: a course recommendation system for MOOCs. Multimed Tools Appl 77:7051–7069CrossRefGoogle Scholar
  11. 11.
    Agnihotri L, Mojarad S, Lewkow N, Essa A (2016) Educational data mining with Python and Apache spark: a hands-on tutorial. In: Proceedings of the sixth international conference on learning analytics and knowledge, pp 507–508. ACMGoogle Scholar
  12. 12.
    Chen Kan, Zhou Yaqian, Ding Yan et al (2016) Research on learning engagement of online video: analysis on the relation between MOOCs video features and seek behavior while watching. J Distance Educ 34(4):35–42Google Scholar
  13. 13.
    Sinha T, Jermann P, Li N, Dillenbourg P (2014) Your click decides your fate: inferring information processing and attrition behavior from mooc video clickstream interactions. arXiv preprint arXiv:1407.7131
  14. 14.
    Pandey SC (2018) Recent developments in big data analysis tools and apache spark. In: Big data processing using spark in cloud. Springer, Singapore, pp 217–236Google Scholar
  15. 15.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, CA, USAGoogle Scholar
  16. 16.
    Zhu X, Suk HI, Wang L, Lee SW, Shen D (2017) A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Med Image Anal 38:205–214CrossRefGoogle Scholar
  17. 17.
    Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett. Google Scholar
  18. 18.
    Zhu Xiaofeng, Zhang Shichao, Rongyao Hu, Zhu Yonghua, Song Jingkuan (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529CrossRefGoogle Scholar
  19. 19.
    Zheng Wei, Zhu Xiaofeng, Zhu Yonghua, Rongyao Hu, Lei Cong (2017) Dynamic graph learning for spectral feature selection. Multimed Tools Appl 11:1–17Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Information and Electrical EngineeringChina Agricultural UniversityBeijingChina
  2. 2.Key Laboratory of Agricultural Informatization StandardizationMinistry of Agriculture and Rural Affairs, China Agricultural UniversityBeijingChina

Personalised recommendations