Automatic Semantic Segmentation and Annotation of MOOC Lecture Videos

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11853)


Locating and searching a desired topic inside a long MOOC lecture video is a rigorous and time consuming task unless video meta data provides annotation information of different topics taught in it. In this work we propose a system that performs topic wise semantic segmentation and annotation of MOOC lecture videos to mitigate user effort. As input, our system takes a lecture video, its associated speech transcript, and list of topic names (available in the form of course syllabus) and produce an annotated information of each semantically coherent topics taught in that video. We have two major contribution in this work. First, we have applied state-of-the art neural network based text segmentation technique on textual information to obtain topic boundaries. Secondly, for annotation we have utilized a combination of visual and textual information, that assigns accurate topic name for each segment. We have tested our method on 100 lecture videos hosted in NPTEL [1] and on an average get \(\sim \)72% segmentation accuracy, better than the existing approaches available in literature.


e-learning Lecture video annotation Lecture video content retrieval Multimedia application Semantic segmentation 


  1. 1.
    National program on technology enhanced learning (2019).
  2. 2.
    Tesseract open source ocr, engine (2019).
  3. 3.
    Wikipedia (2019).
  4. 4.
    Badjatiya, P., Kurisinkel, L.J., Gupta, M., Varma, V.: Attention-based neural text segmentation. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 180–193. Springer, Cham (2018). Scholar
  5. 5.
    Baidya, E., Goel, S.: Lecturekhoj: automatic tagging and semantic segmentation of online lecture videos. In: 2014 Seventh International Conference on Contemporary Computing (IC3), pp. 37–43. IEEE (2014)Google Scholar
  6. 6.
    Bhatt, C.A., et al.: Multi-factor segmentation for topic visualization and recommendation: the must-vis system. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 365–368. ACM (2013)Google Scholar
  7. 7.
    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)
  8. 8.
    Che, X., Yang, H., Meinel, C.: Lecture video segmentation by automatically analyzing the synchronized slides. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 345–348. ACM (2013)Google Scholar
  9. 9.
    Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017)
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)Google Scholar
  12. 12.
    Galanopoulos, D., Mezaris, V.: Temporal lecture video fragmentation using word embeddings. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 254–265. Springer, Cham (2019). Scholar
  13. 13.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  14. 14.
    Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)Google Scholar
  15. 15.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  16. 16.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)Google Scholar
  17. 17.
    Lin, M., Nunamaker Jr, J.F., Chau, M., Chen, H.: Segmentation of lecture videos based on text: a method combining multiple linguistic features. In: Null, p. 10003c. IEEE (2004)Google Scholar
  18. 18.
    Shah, R.R., Yu, Y., Shaikh, A.D., Tang, S., Zimmermann, R.: Atlas: automatic temporal segmentation and annotation of lecture videos based on modelling transition time. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 209–212. ACM (2014)Google Scholar
  19. 19.
    Shah, R.R., Yu, Y., Shaikh, A.D., Zimmermann, R.: Trace: linguistic-based approach for automatic lecture video segmentation leveraging wikipedia texts. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 217–220. IEEE (2015)Google Scholar
  20. 20.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  21. 21.
    Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 1(2), 142–154 (2014)CrossRefGoogle Scholar
  22. 22.
    Yang, H., Siebert, M., Luhne, P., Sack, H., Meinel, C.: Automatic lecture video indexing using video OCR technology. In: 2011 IEEE International Symposium on Multimedia (ISM), pp. 111–116. IEEE (2011)Google Scholar
  23. 23.
    Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Indian Institute of TechnologyKharagpurIndia

Personalised recommendations