Skip to main content
Log in

A multimodal approach for extracting content descriptive metadata from lecture videos

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The rapidly increasing availability of e-learning content and lecture videos over the internet, has brought forth an imperative need for developing effective content based retrieval systems. Comprehensive metadata extraction and support for topic-level search within videos are key factors in developing such systems. In this paper, we propose a multimodal metadata extraction system which extracts an optimal set of keyphrases and topic based segments that effectively summarize the content of a lecture video. The extraction process utilizes features from both audio transcripts and slide content in video streams. A hybrid approach combining a Naive Bayes classifier and a rule-based refiner is used for effective retrieval of the metadata in a lecture. The proposed content-descriptive metadata extraction technique has been evaluated using actual lecture videos from different sources, and our results show that our multimodal approach is effective in summarizing the lecture’s content, potentially improving the user experience during retrieval and browsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Academic earth (2013). http://academicearth.org/.

  • Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.A. (2010). Talkminer: a lecture webcast search engine. In: Proceedings of the international conference on Multimedia, MM ’10, pp. 241–250. ACM, New York, NY, USA. doi:10.1145/1873951.1873986 .

  • Akiba, T., Aikawa, K., Itoh, Y., Kawahara, T., Nanjo, H., Nishizaki, H., Yasuda, N., Yamashita, Y., Itou, K. (2009). Construction of a test collection for spoken document retrieval from lecture audio data. JIP, 17, 82–94.

    Google Scholar 

  • Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., Damodar, A. (2012). Automatic keyphrase extraction and segmentation of video lectures. In: Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on, pp. 1–10. doi:10.1109/ICTEE.2012.6208622.

  • Berkeley webcasts (2013). http://webcast.berkeley.edu/.

  • Böhm, K., & Rakow, T.C. (1994). Metadata for multimedia documents. ACM Sigmod Record, 23(4), 21–26.

    Article  Google Scholar 

  • Chen, Y.N., Huang, Y., Kong, S.Y., Lee, L.S. (2010). Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken Language Technology Workshop (SLT), 2010 IEEE, pp. 265–270. doi:10.1109/SLT.2010.5700862.

  • Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.

  • Frantzi, K.T., & Ananiadou, S. (1996). Extracting nested collocations. In: Proceedings of the 16th conference on Computational linguistics - Volume 1, COLING ’96, pp. 41–46. Association for Computational Linguistics, Stroudsburg, PA, USA. doi:10.3115/992628.992639.

  • Gocr (2013). http://jocr.sourceforge.net/.

  • Haubold, A. (2004). Analysis and visualization of index words from audio transcripts of instructional videos. In: Multimedia Software Engineering, 2004. Proceedings. IEEE Sixth International Symposium on, pp. 570–573. IEEE .

  • Haubold, A., & Kender, J.R. (2005). Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on Multimedia, MULTIMEDIA ’05, pp. 51–60. ACM, New York, NY, USA . doi:10.1145/1101149.1101158.

  • Haubold, A., & Kender, J.R. (2007). VAST MM: multimedia browser for presentation video. In: Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR ’07, pp. 41–48. ACM, New York, NY, USA . doi: 10.1145/1282280.1282286.

  • Hearst, M.A. (1997). Texttiling: segmenting text into multi-paragraph subtopic passages. Computer Linguistic, 23(1), 33–64. http://dl.acm.org/citation.cfm?id=972684.972687.

    Google Scholar 

  • Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03, pp. 216–223. Association for ComputationalLinguistics, Stroudsburg, PA, USA. doi:10.3115/1119355.1119383.

  • Hunter, J., Little, S., Building and indexing a distributed multimedia presentation archive using SMIL. In: ECDL’01, pp. 415–428 (2001).

  • Kim, S.N., & Kan, M.Y. (2009). Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, MWE ’09, pp. 9–16. Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl. acm.org/citation.cfm?id=1698239.1698242.

  • Liu, F., Liu, F., Liu, Y. (2008). Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion. In: Spoken Language Technology Workshop, 2008. SLT 2008. IEEE, pp. 181–184. doi:10.1109/SLT.2008.4777870.

  • Liu, F., Pennell, D., Liu, F., Liu, Y. (2009). Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, pp. 620–628. Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl.acm.org/citation.cfm?id= 1620754.1620845.

  • Liu, T., & Kender, J.R. (2004). Lecture videos for e-learning: Current research and challenges. In: Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering, 2004 pp. 574–578, IEEE.

  • Manning, C.D., Raghavan, P., Schtze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.

    Book  MATH  Google Scholar 

  • MIT OCW - MIT OpenCourseWare (2013). http://ocw.mit.edu/.

  • Medelyan, O., & Witten, I.H. (2006). Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, JCDL ’06, pp. 296–297. ACM, New York, NY, USA. doi:10.1145/1141753.1141819.

  • Mukhopadhyay, S., & Smith, B. (1999). Passive capture and structuring of lectures. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), MULTIMEDIA ’99, pp. 477–487. ACM, New York, NY, USA. doi:10.1145/319463.319690.

  • NPTEL - National Programme on Technology Enhanced Education (2013). http://nptel.iitm.ac.in/.

  • Open Yale Courses (OYC) (2013). http://oyc.yale.edu/.

  • Tesseract OCR (2013). https://code.google.com/p/tesseract-ocr/.

  • VideoLectures.NET (2013). http://videolectures.net/.

  • VideoLectures.Net Challenge (2014). http://acmmm.org/2014/docs/mm/_gc/mediamixer.pdf.

  • Viertl, R. (2008). Fuzzy models for precision measurements. Mathematics and Computers in Simulation, 79(4), 874–878.

    Article  MATH  MathSciNet  Google Scholar 

  • Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G. (1999). KEA: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital libraries, DL ’99, pp. 254–255. ACM, New York, NY, USA. doi:10.1145/313238.313437.

  • Ziółko, B., Manandhar, S., Wilson, R.C. (2007). Fuzzy recall and precision for speech segmentation evaluation. In: Proceedings of 3rd Language & Technology Conference, Poznan, Poland, .

Download references

Acknowledgments

We would like to acknowledge the support by the E-Learning Research Center, Amrita Vishwa Vidyapeetham (Coimbatore). This work was funded in part by this center. We thank NPTEL 2013for their providing their dataset for research purposes. We are immensely grateful to Venkatapathy S. Iyer, our research associate who helped with the implementation of our lecture browser and search system. He also helped with the proof reading of this paper. Finally we thank all the other researchers in our research lab, and the students who helped in contributing to generation of the ground truth dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vidhya Balasubramanian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Balasubramanian, V., Doraisamy, S.G. & Kanakarajan, N.K. A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46, 121–145 (2016). https://doi.org/10.1007/s10844-015-0356-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-015-0356-5

Keywords

Navigation