Video Indexing System Based on Multimodal Information Extraction Using Combination of ASR and OCR

Varma, Sandeep; Pandey, Arunanshu; Shivam; Das, Soham; Roy, Soumya Deep

doi:10.1007/978-3-030-96600-3_14

Sandeep Varma¹¹,
Arunanshu Pandey¹¹,
Shivam¹¹,
Soham Das¹² &
…
Soumya Deep Roy¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13167))

Included in the following conference series:

International Conference on Big Data Analytics

635 Accesses
1 Citations

Abstract

With the ever-increasing internet penetration across the world, there has been a huge surge in the content on the worldwide web. Video has proven to be one of the most popular media. The COVID-19 pandemic has further pushed the envelope, forcing learners to turn to E-Learning platforms. In the absence of relevant descriptions of these videos, it becomes imperative to generate metadata based on the content of the video. In the current paper, an attempt has been made to index videos based on the visual and audio content of the video. The visual content is extracted using an Optical Character Recognition (OCR) on the stack of frames obtained from a video while the audio content is generated using an Automatic Speech Recognition (ASR). The OCR and ASR generated texts are combined to obtain the final description of the respective video. The dataset contains 400 videos spread across 4 genres. To quantify the accuracy of our descriptions, clustering is performed using the video description to discern between the genres of video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Medida, L.-H., Raman, K.: An optimized e-lecture video retrieval based on machine learning classification. Int. J. Eng. Adv. Technol. 8(6), 4820–4827 (2019)
Google Scholar
Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.: Talkminer: A Lecture Webcast Search Engine, pp. 241–250, October 2010
Google Scholar
Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., Damodar. A.: Automatic keyphrase extraction and segmentation of video lectures. In: 2012 IEEE International Conference on Technology Enhanced Education (ICTEE). IEEE, January 2012
Google Scholar
Balasubramanian, V., Doraisamy, S.G., Kanakarajan, N.K.: A multimodal approach for extracting content descriptive metadata from lecture videos. J. Intell. Inf. Syst, 46(1), 121–145 (2015)
Google Scholar
Chand, D., Ogul, H.: Content-based search in lecture video: a systematic literature review. In: 2020 3rd International Conference on Information and Computer Technologies (ICICT). IEEE, March 2020
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, June 2010
Google Scholar
Jeong, H.J., Kim, T.-E., Kim, M.H.: An accurate lecture video segmentation method by using sift and adaptive threshold. In: Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia - MoMM 2012. ACM Press (2012)
Google Scholar
Pranali, B., Anil, W., Kokhale, S.: Inhalt based video recuperation system using OCR and ASR technologies. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, December 2015
Google Scholar
Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)
Article Google Scholar
Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multim. Tools Appl. 69(1), 217–245 (2012). https://doi.org/10.1007/s11042-012-1250-6
Article Google Scholar

Download references

Author information

Authors and Affiliations

ZS Associates, Pune, India
Sandeep Varma, Arunanshu Pandey & Shivam
Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
Soham Das & Soumya Deep Roy

Authors

Sandeep Varma
View author publications
You can also search for this author in PubMed Google Scholar
Arunanshu Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Shivam
View author publications
You can also search for this author in PubMed Google Scholar
Soham Das
View author publications
You can also search for this author in PubMed Google Scholar
Soumya Deep Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sandeep Varma , Arunanshu Pandey , Shivam , Soham Das or Soumya Deep Roy .

Editor information

Editors and Affiliations

National Institute of Technology, Delhi, India
Shelly Sachdeva
University of Aizu, Aizu-wakamatsu city, Fukushima, Japan
Yutaka Watanobe
University of Aizu, Aizu-wakamatsu city, Fukushima, Japan
Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varma, S., Pandey, A., Shivam, Das, S., Roy, S.D. (2022). Video Indexing System Based on Multimodal Information Extraction Using Combination of ASR and OCR. In: Sachdeva, S., Watanobe, Y., Bhalla, S. (eds) Big-Data-Analytics in Astronomy, Science, and Engineering. BDA 2021. Lecture Notes in Computer Science(), vol 13167. Springer, Cham. https://doi.org/10.1007/978-3-030-96600-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-96600-3_14
Published: 18 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96599-0
Online ISBN: 978-3-030-96600-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video Indexing System Based on Multimodal Information Extraction Using Combination of ASR and OCR