Abstract
With the ever-increasing internet penetration across the world, there has been a huge surge in the content on the worldwide web. Video has proven to be one of the most popular media. The COVID-19 pandemic has further pushed the envelope, forcing learners to turn to E-Learning platforms. In the absence of relevant descriptions of these videos, it becomes imperative to generate metadata based on the content of the video. In the current paper, an attempt has been made to index videos based on the visual and audio content of the video. The visual content is extracted using an Optical Character Recognition (OCR) on the stack of frames obtained from a video while the audio content is generated using an Automatic Speech Recognition (ASR). The OCR and ASR generated texts are combined to obtain the final description of the respective video. The dataset contains 400 videos spread across 4 genres. To quantify the accuracy of our descriptions, clustering is performed using the video description to discern between the genres of video.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Medida, L.-H., Raman, K.: An optimized e-lecture video retrieval based on machine learning classification. Int. J. Eng. Adv. Technol. 8(6), 4820–4827 (2019)
Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.: Talkminer: A Lecture Webcast Search Engine, pp. 241–250, October 2010
Balagopalan, A., Balasubramanian, L.L., Balasubramanian, V., Chandrasekharan, N., Damodar. A.: Automatic keyphrase extraction and segmentation of video lectures. In: 2012 IEEE International Conference on Technology Enhanced Education (ICTEE). IEEE, January 2012
Balasubramanian, V., Doraisamy, S.G., Kanakarajan, N.K.: A multimodal approach for extracting content descriptive metadata from lecture videos. J. Intell. Inf. Syst, 46(1), 121–145 (2015)
Chand, D., Ogul, H.: Content-based search in lecture video: a systematic literature review. In: 2020 3rd International Conference on Information and Computer Technologies (ICICT). IEEE, March 2020
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, June 2010
Jeong, H.J., Kim, T.-E., Kim, M.H.: An accurate lecture video segmentation method by using sift and adaptive threshold. In: Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia - MoMM 2012. ACM Press (2012)
Pranali, B., Anil, W., Kokhale, S.: Inhalt based video recuperation system using OCR and ASR technologies. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, December 2015
Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)
Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multim. Tools Appl. 69(1), 217–245 (2012). https://doi.org/10.1007/s11042-012-1250-6
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Varma, S., Pandey, A., Shivam, Das, S., Roy, S.D. (2022). Video Indexing System Based on Multimodal Information Extraction Using Combination of ASR and OCR. In: Sachdeva, S., Watanobe, Y., Bhalla, S. (eds) Big-Data-Analytics in Astronomy, Science, and Engineering. BDA 2021. Lecture Notes in Computer Science(), vol 13167. Springer, Cham. https://doi.org/10.1007/978-3-030-96600-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-96600-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96599-0
Online ISBN: 978-3-030-96600-3
eBook Packages: Computer ScienceComputer Science (R0)