Video Caption Detection

Lu, Tong; Palaiahnakote, Shivakumara; Tan, Chew Lim; Liu, Wenyin

doi:10.1007/978-1-4471-6515-6_3

Tong Lu⁷,
Shivakumara Palaiahnakote⁸,
Chew Lim Tan⁹ &
…
Wenyin Liu¹⁰

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

2023 Accesses
1 Citations

Abstract

Video contains two types of texts. The first type pertains to caption texts which are edited texts or graphics texts artificially superimposed into video and are relevant to the content of the video. The second type belongs to scene texts, which are naturally existing texts, usually embedded in objects in the video. This chapter focuses on the state-of-the-art methods developed for caption text detection in video. According to the literature, current methods can be classified into two broad categories, namely, feature-based methods and machine learning-based methods. Feature-based methods described in this chapter make use of the following features for text detection, namely, image edges by means of gradient and filters, textures by combining a variety of image textures, connected components by analyzing skeletons obtained from the image, and frequency domain features by performing Fourier transform. On the other hand, machine learning methods presented in this chapter make use of classifiers such as support vector machines, neural networks, and Bayesian classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dimitrova N, Agnihotri L, Dorai C, Bolle R (2000) MPEG-7 video text description scheme for superimposed text in images and video. Signal Process Image Commun 16:137–155
Article Google Scholar
Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37:977–997
Article Google Scholar
Chen D, Luttin J, Shearer K (2000) A survey of text detection and recognition in images and videos, IDIAP research report, pp 1–21
Google Scholar
Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of the eighth IAPR workshop on document analysis systems (DAS), pp 5–17
Google Scholar
Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the seventh international conference on document analysis and recognition (ICDAR)
Google Scholar
Smith MA, Kanade T (1995) Video skimming for quick browsing based on audio and image characterization, Technical report CMU-CS-95-186. Mellon University, Pittsburgh
Google Scholar
Chen D, Shearer K, Bourlard H (2001) Text enhancement with asymmetric filter for video OCR. In: Proceedings of the international conference on image analysis and processing, pp 192–197
Google Scholar
Shivakumara P, Huang W, Tan CL (2008) An efficient edge based technique for text detection in video frames. In: Proceedings of the international workshop on document analysis systems (DAS 2008), pp 307–314
Google Scholar
Shivakumara P, Huang W, Tan CL (2008) Efficient video text detection using edge features. In: Proceedings of the international conference on pattern recognition (ICPR08)
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge analysis. In: Proceedings of the ICME 2009, pp 514–517
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2009) A gradient difference based technique for video text detection. In: Proceedings of the ICDAR 2009, pp 156–160
Google Scholar
Phan TQ, Shivakumara P, Tan CL (2009) A Laplacian method for video text detection. In: Proceedings of the ICDAR, pp 66–70
Google Scholar
Shivakumara P, Huang W, Trung PQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43:2165–2185
Article Google Scholar
Park SH, Kim KI, Jung K, Kim HJ (1999) Locating car license plates using neural networks. IEEE Electron Lett 35:1475–1477
Article Google Scholar
Wu V, Manmatha R, Risean EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell (PAMI) 21:1224–1229
Article Google Scholar
Sin B, Kim S, Cho B (2002) Locating characters in scene images using frequency features. Proc Int Conf Pattern Recognit (ICPR) 3:489–492
Google Scholar
Mao W, Chung F, Lanm K, Siu W (2002) Hybrid Chinese/English text detection in images and video frames. Proc Int Conf Pattern Recognit (ICPR) 3:1015–1018
Google Scholar
Jain AK, Zhong Y (1996) Page segmentation using texture analysis. Pattern Recognit 29:743–770
Article Google Scholar
Kim KI, Jung J, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recognit 34:527–529
Article Google Scholar
Li H, Doermann D (2000) A video text detection system based on automated training. Proc Int Conf Pattern Recognit (ICPR) 223
Google Scholar
Jung K (2001) Neural network-based text location in color images. Pattern Recognit Lett 22:1503–1515
Article MATH Google Scholar
Shivakumara P, Phan TQ, Tan CL (2009) A robust wavelet transform based technique for video text detection. In: Proceedings of the ICDAR, pp 1285–1289
Google Scholar
Shivakumara P, Dutta A, Tan CL, Pal U (2010) A new wavelet-median-moment based method for multi-oriented video text detection. In: Proceedings of the DAS, pp 279–288
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2010) New Fourier-Statistical Features in RGB space for video text detection. IEEE Trans Circ Syst Video Technol (TCSVT) 20:1520–1532
Article Google Scholar
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell (TPAMI) 33:412–419
Article Google Scholar
Ohya, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell (PAMI) 16:214–224
Article Google Scholar
Lee CM, Kankanhalli A (1995) Automatic extraction of characters in complex images. Int J Pattern Recognit Artif Intell (IJPRAI) 9:67–82
Article Google Scholar
Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28:1523–1535
Article Google Scholar
Kim HK (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7:336–344
Article Google Scholar
Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. In: Proceedings of the SPIE, pp 180–188
Google Scholar
Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recognit 31:2055–2076
Article Google Scholar
Phan TQ, Shivakumara P, Tan CL (2010) A skeleton-based method for multi-oriented text detection. In: Ninth IAPR international workshop on document analysis and systems (DAS10), pp 271–278
Google Scholar
Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans PAMI 33(2):412–419
Article Google Scholar
Li X, Wang W, Jiang S, Huang Q, Gao W (2008) Fast effective text detection. In: Proceedings of the international conference on image processing (ICIP), pp 969–972
Google Scholar
Anthimopoulus M, Gatos B, Pratikakis I (2008) A hybrid system for text detection in video frames. International Conf Doc Anal Syst (DAS) 1:286–292
Google Scholar
Zhang X, Sun F (2011) Pulse coupled neural network edge based algorithm for image text locating. Tsinghua Sci Technol 16:22–30
Article Google Scholar
Shivakumara P, Sreedhar RP, Phan TQ, Shijian L, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans CSVT 22:1227–1235
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing University, Nanjing, China
Tong Lu
Faculty of CSIT, University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
National University of Singapore, Singapore, Singapore
Chew Lim Tan
Multimedia Software Engineering Research Center, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Wenyin Liu

Authors

Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shivakumara Palaiahnakote
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, T., Palaiahnakote, S., Tan, C.L., Liu, W. (2014). Video Caption Detection. In: Video Text Detection. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6515-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6515-6_3
Published: 30 June 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6514-9
Online ISBN: 978-1-4471-6515-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics