Text Detection from Video Scenes

Lu, Tong; Palaiahnakote, Shivakumara; Tan, Chew Lim; Liu, Wenyin

doi:10.1007/978-1-4471-6515-6_4

Tong Lu⁷,
Shivakumara Palaiahnakote⁸,
Chew Lim Tan⁹ &
…
Wenyin Liu¹⁰

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1143 Accesses

Abstract

Text in video contains valuable information and is exploited in many content-based video applications. However, scene text detection has not been systematically explored even people have developed a lot of optical character recognition (OCR) techniques in the past decades. This chapter gives an introduction to the current progress on scene text detection especially in the past several years. It starts from discussing the visual saliency of scene texts to describe the characteristics of text in natural scene images. Then, the recent developments of scene text detection from video or image are discussed, roughly being categorized into bottom-up, top-down, statistic and learning, temporal or motion analysis, and hybrid approaches. Scene character recognition methods are introduced accordingly. Several typical scene text datasets adopted in different applications are introduced for performance evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012
Google Scholar
Yin X-C, Yin X, Huang K (2013) Robust text detection in natural scene images. arXiv preprint arXiv:1301.2628
Google Scholar
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition (CVPR), 2010
Google Scholar
Torralba A et al (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786
Article Google Scholar
Shahab A et al (2012) How salient is scene text? In: 10th IAPR international workshop on document analysis systems (DAS), 2012
Google Scholar
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Advances in neural information processing systems
Google Scholar
Zhang L et al (2008) SUN: a bayesian framework for saliency using natural statistics. J Vis 8(7):32
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. Pattern Anal Mach Intell IEEE Trans 20(11):1254–1259
Article Google Scholar
Uchida S et al (2011) A keypoint-based approach toward scenery character detection. In: International conference on document analysis and recognition (ICDAR), 2011
Google Scholar
Karaoglu S, Gemert J, Gevers T (2012) Object reading: text recognition for object recognition. In: Fusiello A, Murino V, Cucchiara R (eds) Computer vision – ECCV 2012. Workshops and demonstrations. Springer, Berlin, pp 456–465
Chapter Google Scholar
Jain AK, Yu BIN (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076
Article Google Scholar
Kim H-K (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7(4):336–344
Article Google Scholar
Shivakumara P, Trung Quy P, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. Pattern Anal Mach Intell IEEE Trans 33(2):412–419
Article Google Scholar
Shivakumara P et al (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. Circ Syst Video Technol, IEEE Trans. PP(99):1
Google Scholar
Shivakumara P et al (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recogn 43(6):2165–2185
Article Google Scholar
Pan J et al (2012) Effectively leveraging visual context to detect texts in natural scenes, In: Asian conference on computer vision (ACCV’12), 2012. Daejeon
Google Scholar
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Kimmel R, Klette R, Sugimoto A (eds) Computer vision – ACCV 2010. Springer, Berlin, pp 770–783
Chapter Google Scholar
Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass Hough forests. In: 8th international conference on computer vision theory and applications (VISAPP). Barcelona, pp 737–741
Google Scholar
Gall J et al (2011) Hough forests for object detection, tracking, and action recognition. Pattern Anal Mach Intell IEEE Trans 33(11):2188–2202
Article Google Scholar
Kunishige Y, Yaokai F, Uchida S (2011) Scenery character detection with environmental context. In: International conference on document analysis and recognition (ICDAR), 2011
Google Scholar
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Article MATH Google Scholar
Xiangrong C, Yuille AL (2014) Detecting and reading text in natural scenes. In: CVPR 2004. Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004
Google Scholar
Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recognit Artif Intell 7(04):705–719
Article Google Scholar
Jung-Jin L et al (2011) AdaBoost for text detection in natural scene. In: International conference on document analysis and recognition (ICDAR), 2011
Google Scholar
Vezhnevets A, Vezhnevets V (2005) Modest AdaBoost-teaching AdaBoost to generalize better. Graphicon-2005, Novosibirsk Akademgorodok
Google Scholar
Shivakumara P et al (2012) Multioriented video scene text detection through bayesian classification and boundary growing. Circ Syst Video Technol IEEE Trans 22(8):1227–1235
Article Google Scholar
Shivakumara P et al (2011) A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recogn 44(8):1671–1683
Article Google Scholar
Chenyang X, Prince JL (1998) Snakes, shapes, and gradient vector flow. Image Process IEEE Trans 7(3):359–369
Article MATH Google Scholar
Palma D, Ascenso J, Pereira F (2004) Automatic text extraction in digital video based on motion analysis. In: Campilho A, Kamel M (eds) Image analysis and recognition. Springer, Berlin, pp 588–596
Chapter Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. Image Process IEEE Trans 9(1):147–156
Article Google Scholar
Tsung-Han T, Yung-Chien C (2007) A comprehensive motion videotext detection localization and extraction method. In: IEEE 23rd international conference on data engineering workshop, 2007
Google Scholar
Chen W, Hongliang W (2010) Utilization of temporal continuity in video text detection. In: Second international conference on multimedia and information technology (MMIT), 2010
Google Scholar
Xiaoou T et al (2002) Video text extraction using temporal feature vectors. In: ICME ’02. Proceedings of the IEEE international conference on multimedia and expo, 2002
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Cong Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012
Google Scholar
Kai W, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: IEEE international conference on computer vision (ICCV), 2011
Google Scholar
Ozuysal M, Fua P, Lepetit V (2007) Fast keypoint recognition in ten lines of code. In: CVPR’07. IEEE conference on computer vision and pattern recognition, 2007
Google Scholar
Gonzalez A et al (2012) A character recognition method in natural scene images. In: 21st international conference on pattern recognition (ICPR), 2012
Google Scholar
Campos TED, Babu BR, Varma M (2009) Character recognition in natural images. In: Computer vision theory and applications, pp 273–280
Google Scholar
Feild J, Erik G (2012) Learned-Miller, scene text recognition with bilateral regression. UMass Amherst technical report
Google Scholar
Lucas SM et al (2003) ICDAR 2003 robust reading competitions. In: Proceedings of the seventh international conference on document analysis and recognition, 2003
Google Scholar
Lucas S et al (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122
Article Google Scholar
Lucas SM (2005) ICDAR 2005 text locating competition results. In: Proceedings of the eighth international conference on document analysis and recognition, 2005
Google Scholar
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International conference on document analysis and recognition (ICDAR), 2011
Google Scholar
A database of images. Available from: http://research.microsoft.com/~manik/
Netzer Y et al (2011) Reading digits in natural images with unsupervised feature learning
Google Scholar
The Street View House Numbers (SVHN) Dataset. Available from: http://ufldl.stanford.edu/housenumbers/
Amazon Mechanical Turk framework. Available from: https://www.mturk.com/mturk/welcome
Wu L, Shivakumara P, Lu T, Tan CL Text detection using Delaunay Triangulation in video sequence. DAS 2014, to appear
Google Scholar
Karatzas D, Shafait K, Uchida S, Iwamura M, Bigorda LG ICDAR 2013 robust reading competition. In: Proceedings of the 12th ICDAR, pp 1115–1124
Google Scholar
Yin XC, Yin XW, Huang KZ, Hao HW (2013) Robust text detection in natural scene images. CVPR
Google Scholar
Shi CZ, Wang CH, Xiao BH, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. CVPR
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing University, Nanjing, China
Tong Lu
Faculty of CSIT, University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
National University of Singapore, Singapore, Singapore
Chew Lim Tan
Multimedia Software Engineering Research Center, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Wenyin Liu

Authors

Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shivakumara Palaiahnakote
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, T., Palaiahnakote, S., Tan, C.L., Liu, W. (2014). Text Detection from Video Scenes. In: Video Text Detection. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6515-6_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6515-6_4
Published: 30 June 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6514-9
Online ISBN: 978-1-4471-6515-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics