Abstract
Text in video contains valuable information and is exploited in many content-based video applications. However, scene text detection has not been systematically explored even people have developed a lot of optical character recognition (OCR) techniques in the past decades. This chapter gives an introduction to the current progress on scene text detection especially in the past several years. It starts from discussing the visual saliency of scene texts to describe the characteristics of text in natural scene images. Then, the recent developments of scene text detection from video or image are discussed, roughly being categorized into bottom-up, top-down, statistic and learning, temporal or motion analysis, and hybrid approaches. Scene character recognition methods are introduced accordingly. Several typical scene text datasets adopted in different applications are introduced for performance evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012
Yin X-C, Yin X, Huang K (2013) Robust text detection in natural scene images. arXiv preprint arXiv:1301.2628
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition (CVPR), 2010
Torralba A et al (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786
Shahab A et al (2012) How salient is scene text? In: 10th IAPR international workshop on document analysis systems (DAS), 2012
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Advances in neural information processing systems
Zhang L et al (2008) SUN: a bayesian framework for saliency using natural statistics. J Vis 8(7):32
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. Pattern Anal Mach Intell IEEE Trans 20(11):1254–1259
Uchida S et al (2011) A keypoint-based approach toward scenery character detection. In: International conference on document analysis and recognition (ICDAR), 2011
Karaoglu S, Gemert J, Gevers T (2012) Object reading: text recognition for object recognition. In: Fusiello A, Murino V, Cucchiara R (eds) Computer vision – ECCV 2012. Workshops and demonstrations. Springer, Berlin, pp 456–465
Jain AK, Yu BIN (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076
Kim H-K (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7(4):336–344
Shivakumara P, Trung Quy P, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. Pattern Anal Mach Intell IEEE Trans 33(2):412–419
Shivakumara P et al (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. Circ Syst Video Technol, IEEE Trans. PP(99):1
Shivakumara P et al (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recogn 43(6):2165–2185
Pan J et al (2012) Effectively leveraging visual context to detect texts in natural scenes, In: Asian conference on computer vision (ACCV’12), 2012. Daejeon
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Kimmel R, Klette R, Sugimoto A (eds) Computer vision – ACCV 2010. Springer, Berlin, pp 770–783
Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass Hough forests. In: 8th international conference on computer vision theory and applications (VISAPP). Barcelona, pp 737–741
Gall J et al (2011) Hough forests for object detection, tracking, and action recognition. Pattern Anal Mach Intell IEEE Trans 33(11):2188–2202
Kunishige Y, Yaokai F, Uchida S (2011) Scenery character detection with environmental context. In: International conference on document analysis and recognition (ICDAR), 2011
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Xiangrong C, Yuille AL (2014) Detecting and reading text in natural scenes. In: CVPR 2004. Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004
Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recognit Artif Intell 7(04):705–719
Jung-Jin L et al (2011) AdaBoost for text detection in natural scene. In: International conference on document analysis and recognition (ICDAR), 2011
Vezhnevets A, Vezhnevets V (2005) Modest AdaBoost-teaching AdaBoost to generalize better. Graphicon-2005, Novosibirsk Akademgorodok
Shivakumara P et al (2012) Multioriented video scene text detection through bayesian classification and boundary growing. Circ Syst Video Technol IEEE Trans 22(8):1227–1235
Shivakumara P et al (2011) A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recogn 44(8):1671–1683
Chenyang X, Prince JL (1998) Snakes, shapes, and gradient vector flow. Image Process IEEE Trans 7(3):359–369
Palma D, Ascenso J, Pereira F (2004) Automatic text extraction in digital video based on motion analysis. In: Campilho A, Kamel M (eds) Image analysis and recognition. Springer, Berlin, pp 588–596
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. Image Process IEEE Trans 9(1):147–156
Tsung-Han T, Yung-Chien C (2007) A comprehensive motion videotext detection localization and extraction method. In: IEEE 23rd international conference on data engineering workshop, 2007
Chen W, Hongliang W (2010) Utilization of temporal continuity in video text detection. In: Second international conference on multimedia and information technology (MMIT), 2010
Xiaoou T et al (2002) Video text extraction using temporal feature vectors. In: ICME ’02. Proceedings of the IEEE international conference on multimedia and expo, 2002
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cong Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012
Kai W, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: IEEE international conference on computer vision (ICCV), 2011
Ozuysal M, Fua P, Lepetit V (2007) Fast keypoint recognition in ten lines of code. In: CVPR’07. IEEE conference on computer vision and pattern recognition, 2007
Gonzalez A et al (2012) A character recognition method in natural scene images. In: 21st international conference on pattern recognition (ICPR), 2012
Campos TED, Babu BR, Varma M (2009) Character recognition in natural images. In: Computer vision theory and applications, pp 273–280
Feild J, Erik G (2012) Learned-Miller, scene text recognition with bilateral regression. UMass Amherst technical report
Lucas SM et al (2003) ICDAR 2003 robust reading competitions. In: Proceedings of the seventh international conference on document analysis and recognition, 2003
Lucas S et al (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122
Lucas SM (2005) ICDAR 2005 text locating competition results. In: Proceedings of the eighth international conference on document analysis and recognition, 2005
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International conference on document analysis and recognition (ICDAR), 2011
A database of images. Available from: http://research.microsoft.com/~manik/
Netzer Y et al (2011) Reading digits in natural images with unsupervised feature learning
The Street View House Numbers (SVHN) Dataset. Available from: http://ufldl.stanford.edu/housenumbers/
Amazon Mechanical Turk framework. Available from: https://www.mturk.com/mturk/welcome
Wu L, Shivakumara P, Lu T, Tan CL Text detection using Delaunay Triangulation in video sequence. DAS 2014, to appear
Karatzas D, Shafait K, Uchida S, Iwamura M, Bigorda LG ICDAR 2013 robust reading competition. In: Proceedings of the 12th ICDAR, pp 1115–1124
Yin XC, Yin XW, Huang KZ, Hao HW (2013) Robust text detection in natural scene images. CVPR
Shi CZ, Wang CH, Xiao BH, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. CVPR
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Lu, T., Palaiahnakote, S., Tan, C.L., Liu, W. (2014). Text Detection from Video Scenes. In: Video Text Detection. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6515-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6515-6_4
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6514-9
Online ISBN: 978-1-4471-6515-6
eBook Packages: Computer ScienceComputer Science (R0)