Skip to main content

Text Detection from Video Scenes

  • Chapter
  • First Online:
Video Text Detection

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

  • 1143 Accesses

Abstract

Text in video contains valuable information and is exploited in many content-based video applications. However, scene text detection has not been systematically explored even people have developed a lot of optical character recognition (OCR) techniques in the past decades. This chapter gives an introduction to the current progress on scene text detection especially in the past several years. It starts from discussing the visual saliency of scene texts to describe the characteristics of text in natural scene images. Then, the recent developments of scene text detection from video or image are discussed, roughly being categorized into bottom-up, top-down, statistic and learning, temporal or motion analysis, and hybrid approaches. Scene character recognition methods are introduced accordingly. Several typical scene text datasets adopted in different applications are introduced for performance evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012

    Google Scholar 

  2. Yin X-C, Yin X, Huang K (2013) Robust text detection in natural scene images. arXiv preprint arXiv:1301.2628

    Google Scholar 

  3. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition (CVPR), 2010

    Google Scholar 

  4. Torralba A et al (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev 113(4):766–786

    Article  Google Scholar 

  5. Shahab A et al (2012) How salient is scene text? In: 10th IAPR international workshop on document analysis systems (DAS), 2012

    Google Scholar 

  6. Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Advances in neural information processing systems

    Google Scholar 

  7. Zhang L et al (2008) SUN: a bayesian framework for saliency using natural statistics. J Vis 8(7):32

    Article  Google Scholar 

  8. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. Pattern Anal Mach Intell IEEE Trans 20(11):1254–1259

    Article  Google Scholar 

  9. Uchida S et al (2011) A keypoint-based approach toward scenery character detection. In: International conference on document analysis and recognition (ICDAR), 2011

    Google Scholar 

  10. Karaoglu S, Gemert J, Gevers T (2012) Object reading: text recognition for object recognition. In: Fusiello A, Murino V, Cucchiara R (eds) Computer vision – ECCV 2012. Workshops and demonstrations. Springer, Berlin, pp 456–465

    Chapter  Google Scholar 

  11. Jain AK, Yu BIN (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076

    Article  Google Scholar 

  12. Kim H-K (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7(4):336–344

    Article  Google Scholar 

  13. Shivakumara P, Trung Quy P, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. Pattern Anal Mach Intell IEEE Trans 33(2):412–419

    Article  Google Scholar 

  14. Shivakumara P et al (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. Circ Syst Video Technol, IEEE Trans. PP(99):1

    Google Scholar 

  15. Shivakumara P et al (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recogn 43(6):2165–2185

    Article  Google Scholar 

  16. Pan J et al (2012) Effectively leveraging visual context to detect texts in natural scenes, In: Asian conference on computer vision (ACCV’12), 2012. Daejeon

    Google Scholar 

  17. Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Kimmel R, Klette R, Sugimoto A (eds) Computer vision – ACCV 2010. Springer, Berlin, pp 770–783

    Chapter  Google Scholar 

  18. Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass Hough forests. In: 8th international conference on computer vision theory and applications (VISAPP). Barcelona, pp 737–741

    Google Scholar 

  19. Gall J et al (2011) Hough forests for object detection, tracking, and action recognition. Pattern Anal Mach Intell IEEE Trans 33(11):2188–2202

    Article  Google Scholar 

  20. Kunishige Y, Yaokai F, Uchida S (2011) Scenery character detection with environmental context. In: International conference on document analysis and recognition (ICDAR), 2011

    Google Scholar 

  21. Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44

    Article  MATH  Google Scholar 

  22. Xiangrong C, Yuille AL (2014) Detecting and reading text in natural scenes. In: CVPR 2004. Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004

    Google Scholar 

  23. Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recognit Artif Intell 7(04):705–719

    Article  Google Scholar 

  24. Jung-Jin L et al (2011) AdaBoost for text detection in natural scene. In: International conference on document analysis and recognition (ICDAR), 2011

    Google Scholar 

  25. Vezhnevets A, Vezhnevets V (2005) Modest AdaBoost-teaching AdaBoost to generalize better. Graphicon-2005, Novosibirsk Akademgorodok

    Google Scholar 

  26. Shivakumara P et al (2012) Multioriented video scene text detection through bayesian classification and boundary growing. Circ Syst Video Technol IEEE Trans 22(8):1227–1235

    Article  Google Scholar 

  27. Shivakumara P et al (2011) A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recogn 44(8):1671–1683

    Article  Google Scholar 

  28. Chenyang X, Prince JL (1998) Snakes, shapes, and gradient vector flow. Image Process IEEE Trans 7(3):359–369

    Article  MATH  Google Scholar 

  29. Palma D, Ascenso J, Pereira F (2004) Automatic text extraction in digital video based on motion analysis. In: Campilho A, Kamel M (eds) Image analysis and recognition. Springer, Berlin, pp 588–596

    Chapter  Google Scholar 

  30. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. Image Process IEEE Trans 9(1):147–156

    Article  Google Scholar 

  31. Tsung-Han T, Yung-Chien C (2007) A comprehensive motion videotext detection localization and extraction method. In: IEEE 23rd international conference on data engineering workshop, 2007

    Google Scholar 

  32. Chen W, Hongliang W (2010) Utilization of temporal continuity in video text detection. In: Second international conference on multimedia and information technology (MMIT), 2010

    Google Scholar 

  33. Xiaoou T et al (2002) Video text extraction using temporal feature vectors. In: ICME ’02. Proceedings of the IEEE international conference on multimedia and expo, 2002

    Google Scholar 

  34. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  35. Cong Y et al (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012

    Google Scholar 

  36. Kai W, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: IEEE international conference on computer vision (ICCV), 2011

    Google Scholar 

  37. Ozuysal M, Fua P, Lepetit V (2007) Fast keypoint recognition in ten lines of code. In: CVPR’07. IEEE conference on computer vision and pattern recognition, 2007

    Google Scholar 

  38. Gonzalez A et al (2012) A character recognition method in natural scene images. In: 21st international conference on pattern recognition (ICPR), 2012

    Google Scholar 

  39. Campos TED, Babu BR, Varma M (2009) Character recognition in natural images. In: Computer vision theory and applications, pp 273–280

    Google Scholar 

  40. Feild J, Erik G (2012) Learned-Miller, scene text recognition with bilateral regression. UMass Amherst technical report

    Google Scholar 

  41. Lucas SM et al (2003) ICDAR 2003 robust reading competitions. In: Proceedings of the seventh international conference on document analysis and recognition, 2003

    Google Scholar 

  42. Lucas S et al (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122

    Article  Google Scholar 

  43. Lucas SM (2005) ICDAR 2005 text locating competition results. In: Proceedings of the eighth international conference on document analysis and recognition, 2005

    Google Scholar 

  44. Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International conference on document analysis and recognition (ICDAR), 2011

    Google Scholar 

  45. A database of images. Available from: http://research.microsoft.com/~manik/

  46. Netzer Y et al (2011) Reading digits in natural images with unsupervised feature learning

    Google Scholar 

  47. The Street View House Numbers (SVHN) Dataset. Available from: http://ufldl.stanford.edu/housenumbers/

  48. Amazon Mechanical Turk framework. Available from: https://www.mturk.com/mturk/welcome

  49. Wu L, Shivakumara P, Lu T, Tan CL Text detection using Delaunay Triangulation in video sequence. DAS 2014, to appear

    Google Scholar 

  50. Karatzas D, Shafait K, Uchida S, Iwamura M, Bigorda LG ICDAR 2013 robust reading competition. In: Proceedings of the 12th ICDAR, pp 1115–1124

    Google Scholar 

  51. Yin XC, Yin XW, Huang KZ, Hao HW (2013) Robust text detection in natural scene images. CVPR

    Google Scholar 

  52. Shi CZ, Wang CH, Xiao BH, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. CVPR

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Lu, T., Palaiahnakote, S., Tan, C.L., Liu, W. (2014). Text Detection from Video Scenes. In: Video Text Detection. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6515-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6515-6_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6514-9

  • Online ISBN: 978-1-4471-6515-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics