Skip to main content
Log in

Detection of artificial and scene text in images and video frames

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Textual information in images and video frames constitutes a valuable source of high-level semantics for multimedia indexing and retrieval systems. Text detection is the most crucial step in a multimedia text extraction system and although it has been extensively studied the past decade still, it does not exist a generic architecture that would work for artificial and scene text in multimedia content. In this paper we propose a system for text detection of both artificial and scene text in images and video frames. The system is based on a machine learning stage which uses an Random Forest classifier and a highly discriminative feature set produced by using a new texture operator called Multilevel Adaptive Color edge Local Binary Pattern (MACeLBP). MACeLBP describes the spatial distribution of color edges in multiple adaptive levels of contrast. Then, a gradient-based algorithm is applied to achieve distinction among text lines as well as refinement in the localization of the text lines. The whole algorithm is situated in a multiresolution framework to achieve invariance to scale for the detection of text lines. Finally, an optional connected-component step segments text lines into words based on the distances between the resulting components. The experimental results are produced by applying a concise evaluation methodology and prove the superior performance achieved by the proposed text detection system for artificial and scene text in images and video frames.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. ACM/Springer Multime´d Sys 8:69–81

    Article  Google Scholar 

  2. Sobottka K, Bunke H, Kronenberg H (1999) Identification of text on colored book and journal covers. International conference on document analysis and recognition, pp 57–63

  3. Wang K, Kangas JA (2003) Character location in scene images from digital camera. Pattern Recognit 36(10):2287–2299

    Article  MATH  Google Scholar 

  4. Sato T, Kanade T, Hughes E, and Smith M (1998) Video ocr for digital news archives, IEEE workshop on content-based access of image and video databases, pp 52–60

  5. Anthimopoulos M, Gatos B, Pratikakis I (2007) Multiresolution text detection in video frames. International conference on computer vision theory and applications, pp 161–166

  6. Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411

    Article  MathSciNet  Google Scholar 

  7. Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99

    Article  Google Scholar 

  8. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transforms, IEEE conference on computer vision and pattern recognition, San Francisco

  9. Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Machine Intell 22(4):385–392

    Article  Google Scholar 

  10. Crandall D, Antani S, Kasturi R (2003) Extraction of special effects caption text events from digital video. Int J Document Anal Recognit 5(2–3):138–157

    Google Scholar 

  11. Lim Y.K, Choi S.H, and Lee S.W (2000) Text extraction in mpeg compressed video for content-based indexing. International conference on pattern recognition, pp 409–412

  12. Gargi U, Crandall D.J, Antani S, Gandhi T, Keener R, Kasturi R (1999) A system for automatic text detection in video. International conference on document analysis and recognition, pp 29–32

  13. Goto H (2008) Redefining the DCT-based feature for scene text detection: Analysis and comparison of spatial frequency-based features. Int J Document Anal Recognit 11(1):1–8

    Article  MathSciNet  Google Scholar 

  14. Chen D, Odobez J-M, Thiran J-P (2004) A localization/verification scheme for finding text in images and videos based on contrast independent features and machine learning methods. Image Commun 19(3):205–217

    Google Scholar 

  15. Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vision Comput 23(6):565–576

    Article  Google Scholar 

  16. Jung C, Liu Q, Kim J (2009) A stroke filter and its application to text localization. Pattern Recogn Lett 30(2):114–122

    Article  Google Scholar 

  17. Anthimopoulos M, Gatos B, Pratikakis I (2010) A two-stage scheme for text detection in video images. Image Vision Comput 28(9):1413–1426

    Article  Google Scholar 

  18. Ye Q, Jiao J, Huang J, Yu H (2007) Text detection and restoration in natural scene images. J Vis Commun Image Represent 18(6):504–513

    Article  Google Scholar 

  19. Ji R, Xu P, Yao H, Zhang Z, Sun X, Liu T (2008) Directional correlation analysis of local Haar binary pattern for text detection. IEEE International Conference on Multimedia & Expo, pp 885–888

  20. A. Ekin (2006) Information based overlaid text detection by classifier fusion. IEEE international conference on acoustics, speech and signal processing, pp II-753–II-756

  21. Jung K (2001) Neural network-based text location in color images. Pattern Recogn Lett 22(14):1503–1515

    Article  MATH  Google Scholar 

  22. Kim KI, Jung K, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recogn 34(2):527–529

    Article  Google Scholar 

  23. Wolf C and Jolion J-M (2004) Model Based Text Detection in Images and Videos: a Learning Approach. Technical Report LIRIS-RR-2004-13 Laboratoire d’Informatique en Images et Systemes d’Information, INSA de Lyon, France

  24. Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits and Systems for Video Technol 12(4):256–268

    Article  Google Scholar 

  25. Li H, Doermann D, Kia O (2000) Automatic Text Detection and Tracking in Digital Video. IEEE Trans Image Process 9(1):147–156

    Article  Google Scholar 

  26. Chen X.R, Yuille A.L (2004) Detecting and reading text in natural scenes. IEEE computer society conference on computer vision and pattern recognition, pp 366–373

  27. Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comp Vision 57(2):137–154

    Article  Google Scholar 

  28. Ojala T, Pietikainen M, Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recogn 29(1):51–59

    Article  Google Scholar 

  29. Breiman L (2001) Random forests. Machine Learn 45(1):5–32

    Article  MATH  Google Scholar 

  30. Tang Y, Krasse S, He Y, Yang W, Alperovitch D (2008) Support vector machines and random forests modeling for spam senders behavior analysis. GLOBECOM, pp 2174–2178

  31. Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns, 11th IEEE international conference on computer vision, pp 1–8

  32. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE transactions on systems. Man Cybern 9(1):62–66

    Article  MathSciNet  Google Scholar 

  33. Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions, ICDAR, pp 682–687

  34. Wolf C, Jolion J (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marios Anthimopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anthimopoulos, M., Gatos, B. & Pratikakis, I. Detection of artificial and scene text in images and video frames. Pattern Anal Applic 16, 431–446 (2013). https://doi.org/10.1007/s10044-011-0237-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0237-7

Keywords

Navigation