Advertisement

Multimedia Tools and Applications

, Volume 74, Issue 13, pp 4891–4906 | Cite as

An effective graph-cut scene text localization with embedded text segmentation

  • Xiaoqian LiuEmail author
  • Weiqiang Wang
Article

Abstract

This paper presents an effective and efficient approach to extracting scene text from images. The approach first extracts the edge information by the local maximum difference filter (LMDF), and at the same time a given image is decomposed into a group of image layers by color clustering. Then, through combining the characteristics of geometric structure and spatial distribution of scene text with the edge map, the candidate text image layers are identified. Further, in character level, the candidate text connected components are identified using a set of heuristic rules. Finally, the graph-cut computation is utilized to identify and localize text lines with arbitrary directions. In the proposed approach, the segmentation of text pixels is efficiently embedded into the computation of text localization as a part. The comprehensive evaluation experiments are performed on four challenging datasets (ICDAR 2003, ICDAR 2011, MSRA-TD500 and The Street View Text (SVT)) to verify the validation of our approach. In the comparison experiments with many state-of-the-art methods, the results demonstrate that our approach can effectively handle scene text with diverse fonts, sizes, colors, different languages, as well as arbitrary orientations, and it is robust to the influence of illumination change.

Keywords

Scene text Text localization Text segmentation Graph-cut 

Notes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61232013, No. 61271434, No. 61175115.

References

  1. 1.
    Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and bangla text extraction from natural scene images. In: Proceedings of the 10th international conference on document analysis and recognition (ICDAR). Catalonia, pp 171–175Google Scholar
  2. 2.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of the 23rd IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, pp 2963–2970Google Scholar
  3. 3.
    Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using toggle-mapping. In: Proceedings of the 16th IEEE international conference on image processing. Cairo, pp 2373–2376Google Scholar
  4. 4.
    Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: Proceedings of the 19th international conference on pattern recognition (ICPR). Tampa, pp 1–4Google Scholar
  5. 5.
    Junga C, Liu Q, Kim J (2008) A new approach for text segmentation using a stroke filter. Signal Proc 88(7):1907–1916CrossRefGoogle Scholar
  6. 6.
    Kumar M, Kim YC, Lee GS (2010) Text detection using multilayer separation in real scene images. In: Proceedings of the 10th IEEE international conference on computer and information technology. Bradford, pp 1413–1417Google Scholar
  7. 7.
    Kumar M, Lee G (2010) Automatic text location from complex natural scene images. In: Proceedings of international conference on computer and automation engineering. Singapore, pp 594–597Google Scholar
  8. 8.
    Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: Proceedings of the 11th international conference on document analysis and recognition (ICDAR). Beijing, pp 429–434Google Scholar
  9. 9.
    Li XJ, Wang WQ, Jiang SQ, Huang QM (2008) Fast and effective text detection. In: Proceedings of the 15th IEEE international conference on image processing. San Diego, pp 969–972Google Scholar
  10. 10.
    Liu Q, Jung C, Kim S, Moon Y, Yeun Kim J (2006) Stroke filter for text localization in video images. In: Proceedings of the 26th IEEE conference on image processing (ICIP). Atlanta, pp 1473–1476Google Scholar
  11. 11.
    Lu F, Xie M (2010) An efficient method of license plate location in complex scene. In: Proceedings of the 2nd international conference on computer modeling and simulation. Sanya Yuhai, pp 206–209Google Scholar
  12. 12.
    Lucas SM (2005) Icdar 2005 text locating competition results. In: Proceedings of the 8th international conference on document analysis and recognition (ICDAR). Seoul, pp 80–84Google Scholar
  13. 13.
    Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Proceedings of the 7th international conference on document analysis and recognition (ICDAR). Edinburgh, pp 682–687Google Scholar
  14. 14.
    Mancas-Thillou C, Gosselin B (2006) Spatial and color spaces combination for natural scene text extraction. In: Proceedings of the 13th international conference on image proceedings (ICIP). Atlanta, pp 985–988Google Scholar
  15. 15.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRefGoogle Scholar
  16. 16.
    Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Proceedings of the 10th Asian conference on computer vision (ACCV). New Zealand, pp 30–35Google Scholar
  17. 17.
    Neumann L, Matas J (2011) Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the 11th international conference on document analysis and recognition (ICDAR). Beijing, pp 687–691Google Scholar
  18. 18.
    Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition (CVPR). Providence, pp 3538–3545Google Scholar
  19. 19.
    Park J, Lee G, Kim E, Lim J, Kim S, Yang H, Lee M, Hwang S (2010) Automatic detection and recognition of korean text in outdoor signboard images. Pattern Recogn Lett 31(12):1728–1739CrossRefGoogle Scholar
  20. 20.
    Pazio M, Niedzwiecki M, Kowalik R, Lebiedz J (2007) Text detection system for the blind. In: Proceedings of the 15th European signal processing conference. Poznan, pp 272–276Google Scholar
  21. 21.
    Shahab A, Shafait F, A. Dengel. (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of the 11th international conference on document analysis and recognition. pp 1491–1496Google Scholar
  22. 22.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  23. 23.
    Shivakumara P, Huang W, Phan TQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43(6):2165–2185CrossRefGoogle Scholar
  24. 24.
    Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419CrossRefGoogle Scholar
  25. 25.
    Tang X, Gao X, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Netw 13(4):961–971CrossRefGoogle Scholar
  26. 26.
    Wang K, Babenko B, Belongie S (2011) End-to-end Scene Text Recognition. In: Proceedings of the 13th international conference on computer vision (ICCV). Barcelona, pp 1457–1464Google Scholar
  27. 27.
    Wang K, Belongie S (2010) Word Spotting in the Wild. In: Proceedings of the 11th European conference on computer vision (ECCV). Heraklion, pp 591–604Google Scholar
  28. 28.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition (CVPR). Providence, pp 1083–1090Google Scholar
  29. 29.
    Yi C, Tian Y (2013) Text extraction from scene images by character apperance and structure modeling. Comp Vision Image Underst 117(2):182–194CrossRefGoogle Scholar
  30. 30.
    Zeng C, Jia W, He X (2011) An algorithm for colour-based natural scene text segmentation. In: Proceedings of the 4th international conference on camera-based document analysis and recognition. Beijing, pp 58–68Google Scholar
  31. 31.
    Zhang J, Kasturi R (2010) Character energy and link energy-based text extraction in scene images. In: Proceedings of the 10th Asian conference on computer vision (ACCV). New Zealand, pp 308–320Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations