Advertisement

Journal of Computer Science and Technology

, Volume 30, Issue 2, pp 353–363 | Cite as

Robust Video Text Detection with Morphological Filtering Enhanced MSER

  • Yun-Zhi Zhuge
  • Hu-Chuan LuEmail author
Regular Paper

Abstract

Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts’ label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.

Keywords

text detection gradient amplitude map morphological filtering maximally stable extremal region graph cuts 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.2963-2970.Google Scholar
  2. [2]
    Zhang J, Kasturi R. Character energy and link energy-based text extraction in scene images. In Proc. the 10th Asian Conference on Computer Vision, Nov. 2010, pp.308-320.Google Scholar
  3. [3]
    Lyu M R, Song J, Cai M. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243–255.CrossRefGoogle Scholar
  4. [4]
    Huang X, Ma H, Yuan H. A novel video text detection and localization approach. In Proc. the 9th Pacific Rim Conference on Multimedia, Dec. 2008, pp.525-534.Google Scholar
  5. [5]
    Huang X, Ma H. Automatic detection and localization of natural scene text in video. In Proc. the 20th IEEE International Conference on Pattern Recognition, Aug. 2010, pp.3216-3219.Google Scholar
  6. [6]
    Zhao X, Lin K H, Fu Y, Hu Y, Liu Y, Huang T S. Text from corners: A novel approach to detect text and caption in videos. IEEE Transactions on Image Processing, 2011, 20(3): 790–799.CrossRefMathSciNetGoogle Scholar
  7. [7]
    Kim W, Kim C. A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 2009, 18(2): 401–411.CrossRefMathSciNetGoogle Scholar
  8. [8]
    Shivakumara P, Phan T Q, Tan C L. A robust wavelet transform based technique for video text detection. In Proc. the 10th International Conference on Document Analysis and Recognition, Jul. 2009, pp.1285-1289.Google Scholar
  9. [9]
    Shivakumara P, Phan T Q, Tan C L. A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412–419.CrossRefGoogle Scholar
  10. [10]
    Yi C, Tian Y. Text detection in natural scene images by stroke Gabor words. In Proc. the 11th International Conference on Document Analysis and Recognition, Sept. 2011, pp.177-181.Google Scholar
  11. [11]
    Pan Y F, Hou X, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800–813.CrossRefMathSciNetGoogle Scholar
  12. [12]
    Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L J V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1/2): 43–72.CrossRefGoogle Scholar
  13. [13]
    Donoser M, Bischof H. Efficient maximally stable extremal region (MSER) tracking. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp.553-560.Google Scholar
  14. [14]
    Donoser M, Bischof H, Wiltsche M. Color blob segmentation by MSER analysis. In Proc. IEEE International Conference on Image Processing, Oct. 2006, pp.757-760.Google Scholar
  15. [15]
    Jackway P. Improved morphological top-hat. Electronics Letters, 2000, 36(14): 1194–1195.CrossRefGoogle Scholar
  16. [16]
    Ye B, Peng J. Small target detection method based on morphology top-hat operator. Journal of Image and Graphics, 2002, 7(7): 638–642. (in Chinese)Google Scholar
  17. [17]
    Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.CrossRefGoogle Scholar
  18. [18]
    Freedman D, Zhang T. Interactive graph cut based segmentation with shape priors. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.755-762.Google Scholar
  19. [19]
    Yi C, Tian Y. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 2011, 20(9): 2594–2605.CrossRefMathSciNetGoogle Scholar
  20. [20]
    Chen H, Tsai S, Schroth G, Chen D, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 18th IEEE International Conference on Image Processing, Sept. 2011, pp.2609-2612.Google Scholar
  21. [21]
    He X C, Yang N H C. Curvature scale space corner detector with adaptive threshold and dynamic region of support. In Proc. the 17th IEEE International Conference on Pattern Recognition, Aug. 2004, pp.791-794.Google Scholar
  22. [22]
    Liu X, Wang W. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia, 2012, 14(2): 482–489.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.College of Electronic, Communication and PhysicsShandong University of Science and TechnologyQingdaoChina
  2. 2.School of Information and Communication EngineeringDalian University of TechnologyDalianChina

Personalised recommendations