Robust Video Text Detection with Morphological Filtering Enhanced MSER

Zhuge, Yun-Zhi; Lu, Hu-Chuan

doi:10.1007/s11390-015-1528-z

Robust Video Text Detection with Morphological Filtering Enhanced MSER

Regular Paper
Published: 13 March 2015

Volume 30, pages 353–363, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yun-Zhi Zhuge¹ &
Hu-Chuan Lu²

153 Accesses
6 Citations
Explore all metrics

Abstract

Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts’ label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.2963-2970.
Zhang J, Kasturi R. Character energy and link energy-based text extraction in scene images. In Proc. the 10th Asian Conference on Computer Vision, Nov. 2010, pp.308-320.
Lyu M R, Song J, Cai M. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243–255.
Article Google Scholar
Huang X, Ma H, Yuan H. A novel video text detection and localization approach. In Proc. the 9th Pacific Rim Conference on Multimedia, Dec. 2008, pp.525-534.
Huang X, Ma H. Automatic detection and localization of natural scene text in video. In Proc. the 20th IEEE International Conference on Pattern Recognition, Aug. 2010, pp.3216-3219.
Zhao X, Lin K H, Fu Y, Hu Y, Liu Y, Huang T S. Text from corners: A novel approach to detect text and caption in videos. IEEE Transactions on Image Processing, 2011, 20(3): 790–799.
Article MathSciNet Google Scholar
Kim W, Kim C. A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 2009, 18(2): 401–411.
Article MathSciNet Google Scholar
Shivakumara P, Phan T Q, Tan C L. A robust wavelet transform based technique for video text detection. In Proc. the 10th International Conference on Document Analysis and Recognition, Jul. 2009, pp.1285-1289.
Shivakumara P, Phan T Q, Tan C L. A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412–419.
Article Google Scholar
Yi C, Tian Y. Text detection in natural scene images by stroke Gabor words. In Proc. the 11th International Conference on Document Analysis and Recognition, Sept. 2011, pp.177-181.
Pan Y F, Hou X, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800–813.
Article MathSciNet Google Scholar
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L J V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1/2): 43–72.
Article Google Scholar
Donoser M, Bischof H. Efficient maximally stable extremal region (MSER) tracking. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp.553-560.
Donoser M, Bischof H, Wiltsche M. Color blob segmentation by MSER analysis. In Proc. IEEE International Conference on Image Processing, Oct. 2006, pp.757-760.
Jackway P. Improved morphological top-hat. Electronics Letters, 2000, 36(14): 1194–1195.
Article Google Scholar
Ye B, Peng J. Small target detection method based on morphology top-hat operator. Journal of Image and Graphics, 2002, 7(7): 638–642. (in Chinese)
Google Scholar
Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.
Article Google Scholar
Freedman D, Zhang T. Interactive graph cut based segmentation with shape priors. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.755-762.
Yi C, Tian Y. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 2011, 20(9): 2594–2605.
Article MathSciNet Google Scholar
Chen H, Tsai S, Schroth G, Chen D, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 18th IEEE International Conference on Image Processing, Sept. 2011, pp.2609-2612.
He X C, Yang N H C. Curvature scale space corner detector with adaptive threshold and dynamic region of support. In Proc. the 17th IEEE International Conference on Pattern Recognition, Aug. 2004, pp.791-794.
Liu X, Wang W. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia, 2012, 14(2): 482–489.
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Electronic, Communication and Physics, Shandong University of Science and Technology, Qingdao, 266510, China
Yun-Zhi Zhuge
School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116023, China
Hu-Chuan Lu

Authors

Yun-Zhi Zhuge
View author publications
You can also search for this author in PubMed Google Scholar
Hu-Chuan Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hu-Chuan Lu.

Additional information

Special Section on Object Recognition

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuge, YZ., Lu, HC. Robust Video Text Detection with Morphological Filtering Enhanced MSER. J. Comput. Sci. Technol. 30, 353–363 (2015). https://doi.org/10.1007/s11390-015-1528-z

Download citation

Received: 23 December 2014
Revised: 20 January 2015
Published: 13 March 2015
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11390-015-1528-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Video Text Detection with Morphological Filtering Enhanced MSER

Abstract

Access this article

Similar content being viewed by others

Robust detection of video text using an efficient hybrid method via key frame extraction and text localization

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Color and Gradient Features for Text Segmentation from Video Frames

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Video Text Detection with Morphological Filtering Enhanced MSER

Abstract

Access this article

Similar content being viewed by others

Robust detection of video text using an efficient hybrid method via key frame extraction and text localization

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Color and Gradient Features for Text Segmentation from Video Frames

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation