Abstract
Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts’ label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.
Similar content being viewed by others
References
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp.2963-2970.
Zhang J, Kasturi R. Character energy and link energy-based text extraction in scene images. In Proc. the 10th Asian Conference on Computer Vision, Nov. 2010, pp.308-320.
Lyu M R, Song J, Cai M. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(2): 243–255.
Huang X, Ma H, Yuan H. A novel video text detection and localization approach. In Proc. the 9th Pacific Rim Conference on Multimedia, Dec. 2008, pp.525-534.
Huang X, Ma H. Automatic detection and localization of natural scene text in video. In Proc. the 20th IEEE International Conference on Pattern Recognition, Aug. 2010, pp.3216-3219.
Zhao X, Lin K H, Fu Y, Hu Y, Liu Y, Huang T S. Text from corners: A novel approach to detect text and caption in videos. IEEE Transactions on Image Processing, 2011, 20(3): 790–799.
Kim W, Kim C. A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 2009, 18(2): 401–411.
Shivakumara P, Phan T Q, Tan C L. A robust wavelet transform based technique for video text detection. In Proc. the 10th International Conference on Document Analysis and Recognition, Jul. 2009, pp.1285-1289.
Shivakumara P, Phan T Q, Tan C L. A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412–419.
Yi C, Tian Y. Text detection in natural scene images by stroke Gabor words. In Proc. the 11th International Conference on Document Analysis and Recognition, Sept. 2011, pp.177-181.
Pan Y F, Hou X, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800–813.
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool L J V. A comparison of affine region detectors. International Journal of Computer Vision, 2005, 65(1/2): 43–72.
Donoser M, Bischof H. Efficient maximally stable extremal region (MSER) tracking. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2006, pp.553-560.
Donoser M, Bischof H, Wiltsche M. Color blob segmentation by MSER analysis. In Proc. IEEE International Conference on Image Processing, Oct. 2006, pp.757-760.
Jackway P. Improved morphological top-hat. Electronics Letters, 2000, 36(14): 1194–1195.
Ye B, Peng J. Small target detection method based on morphology top-hat operator. Journal of Image and Graphics, 2002, 7(7): 638–642. (in Chinese)
Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.
Freedman D, Zhang T. Interactive graph cut based segmentation with shape priors. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.755-762.
Yi C, Tian Y. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 2011, 20(9): 2594–2605.
Chen H, Tsai S, Schroth G, Chen D, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 18th IEEE International Conference on Image Processing, Sept. 2011, pp.2609-2612.
He X C, Yang N H C. Curvature scale space corner detector with adaptive threshold and dynamic region of support. In Proc. the 17th IEEE International Conference on Pattern Recognition, Aug. 2004, pp.791-794.
Liu X, Wang W. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia, 2012, 14(2): 482–489.
Author information
Authors and Affiliations
Corresponding author
Additional information
Special Section on Object Recognition
Rights and permissions
About this article
Cite this article
Zhuge, YZ., Lu, HC. Robust Video Text Detection with Morphological Filtering Enhanced MSER. J. Comput. Sci. Technol. 30, 353–363 (2015). https://doi.org/10.1007/s11390-015-1528-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-015-1528-z