Abstract
Typography and layout lead to the hierarchical organization of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing scene text detection methods. This paper addresses the problem of text segmentation in natural scenes from a hierarchical perspective. Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based on perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state-of-the-art methods in unconstrained scenarios.
This is a preview of subscription content, access via your institution.














References
Cao, F., Delon, J., Desolneux, A., Musé, P., Sur, F.: An a contrario approach to hierarchical clustering validity assessment. Technical report, INRIA (2004)
Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of ICIP, (2011)
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of CVPR, (2004)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of ICDAR, (2011)
Desolneux, A., Moisan, L., Morel, J.M.: A grouping principle and four applications. IEEE Trans. Pattern Anal. Mach. Intell. 25(4), 508–513 (2003)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, (2010)
Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: Proceedings of ICDAR, (2013)
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8(2), 179–187 (1962)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: Proceedings of ECCV, (2014)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceedings of ECCV, (2014)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P., et al.: Icdar 2013 robust reading competition. In: Proceedings of ICDAR, (2013)
Kumar, D., Prasad, M., Ramakrishnan, A.: Multi-script robust reading competition in icdar 2013. In: Proceedings of Workshop on Multilingual OCR, (2013)
Kumar, D., Ramakrishnan, A.: Otcymist: otsu-canny minimal spanning tree for born-digital images. In: DAS, pp. 389–393. IEEE, (2012)
Lee, S., Cho, M.S., Jung, K., Kim, J.H.: Scene text extraction with edge constraint and text collinearity. In: Proceedings of ICPR, (2010)
Li, L., Yu, S., Zhong, L., Li, X.: Multilingual text detection with nonlinear neural network. Math. Probl. Eng. 2015, 1–7 (2015)
Liang, G., Shivakumara, P., Lu, T., Tan, C.L.: Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans. Image Process. 24(11), 4488–4501 (2015)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3), 105–122 (2005)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Matas, J., Zimmermann, K.: A new class of learnable detectors for categorisation. In: Scandinavian Conference on Image Analysis, (2005)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: Proceedings of ICDAR, (2013)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. IJDAR 18(2), 169–182 (2015)
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-HOG: an effective gradient-based descriptor for single line text regions. Pattern Recogn. 46(3), 1078–1090 (2013)
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of CVPR, (2012)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, (2011)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Proceedings of ACCV, (2010)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of ICDAR, (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of CVPR, (2012)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceedings of ECCV, (2012)
Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: Proceedings of ICDAR, (2009)
Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: ICCV. IEEE, (2011)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)
Shi, C., Wang, C., Xiao, B., Gao, S., Hu, J.: End-to-end scene text recognition using tree-structured models. Pattern Recogn. 47(9), 2853–2866 (2014)
Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceedings of ICCV, (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: Proceedings of ECCV, (2010)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of ICPR, (2012)
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4), 280–296 (2006)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of CVPR, (2012)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: Proceedings of CVPR, (2014)
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1930–1937 (2015)
Zhang, J., Kasturi, R.: Text detection using edge gradient and graph spectrum. In: Proceedings of ICPR, (2010)
Acknowledgments
This project was supported by the Spanish project TIN2011-24631 the fellowship RYC-2009-05031 and the Catalan government scholarship 2013FI1126.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gomez, L., Karatzas, D. A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. IJDAR 19, 335–349 (2016). https://doi.org/10.1007/s10032-016-0274-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0274-2
Keywords
- Scene text
- Segmentation
- Detection
- Hierarchical grouping
- Perceptual organization