Abstract
Text recognition in natural scene images is a challenging task that has recently been garnering increased research attention. In this paper, we propose a method for recognizing text by utilizing the layout consistency of a text string. We estimate the layout (four lines of a text string) using initial character extraction and recognition result. On the basis of the layout consistency across a word, we perform character extraction and recognition again using four lines, which is more accurate than the first process. Our layout estimation method is different from previous methods in terms of exploiting character recognition results and its use of a class-conditional layout model. More accurate and robust estimation is achieved, and it can be used to refine character extraction and recognition. We call this two-way process—from extraction and recognition to layout, and from layout to extraction and recognition—“bidirectional” to discriminate it from previous feedback refinement approaches. Experimental results demonstrate that our bidirectional processes provide a boost to the performance of word recognition.
Similar content being viewed by others
References
Abbyy FineReader 9.0. http://www.abbyy.com
Almazán, J., Gordo, G., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models. In: ICLP, Oct 2013
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. PAMI 21(6), 495–504 (1999)
Bengio, Y., LeCun, Y.: Word normalization for on-line handwritten word recognition. In: IAPR (1994)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photocr: reading text in uncontrolled conditions. In: ICCV (2013)
Caesar, T., Gloger, J.M., Mandler, E.: Estimating the baseline for written material. In: ICDAR (1995)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
de Campos, T., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP (2009)
Goel, V., Ecole, W.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR (2013)
Gordo, A.: Supervised mid-level features for word image representation. In: CVPR (2015)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: ECCV (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision. 116(1), 1–20 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: NIPS Deep Learning Workshop, pp. 1–10 (2014)
Jones, M.A., Story, G.A., Ballard, B.W.: Integrating multiple knowledge sources in a bayesian ocr post-processor. In: ICDAR (1991)
Karatzas, D., Gomez-bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, Aug 2013
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)
Kumar, D., Prasad, M., Ramakrishnan, A.: Maps: midline analysis and propagation of segmentation. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, p. 15. ACM (2012)
Kumar, D., Prasad, M.A., Ramakrishnan, A.: Nesp: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging, pp. 865806–865806. International Society for Optics and Photonics (2013)
Lee, C.-Y., Bhardwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region-based discriminative feature pooling for scene text recognition. In: CVPR (2014)
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Miller, E.G., Viola, P.A.: Ambiguity and constraint in mathematical expression recognition. In: AAAI/IAAI, pp. 784–791 (1998)
Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Sarkar, P., Nagy, G.: Style consistent classification of isogenous patterns. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 88–98 (2005)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
Su, B., Lu, S., Tian, S., Lim, J.-H., Tan, C.-L.: Character recognition in natural scenes using convolutional co-occurrence hog. In: ICPR (2014)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)
Thillou, C., Ferreira, S., Gosselin, B.: An embedded application for degraded text recognition. EURASIP J. Appl. Signal Process. 2127–2135, 2005 (2005)
Tian, S., Lu, S., Su, B.: Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR (2013)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)
Wang, T., Wu, D.J., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: ICPR (2012)
Weinman, J., Learned-Miller, E.: Improving recognition of novel input with similarity. In: IEEE Computer Soceity Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 17–22 June 2006, pp. 308–315 (2006)
Weinman, J.J.: Typographical features for scene text recognition. In: ICPR (2010)
Weinman, J.J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 375–389 (2014)
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)
ICDAR.: http://rrc.cvc.uab.es/?com=evaluation&ch=4&view=task3_method&id_submit=3895 (2015). Accessed 23 Feb 2016
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hinami, R., Liu, X., Chiba, N. et al. Bidirectional extraction and recognition of scene text with layout consistency. IJDAR 19, 83–98 (2016). https://doi.org/10.1007/s10032-016-0261-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0261-7