Bidirectional extraction and recognition of scene text with layout consistency

Hinami, Ryota; Liu, Xinhao; Chiba, Naoki; Satoh, Shin’ichi

doi:10.1007/s10032-016-0261-7

Bidirectional extraction and recognition of scene text with layout consistency

Original Paper
Published: 23 February 2016

Volume 19, pages 83–98, (2016)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Ryota Hinami ORCID: orcid.org/0000-0003-1542-2612¹,
Xinhao Liu²,
Naoki Chiba³ &
…
Shin’ichi Satoh⁴

513 Accesses
Explore all metrics

Abstract

Text recognition in natural scene images is a challenging task that has recently been garnering increased research attention. In this paper, we propose a method for recognizing text by utilizing the layout consistency of a text string. We estimate the layout (four lines of a text string) using initial character extraction and recognition result. On the basis of the layout consistency across a word, we perform character extraction and recognition again using four lines, which is more accurate than the first process. Our layout estimation method is different from previous methods in terms of exploiting character recognition results and its use of a class-conditional layout model. More accurate and robust estimation is achieved, and it can be used to refine character extraction and recognition. We call this two-way process—from extraction and recognition to layout, and from layout to extraction and recognition—“bidirectional” to discriminate it from previous feedback refinement approaches. Experimental results demonstrate that our bidirectional processes provide a boost to the performance of word recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbyy FineReader 9.0. http://www.abbyy.com
Almazán, J., Gordo, G., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models. In: ICLP, Oct 2013
Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. PAMI 21(6), 495–504 (1999)
Article Google Scholar
Bengio, Y., LeCun, Y.: Word normalization for on-line handwritten word recognition. In: IAPR (1994)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photocr: reading text in uncontrolled conditions. In: ICCV (2013)
Caesar, T., Gloger, J.M., Mandler, E.: Estimating the baseline for written material. In: ICDAR (1995)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
de Campos, T., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP (2009)
Goel, V., Ecole, W.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR (2013)
Gordo, A.: Supervised mid-level features for word image representation. In: CVPR (2015)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: ECCV (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision. 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: NIPS Deep Learning Workshop, pp. 1–10 (2014)
Jones, M.A., Story, G.A., Ballard, B.W.: Integrating multiple knowledge sources in a bayesian ocr post-processor. In: ICDAR (1991)
Karatzas, D., Gomez-bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, Aug 2013
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)
Article Google Scholar
Kumar, D., Prasad, M., Ramakrishnan, A.: Maps: midline analysis and propagation of segmentation. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, p. 15. ACM (2012)
Kumar, D., Prasad, M.A., Ramakrishnan, A.: Nesp: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging, pp. 865806–865806. International Society for Optics and Photonics (2013)
Lee, C.-Y., Bhardwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region-based discriminative feature pooling for scene text recognition. In: CVPR (2014)
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Miller, E.G., Viola, P.A.: Ambiguity and constraint in mathematical expression recognition. In: AAAI/IAAI, pp. 784–791 (1998)
Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Sarkar, P., Nagy, G.: Style consistent classification of isogenous patterns. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 88–98 (2005)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
Su, B., Lu, S., Tian, S., Lim, J.-H., Tan, C.-L.: Character recognition in natural scenes using convolutional co-occurrence hog. In: ICPR (2014)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)
Article Google Scholar
Thillou, C., Ferreira, S., Gosselin, B.: An embedded application for degraded text recognition. EURASIP J. Appl. Signal Process. 2127–2135, 2005 (2005)
Google Scholar
Tian, S., Lu, S., Su, B.: Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR (2013)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)
Wang, T., Wu, D.J., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: ICPR (2012)
Weinman, J., Learned-Miller, E.: Improving recognition of novel input with similarity. In: IEEE Computer Soceity Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 17–22 June 2006, pp. 308–315 (2006)
Weinman, J.J.: Typographical features for scene text recognition. In: ICPR (2010)
Weinman, J.J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 375–389 (2014)
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)
ICDAR.: http://rrc.cvc.uab.es/?com=evaluation&ch=4&view=task3_method&id_submit=3895 (2015). Accessed 23 Feb 2016

Download references

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Ryota Hinami
Tokyo Institute of Technology, Tokyo, Japan
Xinhao Liu
Rakuten, Inc., Tokyo, Japan
Naoki Chiba
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Authors

Ryota Hinami
View author publications
You can also search for this author in PubMed Google Scholar
Xinhao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Shin’ichi Satoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryota Hinami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hinami, R., Liu, X., Chiba, N. et al. Bidirectional extraction and recognition of scene text with layout consistency. IJDAR 19, 83–98 (2016). https://doi.org/10.1007/s10032-016-0261-7

Download citation

Received: 04 June 2015
Revised: 09 December 2015
Accepted: 05 February 2016
Published: 23 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10032-016-0261-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bidirectional extraction and recognition of scene text with layout consistency

Abstract

Access this article

Similar content being viewed by others

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

A Multi-level Progressive Rectification Mechanism for Irregular Scene Text Recognition

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bidirectional extraction and recognition of scene text with layout consistency

Abstract

Access this article

Similar content being viewed by others

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

A Multi-level Progressive Rectification Mechanism for Irregular Scene Text Recognition

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation