Skip to main content
Log in

Bidirectional extraction and recognition of scene text with layout consistency

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Text recognition in natural scene images is a challenging task that has recently been garnering increased research attention. In this paper, we propose a method for recognizing text by utilizing the layout consistency of a text string. We estimate the layout (four lines of a text string) using initial character extraction and recognition result. On the basis of the layout consistency across a word, we perform character extraction and recognition again using four lines, which is more accurate than the first process. Our layout estimation method is different from previous methods in terms of exploiting character recognition results and its use of a class-conditional layout model. More accurate and robust estimation is achieved, and it can be used to refine character extraction and recognition. We call this two-way process—from extraction and recognition to layout, and from layout to extraction and recognition—“bidirectional” to discriminate it from previous feedback refinement approaches. Experimental results demonstrate that our bidirectional processes provide a boost to the performance of word recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Abbyy FineReader 9.0. http://www.abbyy.com

  2. Almazán, J., Gordo, G., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

  3. Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models. In: ICLP, Oct 2013

  4. Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary OCR system for English and Arabic. PAMI 21(6), 495–504 (1999)

    Article  Google Scholar 

  5. Bengio, Y., LeCun, Y.: Word normalization for on-line handwritten word recognition. In: IAPR (1994)

  6. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photocr: reading text in uncontrolled conditions. In: ICCV (2013)

  7. Caesar, T., Gloger, J.M., Mandler, E.: Estimating the baseline for written material. In: ICDAR (1995)

  8. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

  10. de Campos, T., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP (2009)

  11. Goel, V., Ecole, W.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR (2013)

  12. Gordo, A.: Supervised mid-level features for word image representation. In: CVPR (2015)

  13. Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: ECCV (2014)

  14. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision. 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  15. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: NIPS Deep Learning Workshop, pp. 1–10 (2014)

  16. Jones, M.A., Story, G.A., Ballard, B.W.: Integrating multiple knowledge sources in a bayesian ocr post-processor. In: ICDAR (1991)

  17. Karatzas, D., Gomez-bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)

  18. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, Aug 2013

  19. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)

    Article  Google Scholar 

  20. Kumar, D., Prasad, M., Ramakrishnan, A.: Maps: midline analysis and propagation of segmentation. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, p. 15. ACM (2012)

  21. Kumar, D., Prasad, M.A., Ramakrishnan, A.: Nesp: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging, pp. 865806–865806. International Society for Optics and Photonics (2013)

  22. Lee, C.-Y., Bhardwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region-based discriminative feature pooling for scene text recognition. In: CVPR (2014)

  23. Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)

  24. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  25. Miller, E.G., Viola, P.A.: Ambiguity and constraint in mathematical expression recognition. In: AAAI/IAAI, pp. 784–791 (1998)

  26. Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)

  27. Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)

  28. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)

  29. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)

  30. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)

  31. Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)

  32. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)

  33. Sarkar, P., Nagy, G.: Style consistent classification of isogenous patterns. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 88–98 (2005)

  34. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)

  35. Su, B., Lu, S., Tian, S., Lim, J.-H., Tan, C.-L.: Character recognition in natural scenes using convolutional co-occurrence hog. In: ICPR (2014)

  36. Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)

    Article  Google Scholar 

  37. Thillou, C., Ferreira, S., Gosselin, B.: An embedded application for degraded text recognition. EURASIP J. Appl. Signal Process. 2127–2135, 2005 (2005)

    Google Scholar 

  38. Tian, S., Lu, S., Su, B.: Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR (2013)

  39. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)

  40. Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)

  41. Wang, T., Wu, D.J., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: ICPR (2012)

  42. Weinman, J., Learned-Miller, E.: Improving recognition of novel input with similarity. In: IEEE Computer Soceity Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 17–22 June 2006, pp. 308–315 (2006)

  43. Weinman, J.J.: Typographical features for scene text recognition. In: ICPR (2010)

  44. Weinman, J.J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 375–389 (2014)

  45. Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009)

  46. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)

  47. ICDAR.: http://rrc.cvc.uab.es/?com=evaluation&ch=4&view=task3_method&id_submit=3895 (2015). Accessed 23 Feb 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryota Hinami.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hinami, R., Liu, X., Chiba, N. et al. Bidirectional extraction and recognition of scene text with layout consistency. IJDAR 19, 83–98 (2016). https://doi.org/10.1007/s10032-016-0261-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0261-7

Keywords

Navigation