Large-Lexicon Attribute-Consistent Text Recognition in Natural Images

  • Tatiana Novikova
  • Olga Barinova
  • Pushmeet Kohli
  • Victor Lempitsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7577)


This paper proposes a new model for the task of word recognition in natural images that simultaneously models visual and lexicon consistency of words in a single probabilistic model. Our approach combines local likelihood and pairwise positional consistency priors with higher order priors that enforce consistency of characters (lexicon) and their attributes (font and colour). Unlike traditional stage-based methods, word recognition in our framework is performed by estimating the maximum a posteriori (MAP) solution under the joint posterior distribution of the model. MAP inference in our model is performed through the use of weighted finite-state transducers (WFSTs). We show how the efficiency of certain operations on WFSTs can be utilized to find the most likely word under the model in an efficient manner. We evaluate our method on a range of challenging datasets (ICDAR’03, SVT, ICDAR’11). Experimental results demonstrate that our method outperforms state-of-the-art methods for cropped word recognition.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wang, K., Belongie, S.: Word Spotting in the Wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, ICCV (2011)Google Scholar
  3. 3.
    Neumann, L., Matas, J.: A Method for Text Localization and Recognition in Real-World Images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR (2011)Google Scholar
  5. 5.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)Google Scholar
  6. 6.
    Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, pp. 366–373. IEEE (2004)Google Scholar
  7. 7.
    Beaufort, R., Mancas-Thillou, C.: A weighted finite-state framework for correcting errors in natural scene ocr. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, vol. 02 (2007)Google Scholar
  8. 8.
    Smith, D.L., Field, J., Learned-Miller, E.G.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR. IEEE (2011)Google Scholar
  9. 9.
    Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Computer Speech & Language 16, 69–88 (2002)CrossRefGoogle Scholar
  10. 10.
    Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafiát, M., Kombrink, S., Motlícek, P., Qian, Y., Riedhammer, K., Veselý, K., Vu, N.T.: Generating exact lattices in the WFST framework. In: ICASSP (2012)Google Scholar
  11. 11.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. In: CVPR (1), pp. 261–268 (2004)Google Scholar
  13. 13.
    Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)Google Scholar
  14. 14.
    Jacobs, C.E., Simard, P.Y., Viola, P.A., Rinker, J.: Text recognition of low-resolution document images. In: ICDAR, pp. 695–699 (2005)Google Scholar
  15. 15.
    Ciura, M., Deorowicz, S.: How to squeeze a lexicon. Softw., Pract. Exper. 31, 1077–1090 (2001)MATHCrossRefGoogle Scholar
  16. 16.
    Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR (2011)Google Scholar
  17. 17.
    Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008),
  18. 18.
    Allauzen, C., Riley, M.: OpenFst: a general and efficient weighted finite-state transducer library (2010),
  19. 19.
    The OCRopus open source document analysis and OCR system,
  20. 20.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)CrossRefGoogle Scholar
  21. 21.
    Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tatiana Novikova
    • 1
  • Olga Barinova
    • 1
  • Pushmeet Kohli
    • 2
  • Victor Lempitsky
    • 3
  1. 1.Lomonosov Moscow State UniversityRussia
  2. 2.Microsoft Research CambridgeUK
  3. 3.YandexRussia

Personalised recommendations