Scene Text Recognition and Retrieval for Large Lexicons

  • Udit RoyEmail author
  • Anand Mishra
  • Karteek Alahari
  • C. V. Jawahar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9003)


In this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-specific list of words, known as the small lexicon setting, is unavailable. We present a conditional random field model defined on potential character locations and the interactions between them. Observing that the interaction potentials computed in the large lexicon setting are less effective than in the case of a small lexicon, we propose an iterative method, which alternates between finding the most likely solution and refining the interaction potentials. We evaluate our method on public datasets and show that it improves over baseline and state-of-the-art approaches. For example, we obtain nearly 15 % improvement in recognition accuracy and precision for our retrieval task over baseline methods on the IIIT-5K word dataset, with a large lexicon containing 0.5 million words.


Large Dictionary Scene Text Recognition Small Dictionary Dictionary Set Lexicon Reduction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partially supported by the Ministry of Communications and Information Technology, Government of India, New Delhi. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India PhD fellowship award.


  1. 1.
    Weinman, J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. TPAMI 36, 375–387 (2014)CrossRefGoogle Scholar
  2. 2.
    Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: ICCV (2013)Google Scholar
  3. 3.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)Google Scholar
  4. 4.
    Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)Google Scholar
  5. 5.
    Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR (2013)Google Scholar
  6. 6.
    Rodriguez, J., Perronnin, F.: Label embedding for text recognition. In: BMVC (2013)Google Scholar
  7. 7.
    Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order langauge priors. In: BMVC (2012)Google Scholar
  8. 8.
    Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. TPAMI 31, 1733–1746 (2009)CrossRefGoogle Scholar
  9. 9.
    Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  10. 10.
    Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)Google Scholar
  11. 11.
    Batra, D., Yadollahpour, P., Guzman-Rivera, A., Shakhnarovich, G.: Diverse m-best solutions in markov random fields. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 1–16. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  12. 12.
    Sheshadri, K., Divvala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012)Google Scholar
  13. 13.
    Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR (2013)Google Scholar
  14. 14.
    Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 146–160 (1972)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. TPAMI 28, 1568–1583 (2006)CrossRefGoogle Scholar
  16. 16.
    ICDAR 2003 datasets.
  17. 17.
    Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  18. 18.
    Street View Text dataset.
  19. 19.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
  20. 20.
    Mishra, A., Alahari, K., Jawahar, C.V.: Image retrieval using textual cues. In: ICCV (2013)Google Scholar
  21. 21.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  22. 22.
    Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811 (2013)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Udit Roy
    • 1
    Email author
  • Anand Mishra
    • 1
  • Karteek Alahari
    • 2
  • C. V. Jawahar
    • 1
  1. 1.CVITIIIT HyderabadHyderabadIndia
  2. 2.Inria, LEAR team, Inria Grenoble Rhône-Alpes, Laboratoire Jean KuntzmannCNRS, Univ. Grenoble AlpesSaint-Martin-d’HéresFrance

Personalised recommendations