Scene Text Recognition and Retrieval for Large Lexicons

Roy, Udit; Mishra, Anand; Alahari, Karteek; Jawahar, C. V.

doi:10.1007/978-3-319-16865-4_32

Udit Roy⁵,
Anand Mishra⁵,
Karteek Alahari⁶ &
…
C. V. Jawahar⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Included in the following conference series:

Asian Conference on Computer Vision

2060 Accesses
2 Citations

Abstract

In this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-specific list of words, known as the small lexicon setting, is unavailable. We present a conditional random field model defined on potential character locations and the interactions between them. Observing that the interaction potentials computed in the large lexicon setting are less effective than in the case of a small lexicon, we propose an iterative method, which alternates between finding the most likely solution and refining the interaction potentials. We evaluate our method on public datasets and show that it improves over baseline and state-of-the-art approaches. For example, we obtain nearly 15 % improvement in recognition accuracy and precision for our retrieval task over baseline methods on the IIIT-5K word dataset, with a large lexicon containing 0.5 million words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It should also be noted that [7] follows an open vocabulary lexicon, i.e., it does not assume that the ground truth is present in the lexicon. We find that around 75 % of the ground truth words from the IIIT 5K-word dataset are present in the large lexicon by default. The rest of the ground truth words are language-specific and proper nouns like city and shop names.

References

Weinman, J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. TPAMI 36, 375–387 (2014)
Article Google Scholar
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: ICCV (2013)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Google Scholar
Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR (2013)
Google Scholar
Rodriguez, J., Perronnin, F.: Label embedding for text recognition. In: BMVC (2013)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order langauge priors. In: BMVC (2012)
Google Scholar
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. TPAMI 31, 1733–1746 (2009)
Article Google Scholar
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012)
Chapter Google Scholar
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
Google Scholar
Batra, D., Yadollahpour, P., Guzman-Rivera, A., Shakhnarovich, G.: Diverse m-best solutions in markov random fields. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 1–16. Springer, Heidelberg (2012)
Chapter Google Scholar
Sheshadri, K., Divvala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012)
Google Scholar
Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text recognition using co-occurrence of histogram of oriented gradients. In: ICDAR (2013)
Google Scholar
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 146–160 (1972)
Article MathSciNet Google Scholar
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. TPAMI 28, 1568–1583 (2006)
Article Google Scholar
ICDAR 2003 datasets. http://algoval.essex.ac.uk/icdar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
Chapter Google Scholar
Street View Text dataset. http://vision.ucsd.edu/~kai/svt
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)
Article Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: Image retrieval using textual cues. In: ICCV (2013)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811 (2013)

Download references

Acknowledgements

This work was partially supported by the Ministry of Communications and Information Technology, Government of India, New Delhi. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India PhD fellowship award.

Author information

Authors and Affiliations

CVIT, IIIT Hyderabad, Hyderabad, India
Udit Roy, Anand Mishra & C. V. Jawahar
Inria, LEAR team, Inria Grenoble Rhône-Alpes, Laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, Saint-Martin-d’Héres, France
Karteek Alahari

Authors

Udit Roy
View author publications
You can also search for this author in PubMed Google Scholar
Anand Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Karteek Alahari
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Udit Roy .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roy, U., Mishra, A., Alahari, K., Jawahar, C.V. (2015). Scene Text Recognition and Retrieval for Large Lexicons. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-16865-4_32
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics