International Journal of Computer Vision

, Volume 113, Issue 3, pp 193–207

Label Embedding: A Frugal Baseline for Text Recognition

  • Jose A. Rodriguez-Serrano
  • Albert Gordo
  • Florent Perronnin
Article

DOI: 10.1007/s11263-014-0793-6

Cite this article as:
Rodriguez-Serrano, J.A., Gordo, A. & Perronnin, F. Int J Comput Vis (2015) 113: 193. doi:10.1007/s11263-014-0793-6

Abstract

The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields. This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents several advantages: it does not require ad-hoc or costly pre-/post-processing operations, it can build on top of any state-of-the-art image descriptor (Fisher vectors in our case), it allows for the recognition of never-seen-before words (zero-shot recognition) and the recognition process is simple and efficient, as it amounts to a nearest neighbor search. Experiments are performed on challenging datasets of license plates and scene text. The main conclusion of the paper is that with such a frugal approach it is possible to obtain results which are competitive with standard bottom-up approaches, thus establishing label embedding as an interesting and simple to compute baseline for text recognition.

Keywords

Label embedding Scene text recognition Structured learning 

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Jose A. Rodriguez-Serrano
    • 1
  • Albert Gordo
    • 2
  • Florent Perronnin
    • 2
  1. 1.Machine Learning for Services AreaXerox Research Centre EuropeMeylanFrance
  2. 2.Computer Vision GroupXerox Research Centre EuropeMeylanFrance

Personalised recommendations