ACCV 2014: Computer Vision - ACCV 2014 Workshops pp 157-168 | Cite as
Scene Text Recognition: No Country for Old Men?
Abstract
It is a generally accepted fact that Off-the-shelf OCR engines do not perform well in unconstrained scenarios like natural scene imagery, where text appears among the clutter of the scene. However, recent research demonstrates that a conventional shape-based OCR engine would be able to produce competitive results in the end-to-end scene text recognition task when provided with a conveniently preprocessed image. In this paper we confirm this finding with a set of experiments where two off-the-shelf OCR engines are combined with an open implementation of a state-of-the-art scene text detection framework. The obtained results demonstrate that in such pipeline, conventional OCR solutions still perform competitively compared to other solutions specifically designed for scene text recognition.
Keywords
Text Line Character Classifier Beam Search Text Detection Scene TextNotes
Acknowledgement
This project was supported by the Spanish project TIN2011-24631 the fellowship RYC-2009-05031, and the Catalan government scholarship 2013 FI1126. The authors want to thanks also Google Inc. for the support received through the GSoC project, as well as the OpenCV community, specially to Stefano Fabri and Vadim Pisarevsky, for their help in the implementation of the scene text detection module evaluated in this paper.
References
- 1.Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
- 2.Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
- 3.Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
- 4.Fujisawa, H.: Forty years of research in character and document recognition an industrial perspective. Pattern Recogn. 41, 2435–2446 (2008)CrossRefGoogle Scholar
- 5.Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)Google Scholar
- 6.Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez, L., Robles, S., Mas, J., Fernandez, D., Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)Google Scholar
- 7.Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition (ICDAR) (2003)Google Scholar
- 8.Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)Google Scholar
- 9.Neumann, L., Matas, J.: A method for text localization and detection. In: Assian Conference on Computer Vision (ACCV) (2010)Google Scholar
- 10.Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)Google Scholar
- 11.Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
- 12.Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)Google Scholar
- 13.Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
- 14.Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 15.Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: International Conference on Document Analysis and Recognition (ICDAR) (2009)Google Scholar
- 16.Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)Google Scholar
- 17.Smith, R.: An overview of the tesseract OCR engine. In: International Conference on Document Analysis and Recognition (ICDAR) (2007)Google Scholar
- 18.Smith, R.: Limits on the application of frequency-based language models to OCR. In: International Conference on Document Analysis and Recognition (ICDAR) (2011)Google Scholar
- 19.Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition (CVPR) (2001)Google Scholar
- 20.Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)Google Scholar
- 21.Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition (ICPR) (2012)Google Scholar
- 22.Yao, C., Bai, X., Liu, W.: A unified framework for multi-oriented text detection and recognition. In: IEEE Transactions on Image Processing (TIP) (2014)Google Scholar
- 23.Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2013)Google Scholar