End-to-End Interpretation of the French Street Name Signs Dataset

  • Raymond SmithEmail author
  • Chunhui Gu
  • Dar-Shyang Lee
  • Huiyi Hu
  • Ranjith Unnikrishnan
  • Julian Ibarz
  • Sacha Arnoud
  • Sophia Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9913)


We introduce the French Street Name Signs (FSNS) Dataset consisting of more than a million images of street name signs cropped from Google Street View images of France. Each image contains several views of the same street name sign. Every image has normalized, title case folded ground-truth text as it would appear on a map. We believe that the FSNS dataset is large and complex enough to train a deep network of significant complexity to solve the street name extraction problem “end-to-end” or to explore the design trade-offs between a single complex engineered network and multiple sub-networks designed and trained to solve sub-problems. We present such an “end-to-end” network/graph for Tensor Flow and its results on the FSNS dataset.


Deep networks End-to-end networks Image dataset Multiview dataset 


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
  2. 2.
    Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F.: High-performance OCR for printed English and Fraktur using LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 683–687. IEEE (2013)Google Scholar
  3. 3.
    Google: reCAPTCHA. Accessed 20 June 2016
  4. 4.
    Google: Tensorflow. Accessed 20 June 2016
  5. 5.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)Google Scholar
  6. 6.
    Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1764–1772 (2014)Google Scholar
  7. 7.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  8. 8.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
  9. 9.
    Jung, J., Lee, S., Cho, M.S., Kim, J.H.: Touch TT: scene text extractor using touchscreen interface. ETRI J. 33(1), 78–88 (2011)CrossRefGoogle Scholar
  10. 10.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  11. 11.
    Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, H., Okamoto, M., Yamamoto, H., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal. Recognit. (IJDAR) 7(2–3), 105–122 (2005)CrossRefGoogle Scholar
  12. 12.
    Nagy, R., Dicker, A., Meyer-Wegener, K.: NEOCR: a configurable dataset for natural image text recognition. In: Iwamura, M., Shafait, F. (eds.) CBDAR 2011. LNCS, vol. 7139, pp. 150–163. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, vol. 2011, p. 4. (2011)Google Scholar
  14. 14.
    Smith, R.: Tesseract blends old and new OCR technology. Accessed 20 June 2016
  15. 15.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems vol. 27, pp. 3104–3112. Curran Associates Inc. (2014).
  16. 16.
    Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)
  17. 17.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)Google Scholar
  18. 18.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Raymond Smith
    • 1
    Email author
  • Chunhui Gu
    • 1
  • Dar-Shyang Lee
    • 1
  • Huiyi Hu
    • 1
  • Ranjith Unnikrishnan
    • 1
  • Julian Ibarz
    • 1
  • Sacha Arnoud
    • 1
  • Sophia Lin
    • 1
  1. 1.Google Inc.Mountain ViewUSA

Personalised recommendations