Skip to main content

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 Workshops (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11367))

Included in the following conference series:

Abstract

An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks.

E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem. Code and trained models are released publicly at https://github.com/MichalBusta/E2E-MLT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Unicode table. https://unicode-table.com/. Accessed 30 Sept 2017

  2. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI 33(5), 898–916 (2011)

    Article  Google Scholar 

  3. Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

    Google Scholar 

  4. Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: CVPR (2007)

    Google Scholar 

  5. Bušta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231. IEEE (2017)

    Google Scholar 

  6. Chen, T.Q., Schmidt, M.: Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337 (2016)

  7. Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. PAMI 36(8), 1532–1545 (2014)

    Article  Google Scholar 

  8. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)

    Google Scholar 

  9. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  10. Girshick, R.: Fast R-CNN. In: ICCV (2015)

    Google Scholar 

  11. Gómez, L., Karatzas, D.: Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognit. 70, 60–74 (2017)

    Article  Google Scholar 

  12. Gomez, L., Nicolaou, A., Karatzas, D.: Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit. 67, 85–96 (2017)

    Article  Google Scholar 

  13. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)

    Google Scholar 

  14. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR (2016)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  16. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. IJCV 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  17. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)

    Google Scholar 

  18. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34

    Chapter  Google Scholar 

  19. Jain, M., Mathew, M., Jawahar, C.: Unconstrained scene text and video text recognition for Arabic script. In: ASAR (2017)

    Google Scholar 

  20. Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)

    Google Scholar 

  21. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)

    Google Scholar 

  22. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  24. Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR (2016)

    Google Scholar 

  25. Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: ICCV (2017)

    Google Scholar 

  26. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI (2017)

    Google Scholar 

  27. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)

    Google Scholar 

  28. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR (2015)

    Google Scholar 

  29. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  30. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR (2018)

    Google Scholar 

  31. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. arXiv preprint arXiv:1703.01086 (2017)

  32. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)

    Google Scholar 

  33. Nayef, N.: ICDAR 2017 competition on multi-lingual scene text detection and script identification (2017)

    Google Scholar 

  34. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60

    Chapter  Google Scholar 

  35. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)

    Google Scholar 

  36. Patel, Y., Gomez, L., Rusiñol, M., Karatzas, D.: Dynamic lexicon generation for natural scene images. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 395–410. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_29

    Chapter  Google Scholar 

  37. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  38. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  39. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. PAMI 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  40. Shi, B., et al.: ICDAR 2017 competition on reading Chinese text in the wild (RCTW-17). arXiv preprint arXiv:1708.09585 (2017)

  41. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  42. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  43. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  44. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR (2017)

    Google Scholar 

  45. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26

    Chapter  Google Scholar 

Download references

Acknowledgement

The research was supported by the Czech Science Foundation Project GACR P103/12/G084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yash Patel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bušta, M., Patel, Y., Matas, J. (2019). E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text. In: Carneiro, G., You, S. (eds) Computer Vision – ACCV 2018 Workshops. ACCV 2018. Lecture Notes in Computer Science(), vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21074-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21073-1

  • Online ISBN: 978-3-030-21074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics