Skip to main content

End-to-End Scene Text Recognition with Character Centroid Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Abstract

Scene text recognition tries to extract text information from natural images, being widely applied in computer vision and intelligent information processing. In this paper, we propose a novel end-to-end approach to scene text recognition with a specially trained fully convolutional network for predicting the centroid and pixel cluster of each character. With the help of this new information, we can solve the character instance segmentation problem effectively and then combine the recognized characters into words to accomplish the text recognition task. It is demonstrated by the experimental results on ICDAR2013 dataset that our proposed method with character centroid prediction can get a promising result on scene text recognition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)

    Article  Google Scholar 

  2. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  3. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). doi:10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  4. Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 625–632 (2015)

    Google Scholar 

  5. Zhang, Z., Zhang, C., Shen, W., et al.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 4159–4167 (2016)

    Google Scholar 

  6. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016)

    Google Scholar 

  7. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_48

    Google Scholar 

  8. Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected mrfs. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 669–677 (2016)

    Google Scholar 

  9. Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 3150–3158 (2016)

    Google Scholar 

  10. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 3431–3440 (2015)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of 2016 IEEE conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 770–778 (2016)

    Google Scholar 

  12. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2315–2324 (2016)

    Google Scholar 

  13. Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). doi:10.1007/978-3-319-16631-5_7

    Google Scholar 

  14. Zhang, Z., Shen, W., Yao, C., et al.: Symmetry-based text line detection in natural scenes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 2558–2567 (2015)

    Google Scholar 

  15. Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 2013 International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)

    Google Scholar 

  16. Liao, M., Shi, B., Bai, X., et al.: TextBoxes: a fast text detector with a single deep neural network. In: Prooceedings of AAAI 2017, pp. 4161–4167 (2017)

    Google Scholar 

  17. Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. arXiv preprint arXiv:1412.1842 (2014)

  18. Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

  19. Yin, X.C., Yin, X., Huang, K., et al.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant 61171138.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinwen Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, W., Ma, J. (2017). End-to-End Scene Text Recognition with Character Centroid Prediction. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70090-8_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70089-2

  • Online ISBN: 978-3-319-70090-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics