End-to-End Scene Text Recognition with Character Centroid Prediction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)

Abstract

Scene text recognition tries to extract text information from natural images, being widely applied in computer vision and intelligent information processing. In this paper, we propose a novel end-to-end approach to scene text recognition with a specially trained fully convolutional network for predicting the centroid and pixel cluster of each character. With the help of this new information, we can solve the character instance segmentation problem effectively and then combine the recognized characters into words to accomplish the text recognition task. It is demonstrated by the experimental results on ICDAR2013 dataset that our proposed method with character centroid prediction can get a promising result on scene text recognition.

Keywords

Scene text recognition Character centroid prediction Fully convolutional networks Character instance segmentation 

Notes

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant 61171138.

References

  1. 1.
    Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRefGoogle Scholar
  2. 2.
    Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)Google Scholar
  3. 3.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). doi: 10.1007/978-3-319-46484-8_4 CrossRefGoogle Scholar
  4. 4.
    Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 625–632 (2015)Google Scholar
  5. 5.
    Zhang, Z., Zhang, C., Shen, W., et al.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 4159–4167 (2016)Google Scholar
  6. 6.
    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016)Google Scholar
  7. 7.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  8. 8.
    Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected mrfs. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 669–677 (2016)Google Scholar
  9. 9.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 3150–3158 (2016)Google Scholar
  10. 10.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 3431–3440 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of 2016 IEEE conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 770–778 (2016)Google Scholar
  12. 12.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2315–2324 (2016)Google Scholar
  13. 13.
    Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). doi: 10.1007/978-3-319-16631-5_7 Google Scholar
  14. 14.
    Zhang, Z., Shen, W., Yao, C., et al.: Symmetry-based text line detection in natural scenes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 2558–2567 (2015)Google Scholar
  15. 15.
    Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 2013 International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  16. 16.
    Liao, M., Shi, B., Bai, X., et al.: TextBoxes: a fast text detector with a single deep neural network. In: Prooceedings of AAAI 2017, pp. 4161–4167 (2017)Google Scholar
  17. 17.
    Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. arXiv preprint arXiv:1412.1842 (2014)
  18. 18.
    Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
  19. 19.
    Yin, X.C., Yin, X., Huang, K., et al.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Information Science, School of Mathematical Sciences And LMAMPeking UniversityBeijingChina

Personalised recommendations