End-to-End Scene Text Recognition with Character Centroid Prediction

Zhao, Wei; Ma, Jinwen

doi:10.1007/978-3-319-70090-8_30

End-to-End Scene Text Recognition with Character Centroid Prediction

Wei Zhao¹⁸ &
Jinwen Ma¹⁸

Conference paper
First Online: 28 October 2017

4316 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Abstract

Scene text recognition tries to extract text information from natural images, being widely applied in computer vision and intelligent information processing. In this paper, we propose a novel end-to-end approach to scene text recognition with a specially trained fully convolutional network for predicting the centroid and pixel cluster of each character. With the help of this new information, we can solve the character instance segmentation problem effectively and then combine the recognized characters into words to accomplish the text recognition task. It is demonstrated by the experimental results on ICDAR2013 dataset that our proposed method with character centroid prediction can get a promising result on scene text recognition.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Article Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). doi:10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Zhu, S., Zanibbi, R.: A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 625–632 (2015)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., et al.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 4159–4167 (2016)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_48
Google Scholar
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected mrfs. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 669–677 (2016)
Google Scholar
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 3150–3158 (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 3431–3440 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of 2016 IEEE conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 770–778 (2016)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2315–2324 (2016)
Google Scholar
Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). doi:10.1007/978-3-319-16631-5_7
Google Scholar
Zhang, Z., Shen, W., Yao, C., et al.: Symmetry-based text line detection in natural scenes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 2558–2567 (2015)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: Proceedings of 2013 International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Google Scholar
Liao, M., Shi, B., Bai, X., et al.: TextBoxes: a fast text detector with a single deep neural network. In: Prooceedings of AAAI 2017, pp. 4161–4167 (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. arXiv preprint arXiv:1412.1842 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Yin, X.C., Yin, X., Huang, K., et al.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant 61171138.

Author information

Authors and Affiliations

Department of Information Science, School of Mathematical Sciences And LMAM, Peking University, Beijing, 100871, China
Wei Zhao & Jinwen Ma

Authors

Wei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwen Ma .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, W., Ma, J. (2017). End-to-End Scene Text Recognition with Character Centroid Prediction. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-70090-8_30
Published: 28 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics