Scene text recognition using residual convolutional recurrent neural network

Lei, Zhengchao; Zhao, Sanyuan; Song, Hongmei; Shen, Jianbing

doi:10.1007/s00138-018-0942-y

Scene text recognition using residual convolutional recurrent neural network

Original Paper
Published: 16 June 2018

Volume 29, pages 861–871, (2018)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Zhengchao Lei¹,
Sanyuan Zhao¹,
Hongmei Song¹ &
…
Jianbing Shen¹

1178 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

Text is a significant tool for human communication, and text recognition in scene images becomes more and more important. In this paper, we propose a residual convolutional recurrent neural network for solving the task of scene text recognition. The general convolutional recurrent neural network (CRNN) is realized by combining convolutional neural network (CNN) with recurrent neural network (RNN). The CNN part extracts features and the RNN part encodes and decodes feature sequences. In order to improve the accuracy rate of scene text recognition based on CRNN, we explore different deeper CNN architectures to get feature descriptors and analyze the corresponding text recognition results. Specifically, VGG and ResNet are introduced to train these different deep models and obtain the encoding information of images. The experimental results on public datasets demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Scene Text Recognition Based on Recurrent Neural Network

Text recognition in natural scenes based on deep learning

Article 16 February 2022

Convolutional Neural Networks for Scene Image Recognition

References

Thome, N., Vacavant, A., Robinault, L., Miguet, S.: A cognitive and video-based approach for multinational license plate recognition. Mach. Vis. Appl. 22(2), 389–407 (2011)
Article Google Scholar
Kheyrollahi, A., Breckon, T.: Automatic real-time road marking recognition using a feature driven approach. Mach. Vis. Appl. 23(1), 123–133 (2012)
Article Google Scholar
Rodriguez, J., Perronnin, F.: Label embedding for text recognition. In: British Machine Vision Conference, pp. 5.1–5.12 (2013)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Shi, B., Bai, X., Yao, C.: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. arxiv (2015)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: International IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Yao, X., Han, J., Zhang, D., Nie, F.: Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Trans. Image Process. 26(7), 3196–3209 (2017)
Article MathSciNet Google Scholar
Zhang, D., Meng, D., Li, C., Jiang, L., Zhao, Q., Han, J.: Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 865–878 (2017)
Article Google Scholar
Jian, M., Qi, Q., Dong, J., Sun, X., Sun, Y., Lam, K.: Saliency detection using quatemionic distance based weber descriptor and object cues. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4 (2016)
Jian, M., Lam, K., Dong, J., Shen, L.: Visual-patch-attention-aware saliency detection. IEEE Trans. Cybern. 45(8), 1575–1586 (2015)
Article Google Scholar
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. 35(1), 84–100 (2018)
Article Google Scholar
Shen, J., Peng, J., Dong, X., Shao, L., Porikli, F.: Higher-order energies for image segmentation. IEEE Trans. Image Process. 26(10), 4911–4922 (2017)
Article MathSciNet Google Scholar
Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE Trans. Pattern Anal Mach Intell 40(1), 20–33 (2018)
Article Google Scholar
Han, J., Quan, R., Zhang, D., Nie, F.: Robust object co-segmentation using background prior. IEEE Trans. Image Process. 27(4), 1639–1651 (2017)
Article MathSciNet Google Scholar
Shen, J., Du, Y., Wang, W., Li, X.: Lazy random walks for superpixel segmentation. IEEE Trans. Image Process. 23(4), 1451–1462 (2014)
Article MathSciNet MATH Google Scholar
Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L.: Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Trans. Image Process. 25(12), 5933–5942 (2016)
Article MathSciNet Google Scholar
Cheng, G., Yang, C., Yao, X., Guo, L., Han, J.: When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 56(5), 2811–2821 (2018)
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision, 1457–1464 (2011)
Wang, T., Wu, D., Coates, A., Ng, A.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: Reading text in uncontrolled conditions. In: International Conference on Computer Vision, pp. 785-792 (2013)
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision, pp. 97–104 (2013)
Lee, C., Bhardwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region-based discriminative feature pooling for scene text recognition. In: International Conference on Computer Vision and Pattern Recognition, pp. 4050–4057 (2014)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: International Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
Almazn, J., Gordo, A., Forns, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. Eprint Arxiv. 24(6), 603–611 (2015)
Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Asian Conference on Computer Vision, pp. 35–48 (2015)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: International Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)
Article MathSciNet Google Scholar
Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 4480–456 (2015)
Belagiannis, V., Wang, X., Shitrit, H., et al.: Parsing human skeletons in an operating room. Mach. Vis. Appl. 27(7), 1035–1046 (2016)
Article Google Scholar
Sebastien, R., Fredericm, J.: A novel target detection algorithm combining foreground and background manifold-based models. Mach. Vis. Appl. 27(3), 363–375 (2016)
Article Google Scholar
He, P., Huang, W., Qiao, Y., Loy, C., Tang, X.: Reading scene text in deep convolutional sequences. In: AAAI Conference on Artificial Intelligence, pp. 3501–3508 (2016)
Bengio, Y., Simard, P., Frasconi, P.: Learning long term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(1), 115–143 (2003)
MathSciNet MATH Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 6645–6649 (2013)
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2006)
Zeiler, M.: ADADELTA: An Adaptive Learning Rate Method. arXiv (2012)
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. NIPS Deep Learning Workshop (2014)
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Yang, R., et al.: Robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal. Recognit. 7(2), 105–122 (2005)
Article Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Scene Text Recognition using higher OrScene text recognition using higher order language priors. In: British Machine Vision Conference, pp. 1–11 (2012)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L., Mestre, S., et al.: ICDAR 2013 robust reading competition. Int. Conf. Doc. Anal. Recognit. 2013, 1484–1493 (2013)
Google Scholar
Bhunia, A., Kumar, G., Roy, P., Balasubramanian, R., Pal, U.: Text recognition in scene image and video frame using color channel selection. Multimedia Tools Appl. 77(7), 8551–8578 (2018)
Article Google Scholar
Lee C., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)

Download references

Acknowledgements

This work was supported in part by the Beijing Natural Science Foundation under Grant 4182056. Specialized Fund for Joint Building Program of Beijing Municipal Education Commission.

Author information

Authors and Affiliations

Beijing Key Lab of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Zhengchao Lei, Sanyuan Zhao, Hongmei Song & Jianbing Shen

Authors

Zhengchao Lei
View author publications
You can also search for this author in PubMed Google Scholar
Sanyuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Song
View author publications
You can also search for this author in PubMed Google Scholar
Jianbing Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanyuan Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lei, Z., Zhao, S., Song, H. et al. Scene text recognition using residual convolutional recurrent neural network. Machine Vision and Applications 29, 861–871 (2018). https://doi.org/10.1007/s00138-018-0942-y

Download citation

Received: 24 October 2017
Revised: 26 March 2018
Accepted: 12 May 2018
Published: 16 June 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s00138-018-0942-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene text recognition using residual convolutional recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Accurate Scene Text Recognition Based on Recurrent Neural Network

Text recognition in natural scenes based on deep learning

Convolutional Neural Networks for Scene Image Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene text recognition using residual convolutional recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Accurate Scene Text Recognition Based on Recurrent Neural Network

Text recognition in natural scenes based on deep learning

Convolutional Neural Networks for Scene Image Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation