MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Tong, Guofeng; Li, Yong; Gao, Huashuai; Chen, Huairong; Wang, Hao; Yang, Xiang

doi:10.1007/s10032-019-00348-7

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Original Paper
Published: 15 November 2019

Volume 23, pages 103–114, (2020)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Guofeng Tong¹^na1,
Yong Li¹^na1,
Huashuai Gao ORCID: orcid.org/0000-0002-6784-449X¹^na1,
Huairong Chen¹,
Hao Wang¹ &
…
Xiang Yang¹

1438 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Chinese Image Text Recognition with Attention Model

N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition

Article 13 April 2019

End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition

Article 06 January 2024

References

Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)
Chapter Google Scholar
Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012)
Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)
Article Google Scholar
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017)
Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013)
Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models (2013). arXiv preprint arXiv:1310.1811
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)
Chapter Google Scholar
Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)
Chapter Google Scholar
Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013)
Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473
He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)
Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017)
Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:1711.01889

Download references

Funding

This research was funded by the National Natural Science Foundation of China (No. 61175031), the National High Technology Research and Development Program of China (863 Program) (No. 2012AA041402) and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2015BAF13B00-5).

Author information

Guofeng Tong, Yong Li and Huashuai Gao share the co-first authorship.

Authors and Affiliations

College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Guofeng Tong, Yong Li, Huashuai Gao, Huairong Chen, Hao Wang & Xiang Yang

Authors

Guofeng Tong
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Huashuai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Huairong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.F. Tong, Y. Li, H.S. Gao and H.R. Chen contributed to conceptualization; Y. Li, H.S. Gao and H.R. Chen provided methodology; H.R. Chen, H.S. Gao and H. Wang contributed to software; G.F. Tong, H.S. Gao and H.R. Chen performed validation; G.F. Tong and H.R. Chen were involved in data curation; Y. Li, H.S. Gao, H.R. Chen and X. Yang helped in writing and original draft preparation; G.F. Tong supervised the study; G.F. Tong was involved in project administration; G.F. Tong contributed to funding acquisition.

Corresponding authors

Correspondence to Yong Li or Huashuai Gao.

Ethics declarations

Conflict of interest

The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, G., Li, Y., Gao, H. et al. MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. IJDAR 23, 103–114 (2020). https://doi.org/10.1007/s10032-019-00348-7

Download citation

Received: 01 February 2019
Revised: 03 November 2019
Accepted: 06 November 2019
Published: 15 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10032-019-00348-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Abstract

Access this article

Similar content being viewed by others

End-to-End Chinese Image Text Recognition with Attention Model

N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition

End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Abstract

Access this article

Similar content being viewed by others

End-to-End Chinese Image Text Recognition with Attention Model

N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition

End-to-end attention convolutional recurrent network for online handwritten Chinese text recognition

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation