Skip to main content
Log in

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)

    Chapter  Google Scholar 

  2. Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)

  3. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012)

  4. Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research

  5. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)

    Article  Google Scholar 

  6. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

    Article  Google Scholar 

  7. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  8. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

  9. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)

  10. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017)

  11. Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011)

  12. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015)

  13. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006)

  14. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013)

  15. Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models (2013). arXiv preprint arXiv:1310.1811

  16. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)

    Chapter  Google Scholar 

  17. Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012)

  18. Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)

    Chapter  Google Scholar 

  19. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013)

  20. Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013)

  21. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

  22. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)

  23. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  24. Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

  25. Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019)

  26. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  27. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  28. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473

  29. He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)

  30. Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017)

  31. Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:1711.01889

Download references

Funding

This research was funded by the National Natural Science Foundation of China (No. 61175031), the National High Technology Research and Development Program of China (863 Program) (No. 2012AA041402) and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2015BAF13B00-5).

Author information

Authors and Affiliations

Authors

Contributions

G.F. Tong, Y. Li, H.S. Gao and H.R. Chen contributed to conceptualization; Y. Li, H.S. Gao and H.R. Chen provided methodology; H.R. Chen, H.S. Gao and H. Wang contributed to software; G.F. Tong, H.S. Gao and H.R. Chen performed validation; G.F. Tong and H.R. Chen were involved in data curation; Y. Li, H.S. Gao, H.R. Chen and X. Yang helped in writing and original draft preparation; G.F. Tong supervised the study; G.F. Tong was involved in project administration; G.F. Tong contributed to funding acquisition.

Corresponding authors

Correspondence to Yong Li or Huashuai Gao.

Ethics declarations

Conflict of interest

The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, G., Li, Y., Gao, H. et al. MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. IJDAR 23, 103–114 (2020). https://doi.org/10.1007/s10032-019-00348-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-019-00348-7

Keywords

Navigation