Advertisement

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

  • Guofeng Tong
  • Yong LiEmail author
  • Huashuai GaoEmail author
  • Huairong Chen
  • Hao Wang
  • Xiang Yang
Original Paper
  • 45 Downloads

Abstract

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

Keywords

Chinese text line recognition Multiple scales Attention mechanism Chinese text line dataset (CTLD) 

Notes

Author contributions

G.F. Tong, Y. Li, H.S. Gao and H.R. Chen contributed to conceptualization; Y. Li, H.S. Gao and H.R. Chen provided methodology; H.R. Chen, H.S. Gao and H. Wang contributed to software; G.F. Tong, H.S. Gao and H.R. Chen performed validation; G.F. Tong and H.R. Chen were involved in data curation; Y. Li, H.S. Gao, H.R. Chen and X. Yang helped in writing and original draft preparation; G.F. Tong supervised the study; G.F. Tong was involved in project administration; G.F. Tong contributed to funding acquisition.

Funding

This research was funded by the National Natural Science Foundation of China (No. 61175031), the National High Technology Research and Development Program of China (863 Program) (No. 2012AA041402) and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2015BAF13B00-5).

Compliance with ethical standards

Conflict of interest

The authors have no relevant financial interests in the manuscript and no other potential conflicts of interest to disclose.

References

  1. 1.
    Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)CrossRefGoogle Scholar
  2. 2.
    Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)Google Scholar
  3. 3.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012)Google Scholar
  4. 4.
    Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft ResearchGoogle Scholar
  5. 5.
    Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)CrossRefGoogle Scholar
  6. 6.
    Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRefGoogle Scholar
  7. 7.
    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRefGoogle Scholar
  8. 8.
    Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)Google Scholar
  9. 9.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)Google Scholar
  10. 10.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017)Google Scholar
  11. 11.
    Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011)Google Scholar
  12. 12.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015)Google Scholar
  13. 13.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006)Google Scholar
  14. 14.
    Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013)Google Scholar
  15. 15.
    Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models (2013). arXiv preprint arXiv:1310.1811
  16. 16.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)CrossRefGoogle Scholar
  17. 17.
    Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012)Google Scholar
  18. 18.
    Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)CrossRefGoogle Scholar
  19. 19.
    Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013)Google Scholar
  20. 20.
    Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013)Google Scholar
  21. 21.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  22. 22.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)Google Scholar
  23. 23.
    Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)CrossRefGoogle Scholar
  24. 24.
    Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)CrossRefGoogle Scholar
  25. 25.
    Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019)Google Scholar
  26. 26.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  27. 27.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  28. 28.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473
  29. 29.
    He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)Google Scholar
  30. 30.
    Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017)Google Scholar
  31. 31.
    Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:1711.01889

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Information Science and EngineeringNortheastern UniversityShenyangChina

Personalised recommendations