N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition

  • Hongzhu Li
  • Weiqiang WangEmail author
  • Ke Lv


The convolutional recurrent neural network is one of the most popular text recognition methods. Recurrent structures can extract long-term dependencies, but they are time consuming in computation compared with convolutional structures. We argue that the Chinese text line recognition can be performed based on neighbor rather than entire contextual information, and the information extracted from neighborhoods should only be a supplement to the information extracted from character regions. Therefore, we propose a novel neighborhoods based fully convolutional text recognition network (N-FTRN). It first extracts character-level feature sequences from text lines, then uses residual blocks instead of the recurrent structure to utilize contextual information. A reshape layer is applied to enable the network to recognize both vertical and horizontal text lines. Extensive experiments have been conducted to validate the efficiency and effectiveness of the proposed network. Compared with the state-of-the-art methods, we achieve comparable recognition performances on a Chinese scene text competition dataset (TRW) in ICDAR 2015 with much more compact models.


Chinese text recognition Fully convolutional network (FCN) Connectionist temporal classification (CTC) 



This work is supported by National Key R&D Program of China under contract No. 2017YFB1002203, and NSFC Key Projects of International (Regional) Cooperation and Exchanges under Grant 61860206004.


  1. 1.
    Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Computer ScienceGoogle Scholar
  2. 2.
    Bartz C, Yang H, Meinel C (2017) SEE: Towards Semi-Supervised End-to-End Scene Text Recognition ArXiv e-printsGoogle Scholar
  3. 3.
    Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 71–79.
  4. 4.
    Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: Towards accurate text recognition in natural images. ArXiv e-printsGoogle Scholar
  5. 5.
    Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: Towards arbitrarily-oriented text recognition. In: 2018 IEEE Conference on computer vision and pattern recognition (CVPR)Google Scholar
  6. 6.
    Gao Y, Chen Y, Wang J, Lu H (2017) Reading scene text with attention convolutional sequence modeling. ArXiv e-printsGoogle Scholar
  7. 7.
    Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610CrossRefGoogle Scholar
  8. 8.
    Graves A, Gomez F (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International conference on machine learning, pp 369–376Google Scholar
  9. 9.
    Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855CrossRefGoogle Scholar
  10. 10.
    Graves A (2012) Offline arabic handwriting recognition with multidimensional recurrent neural networks. Advances in Neural Information Processing Systems, pp 545–552Google Scholar
  11. 11.
    Grosicki E, Abed HE (2009) Icdar 2009 handwriting recognition competition. In: International conference on document analysis and recognition, pp 1398–1402Google Scholar
  12. 12.
    He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 00, pp 770–778 (2016).
  13. 13.
    He P, Huang W, Qiao Y, Chen CL, Tang X (2016) Reading scene text in deep convolutional sequences. In: Thirtieth AAAI conference on artificial intelligence, pp 3501–3508Google Scholar
  14. 14.
    Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 3(4):212–223Google Scholar
  15. 15.
    Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: 2014 IEEE International conference on image processing (ICIP), pp 3087–3091.
  16. 16.
    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition Eprint ArxivGoogle Scholar
  17. 17.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. Springer International Publishing, BerlinCrossRefGoogle Scholar
  18. 18.
    Liu CL, Koga M, Fujisawa H (2004) Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Trans Pattern Anal Mach Intell 24(11):1425– 1437Google Scholar
  19. 19.
    Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans Multimed 14(2):482–489. MathSciNetCrossRefGoogle Scholar
  20. 20.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37Google Scholar
  21. 21.
    Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: Fast Oriented Text Spotting with a Unified Network ArXiv e-printsGoogle Scholar
  22. 22.
    Messina R, Louradour J (2015) Segmentation-free handwritten chinese text recognition with lstm-rnn. In: International conference on document analysis and recognition, pp 171–175Google Scholar
  23. 23.
    Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651CrossRefGoogle Scholar
  24. 24.
    Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification, pp 4168–4176Google Scholar
  25. 25.
    Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298CrossRefGoogle Scholar
  26. 26.
    Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) ArXiv e-printsGoogle Scholar
  27. 27.
    Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis & Machine IntelligenceGoogle Scholar
  28. 28.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer ScienceGoogle Scholar
  29. 29.
    Su T, Zhang T, Guan D, Huang H (2009) Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recogn 42:167–182CrossRefGoogle Scholar
  30. 30.
    Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: International conference on pattern recognition, pp 3304–3308Google Scholar
  31. 31.
    Wu YC, Yin F, Liu CL (2017) Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn 65(C):251–264CrossRefGoogle Scholar
  32. 32.
    Wu YC, Yin F, Zhang XY, Liu L, Liu CL (2018) Scan: Sliding convolutional attention network for scene text recognition. ArXiv e-printsGoogle Scholar
  33. 33.
    Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Twenty-sixth international joint conference on artificial intelligence, pp 3133–3139Google Scholar
  34. 34.
    Xie Z, Sun Z, Jin L, Feng Z, Zhang S (2017) Fully convolutional recurrent network for handwritten chinese text recognition. In: International conference on pattern recognition, pp 4011–4016Google Scholar
  35. 35.
    Xie Z, Sun Z, Jin L, Ni H, Lyons T (2018) Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, pp 1903–1917Google Scholar
  36. 36.
    Xu L, Yin F, Wang QF, Liu CL (2014) An over-segmentation method for single-touching chinese handwriting with learning-based filtering. Int J Doc Anal Recogn 17(1):91–104CrossRefGoogle Scholar
  37. 37.
    Yangqing J, Evan S, Jeff D, Sergey K, Jonathan L (2014) Caffe: Convolutional architecture for fast feature embedding. Eprint Arxiv, pp 675–678Google Scholar
  38. 38.
    Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500CrossRefGoogle Scholar
  39. 39.
    Yin F, Wu YC, Zhang XY, Liu CL (2017) Scene text recognition with sliding convolutional character models. ArXiv e-printsGoogle Scholar
  40. 40.
    Zhou X, Zhou S, Yao C, Cao Z, Yin Q (2015) Icdar 2015 text reading in the wild competition. Computer ScienceGoogle Scholar
  41. 41.
    Zhu L, Shen J, Xie L (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486CrossRefGoogle Scholar
  42. 42.
    Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Transactions on CyberneticsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations