DenseNet with Up-Sampling block for recognizing texts in images

  • Zeming Tang
  • Weiming Jiang
  • Zhao ZhangEmail author
  • Mingbo Zhao
  • Li Zhang
  • Meng Wang
Original Article


The Convolutional Recurrent Neural Networks (CRNN) have achieved a great success for the study of OCR. But existing deep models usually apply the down-sampling in pooling operation to reduce the size of features by dropping some feature information, which may cause the relevant characters with small occupancy rate to be missed. Moreover, all hidden layer units in the cyclic module need to be connected in cyclic layer, which may result in a heavy computation burden. In this paper, we explore to improve the results potentially using Dense Convolutional Network (DenseNet) to replace the convolution network of the CRNN to connect and combine multiple features. Also, we use the up-sampling function to construct an Up-Sampling block to reduce the negative effects of down-sampling in pooling stage and restore the lost information to a certain extent. Thus, informative features can also be extracted with deeper structure. Besides, we also directly use the output of inner convolution parts to describe the label distribution of each frame to make the process efficient. Finally, we propose a new OCR framework, termed DenseNet with Up-Sampling block joint with the connectionist temporal classification, for Chinese recognition. Results on Chinese string dataset show that our model delivers the enhanced performance, compared with several popular deep frameworks.


Chinese recognition DenseNet Up-Sampling block Feature information restorement 



The authors would like to express sincere thanks to reviewers for their insightful comments, making our manuscript a higher standard. This work is partially supported by the National Natural Science Foundation of China (61672365, 61732008, 61725203, 61622305, 61871444 and 61572339), Natural Science Foundation of Jiangsu Province of China (BK20160040), High-Level Talent of “Six Talent Peak” Project of Jiangsu Province of China (XYDXX-055) and the Fundamental Research Funds for the Central Universities of China (JZ2019HGPA0102).

Compliance with ethical standards

Conflict of interest

The authors declared that they have no conflict of interest in this work.


  1. 1.
    Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of international conference on computational statistics, Paris, pp 177–186Google Scholar
  2. 2.
    Caulfield HJ, Maloney WT (1969) Improved discrimination in optical character recognition. Appl Opt 8(11):2354–2356CrossRefGoogle Scholar
  3. 3.
    Chen HY (2016) TensorFlow—a system for large-scale machine learning. In: USENIX operating system design and implementation, Savannah, GA, USA, pp 265–283Google Scholar
  4. 4.
    Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
  5. 5.
    de Silva AP, Dixon IM, Gunaratne HN, Gunnlaugsson T, Maxwell PR, Rice TE (1999) Integration of logic functions and sequential operation of gates at the molecular-scale. J Am Chem Soc 121(6):1393–1394CrossRefGoogle Scholar
  6. 6.
    Duan K, Keerthi SS, Chu W, Shevade SK, Poo AN (2003) Multi-category classification by soft-max combination of binary classifiers. In: Multiple classifier systems, international workshop, MCS. DBLP, GuilfordGoogle Scholar
  7. 7.
    Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7(2–3):195–225Google Scholar
  8. 8.
    Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks, arXiv preprint arXiv:1312.6082
  9. 9.
    Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, Carnegie Mellon University, Pittsburgh, pp 369–376Google Scholar
  10. 10.
    Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, Vancouver, British Columbia, Canada, pp 6645–6649Google Scholar
  11. 11.
    Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 2315–2324Google Scholar
  12. 12.
    Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. In: International joint conference on neural networks, Killarney, Ireland, pp 1–8Google Scholar
  13. 13.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 770–778Google Scholar
  14. 14.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  15. 15.
    Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HawaiiGoogle Scholar
  16. 16.
    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
  17. 17.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA, pp 675–678Google Scholar
  18. 18.
    Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kopf J, Cohen MF, Lischinski D, Uyttendaele M (2007) Joint bilateral upsampling. ACM Trans Graph 26(3):96CrossRefGoogle Scholar
  20. 20.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Curran Associates Inc, Lake TahoeGoogle Scholar
  21. 21.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRefGoogle Scholar
  22. 22.
    Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks, Berlin, Heidelberg, pp 381–388Google Scholar
  23. 23.
    Li W, Cao L, Zhao D, Cui X (2013) CRNN: Integrating classification rules into neural network. In: International joint conference on neural networks, Dallas, Texas, USA, pp 1–8Google Scholar
  24. 24.
    McBride-Chang C, Shu H, Zhou A, Wat CP, Wagner RK (2003) Morphological awareness uniquely predicts young children’s Chinese character recognition. J Educ Psychol 95(4):743CrossRefGoogle Scholar
  25. 25.
    Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Berlin, Heidelberg, pp 770–783Google Scholar
  26. 26.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp 1520–1528Google Scholar
  27. 27.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, Sydney, Australia, pp 397–403Google Scholar
  29. 29.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  30. 30.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  31. 31.
    Stasser G, Titus W (1985) Pooling of unshared information in group decision making: biased information sampling during discussion. J Pers Soc Psychol 48(6):1467CrossRefGoogle Scholar
  32. 32.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  33. 33.
    Tang Z, Zhang Z, Ma X, Qin J, Zhao M (2018) Robust neighborhood preserving low-rank sparse CNN features for classification. In: Proceedings of the 19th pacific-rim conference on multimedia, Hefei, ChinaGoogle Scholar
  34. 34.
    Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Amsterdam, The Netherlands, pp 56–72Google Scholar
  35. 35.
    Wang T, Liu C (2018) Fully convolutional network based skeletonization for handwritten chinese characters. In: AAAI conference on artificial intelligence, New Orleans, LouisianaGoogle Scholar
  36. 36.
    Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recogn 72:72–81CrossRefGoogle Scholar
  37. 37.
    Zhang Z, Shao L, Xu Y, Liu L, Yang J (2017) Marginal representation learning with graph structure self-adaptation. IEEE Trans Neural Netw Learn Syst 99:1–15CrossRefGoogle Scholar
  38. 38.
    Zhang Z, Xu Y, Shao L, Yang J (2018) Discriminative block-diagonal representation learning for image recognition. IEEE Trans Neural Netw Learn Syst 29(7):3111–3125MathSciNetCrossRefGoogle Scholar
  39. 39.
    Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782Google Scholar
  40. 40.
    Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814MathSciNetCrossRefGoogle Scholar
  41. 41.
    Zhang Y, Zhang Z, Li S, Qin J, Liu GC, Wang M, Yan SC (2018) Unsupervised nonnegative adaptive feature extraction for data representation. IEEE Trans Knowl Data Eng (Early Access).
  42. 42.
    Zhang Z, Zhang Y, Liu G, Tang J, Yan S, Wang M (2019) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng (Early Access).

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Zeming Tang
    • 1
  • Weiming Jiang
    • 1
  • Zhao Zhang
    • 1
    • 3
    Email author
  • Mingbo Zhao
    • 2
  • Li Zhang
    • 1
  • Meng Wang
    • 3
  1. 1.School of Computer Science and TechnologySoochow UniversitySuzhouChina
  2. 2.Department of Electronic EngineeringCity University of Hong KongKowloonHong Kong
  3. 3.School of Computer Science and School of Artificial IntelligenceHefei University of TechnologyHefeiChina

Personalised recommendations