Skip to main content
Log in

Deep learning based text detection using resnet for feature extraction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Popular deep learning models for text segmentation include CTPN, EAST, and PixelLink. However, they are not very well capable of dealing with the images containing densely distributed characters, and those characters may be connected. For these problems, the ResNet with excellent sensitivity for feature extraction is used to replace those embedded convolution neural networks in the main structures of CTPN and EAST. The experimental results showed that a better feature extraction network could significantly improve the precision of text localization. Noteworthy, the results indicate that the accuracy of modified EAST with ResNet101 would be the highest with a deeper depth and larger width of ResNet. The accuracy of text segmentation on ICDAR 2015 is 83.4% which is 7% higher than the original PVANET-EAST. The text detection accuracy is 83.9% on the untrained scanned document. Also, it achieved an accuracy of 86.3% when applied to self-collected Chinese calligraphy. Those results demonstrated that text detection using ResNet is a better improvement for OCR applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Data availability

The datasets generated and/or used during the current study are available from the corresponding author on reasonable request.

Notes

  1. https://github.com/eragonruan/text-detection-ctpn

  2. https://github.com/argman/EAST

  3. https://github.com/GitYCC/crnn-pytorch

References

  1. Agazzi OE, Kuo SS (1993) Hidden markov model based optical character recognition in the presence of deterministic transformations. Pattern Recognit 26:1813–1826. https://doi.org/10.1016/0031-3203(93)90178-Y

    Article  Google Scholar 

  2. Bahlmann C, Haasdonk B, Burkhardt H (2002) Online handwriting recognition with support vector machines - a kernel approach. International Workshop on Frontiers in Handwriting Recognition (ICFHR 2020), 49–54 https://doi.org/10.1109/IWFHR.2002.1030883

  3. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw Learn Syst 5:157–166. https://doi.org/10.1109/72.279181

    Article  Google Scholar 

  4. Bora MB, Daimary D, Amitab K, Kandar D (2020) Handwritten character recognition from images using CNN-ECOC. Procedia Comput Sci 167:2403–2409. https://doi.org/10.1016/j.procs.2020.03.293

  5. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions. IEEE International Conference on Image Processing (ICIP 2011), 2609–2612 https://doi.org/10.1109/ICIP.2011.6116200

  6. Chen L (2021) Research and application of chinese calligraphy character recognition slgorithm based on image analysis. IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA 2021) 405–410. https://doi.org/10.1109/AEECA52519.2021.9574199

  7. Deng D, Liu H, Li X, Cai D (2018) PixelLink: detecting scene text via instance segmentation. Conference on Artificial Intelligence (AAAI 2018), https://doi.org/10.1609/aaai.v32i1.12269

  8. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2010), 2963–2970 https://doi.org/10.1109/CVPR.2010.5540041

  9. Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks. International conference on Machine learning (ICML 2006) 369–376, https://doi.org/10.1145/1143844.1143891

  10. He K, Sun J (2015) Convolutional neural networks at constrained time cost. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) 5353–5360, https://doi.org/10.48550/arXiv.1412.1710

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning forimage recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) p. 770–778. https://doi.org/10.48550/arXiv.1512.03385

  12. Hong C, Loudon G, Wu Y, Zitserman R. (1999) Segmentation and recognition of continuous handwritng Chinese text. World Scientific, 223–232. https://doi.org/10.1142/9789812797643_0014

  13. Jawahar CV, Kumar MNSSKP, Kiran SSR (2003) A bilingual OCR for Hindi-Telugu documents and its applications. International Conference on Document Analysis and Recognition (ICDAR 2003). 408–412 https://doi.org/10.1109/ICDAR.2003.1227699

  14. Kim KH, Cheon Y, Hong S, Roh BS, Park M (2016) PVANET: Deep but lightweight neural networks for real-time object detection. https://doi.org/10.48550/arXiv.1608.08021

  15. Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. International Joint Conference on Neural Networks (IJCNN 2018) 1–10, https://doi.org/10.48550/arXiv.1802.09567

  16. Li C (2021) Research on methods of english text detection and recognition based on neural network detection model. Sci Program 2021:6406856. https://doi.org/10.1155/2021/6406856

  17. Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Conf Comput Vis Pattern Recognit (CVPR 2017), 2117–2125, https://doi.org/10.48550/arXiv.1612.03144

  18. Liu CL, Yin F, Wang DH, Wang QF (2011) CASIA online and offline chinese handwriting databases. Int J Doc Anal Recognit (ICDAR 2011) 37–41 https://doi.org/10.1109/ICDAR.2011.17. 37–4

  19. Liu F, Chen C, Gu D, Zheng J (2019) FTPN: Scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228. https://doi.org/10.1109/ACCESS.2019.2908933

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3431–3440, https://doi.org/10.48550/arXiv.1411.4038

  21. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. Computer Vision (ECCV 2016),21–37, https://doi.org/10.1007/978-3-319-46448-0_2

  22. Lu M, Mou Y, Chen CL, Tang Q (2021) An efficient text detection model for street signs. Appl Sci 11:5962. https://doi.org/10.3390/app11135962

    Article  Google Scholar 

  23. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22:761–767. https://doi.org/10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  24. Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst. Appl. 170:114549. https://doi.org/10.1016/j.eswa.2020.114549

    Article  Google Scholar 

  25. Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:20255–20290. https://doi.org/10.1007/s11042-022-12693-7

    Article  Google Scholar 

  26. Otsu N (1979) A threshold selection method from gray-level histograms. Systems, Man and Cybernetics IEEE Trans. Syst Man Cybern Syst 9:62–66. https://doi.org/10.1109/TSMC.1979.4310076

    Article  Google Scholar 

  27. Pang B, Wu J (2020) Chinese calligraphy character image recognition and its applications in Web and Wechat Applet Platform. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020) 253–260. https://doi.org/10.1145/3383583.3398516

  28. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 779–788 https://doi.org/10.48550/arXiv.1506.02640

  29. Redmon, J., and Farhadi, A.(2017) YOLO9000: Better, faster, stronger. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR 2017) p.6517–6525, https://doi.org/10.1109/CVPR.2017.690

  30. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. ArXiv abs/1804.02767, https://doi.org/10.48550/arXiv.1804.02767

  31. Ren S , He K, Girshick R, Sun J (2015) ‘Faster R-CNN: towards real-time object detection with region proposal networks’. International Conference on Neural Information Processing Systems (NIPS 2015) 91–99 https://doi.org/10.48550/arXiv.1506.01497

  32. Santos CFG (2018) Optical character recognition using deep learning https://repositorio.unesp.br/bitstream/handle/11449/154100/santos_cfg_me_sjrp.pdf?sequence=9

  33. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681. https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  34. Shi B, Bai X, Yao C (2015) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2646371

  35. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556, 0.48550/arXiv.1409.1556

  36. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. Computer Vision – ECCV 2016, p. 56–72, https://doi.org/10.48550/arXiv.1609.03605

  37. Tong X, Evans DA (1996) A statistical approach to automatic OCR error correction in context. VLC@COLING

  38. Wang C, Qi Y, and Wang X (2015) The chinese characters extraction method based on area voronoi diagram in inscription. International Conference on Virtual Reality and Visualization (ICVRV 2015) 109–116

  39. Wang H, Zhang Z (2019) Text Detection algorithm based on improved YOLOv3. IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC 2019) 147–150, https://doi.org/10.1109/ICEIEC.2019.8784576

  40. Wei X, Ma S, Jin Y (2005) Segmentation of connected Chinese characters based on genetic algorithm. International Conference on Document Analysis and Recognition (ICDAR 2005) 645–649

  41. Zhang R, Wang Q, Lu Y (2017) Combination of ResNet and center loss based metric learning for handwritten chinese character recognition. International Conference on Document Analysis and Recognition (ICDAR 2017) 25–29, https://doi.org/10.1109/ICDAR.2017.324

  42. Zhao S, Chi Z, Shi P, Wnag Q (2001) Handwritten Chinese character segmentation using a two-stage approach. Proceedings of Sixth International Conference on Document Analysis and Recognition 179–183. https://doi.org/10.1109/ICDAR.2001.953779

  43. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2642–2651 https://doi.org/10.1109/CVPR.2017.283

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen-Chiung Hsieh.

Ethics declarations

Conflict of interests

The authors, whose names are listed on this paper, certify that they have no conflicts of interest to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, LK., Tseng, HT., Hsieh, CC. et al. Deep learning based text detection using resnet for feature extraction. Multimed Tools Appl 82, 46871–46903 (2023). https://doi.org/10.1007/s11042-023-15449-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15449-z

Keywords

Navigation