Deep learning based text detection using resnet for feature extraction

Huang, Li-Kun; Tseng, Hsiao-Ting; Hsieh, Chen-Chiung; Yang, Chih-Sin

doi:10.1007/s11042-023-15449-z

Deep learning based text detection using resnet for feature extraction

Published: 03 May 2023

Volume 82, pages 46871–46903, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Li-Kun Huang¹,
Hsiao-Ting Tseng²,
Chen-Chiung Hsieh ORCID: orcid.org/0000-0002-7716-7306³ &
…
Chih-Sin Yang³

368 Accesses
1 Altmetric
Explore all metrics

Abstract

Popular deep learning models for text segmentation include CTPN, EAST, and PixelLink. However, they are not very well capable of dealing with the images containing densely distributed characters, and those characters may be connected. For these problems, the ResNet with excellent sensitivity for feature extraction is used to replace those embedded convolution neural networks in the main structures of CTPN and EAST. The experimental results showed that a better feature extraction network could significantly improve the precision of text localization. Noteworthy, the results indicate that the accuracy of modified EAST with ResNet101 would be the highest with a deeper depth and larger width of ResNet. The accuracy of text segmentation on ICDAR 2015 is 83.4% which is 7% higher than the original PVANET-EAST. The text detection accuracy is 83.9% on the untrained scanned document. Also, it achieved an accuracy of 86.3% when applied to self-collected Chinese calligraphy. Those results demonstrated that text detection using ResNet is a better improvement for OCR applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Selective Search Segmentation Based Text Detection from Natural Images

Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps

Article 30 May 2019

Data availability

The datasets generated and/or used during the current study are available from the corresponding author on reasonable request.

Notes

References

Agazzi OE, Kuo SS (1993) Hidden markov model based optical character recognition in the presence of deterministic transformations. Pattern Recognit 26:1813–1826. https://doi.org/10.1016/0031-3203(93)90178-Y
Article Google Scholar
Bahlmann C, Haasdonk B, Burkhardt H (2002) Online handwriting recognition with support vector machines - a kernel approach. International Workshop on Frontiers in Handwriting Recognition (ICFHR 2020), 49–54 https://doi.org/10.1109/IWFHR.2002.1030883
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw Learn Syst 5:157–166. https://doi.org/10.1109/72.279181
Article Google Scholar
Bora MB, Daimary D, Amitab K, Kandar D (2020) Handwritten character recognition from images using CNN-ECOC. Procedia Comput Sci 167:2403–2409. https://doi.org/10.1016/j.procs.2020.03.293
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions. IEEE International Conference on Image Processing (ICIP 2011), 2609–2612 https://doi.org/10.1109/ICIP.2011.6116200
Chen L (2021) Research and application of chinese calligraphy character recognition slgorithm based on image analysis. IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA 2021) 405–410. https://doi.org/10.1109/AEECA52519.2021.9574199
Deng D, Liu H, Li X, Cai D (2018) PixelLink: detecting scene text via instance segmentation. Conference on Artificial Intelligence (AAAI 2018), https://doi.org/10.1609/aaai.v32i1.12269
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2010), 2963–2970 https://doi.org/10.1109/CVPR.2010.5540041
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks. International conference on Machine learning (ICML 2006) 369–376, https://doi.org/10.1145/1143844.1143891
He K, Sun J (2015) Convolutional neural networks at constrained time cost. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) 5353–5360, https://doi.org/10.48550/arXiv.1412.1710
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning forimage recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) p. 770–778. https://doi.org/10.48550/arXiv.1512.03385
Hong C, Loudon G, Wu Y, Zitserman R. (1999) Segmentation and recognition of continuous handwritng Chinese text. World Scientific, 223–232. https://doi.org/10.1142/9789812797643_0014
Jawahar CV, Kumar MNSSKP, Kiran SSR (2003) A bilingual OCR for Hindi-Telugu documents and its applications. International Conference on Document Analysis and Recognition (ICDAR 2003). 408–412 https://doi.org/10.1109/ICDAR.2003.1227699
Kim KH, Cheon Y, Hong S, Roh BS, Park M (2016) PVANET: Deep but lightweight neural networks for real-time object detection. https://doi.org/10.48550/arXiv.1608.08021
Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. International Joint Conference on Neural Networks (IJCNN 2018) 1–10, https://doi.org/10.48550/arXiv.1802.09567
Li C (2021) Research on methods of english text detection and recognition based on neural network detection model. Sci Program 2021:6406856. https://doi.org/10.1155/2021/6406856
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Conf Comput Vis Pattern Recognit (CVPR 2017), 2117–2125, https://doi.org/10.48550/arXiv.1612.03144
Liu CL, Yin F, Wang DH, Wang QF (2011) CASIA online and offline chinese handwriting databases. Int J Doc Anal Recognit (ICDAR 2011) 37–41 https://doi.org/10.1109/ICDAR.2011.17. 37–4
Liu F, Chen C, Gu D, Zheng J (2019) FTPN: Scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228. https://doi.org/10.1109/ACCESS.2019.2908933
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3431–3440, https://doi.org/10.48550/arXiv.1411.4038
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. Computer Vision (ECCV 2016),21–37, https://doi.org/10.1007/978-3-319-46448-0_2
Lu M, Mou Y, Chen CL, Tang Q (2021) An efficient text detection model for street signs. Appl Sci 11:5962. https://doi.org/10.3390/app11135962
Article Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22:761–767. https://doi.org/10.1016/j.imavis.2004.02.006
Article Google Scholar
Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst. Appl. 170:114549. https://doi.org/10.1016/j.eswa.2020.114549
Article Google Scholar
Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:20255–20290. https://doi.org/10.1007/s11042-022-12693-7
Article Google Scholar
Otsu N (1979) A threshold selection method from gray-level histograms. Systems, Man and Cybernetics IEEE Trans. Syst Man Cybern Syst 9:62–66. https://doi.org/10.1109/TSMC.1979.4310076
Article Google Scholar
Pang B, Wu J (2020) Chinese calligraphy character image recognition and its applications in Web and Wechat Applet Platform. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020) 253–260. https://doi.org/10.1145/3383583.3398516
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 779–788 https://doi.org/10.48550/arXiv.1506.02640
Redmon, J., and Farhadi, A.(2017) YOLO9000: Better, faster, stronger. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR 2017) p.6517–6525, https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. ArXiv abs/1804.02767, https://doi.org/10.48550/arXiv.1804.02767
Ren S , He K, Girshick R, Sun J (2015) ‘Faster R-CNN: towards real-time object detection with region proposal networks’. International Conference on Neural Information Processing Systems (NIPS 2015) 91–99 https://doi.org/10.48550/arXiv.1506.01497
Santos CFG (2018) Optical character recognition using deep learning https://repositorio.unesp.br/bitstream/handle/11449/154100/santos_cfg_me_sjrp.pdf?sequence=9
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681. https://doi.org/10.1109/78.650093
Article Google Scholar
Shi B, Bai X, Yao C (2015) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2646371
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556, 0.48550/arXiv.1409.1556
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. Computer Vision – ECCV 2016, p. 56–72, https://doi.org/10.48550/arXiv.1609.03605
Tong X, Evans DA (1996) A statistical approach to automatic OCR error correction in context. VLC@COLING
Wang C, Qi Y, and Wang X (2015) The chinese characters extraction method based on area voronoi diagram in inscription. International Conference on Virtual Reality and Visualization (ICVRV 2015) 109–116
Wang H, Zhang Z (2019) Text Detection algorithm based on improved YOLOv3. IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC 2019) 147–150, https://doi.org/10.1109/ICEIEC.2019.8784576
Wei X, Ma S, Jin Y (2005) Segmentation of connected Chinese characters based on genetic algorithm. International Conference on Document Analysis and Recognition (ICDAR 2005) 645–649
Zhang R, Wang Q, Lu Y (2017) Combination of ResNet and center loss based metric learning for handwritten chinese character recognition. International Conference on Document Analysis and Recognition (ICDAR 2017) 25–29, https://doi.org/10.1109/ICDAR.2017.324
Zhao S, Chi Z, Shi P, Wnag Q (2001) Handwritten Chinese character segmentation using a two-stage approach. Proceedings of Sixth International Conference on Document Analysis and Recognition 179–183. https://doi.org/10.1109/ICDAR.2001.953779
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2642–2651 https://doi.org/10.1109/CVPR.2017.283

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Institute of Bioinformatics and Structural Biology, College of Life Science, National Tsing Hua University, No. 101, Sec. 2, Kuangfu Rd. East Dist.,, Hsinchu City, 300, Taiwan, ROC
Li-Kun Huang
Department of Information Management, National Central University, No. 300, Zhongda Rd, Taoyuan City, 320, Zhongli District, Taiwan, ROC
Hsiao-Ting Tseng
Department of Computer Science and Engineering, Tatung University, No. 40, Sec. 3, Jhongshan N. Rd, Jhongshan District, Taipei City, 104, Taiwan, ROC
Chen-Chiung Hsieh & Chih-Sin Yang

Authors

Li-Kun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hsiao-Ting Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Chiung Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Sin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen-Chiung Hsieh.

Ethics declarations

Conflict of interests

The authors, whose names are listed on this paper, certify that they have no conflicts of interest to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, LK., Tseng, HT., Hsieh, CC. et al. Deep learning based text detection using resnet for feature extraction. Multimed Tools Appl 82, 46871–46903 (2023). https://doi.org/10.1007/s11042-023-15449-z

Download citation

Received: 20 April 2022
Revised: 02 August 2022
Accepted: 18 April 2023
Published: 03 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11042-023-15449-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning based text detection using resnet for feature extraction

Abstract

Access this article

Similar content being viewed by others

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Selective Search Segmentation Based Text Detection from Natural Images

Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning based text detection using resnet for feature extraction

Abstract

Access this article

Similar content being viewed by others

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Selective Search Segmentation Based Text Detection from Natural Images

Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation