Abstract
Popular deep learning models for text segmentation include CTPN, EAST, and PixelLink. However, they are not very well capable of dealing with the images containing densely distributed characters, and those characters may be connected. For these problems, the ResNet with excellent sensitivity for feature extraction is used to replace those embedded convolution neural networks in the main structures of CTPN and EAST. The experimental results showed that a better feature extraction network could significantly improve the precision of text localization. Noteworthy, the results indicate that the accuracy of modified EAST with ResNet101 would be the highest with a deeper depth and larger width of ResNet. The accuracy of text segmentation on ICDAR 2015 is 83.4% which is 7% higher than the original PVANET-EAST. The text detection accuracy is 83.9% on the untrained scanned document. Also, it achieved an accuracy of 86.3% when applied to self-collected Chinese calligraphy. Those results demonstrated that text detection using ResNet is a better improvement for OCR applications.
Similar content being viewed by others
Data availability
The datasets generated and/or used during the current study are available from the corresponding author on reasonable request.
References
Agazzi OE, Kuo SS (1993) Hidden markov model based optical character recognition in the presence of deterministic transformations. Pattern Recognit 26:1813–1826. https://doi.org/10.1016/0031-3203(93)90178-Y
Bahlmann C, Haasdonk B, Burkhardt H (2002) Online handwriting recognition with support vector machines - a kernel approach. International Workshop on Frontiers in Handwriting Recognition (ICFHR 2020), 49–54 https://doi.org/10.1109/IWFHR.2002.1030883
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw Learn Syst 5:157–166. https://doi.org/10.1109/72.279181
Bora MB, Daimary D, Amitab K, Kandar D (2020) Handwritten character recognition from images using CNN-ECOC. Procedia Comput Sci 167:2403–2409. https://doi.org/10.1016/j.procs.2020.03.293
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions. IEEE International Conference on Image Processing (ICIP 2011), 2609–2612 https://doi.org/10.1109/ICIP.2011.6116200
Chen L (2021) Research and application of chinese calligraphy character recognition slgorithm based on image analysis. IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA 2021) 405–410. https://doi.org/10.1109/AEECA52519.2021.9574199
Deng D, Liu H, Li X, Cai D (2018) PixelLink: detecting scene text via instance segmentation. Conference on Artificial Intelligence (AAAI 2018), https://doi.org/10.1609/aaai.v32i1.12269
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2010), 2963–2970 https://doi.org/10.1109/CVPR.2010.5540041
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural 'networks. International conference on Machine learning (ICML 2006) 369–376, https://doi.org/10.1145/1143844.1143891
He K, Sun J (2015) Convolutional neural networks at constrained time cost. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) 5353–5360, https://doi.org/10.48550/arXiv.1412.1710
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning forimage recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) p. 770–778. https://doi.org/10.48550/arXiv.1512.03385
Hong C, Loudon G, Wu Y, Zitserman R. (1999) Segmentation and recognition of continuous handwritng Chinese text. World Scientific, 223–232. https://doi.org/10.1142/9789812797643_0014
Jawahar CV, Kumar MNSSKP, Kiran SSR (2003) A bilingual OCR for Hindi-Telugu documents and its applications. International Conference on Document Analysis and Recognition (ICDAR 2003). 408–412 https://doi.org/10.1109/ICDAR.2003.1227699
Kim KH, Cheon Y, Hong S, Roh BS, Park M (2016) PVANET: Deep but lightweight neural networks for real-time object detection. https://doi.org/10.48550/arXiv.1608.08021
Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. International Joint Conference on Neural Networks (IJCNN 2018) 1–10, https://doi.org/10.48550/arXiv.1802.09567
Li C (2021) Research on methods of english text detection and recognition based on neural network detection model. Sci Program 2021:6406856. https://doi.org/10.1155/2021/6406856
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Conf Comput Vis Pattern Recognit (CVPR 2017), 2117–2125, https://doi.org/10.48550/arXiv.1612.03144
Liu CL, Yin F, Wang DH, Wang QF (2011) CASIA online and offline chinese handwriting databases. Int J Doc Anal Recognit (ICDAR 2011) 37–41 https://doi.org/10.1109/ICDAR.2011.17. 37–4
Liu F, Chen C, Gu D, Zheng J (2019) FTPN: Scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228. https://doi.org/10.1109/ACCESS.2019.2908933
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), 3431–3440, https://doi.org/10.48550/arXiv.1411.4038
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. Computer Vision (ECCV 2016),21–37, https://doi.org/10.1007/978-3-319-46448-0_2
Lu M, Mou Y, Chen CL, Tang Q (2021) An efficient text detection model for street signs. Appl Sci 11:5962. https://doi.org/10.3390/app11135962
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22:761–767. https://doi.org/10.1016/j.imavis.2004.02.006
Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst. Appl. 170:114549. https://doi.org/10.1016/j.eswa.2020.114549
Naiemi F, Ghods V, Khalesi H (2022) Scene text detection and recognition: a survey. Multimed Tools Appl 81:20255–20290. https://doi.org/10.1007/s11042-022-12693-7
Otsu N (1979) A threshold selection method from gray-level histograms. Systems, Man and Cybernetics IEEE Trans. Syst Man Cybern Syst 9:62–66. https://doi.org/10.1109/TSMC.1979.4310076
Pang B, Wu J (2020) Chinese calligraphy character image recognition and its applications in Web and Wechat Applet Platform. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020) 253–260. https://doi.org/10.1145/3383583.3398516
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 779–788 https://doi.org/10.48550/arXiv.1506.02640
Redmon, J., and Farhadi, A.(2017) YOLO9000: Better, faster, stronger. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR 2017) p.6517–6525, https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. ArXiv abs/1804.02767, https://doi.org/10.48550/arXiv.1804.02767
Ren S , He K, Girshick R, Sun J (2015) ‘Faster R-CNN: towards real-time object detection with region proposal networks’. International Conference on Neural Information Processing Systems (NIPS 2015) 91–99 https://doi.org/10.48550/arXiv.1506.01497
Santos CFG (2018) Optical character recognition using deep learning https://repositorio.unesp.br/bitstream/handle/11449/154100/santos_cfg_me_sjrp.pdf?sequence=9
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681. https://doi.org/10.1109/78.650093
Shi B, Bai X, Yao C (2015) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2646371
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556, 0.48550/arXiv.1409.1556
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. Computer Vision – ECCV 2016, p. 56–72, https://doi.org/10.48550/arXiv.1609.03605
Tong X, Evans DA (1996) A statistical approach to automatic OCR error correction in context. VLC@COLING
Wang C, Qi Y, and Wang X (2015) The chinese characters extraction method based on area voronoi diagram in inscription. International Conference on Virtual Reality and Visualization (ICVRV 2015) 109–116
Wang H, Zhang Z (2019) Text Detection algorithm based on improved YOLOv3. IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC 2019) 147–150, https://doi.org/10.1109/ICEIEC.2019.8784576
Wei X, Ma S, Jin Y (2005) Segmentation of connected Chinese characters based on genetic algorithm. International Conference on Document Analysis and Recognition (ICDAR 2005) 645–649
Zhang R, Wang Q, Lu Y (2017) Combination of ResNet and center loss based metric learning for handwritten chinese character recognition. International Conference on Document Analysis and Recognition (ICDAR 2017) 25–29, https://doi.org/10.1109/ICDAR.2017.324
Zhao S, Chi Z, Shi P, Wnag Q (2001) Handwritten Chinese character segmentation using a two-stage approach. Proceedings of Sixth International Conference on Document Analysis and Recognition 179–183. https://doi.org/10.1109/ICDAR.2001.953779
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2642–2651 https://doi.org/10.1109/CVPR.2017.283
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors, whose names are listed on this paper, certify that they have no conflicts of interest to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, LK., Tseng, HT., Hsieh, CC. et al. Deep learning based text detection using resnet for feature extraction. Multimed Tools Appl 82, 46871–46903 (2023). https://doi.org/10.1007/s11042-023-15449-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15449-z