Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR

Chaitra, Y. L.; Roopa, M. J.; Gopalakrishna, M. T.; Swetha, M. D.; Aditya, C. R.

doi:10.1007/978-981-99-3761-5_8

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 720))

Included in the following conference series:

International Conference on Information and Communication Technology for Intelligent Systems

262 Accesses
1 Citations

Abstract

Detecting text location and recognizing them from scene images remains one of the most challenging and enduring research problems in the field of computer vision. Over the past few decades, various researchers working hard to increase the accuracy of text recognition in scene images by considering several challenges. Because of this, there is a high need for commercial text recognizers for natural scenes. Conventional optical character recognition (OCR) demands clean backgrounds, crisp layouts, and neat text, which are frequently not met by natural scene images. A major challenge in scene text recognition is its orientation. There are various orientations like horizontal, vertical, diagonal and off-diagonal. Sometimes these orientations are in curved nature than straight. Another challenge is texts are embedded with un-uniform backgrounds and complicated environments. The extraction of such text is harrowing because of noisy backgrounds, diverse fonts, and text sizes. In this research, a comprehensive solution for detecting text using Faster RCNN and EasyOCR for text recognition is used to increase accuracy. The proposed algorithm is tested on benchmark datasets such as ICDAR13 and ICDAR15. F-score improves by 2.1 and 1.4% for detection on ICDAR13 and 15 datasets, respectively. Also noticed that 1.8% of improvement in recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene Text Recognition: No Country for Old Men?

Scene text detection and recognition: a survey

Article 11 March 2022

A CCD based machine vision system for real-time text detection

Article 05 August 2019

References

Ling OY, Theng LB, Weiyen AC, Mccarthy C (2021) Development of vertical text interpreter for natural scene images. IEEE Access 9:144341–144351
Google Scholar
Chen Y, Yang J (2020) Research on scene text recognition algorithm based on improved CRNN. In: ICDSP’20: Proceedings of the 2020 4th International Conference on Digital Signal Processing, pp 107–111
Google Scholar
Chandio AA, Asikuzzaman MD, Pickering MR, Leghari M (2022) Cursive text recognition in natural scene images using deep convolutional recurrent neural network. IEEE Access 10:10062–10078
Google Scholar
Hendry, Chen R-C (2019) Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis Comput 87:47–56
Google Scholar
Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the 21st AAAI conference on artificial intelligence, pp 4103–4110
Google Scholar
Zhu Y, Liao M, Liu W, Yang M (2018) Cascaded segmentation-detection networks for text-based traffic sign detection. IEEE Trans Intell Transp Syst 19(1):209–219
Google Scholar
Feng W, Yin F, Zhang X-Y, Liu C-L (2021) Semantic-aware video text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1695–1705
Google Scholar
Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognit 96:106954
Google Scholar
Liang Q, Xiang S, Wang Y, Sun W, Zhang D (2020) RNTR-Net: a robust natural text recognition network. IEEE Access 8:7719–7730. https://doi.org/10.1109/ACCESS.2020.2964148
Mayank, Bhowmick S, Kotecha D, Rege PP (2021) Natural scene text detection using deep neural networks. In: 2021 6th International conference for convergence in technology (I2CT). IEEE, pp 1–6. https://doi.org/10.1109/I2CT51068.2021.9418116
Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with high performance CNN classifier and efficient word inference. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1322–1326. https://doi.org/10.1109/ICASSP.2016.7471891
Cao Y, Ma S, Pan H (2020) FDTA: fully convolutional scene text detection with text attention. IEEE Access 8:155441–155449. https://doi.org/10.1109/ACCESS.2020.3018784
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. arXiv:1412.5903, arXiv:1412.5903v5, https://doi.org/10.48550/arXiv.1412.5903
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160
Google Scholar
Lei Z, Zhao S, Song H, Shen J (2018) Scene text recognition using residual convolutional recurrent neural network. Mach Vis Appl 29(5):861–871. https://doi.org/10.1007/s00138-018-0942-y
Article Google Scholar
Khan G, Tariq Z, Khan MUG () Multi-person tracking based on faster R-CNN and deep appearance features. In: Visual object tracking with deep neural networks, pp 25–47
Google Scholar
Smelyakov K, Chupryna A, Darahan D, Midina S (2021) Effectiveness of modern text recognition solutions and tools for common data sources. In: COLINS-2021: 5th International conference on computational linguistics and intelligent systems
Google Scholar
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazàn JA, de las Heras LP (2013) ICDAR 2013 robust reading competition. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 1484–1493
Google Scholar
Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002, arXiv:1606.09002v2, https://doi.org/10.48550/arXiv.1606.09002
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision. ECCV 2016: Computer vision. Lecture notes in computer science, vol 9912. Springer, Cham, pp 56–72
Google Scholar
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5551–5560
Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS 2015), vol 28
Google Scholar
Lokkondra CY, Ramegowda D, Thimmaiah GM, Vijaya APB, Shivananjappa MH (2021) ETDR: an exploratory view of text detection and recognition in images and videos. Rev d’Intell Artif 35(5):383–393
Google Scholar
Chaitra YL, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of YOLOv5 on text detection and recognition system using TesseractOCR in images/video frames. In: 2022 IEEE International conference on data science and information system (ICDSIS). IEEE, pp 1–6
Google Scholar
Lokkondra CY, Ramegowda D, Thimmaiah GM, Prakash A, Vijaya B (2022) DEFUSE: deep fused end-to-end video text detection and recognition. Rev d’Intell Artif 36(3):459–466
Google Scholar
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2550–2558
Google Scholar
Deng D, Liu H, Li X, Cai D (2018) PixelLink: detecting scene text via instance segmentation. In: Proceedings of the AAAI-2018, pp. 6773–6780. arXiv:1801.01315, arXiv:1801.01315v1, https://doi.org/10.48550/arXiv.1801.01315
Liao M, Shi B, Bai X (2018) Text boxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Google Scholar
Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 785–792
Google Scholar
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image–based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal. Mach Intell 39(11):2298–2304
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, SJB Institute of Technology, Affiliated to Visvesvaraya Technological University, Bengaluru, Karnataka, India
Y. L. Chaitra, M. J. Roopa & M. T. Gopalakrishna
Department of CSE, BNM Institute of Technology, Affiliated to Visvesvaraya Technological University, Bengaluru, Karnataka, India
M. D. Swetha
Department of CSE, Vidyavardhaka College of Engineering, Affiliated to Visvesvaraya Technological University, Mysuru, Karnataka, India
C. R. Aditya

Authors

Y. L. Chaitra
View author publications
You can also search for this author in PubMed Google Scholar
M. J. Roopa
View author publications
You can also search for this author in PubMed Google Scholar
M. T. Gopalakrishna
View author publications
You can also search for this author in PubMed Google Scholar
M. D. Swetha
View author publications
You can also search for this author in PubMed Google Scholar
C. R. Aditya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. L. Chaitra .

Editor information

Editors and Affiliations

Hertfordshire Business School, University of Hertfordshire, Hatfield, Hertfordshire, UK
Jyoti Choudrie
Department of AI and DS, Vishwakarma Institute of Information Technology, Pune, India
Parikshit N. Mahalle
Department of Computer Science, Faculty of Computer Science and Information Technology, University Putra Malaysia, Serdang, Selangor, Malaysia
Thinagaran Perumal
Global Knowledge Research Foundation, Ahmedabad, Gujarat, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaitra, Y.L., Roopa, M.J., Gopalakrishna, M.T., Swetha, M.D., Aditya, C.R. (2023). Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR. In: Choudrie, J., Mahalle, P.N., Perumal, T., Joshi, A. (eds) IOT with Smart Systems. ICTIS 2023. Lecture Notes in Networks and Systems, vol 720. Springer, Singapore. https://doi.org/10.1007/978-981-99-3761-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-3761-5_8
Published: 31 August 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3760-8
Online ISBN: 978-981-99-3761-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR

Abstract

Access this chapter

Similar content being viewed by others

Scene Text Recognition: No Country for Old Men?

Scene text detection and recognition: a survey

A CCD based machine vision system for real-time text detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR

Abstract

Access this chapter

Similar content being viewed by others

Scene Text Recognition: No Country for Old Men?

Scene text detection and recognition: a survey

A CCD based machine vision system for real-time text detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation