Skip to main content

Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR

  • Conference paper
  • First Online:
IOT with Smart Systems ( ICTIS 2023)

Abstract

Detecting text location and recognizing them from scene images remains one of the most challenging and enduring research problems in the field of computer vision. Over the past few decades, various researchers working hard to increase the accuracy of text recognition in scene images by considering several challenges. Because of this, there is a high need for commercial text recognizers for natural scenes. Conventional optical character recognition (OCR) demands clean backgrounds, crisp layouts, and neat text, which are frequently not met by natural scene images. A major challenge in scene text recognition is its orientation. There are various orientations like horizontal, vertical, diagonal and off-diagonal. Sometimes these orientations are in curved nature than straight. Another challenge is texts are embedded with un-uniform backgrounds and complicated environments. The extraction of such text is harrowing because of noisy backgrounds, diverse fonts, and text sizes. In this research, a comprehensive solution for detecting text using Faster RCNN and EasyOCR for text recognition is used to increase accuracy. The proposed algorithm is tested on benchmark datasets such as ICDAR13 and ICDAR15. F-score improves by 2.1 and 1.4% for detection on ICDAR13 and 15 datasets, respectively. Also noticed that 1.8% of improvement in recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ling OY, Theng LB, Weiyen AC, Mccarthy C (2021) Development of vertical text interpreter for natural scene images. IEEE Access 9:144341–144351

    Google Scholar 

  2. Chen Y, Yang J (2020) Research on scene text recognition algorithm based on improved CRNN. In: ICDSP’20: Proceedings of the 2020 4th International Conference on Digital Signal Processing, pp 107–111

    Google Scholar 

  3. Chandio AA, Asikuzzaman MD, Pickering MR, Leghari M (2022) Cursive text recognition in natural scene images using deep convolutional recurrent neural network. IEEE Access 10:10062–10078

    Google Scholar 

  4. Hendry, Chen R-C (2019) Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis Comput 87:47–56

    Google Scholar 

  5. Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the 21st AAAI conference on artificial intelligence, pp 4103–4110

    Google Scholar 

  6. Zhu Y, Liao M, Liu W, Yang M (2018) Cascaded segmentation-detection networks for text-based traffic sign detection. IEEE Trans Intell Transp Syst 19(1):209–219

    Google Scholar 

  7. Feng W, Yin F, Zhang X-Y, Liu C-L (2021) Semantic-aware video text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1695–1705

    Google Scholar 

  8. Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognit 96:106954

    Google Scholar 

  9. Liang Q, Xiang S, Wang Y, Sun W, Zhang D (2020) RNTR-Net: a robust natural text recognition network. IEEE Access 8:7719–7730. https://doi.org/10.1109/ACCESS.2020.2964148

  10. Mayank, Bhowmick S, Kotecha D, Rege PP (2021) Natural scene text detection using deep neural networks. In: 2021 6th International conference for convergence in technology (I2CT). IEEE, pp 1–6. https://doi.org/10.1109/I2CT51068.2021.9418116

  11. Liu X, Kawanishi T, Wu X, Kashino K (2016) Scene text recognition with high performance CNN classifier and efficient word inference. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1322–1326. https://doi.org/10.1109/ICASSP.2016.7471891

  12. Cao Y, Ma S, Pan H (2020) FDTA: fully convolutional scene text detection with text attention. IEEE Access 8:155441–155449. https://doi.org/10.1109/ACCESS.2020.3018784

  13. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. arXiv:1412.5903, arXiv:1412.5903v5, https://doi.org/10.48550/arXiv.1412.5903

  14. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160

    Google Scholar 

  15. Lei Z, Zhao S, Song H, Shen J (2018) Scene text recognition using residual convolutional recurrent neural network. Mach Vis Appl 29(5):861–871. https://doi.org/10.1007/s00138-018-0942-y

    Article  Google Scholar 

  16. Khan G, Tariq Z, Khan MUG () Multi-person tracking based on faster R-CNN and deep appearance features. In: Visual object tracking with deep neural networks, pp 25–47

    Google Scholar 

  17. Smelyakov K, Chupryna A, Darahan D, Midina S (2021) Effectiveness of modern text recognition solutions and tools for common data sources. In: COLINS-2021: 5th International conference on computational linguistics and intelligent systems

    Google Scholar 

  18. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazàn JA, de las Heras LP (2013) ICDAR 2013 robust reading competition. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 1484–1493

    Google Scholar 

  19. Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002, arXiv:1606.09002v2, https://doi.org/10.48550/arXiv.1606.09002

  20. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision. ECCV 2016: Computer vision. Lecture notes in computer science, vol 9912. Springer, Cham, pp 56–72

    Google Scholar 

  21. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5551–5560

    Google Scholar 

  22. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS 2015), vol 28

    Google Scholar 

  23. Lokkondra CY, Ramegowda D, Thimmaiah GM, Vijaya APB, Shivananjappa MH (2021) ETDR: an exploratory view of text detection and recognition in images and videos. Rev d’Intell Artif 35(5):383–393

    Google Scholar 

  24. Chaitra YL, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of YOLOv5 on text detection and recognition system using TesseractOCR in images/video frames. In: 2022 IEEE International conference on data science and information system (ICDSIS). IEEE, pp 1–6

    Google Scholar 

  25. Lokkondra CY, Ramegowda D, Thimmaiah GM, Prakash A, Vijaya B (2022) DEFUSE: deep fused end-to-end video text detection and recognition. Rev d’Intell Artif 36(3):459–466

    Google Scholar 

  26. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2550–2558

    Google Scholar 

  27. Deng D, Liu H, Li X, Cai D (2018) PixelLink: detecting scene text via instance segmentation. In: Proceedings of the AAAI-2018, pp. 6773–6780. arXiv:1801.01315, arXiv:1801.01315v1, https://doi.org/10.48550/arXiv.1801.01315

  28. Liao M, Shi B, Bai X (2018) Text boxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690

    Google Scholar 

  29. Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 785–792

    Google Scholar 

  30. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image–based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal. Mach Intell 39(11):2298–2304

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Y. L. Chaitra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chaitra, Y.L., Roopa, M.J., Gopalakrishna, M.T., Swetha, M.D., Aditya, C.R. (2023). Text Detection and Recognition from the Scene Images Using RCNN and EasyOCR. In: Choudrie, J., Mahalle, P.N., Perumal, T., Joshi, A. (eds) IOT with Smart Systems. ICTIS 2023. Lecture Notes in Networks and Systems, vol 720. Springer, Singapore. https://doi.org/10.1007/978-981-99-3761-5_8

Download citation

Publish with us

Policies and ethics