Skip to main content
Log in

A light-weight natural scene text detection and recognition system

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scene text recognition is an application of Computer Vision that analyses the scene image and recognizes the text present on it. This task has many applications and will gain more importance if it can be used in handheld devices. The problem with existing methods is that if the model has a huge number of parameters and complex architectures, then the model will have a huge file size which will be problematic to deploy the application on mobile devices. Therefore, the aim of this paper is to propose a light-weight model that is a model with less number of parameters, small file size and less complexity that can be used in platforms with limited resources while achieving a comparable accuracy with those of the heavy weight models. The proposed models rely on deep learning to handle most of the steps automatically, consume less time and give precise results after facing many challenges. The proposed scene text recognition model is in the form of a Convolutional-Recurrent Neural network where the Convolution network extracts the features from the cropped images of scene text and the Recurrent network processes the sequential data of varying length present in the cropped images. After training, the scene text recognition model generates a weight file of 12 MB with 1 M parameters. To reduce number of parameters, weight of files and to show trade-off between efficiency and accuracy, MobileNetV2 is used in place of Convolution network that generates weight file of 6 MB with 0.5 M parameters. The performance on ICDAR 2013, IIIT 5K and Total-Text datasets shows that the proposed work performs well in detecting and recognizing texts from natural scene images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of data and materials

The details of data and materials are given under Section 3 of this paper.

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G-S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. https://www.tensorflow.org/

  2. Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 4715–4723

  3. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition. pp 9365–9374

  4. Bagi R, Dutta T, Gupta HP (2020) Cluttered textspotter: An end-to-end trainable light-weight scene text spotter for cluttered environment. IEEE Access 8:111433–111447

  5. Bisong E (2019) Google Colaboratory. pp 59–64. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-87https://doi.org/10.1007/978-1-4842-4470-8-7

  6. Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp 71–79

  7. Bradski G (2000) The opencv library. Dr. Dobb’s Journal of Software Tools

  8. Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention:Towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision. pp 5076–5084

  9. Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 935–942. https://doi.org/10.1109/ICDAR.2017.157

  10. Chollet F, et al (2015) Keras. https://keras.io

  11. Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32

  12. Fu K, Sun L, Kang X, Ren F (2019) Text detection for natural scene based on mobilenet v2 and u-net. In: 2019 IEEE International Conference on Mechatronics and Automation (ICMA). pp 1560–1564. https://doi.org/10.1109/ICMA.2019.8816384

  13. Ghosh J, Talukdar AK, Sarma KK (2021) Design of a light-weight natural scene text detection system. In: 2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP). pp 21–227. https://doi.org/10.1109/ICCCSP52374.2021.9465515

  14. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del R’ıo JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2

  15. He Y, Chen C, Zhang J, Liu J, He F, Wang C, Du B (2021) Visual semantics allow for textual reasoning better in scene text recognition. arXiv preprint arXiv:2112.12916

  16. Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology. 1.https://doi.org/10.1109/TCSVT.2021.3074259

  17. Hunter JD (2007) Matplotlib: A 2d graphics environment. Computing in Science Engineering 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55

  18. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903

  19. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. International journal of computer vision 116(1):1–20

  20. Kluyver T, Ragan-Kelley B, P’erez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C (2016) Jupyter Notebooks - a Publishing Format for Reproducible Computational Workflows. In: Loizides F, Schmidt B (eds.) Positioning and Power in Academic Publishing:Players, Agents and Agendas. pp 87–90. IOS Press

  21. Li H, Wang W (2020) Reinterpreting ctc training as iterative fitting. Pattern Recognition 105:107392

    Article  Google Scholar 

  22. Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968. https://doi.org/10.1109/JSTSP.2020.3002391

  23. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 31

  24. Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1474–11481

  25. Li B, Tang X, Qi X, Chen Y, Xiao R (2020) Hamming ocr: A locality sensitive hashing neural network for scene text recognition. arXiv preprint arXiv:2009.10874

  26. Liu Z, Shen Q, Wang C (2018) Text detection in natural scene images with text line construction. In: 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP). pp 59–63. https://doi.org/10.1109/ICICSP.2018.8549799

  27. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 20–36

  28. Lu N, Yu W, Qi X, Chen Y, Gong P, Xiao R, Bai X (2021) Maser: Multi-aspect non-local network for scene text recognition. Pattern Recognition 117:107980

    Article  Google Scholar 

  29. Lundh F, Lundh F (2001) Python Standard Library. O’Reilly & amp; Associates, Inc., USA

  30. Munjal RS, Prabhu AD, Arora N, Moharana S, Ramena G (2021) Stride: Scene text recognition in-device. arXiv preprint arXiv:2105.07795

  31. Ong YL, Lau B, Chai A, Mccarthy C (2018) A model for automatic recognition of vertical texts in natural scene images. https://doi.org/10.1109/ICCSCE.2018.8685019

  32. Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Systems with Applications 41:8027–8048. https://doi.org/10.1016/j.eswa.2014.07.008

  33. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2:Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  34. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence. 39(11):2298–2304

  35. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4168–4176

  36. Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2019) Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9):2035–2048. https://doi.org/10.1109/TPAMI.2018.2848939

  37. Umesh P (2012) Image processing in python. CSI Communications 23

  38. Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA

    Google Scholar 

  39. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 9328–9337. https://doi.org/10.1109/CVPR.2019.00956

  40. Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 8439–8448 . https://doi.org/10.1109/ICCV.2019.00853

  41. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp 9038–9045

  42. Ye J, Chen Z, Liu J, Du B (2020) Textfusenet: Scene text detection with richer fused features. In: Bessiere, C. (ed.) Proceedings of the Twenty Ninth International Joint Conference on Artificial Intelligence, IJCAI International Joint Conferences on Artificial Intelligence Organization, ??? 20:pp 516-522. https://doi.org/10.24963/ijcai.2020/72 Main track. https://doi.org/10.24963/ijcai.2020/72

  43. Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 12113–12122

  44. Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology. 1.https://doi.org/10.1109/TCSVT.2021.3074259

Download references

Funding

This project is funded by the author itself.

Author information

Authors and Affiliations

Authors

Contributions

Jyoti Ghosh proposed the models, performed the experiment and wrote this paper. Anjan Kumar Talukdar supervised the whole work and the experiments conducted. Kandarpa Kumar Sarma supervised the originality and novelty of the whole work. The manuscript is reviewed and edited by all the authors. The final manuscript is approved by all the authors.

Corresponding author

Correspondence to Jyoti Ghosh.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no Conflict of interest or Competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Fig. 15
figure 15

Detected and recognized scene text on Street View Text dataset

Appendix B

Fig. 16
figure 16

Detected and recognized scene text on CUTE 80 dataset

Fig. 17
figure 17

Recognized Scene Text on IIIT 5K Dataset

Appendix C

Appendix D

Fig. 18
figure 18

Detected and recognized scene text on real time images

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, J., Talukdar, A.K. & Sarma, K.K. A light-weight natural scene text detection and recognition system. Multimed Tools Appl 83, 6651–6683 (2024). https://doi.org/10.1007/s11042-023-15696-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15696-0

Keywords

Navigation