A light-weight natural scene text detection and recognition system

Ghosh, Jyoti; Talukdar, Anjan Kumar; Sarma, Kandarpa Kumar

doi:10.1007/s11042-023-15696-0

A light-weight natural scene text detection and recognition system

Published: 13 June 2023

Volume 83, pages 6651–6683, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jyoti Ghosh ORCID: orcid.org/0000-0003-3300-9442¹,
Anjan Kumar Talukdar¹ &
Kandarpa Kumar Sarma¹

439 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Scene text recognition is an application of Computer Vision that analyses the scene image and recognizes the text present on it. This task has many applications and will gain more importance if it can be used in handheld devices. The problem with existing methods is that if the model has a huge number of parameters and complex architectures, then the model will have a huge file size which will be problematic to deploy the application on mobile devices. Therefore, the aim of this paper is to propose a light-weight model that is a model with less number of parameters, small file size and less complexity that can be used in platforms with limited resources while achieving a comparable accuracy with those of the heavy weight models. The proposed models rely on deep learning to handle most of the steps automatically, consume less time and give precise results after facing many challenges. The proposed scene text recognition model is in the form of a Convolutional-Recurrent Neural network where the Convolution network extracts the features from the cropped images of scene text and the Recurrent network processes the sequential data of varying length present in the cropped images. After training, the scene text recognition model generates a weight file of 12 MB with 1 M parameters. To reduce number of parameters, weight of files and to show trade-off between efficiency and accuracy, MobileNetV2 is used in place of Convolution network that generates weight file of 6 MB with 0.5 M parameters. The performance on ICDAR 2013, IIIT 5K and Total-Text datasets shows that the proposed work performs well in detecting and recognizing texts from natural scene images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A detector for page-level handwritten music object recognition based on deep learning

Article 20 January 2023

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Availability of data and materials

The details of data and materials are given under Section 3 of this paper.

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G-S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. https://www.tensorflow.org/
Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 4715–4723
Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition. pp 9365–9374
Bagi R, Dutta T, Gupta HP (2020) Cluttered textspotter: An end-to-end trainable light-weight scene text spotter for cluttered environment. IEEE Access 8:111433–111447
Bisong E (2019) Google Colaboratory. pp 59–64. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-87 https://doi.org/10.1007/978-1-4842-4470-8-7
Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp 71–79
Bradski G (2000) The opencv library. Dr. Dobb’s Journal of Software Tools
Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention:Towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision. pp 5076–5084
Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp. 935–942. https://doi.org/10.1109/ICDAR.2017.157
Chollet F, et al (2015) Keras. https://keras.io
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32
Fu K, Sun L, Kang X, Ren F (2019) Text detection for natural scene based on mobilenet v2 and u-net. In: 2019 IEEE International Conference on Mechatronics and Automation (ICMA). pp 1560–1564. https://doi.org/10.1109/ICMA.2019.8816384
Ghosh J, Talukdar AK, Sarma KK (2021) Design of a light-weight natural scene text detection system. In: 2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP). pp 21–227. https://doi.org/10.1109/ICCCSP52374.2021.9465515
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del R’ıo JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2
He Y, Chen C, Zhang J, Liu J, He F, Wang C, Du B (2021) Visual semantics allow for textual reasoning better in scene text recognition. arXiv preprint arXiv:2112.12916
Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology. 1.https://doi.org/10.1109/TCSVT.2021.3074259
Hunter JD (2007) Matplotlib: A 2d graphics environment. Computing in Science Engineering 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. International journal of computer vision 116(1):1–20
Kluyver T, Ragan-Kelley B, P’erez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C (2016) Jupyter Notebooks - a Publishing Format for Reproducible Computational Workflows. In: Loizides F, Schmidt B (eds.) Positioning and Power in Academic Publishing:Players, Agents and Agendas. pp 87–90. IOS Press
Li H, Wang W (2020) Reinterpreting ctc training as iterative fitting. Pattern Recognition 105:107392
Article Google Scholar
Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968. https://doi.org/10.1109/JSTSP.2020.3002391
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 31
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1474–11481
Li B, Tang X, Qi X, Chen Y, Xiao R (2020) Hamming ocr: A locality sensitive hashing neural network for scene text recognition. arXiv preprint arXiv:2009.10874
Liu Z, Shen Q, Wang C (2018) Text detection in natural scene images with text line construction. In: 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP). pp 59–63. https://doi.org/10.1109/ICICSP.2018.8549799
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 20–36
Lu N, Yu W, Qi X, Chen Y, Gong P, Xiao R, Bai X (2021) Maser: Multi-aspect non-local network for scene text recognition. Pattern Recognition 117:107980
Article Google Scholar
Lundh F, Lundh F (2001) Python Standard Library. O’Reilly & amp; Associates, Inc., USA
Munjal RS, Prabhu AD, Arora N, Moharana S, Ramena G (2021) Stride: Scene text recognition in-device. arXiv preprint arXiv:2105.07795
Ong YL, Lau B, Chai A, Mccarthy C (2018) A model for automatic recognition of vertical texts in natural scene images. https://doi.org/10.1109/ICCSCE.2018.8685019
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Systems with Applications 41:8027–8048. https://doi.org/10.1016/j.eswa.2014.07.008
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2:Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence. 39(11):2298–2304
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4168–4176
Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2019) Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9):2035–2048. https://doi.org/10.1109/TPAMI.2018.2848939
Umesh P (2012) Image processing in python. CSI Communications 23
Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA
Google Scholar
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 9328–9337. https://doi.org/10.1109/CVPR.2019.00956
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 8439–8448 . https://doi.org/10.1109/ICCV.2019.00853
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp 9038–9045
Ye J, Chen Z, Liu J, Du B (2020) Textfusenet: Scene text detection with richer fused features. In: Bessiere, C. (ed.) Proceedings of the Twenty Ninth International Joint Conference on Artificial Intelligence, IJCAI International Joint Conferences on Artificial Intelligence Organization, ??? 20:pp 516-522. https://doi.org/10.24963/ijcai.2020/72 Main track. https://doi.org/10.24963/ijcai.2020/72
Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 12113–12122
Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology. 1.https://doi.org/10.1109/TCSVT.2021.3074259

Download references

Funding

This project is funded by the author itself.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Gauhati University, Jalukbari, Guwahati, 781014, Assam, India
Jyoti Ghosh, Anjan Kumar Talukdar & Kandarpa Kumar Sarma

Authors

Jyoti Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Anjan Kumar Talukdar
View author publications
You can also search for this author in PubMed Google Scholar
Kandarpa Kumar Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jyoti Ghosh proposed the models, performed the experiment and wrote this paper. Anjan Kumar Talukdar supervised the whole work and the experiments conducted. Kandarpa Kumar Sarma supervised the originality and novelty of the whole work. The manuscript is reviewed and edited by all the authors. The final manuscript is approved by all the authors.

Corresponding author

Correspondence to Jyoti Ghosh.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no Conflict of interest or Competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ghosh, J., Talukdar, A.K. & Sarma, K.K. A light-weight natural scene text detection and recognition system. Multimed Tools Appl 83, 6651–6683 (2024). https://doi.org/10.1007/s11042-023-15696-0

Download citation

Received: 20 September 2021
Revised: 15 January 2022
Accepted: 23 April 2023
Published: 13 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15696-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A light-weight natural scene text detection and recognition system

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A detector for page-level handwritten music object recognition based on deep learning

A review of convolutional neural networks in computer vision

Availability of data and materials

References

Funding