Skip to main content
Log in

Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNN

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Advanced Driver-Assistance Systems (ADAS) are becoming more and more topical projects for researchers. Their goal is to help people drive their cars easily. The increasing number of accidents reveals the great need of humans for human assistance by machines. In this paper, we propose a new framework using a single camera to detect text in natural scenes. First, we propose a filtering phase where we extract candidate text regions using pseudo-Zernike moments. Then, we propose a new convolutional neural network architecture (Scene Text Detection Network - STDN) for the classification phase. The results show that the proposed model reached ≈ 40 fps and an mAP of 88.12 %, thus a low computing time with a competitive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Ansari MA, Dixit M (2017) An image retrieval framework: a review. International Journal of Advanced Research in Computer Science 8(4)

  2. Cheng Z, Lu J, Zou B, Qiao L, Xu Y, Pu S, Niu Y, Wu F, Zhou S (2021) Free: a fast and robust end-to-end video text spotter. IEEE Trans Image Process 30:822–837. https://doi.org/10.1109/TIP.2020.3038520

    Article  Google Scholar 

  3. Chong CW, Raveendran P, Mukundan R (2003) An efficient algorithm for fast computation of pseudo-zernike moments. Int J Pattern Recognit Artif Intell 17(6):1011–1023. https://doi.org/10.1142/S0218001403002769

    Article  MATH  Google Scholar 

  4. Dai X, Liu T, Shu H, Luo L (2013) Pseudo-zernike moment invariants to blur degradation and their use in image recognition. In: Yang J, Fang F, Sun C (eds) Intelligent science and intelligent data engineering. Springer, Berlin, pp 90–97

  5. Dargan S, Kumar M, Garg A, Thakur K (2020) Writer identification system for pre-segmented offline handwritten devanagari characters using k-nn and svm. Soft Comput 24(13):10111–10122. https://doi.org/10.1007/s00500-019-04525-y

    Article  Google Scholar 

  6. Fernandez C (2018) Learning from imbalanced data sets, 1 edn. Springer; 1st ed 2018 edition

  7. Fujitake M, Ge H (2021) Temporally-aware convolutional block attention module for video text detection. In: 2021 IEEE International conference on systems, man, and cybernetics (SMC), pp 220–225. https://doi.org/10.1109/SMC52423.2021.9658799

  8. Ghoshal R, Banerjee A (2018) An improved scene text and document image binarization scheme. In: 2018 4th International conference on recent advances in information technology (RAIT), pp 1–6. https://doi.org/10.1109/RAIT.2018.8389021

  9. Goodfellow I, Bengio Y, Courville A Deep learning. MIT Press (2016). Accessed 25 Aug 2022. http://www.deeplearningbook.org

  10. Haifeng D, Siqi H (2020) Natural scene text detection based on yolo v2 network model. J Phys Conf Ser 1634:012013. https://doi.org/10.1088/1742-6596/1634/1/012013

    Article  Google Scholar 

  11. He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications, 1 edn. Springer; 1st ed 2018 edition

  12. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385

  13. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: 2017 IEEE International conference on computer vision (ICCV), pp 3066–3074. https://doi.org/10.1109/ICCV.2017.331

  14. He W, Zhang XY, Yin F, Liu CL (2017) Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 745–753. https://doi.org/10.1109/ICCV.2017.87

  15. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580

  16. Hosny K (2011) Accurate pseudo-zernike moment invariants for gray-level images. Imaging Science Journal The, 60. https://doi.org/10.1179/1743131X11Y.0000000023

  17. Jose GV (2019) Useful plots to diagnose your neural network. https://towardsdatascience.com/useful-plots-to-diagnose-your-neural-network-521907fa2f45. Accessed 26 Dec 2020

  18. Kan C, Srinath MD (2002) Invariant character recognition with zernike and orthogonal fourier-mellin moments. Pattern Recogn 35(1):143–154

    Article  MATH  Google Scholar 

  19. Kumar M, Jindal M, Narang S (2019) Devanagari ancient documents recognition using statistical feature extraction techniques. Sadhana 44:1–8. https://doi.org/10.1007/s12046-019-1126-9

    Google Scholar 

  20. Kumar M, Jindal M, Sharma R (2017) Offline handwritten gurmukhi character recognition: analytical study of different transformations. Proc Nat Acad Sci India - Section A 87:137–143. https://doi.org/10.1007/s40010-016-0284-y

    Article  Google Scholar 

  21. Kumar M, Jindal MK, Sharma RK (2020) Performance evaluation of classifiers for the recognition of offline handwritten gurmukhi characters and numerals: a study. Artif Intell Rev 53(3):2075–2097. https://doi.org/10.1007/s10462-019-09727-2

    Article  Google Scholar 

  22. Kumar M, Jindal MK, Sharma RK, Jindal SR (2019) Character and numeral recognition for non-indic and indic scripts: a survey. Artif Intell Rev 52(4):2235–2261. https://doi.org/10.1007/s10462-017-9607-x

    Article  Google Scholar 

  23. Kumar M, Jindal MK, Sharma RK, RaniJindal S (2018) Performance comparison of several feature selection techniques for offline handwritten character recognition. In: 2018 International conference on research in intelligent and computing in engineering (RICE), pp 1–6. https://doi.org/10.1109/RICE.2018.8509076

  24. Kumar M, Jindal SR, Jindal MK, Lehal GS (2019) Improved recognition results of medieval handwritten gurmukhi manuscripts using boosting and bagging methodologies. Neural Process Lett 50(1):43–56. https://doi.org/10.1007/s11063-018-9913-6

    Article  Google Scholar 

  25. Liu Z, Zhou W, Li H (2019) Scene text detection with fully convolutional neural networks. Multimed Tools Appl 78(13):18205–18227. https://doi.org/10.1007/s11042-019-7177-4

    Article  Google Scholar 

  26. Long S, He X, Yao C (2020) Scene text detection and recognition: the deep learning era. International Journal of Computer Vision. https://doi.org/10.1007/s11263-020-01369-0

  27. Mukundan R, Ramakrishnan KR (1998) Moment functions in image analysis theory and applications. World scientific

  28. Narang SR, Jindal MK, Ahuja S, Kumar M (2020) On the recognition of devanagari ancient handwritten characters using sift and gabor features. Soft Comput 27(22):17279–17289. https://doi.org/10.1007/s00500-020-05018-z

    Article  Google Scholar 

  29. Narang SR, Jindal MK, Kumar M (2019) Devanagari ancient character recognition using dct features with adaptive boosting and bootstrap aggregating. Soft Comput 23(24):13603–13614. https://doi.org/10.1007/s00500-019-03897-5

    Article  Google Scholar 

  30. Papakostas GA, Boutalis YS, Karras DA, Mertzios BG (2010) Efficient computation of zernike and pseudo-zernike moments for pattern classification applications. Pattern Recogn Image Anal 20:56–64. https://doi.org/10.1134/S1054661810010050

    Article  Google Scholar 

  31. P.S H, Pujari J (2008) Content based image retrieval using color boosted salient points and shape features of an image. International Journal of Image Processing (IJIP) 2(1)

  32. Reddy S, Mathew M, Gomez L, Rusinol M, Karatzas D, Jawahar CV (2020) Roadtext-1k: text detection & recognition dataset for driving videos. In: 2020 IEEE International conference on robotics and automation, (ICRA) 2020, Paris, France, May 31 - August 31, 2020, pp 11074–11080. IEEE. https://doi.org/10.1109/ICRA40945.2020.9196577

  33. Shin HC, Orton M, Collins D, Doran S, Leach M (2016) Chapter 7 - organ detection using deep learning. In: Zhou SK (ed) Medical image recognition, segmentation and parsing, pp 123–153. Academic Press. https://doi.org/10.1016/B978-0-12-802581-9.00007-X

  34. Singh C, Walia E, Sharma P, Upneja R (2012) Analysis of algorithms for fast computation of pseudo zernike moments and their numerical stability. Digit Signal Process 22(6):1031–1043. https://doi.org/10.1016/j.dsp.2012.06.009

    Article  MathSciNet  Google Scholar 

  35. Sravani M, Maheswararao A, Murthy MK (2020) Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-10113-2

  36. Su P (2016) Coco-text explorer. In: Cornell University CS Department MEng Report

  37. Toro V, Alejandro M (2015) Fast text detection for road scenes. Master’s thesis, Department of Computer Science. University of Applied Sciences Bonn-Rhein-Sieg, Bonn-Rhein-Sieg

    Google Scholar 

  38. Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140

  39. Wang J, Hu H, Lu X (2020) Adn for object detection. IET Comput Vis 14(2):65–72. https://doi.org/10.1049/iet-cvi.2018.5651

    Article  Google Scholar 

  40. Wang X, min Hou L (2010) A new robust digital image watermarking based on pseudo-zernike moments. Multidimens Syst Signal Process 21(2):179–196. https://doi.org/10.1007/s11045-009-0096-1

    Article  MathSciNet  MATH  Google Scholar 

  41. Wei X, Manna D, Weihang W (2020) Text detection design based on deep neural network. In: Proceedings of the 2020 international conference on aviation safety and information technology. https://doi.org/10.1145/3434581.3434705. Association for Computing Machinery, New York, pp 638–642

  42. Wu Z, He S (2020) Improvement of the alexnet networks for large-scale recognition applications. Iranian Journal of Science and Technology, Transactions of Electrical Engineering. https://doi.org/10.1007/s40998-020-00388-4

    Google Scholar 

  43. Zharikov I, Nikitin F, Vasiliev I, Dokholyan V (2019) Ddi-100: dataset for text detection and recognition arXiv e-prints

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guezouli Larbi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Larbi, G. Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNN. Multimed Tools Appl 82, 10595–10616 (2023). https://doi.org/10.1007/s11042-022-13690-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13690-6

Keywords

Navigation