Skip to main content
Log in

STDNet: A CNN-based approach to single-/mixed-script detection

  • S.I. : Verifiability in Systems and Data Engineering
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

Script identification serves as a guide to the detection of the text of the scene through optical character recognition (OCR). But this is not a principal concern for the OCR engine. Until script identification, it is important to identify the script-type because today the text of the scene in natural images does not consist only of a single script, rather mixed-script words at character level are very often encountered. These words are also used in various ways, such as signboards, t-shirt graffiti, hoardings, and banners and often written in artistic way. In this work, a CNN-based deep learning framework, named as STDNet: Script-Type detection Network, was developed to detect single-/mixed-script images. To determine the feasibility of the system presented, tests were also undertaken with an outlier which is composed of a wide range of single scripts. Experiments were performed with over 20K images and 99.53% highest accuracy was reached. This approach was compared to a state-of-the-art deep learning techniques and handcrafted feature-based methodologies where the proposed approach obtained a better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ghosh M, Mukherjee H, Obaidullah SM, Santosh KC, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Proc Comput Sci 167:496–505

    Article  Google Scholar 

  2. Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu D.J, Ng A. Y (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition. IEEE, pp 440–445

  3. Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214–220

    Article  Google Scholar 

  4. Rani R, Dhir R, Lehal GS (2014) Gabor features based script identification of lines within a bilingual/trilingual document. Int J Adv Sci Technol 66:1–12

    Article  Google Scholar 

  5. Pati PB, Ramakrishnan AG (2008) Word level multi-script identification. Pattern Recogn Lett 29(9):1218–1229

    Article  Google Scholar 

  6. Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458

    Article  Google Scholar 

  7. Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2010) A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn 43(10):3507–3521

    Article  Google Scholar 

  8. Busch A, Boles WW, Sridharan S (2005) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732

    Article  Google Scholar 

  9. Aarif KOM, Sivakumar P (2020) Cursive script identification using Gabor features and SVM classifier. Int J Comput Aided Eng Technol 12(3):328–335

    Article  Google Scholar 

  10. Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network. Pattern Recogn 85:172–184

    Article  Google Scholar 

  11. Sheng F, Chen Z, Xu B (2019) NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 781–786

  12. Zhao D, Shivakumara P, Lu S, Tan C (2012) L. New spatial-gradient-features for video script identification. In: 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42

  13. Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96

    Article  Google Scholar 

  14. Khare V, Shivakumara P, Raveendran P (2015) A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video. Expert Syst Appl 42(21):7627–7640

    Article  Google Scholar 

  15. Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 23rd International conference on pattern recognition (ICPR). IEEE, pp 4053–4058

  16. Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679

    Article  Google Scholar 

  17. Wojna Z, Gorban A. N, Lee D. S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850

  18. Nguyen DT, Pham TD, Baek NR, Park KR (2018) Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors. Sensors 18(3):699

  19. Zhang D, Han X, Deng C (2018) Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J Power Energy Syst 4(3):362–370

    Article  Google Scholar 

  20. Abreu E, Lightstone M, Mitra SK, Arakawa K (1996) A new efficient approach for the removal of impulse noise from highly corrupted images. IEEE Trans Image Process 5(6):1012–1025

    Article  Google Scholar 

  21. Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 13th ICDAR. IEEE, pp 1196–1200

  22. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.1580–1589

  23. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261

  24. Nanni L, Lumini A, Brahnam S (2012) Survey on LBP based texture descriptors for image classification. Expert Syst Appl 39(3):3634–3641

    Article  Google Scholar 

  25. Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806

    Article  Google Scholar 

  26. Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2009) WLD: a robust local image descriptor. IEEE Trans PAMI 32(9):1705–1720

    Article  Google Scholar 

  27. Ren X, Malik J (2003) Learning a classification model for segmentation. In: IEEE international conference on computer vision, vol 2. IEEE Computer Society, pp 10–10

  28. Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: European conference on computer vision. Springer, Berlin, pp 86–99

  29. Thepade S. D, Kalbhor M. M (2015) Extended performance appraise of Bayes, Function, Lazy, Rule, Tree data mining classifier in novel transformed fractional content based image classification. In: ICPC. IEEE, pp 1–6

  30. Ma J, Yuan Y (2019) Dimension reduction of image deep feature using PCA. J Vis Commun Image Represent 63:102578

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. STDNet: A CNN-based approach to single-/mixed-script detection. Innovations Syst Softw Eng 17, 277–288 (2021). https://doi.org/10.1007/s11334-021-00395-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-021-00395-6

Keywords

Navigation