STDNet: A CNN-based approach to single-/mixed-script detection

Ghosh, Mridul; Mukherjee, Himadri; Obaidullah, Sk Md; Roy, Kaushik

doi:10.1007/s11334-021-00395-6

STDNet: A CNN-based approach to single-/mixed-script detection

S.I. : Verifiability in Systems and Data Engineering
Published: 27 April 2021

Volume 17, pages 277–288, (2021)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Mridul Ghosh¹,
Himadri Mukherjee²,
Sk Md Obaidullah³ &
…
Kaushik Roy ORCID: orcid.org/0000-0002-3360-7576²

192 Accesses
5 Citations
Explore all metrics

Abstract

Script identification serves as a guide to the detection of the text of the scene through optical character recognition (OCR). But this is not a principal concern for the OCR engine. Until script identification, it is important to identify the script-type because today the text of the scene in natural images does not consist only of a single script, rather mixed-script words at character level are very often encountered. These words are also used in various ways, such as signboards, t-shirt graffiti, hoardings, and banners and often written in artistic way. In this work, a CNN-based deep learning framework, named as STDNet: Script-Type detection Network, was developed to detect single-/mixed-script images. To determine the feasibility of the system presented, tests were also undertaken with an outlier which is composed of a wide range of single scripts. Experiments were performed with over 20K images and 99.53% highest accuracy was reached. This approach was compared to a state-of-the-art deep learning techniques and handcrafted feature-based methodologies where the proposed approach obtained a better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Deep Learning-Based Approach to Single/Mixed Script-Type Identification

Script Identification in Natural Scene Text Images by Learning Local and Global Features on Inception Net

MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

Article Open access 25 August 2023

References

Ghosh M, Mukherjee H, Obaidullah SM, Santosh KC, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Proc Comput Sci 167:496–505
Article Google Scholar
Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu D.J, Ng A. Y (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition. IEEE, pp 440–445
Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214–220
Article Google Scholar
Rani R, Dhir R, Lehal GS (2014) Gabor features based script identification of lines within a bilingual/trilingual document. Int J Adv Sci Technol 66:1–12
Article Google Scholar
Pati PB, Ramakrishnan AG (2008) Word level multi-script identification. Pattern Recogn Lett 29(9):1218–1229
Article Google Scholar
Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Article Google Scholar
Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2010) A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn 43(10):3507–3521
Article Google Scholar
Busch A, Boles WW, Sridharan S (2005) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732
Article Google Scholar
Aarif KOM, Sivakumar P (2020) Cursive script identification using Gabor features and SVM classifier. Int J Comput Aided Eng Technol 12(3):328–335
Article Google Scholar
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network. Pattern Recogn 85:172–184
Article Google Scholar
Sheng F, Chen Z, Xu B (2019) NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 781–786
Zhao D, Shivakumara P, Lu S, Tan C (2012) L. New spatial-gradient-features for video script identification. In: 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Article Google Scholar
Khare V, Shivakumara P, Raveendran P (2015) A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video. Expert Syst Appl 42(21):7627–7640
Article Google Scholar
Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 23rd International conference on pattern recognition (ICPR). IEEE, pp 4053–4058
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Article Google Scholar
Wojna Z, Gorban A. N, Lee D. S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850
Nguyen DT, Pham TD, Baek NR, Park KR (2018) Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors. Sensors 18(3):699
Zhang D, Han X, Deng C (2018) Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J Power Energy Syst 4(3):362–370
Article Google Scholar
Abreu E, Lightstone M, Mitra SK, Arakawa K (1996) A new efficient approach for the removal of impulse noise from highly corrupted images. IEEE Trans Image Process 5(6):1012–1025
Article Google Scholar
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 13th ICDAR. IEEE, pp 1196–1200
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.1580–1589
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261
Nanni L, Lumini A, Brahnam S (2012) Survey on LBP based texture descriptors for image classification. Expert Syst Appl 39(3):3634–3641
Article Google Scholar
Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806
Article Google Scholar
Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2009) WLD: a robust local image descriptor. IEEE Trans PAMI 32(9):1705–1720
Article Google Scholar
Ren X, Malik J (2003) Learning a classification model for segmentation. In: IEEE international conference on computer vision, vol 2. IEEE Computer Society, pp 10–10
Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: European conference on computer vision. Springer, Berlin, pp 86–99
Thepade S. D, Kalbhor M. M (2015) Extended performance appraise of Bayes, Function, Lazy, Rule, Tree data mining classifier in novel transformed fractional content based image classification. In: ICPC. IEEE, pp 1–6
Ma J, Yuan Y (2019) Dimension reduction of image deep feature using PCA. J Vis Commun Image Represent 63:102578
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shyampur Siddheswari Mahavidyalaya, Howrah, India
Mridul Ghosh
Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
Department of Computer Science and Engineering, Aliah University, Kolkata, India
Sk Md Obaidullah

Authors

Mridul Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sk Md Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaushik Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. STDNet: A CNN-based approach to single-/mixed-script detection. Innovations Syst Softw Eng 17, 277–288 (2021). https://doi.org/10.1007/s11334-021-00395-6

Download citation

Received: 12 January 2021
Accepted: 15 April 2021
Published: 27 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11334-021-00395-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STDNet: A CNN-based approach to single-/mixed-script detection

Abstract

Access this article

Similar content being viewed by others

A Deep Learning-Based Approach to Single/Mixed Script-Type Identification

Script Identification in Natural Scene Text Images by Learning Local and Global Features on Inception Net

MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

STDNet: A CNN-based approach to single-/mixed-script detection

Abstract

Access this article

Similar content being viewed by others

A Deep Learning-Based Approach to Single/Mixed Script-Type Identification

Script Identification in Natural Scene Text Images by Learning Local and Global Features on Inception Net

MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation