Abstract
Script identification serves as a guide to the detection of the text of the scene through optical character recognition (OCR). But this is not a principal concern for the OCR engine. Until script identification, it is important to identify the script-type because today the text of the scene in natural images does not consist only of a single script, rather mixed-script words at character level are very often encountered. These words are also used in various ways, such as signboards, t-shirt graffiti, hoardings, and banners and often written in artistic way. In this work, a CNN-based deep learning framework, named as STDNet: Script-Type detection Network, was developed to detect single-/mixed-script images. To determine the feasibility of the system presented, tests were also undertaken with an outlier which is composed of a wide range of single scripts. Experiments were performed with over 20K images and 99.53% highest accuracy was reached. This approach was compared to a state-of-the-art deep learning techniques and handcrafted feature-based methodologies where the proposed approach obtained a better performance.
Similar content being viewed by others
References
Ghosh M, Mukherjee H, Obaidullah SM, Santosh KC, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Proc Comput Sci 167:496–505
Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu D.J, Ng A. Y (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition. IEEE, pp 440–445
Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214–220
Rani R, Dhir R, Lehal GS (2014) Gabor features based script identification of lines within a bilingual/trilingual document. Int J Adv Sci Technol 66:1–12
Pati PB, Ramakrishnan AG (2008) Word level multi-script identification. Pattern Recogn Lett 29(9):1218–1229
Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2010) A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn 43(10):3507–3521
Busch A, Boles WW, Sridharan S (2005) Texture for script identification. IEEE Trans Pattern Anal Mach Intell 27(11):1720–1732
Aarif KOM, Sivakumar P (2020) Cursive script identification using Gabor features and SVM classifier. Int J Comput Aided Eng Technol 12(3):328–335
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network. Pattern Recogn 85:172–184
Sheng F, Chen Z, Xu B (2019) NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 781–786
Zhao D, Shivakumara P, Lu S, Tan C (2012) L. New spatial-gradient-features for video script identification. In: 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Khare V, Shivakumara P, Raveendran P (2015) A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video. Expert Syst Appl 42(21):7627–7640
Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 23rd International conference on pattern recognition (ICPR). IEEE, pp 4053–4058
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Wojna Z, Gorban A. N, Lee D. S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850
Nguyen DT, Pham TD, Baek NR, Park KR (2018) Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors. Sensors 18(3):699
Zhang D, Han X, Deng C (2018) Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J Power Energy Syst 4(3):362–370
Abreu E, Lightstone M, Mitra SK, Arakawa K (1996) A new efficient approach for the removal of impulse noise from highly corrupted images. IEEE Trans Image Process 5(6):1012–1025
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 13th ICDAR. IEEE, pp 1196–1200
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.1580–1589
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261
Nanni L, Lumini A, Brahnam S (2012) Survey on LBP based texture descriptors for image classification. Expert Syst Appl 39(3):3634–3641
Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806
Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2009) WLD: a robust local image descriptor. IEEE Trans PAMI 32(9):1705–1720
Ren X, Malik J (2003) Learning a classification model for segmentation. In: IEEE international conference on computer vision, vol 2. IEEE Computer Society, pp 10–10
Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: European conference on computer vision. Springer, Berlin, pp 86–99
Thepade S. D, Kalbhor M. M (2015) Extended performance appraise of Bayes, Function, Lazy, Rule, Tree data mining classifier in novel transformed fractional content based image classification. In: ICPC. IEEE, pp 1–6
Ma J, Yuan Y (2019) Dimension reduction of image deep feature using PCA. J Vis Commun Image Represent 63:102578
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. STDNet: A CNN-based approach to single-/mixed-script detection. Innovations Syst Softw Eng 17, 277–288 (2021). https://doi.org/10.1007/s11334-021-00395-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11334-021-00395-6