Skip to main content

Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images


Text obtained in natural scenes contains various information; therefore, it is extensively used in various applications to understand the image scenarios and also to retrieve the visual information. The semantic information provided by this scene image is very much valuable for human beings to realize the whole environment. But the text in such natural images depicts a flexible appearance in an unconstrained environment which makes the text identification and character recognition process a more challenging one. Therefore, a weighted naïve Bayes classifier (WNBC)-based deep learning process is used in this framework to effectively detect the text and to recognize the character from the scene images. Normally, the natural scene images may carry some kind of noise in it, and to remove that, the guided image filter is introduced at the pre-processing stage. The features that are useful for the classification process are extracted using the Gabor transform and stroke width transform techniques. Finally, with these extracted features, the text detection and character recognition is successfully achieved by WNBC and deep neural network-based adaptive galactic swarm optimization. Then, the performance metrics such as accuracy, F1-score, precision, mean absolute error, mean square error and recall metrics are evaluated to estimate the adeptness of the proposed method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  • Ahmed SB, Naz S, Razzak MI and Yusof R (2018) Cursive scene text analysis by deep convolutional linear pyramids. In: International conference on neural information processing, Springer, Cham, pp 307–318

  • Ahmed SB, Naz S, Razzak MI, Yusof RB (2019) A novel dataset for english-arabic scene text recognition (EASTR)-42 K and its evaluation using invariant feature extraction on detected extremal regions. IEEE Access 7:19801–19820

    Article  Google Scholar 

  • Ali A, Pickering M, Shafi K (2018) Urdu natural scene character recognition using convolutional neural networks. In: 2018 IEEE 2nd international workshop on arabic and derived script analysis and recognition (ASAR), IEEE, pp 29–34

  • Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566

    Article  Google Scholar 

  • Ansari GJ, Shah JH, Yasmin M, Sharif M, Fernandes SL (2018) A novel machine learning approach for scene text extraction. Future Gener Comput Syst 87:328–340

    Article  Google Scholar 

  • Bagchi C, Amali DGB and Dinakaran M (2019) Accurate facial ethnicity classification using artificial neural networks trained with galactic swarm optimization algorithm. In: Information systems design and intelligent applications, pp 123–132

  • Baliarsingh SK, Vipsita S, Muhammad K, Bakshi S (2019) Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer. Swarm Evolut Comput 48:262–273

    Article  Google Scholar 

  • Baran R, Partila P, Wilk R (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. In: International conference on intelligent human systems integration, Springer, Cham, pp 42–48

  • Bernal E et al (2018a) A variant to the dynamic adaptation of parameters in galactic swarm optimization using a fuzzy logic augmentation. FUZZ-IEEE, pp 1–7

  • Bernal E et al (2017) Imperialist competitive algorithm with dynamic parameter adaptation using fuzzy logic applied to the optimization of mathematical functions. Algorithms 10(1):18

    MathSciNet  Article  Google Scholar 

  • Bernal E, Castillo O, Soria J, Valdez F (2018a) Fuzzy galactic swarm optimization with dynamic adjustment of parameters based on fuzzy logic. Metaheuristics 1(1):1–19

    Google Scholar 

  • Bernal E, Castillo O, Soria J, Valdez F (2018c) Galactic swarm optimization with adaptation of parameters using fuzzy logic for the optimization of mathematical functions. In: Fuzzy logic augmentation of neural and optimization algorithms: theoretical aspects and real applications, Springer, Cham, pp 131–140

  • Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using Color Channel selection. Multimedia Tools Appl 77(7):8551–8578

    Article  Google Scholar 

  • Castillo O et al (2015) A new approach for dynamic fuzzy logic parameter tuning in ant colony optimization and its application in fuzzy control of a mobile robot. Appl Soft Comput 28:150–159

    Article  Google Scholar 

  • Chandio AA, Pickering M (2019) Convolutional feature fusion for multi-language text detection in natural scene images. In: 2019 2nd international conference on computing, mathematics and engineering technologies (iCoMET), IEEE, pp 1–6

  • Chavre P, Ghotkar A (2016) Scene text extraction using stroke width transform for tourist translator on android platform. In: 2016 international conference on automatic control and dynamic optimization techniques (ICACDOT), IEEE, pp 301–306

  • Cheng J, Rajapakse JC (2008) Segmentation of clustered nuclei with shape markers and marking function. IEEE Trans Biomed Eng 56(3):741–748

    Article  Google Scholar 

  • Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  • Dhiman G, Kumar V (2018) Emperor penguin optimizer: a bio-inspired algorithm for engineering problems. Knowl-Based Syst 159:20–50

    Article  Google Scholar 

  • Dutta IN, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-lingual text localization from camera captured images based on foreground homogenity analysis. In: Recent developments in machine learning and data analytics, Springer, Singapore, pp 149–158

  • Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970

  • Francis LM, Sreenath N (2019) Robust scene text recognition: using manifold regularized Twin-Support Vector Machine. J King Saud Univ-Comput Inform Sci 10(1):19–36

    Google Scholar 

  • Gao X, Qian Y, Hui R, Loomes M, Comley R, Barn B, Chapman A, Rix J (2010) Texture-based 3D image retrieval for medical applications. In: IADIS international conference e-health, pp 101–108

  • Gaxiola F et al (2016) Optimization of type-2 fuzzy weights in backpropagation learning for neural networks using GAs and PSO. Appl Soft Comput 38:860–871

    Article  Google Scholar 

  • Ghai D, Jain N (2019) Comparative analysis of multi-scale wavelet decomposition and k-means clustering based text extraction. Wireless Personal Commun 109(1):1–36

    Article  Google Scholar 

  • He P, Huang W, Qiao Y, Loy CC, Tang X (2016) Reading scene text in deep convolutional sequences. In: Thirtieth AAAI conference on artificial intelligence

  • Huang Z, Jiang S, Yang Z, Ding Y, Wang W, Yu Y (2016) Automatic multi-organ segmentation of prostate magnetic resonance images using watershed and nonsubsampled contourlet transform. Biomed Signal Process Control 25:53–61

    Article  Google Scholar 

  • Huang Z, Zhong Z, Sun L, Huo Q (2019) Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 764–772

  • Joan SF, Valli S (2019) A survey on text information extraction from born-digital and scene text images. Proc Natl Acad Sci, India, Sect A 89(1):77–101

    Article  Google Scholar 

  • Keserwani P, Chandrasekhar Pammi VS, Prakash O, Khare A, Jeon M (2016) Classification of alzheimer disease using gabor texture feature of hippocampus region. Int J Image Graph Signal Process 8(6):13

    Article  Google Scholar 

  • Kharya S, Soni S (2016) Weighted naive bayes classifier: a predictive model for breast cancer detection. Int J Comput Appl 133(9):32–37

    Google Scholar 

  • Khlif W, Nayef N, Burie JC, Ogier JM, Alimi A (2018) Learning text component features via convolutional neural networks for scene text detection. In: 2018 13th IAPR international workshop on document analysis systems (DAS), IEEE, pp 79–84

  • Kumuda T, Basavaraj L (2017) Edge based segmentation approach to extract text from scene images. In: 2017 IEEE 7th international advance computing conference (IACC), IEEE, pp 706–710

  • Lin J, Yu J (2011) Weighted Naive Bayes classification algorithm based on particle swarm optimization. In: 2011 IEEE 3rd international conference on communication software and networks, IEEE, pp 444–447

  • Lin H, Yang P, Zhang F (2019) Review of scene text detection and recognition. Archiv Comput Methods Eng 27(2):433–454

    Article  Google Scholar 

  • Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Article  Google Scholar 

  • Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Document Anal Recognit (IJDAR) 22(2):143–162

    Article  Google Scholar 

  • Lu Z, Long B, Li K, Lu F (2018) Effective guided image filtering for contrast enhancement. IEEE Signal Process Lett 25(10):1585–1589

    Article  Google Scholar 

  • Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122

    Article  Google Scholar 

  • Manjula C, Florence L (2019) Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput 22(4):9847–9863

    Article  Google Scholar 

  • Muthiah-Nakarajan V, Noel MM (2016) Galactic swarm optimization: a new global optimization metaheuristic inspired by galactic motion. Appl Soft Comput 38:771–787

    Article  Google Scholar 

  • Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimedia Tools Appl 78(13):18017–18036

    Article  Google Scholar 

  • Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  • Sain A, Bhunia AK, Roy PP, Pal U (2018) Multi-oriented text detection and verification in video frames and scene images. Neurocomputing 275:1531–1549

    Article  Google Scholar 

  • Tang Y, Wu X (2018) Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans Multimedia 20(9):2276–2288

    Article  Google Scholar 

  • Tian C, Xia Y, Zhang X, Gao X (2017) Natural scene text detection with MC–MR candidate extraction and coarse-to-fine filtering. Neurocomputing 260:112–122

    Article  Google Scholar 

  • Trémeau A, Fernando B, Karaoglu S, Muselet D (2011a) April) Detecting text in natural scenes based on a reduction of photometric effects: problem of text detection. International workshop on computational color imaging. Springer, Berlin, pp 230–244

    Chapter  Google Scholar 

  • Trémeau A, Godau C, Karaoglu S, Muselet D (2011b) April) Detecting text in natural scenes based on a reduction of photometric effects: problem of color invariance. International workshop on computational color imaging. Springer, Berlin, pp 214–229

    Chapter  Google Scholar 

  • Wang L, Uchida S, Zhu A, Sun J (2018a) Human reading knowledge inspired text line extraction. Cognit Comput 10(1):84–93

    Article  Google Scholar 

  • Wang Y, Shi C, Xiao B, Wang C, Qi C (2018b) CRF based text detection for natural scene images using convolutional neural network and context information. Neurocomputing 295:46–58

    Article  Google Scholar 

  • Wang Y, Wang L, Su F (2018) A robust approach for scene text detection and tracking in video. In: Pacific rim conference on multimedia, Springer, Cham, pp 303–314

  • Wu H, Zou B, Zhao YQ, Guo J (2017) Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. Vis Comput 33(1):113–126

    Article  Google Scholar 

  • Xie X, Li Y, Zhang M, Shen L (2018) Robust segmentation of nucleus in histopathology images via mask R-CNN. In: International MICCAI Brainlesion workshop, Springer, Cham, pp 428–436

  • Xue M, Shivakumara P, Zhang C, Lu T, Pal U (2019) Curved text detection in blurred/non-blurred video/scene images. Multimedia Tools Appl 78(18):25629–25653

    Article  Google Scholar 

  • Zeng F, Liu L (2013) Contrast enhancement of mammographic images using guided image filtering. In: Chinese conference on image and graphics technologies, pp 300–306

  • Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4159-4167

  • Zhu A, Uchida S (2017) Scene text relocation with guidance. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 1289–1294

Download references


No funding is provided for the preparation of manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Digvijay Pandey.

Ethics declarations

Conflict of interest

Authors Digvijay Pandey, Binay Kumar Pandey, Subodh Wairya declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pandey, D., Pandey, B.K. & Wairya, S. Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images. Soft Comput 25, 1563–1580 (2021).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Text extraction
  • Guided image filter
  • Watershed segmentation
  • Naïve Bayes
  • Deep neural network (DNN)
  • Emperor penguin optimization (EPO)