Abstract
Automatic text localization in natural environments is the main element of many applications including self-driving cars, identifying vehicles, and providing scene information to visually impaired people. However, text in the natural and irregular scene has different degrees in orientations, shapes, and colors that make it difficult to detect. In this paper, an accurate multi-oriented scene text localization (MOSTL) is presented to obtain high efficiency of detecting text-based on convolutional neural networks. In the proposed method, an improved ReLU layer (i.ReLU) and an improved inception layer (i.inception) were introduced. Firstly, the proposed structure is used to extract low-level visual features. Then, an extra layer has been used to improve the feature extraction. The i.ReLU and i.inception layers have improved valuable information in text detection. The i.ReLU layers cause to extract some low-level features appropriately. The i.inception layers (specially 3 × 3 convolutions) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer (without inception layers). The output of i.ReLU layers and i.inception layers was fed to an extra layer, which enables MOSTL to detect multi-oriented even curved and vertical texts. We conducted text detection experiments on well-known databases including ICDAR 2019, ICDAR 2017, ICDAR 2015, ICDAR 2003, and MSRA-TD500. MOSTL results yielded performance improvement remarkably.
Similar content being viewed by others
Data Availability
All databases (used in this study) are available on their websites freely.
References
A. Aggarwal, M. Kumar, T.K. Rawat, Design of two-dimensional FIR filters with quadrantally symmetric properties using the 2D L 1-method. IET Signal Proc. 13(3), 262–272 (2018)
A. Aggarwal, M. Kumar, T.K. Rawat, D.K. Upadhyay, Optimal design of 2D FIR filters with quadrantally symmetric properties using fractional derivative constraints. Circuits Syst. Signal Process. 35(6), 2213–2257 (2016)
A. Aggarwal, M. Kumar, T.K. Rawat, D.K. Upadhyay, Optimal design of 2-D FIR digital differentiator using L1-norm based cuckoo-search algorithm. Multidimens. Syst. Signal Process. 28(4), 1569–1587 (2017)
Y. Aramaki, Y. Matsui, T. Yamasaki, K. Aizawa, Text detection in manga by combining connected-component-based and region-based classifications, in IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA (2016)
S. Baabou, A.B. Fradj, M.A. Farah, A.G. Abubakr, F. Bremond, A. Kachouri, A comparative study and state-of-the-art evaluation for pedestrian detection, in 2019 19th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA) (IEEE, 2019), pp. 485–490
X. Bai, B. Shi, C. Zhang, X. Cai, L. Qi, Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn. 66, 437–446 (2017)
Y. Bengio, Practical recommendations for gradient-based training of deep architectures, in Neural Networks: Tricks of the Trade. ed. by G. Montavon, G.B. Orr, K.R. Müller (Springer, Berlin, 2012), pp. 437–478
A.F. Biten, R. Tito, A. Mafla, L. Gomez, M. Rusinol, M. Mathew, C.V. Jawahar, E. Valveny, D. Karatzas, Icdar 2019 competition on scene text visual question answering, in 2019 International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2019), pp. 1563–1570
G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54(12), 7405–7415 (2016)
H. Cho, M. Sung, B. Jun, Canny text detector: Fast and robust scene text localization algorithm, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3566–3573
B. Epshtein, E. Ofek, Y. Wexler, Detecting text in natural scenes with stroke width transform, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2010), pp. 2963–2970
J. Han, X. Yao, G. Cheng, X. Feng, D. Xu, P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2933510
W. He, X.-Y. Zhang, F. Yin, Z. Luo, J.-M. Ogier, C.-L. Liu, Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recogn. 98, 107026 (2020)
S. Hong, B. Roh, K.-H. Kim, Y. Cheon, M. Park, Pvanet: Lightweight deep neural networks for real-time object detection (2016). arXiv preprint https://arxiv.org/abs/1611.08588
L. Huang, Y. Yang, Y. Deng, Y. Yu, Densebox: unifying landmark localization with end to end object detection (2015). arXiv preprint https://arxiv.org/abs/1509.04874
W. Huang, Y. Qiao, X. Tang, Robust scene text detection with convolution neural network induced MSER trees, in European Conference on Computer Vision (Springer, 2014), pp. 497–511
M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
M. Jaderberg, A. Vedaldi, A. Zisserman, Deep features for text spotting, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 512–528
Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, Z. Luo, R2cnn: rotational region cnn for orientation robust scene text detection (2017). arXiv preprint https://arxiv.org/abs/1706.09579
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. iBigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. De Las Heras, ICDAR 2013 robust reading competition, in 2013 12th International Conference on Document Analysis and Recognition (IEEE, 2013), pp. 1484–1493
K.-H. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, Pvanet: deep but lightweight neural networks for real-time object detection (2016). arXiv preprint https://arxiv.org/abs/1608.08021
C.C. Lee, P.S. Chung, M.S. Hwang, A survey on attribute-based encryption schemes of access control in cloud environments. IJ Netw. Secur. 15(4), 231–240 (2013)
M. Liao, B.X. ShiBai, Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27, 3676–3690 (2018)
M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, Textboxes: a fast text detector with a single deep neural network, in 31st AAAI Conference on Artificial Intelligence (2017)
F. Liu, C. Chen, D. Gu, J. Zheng, FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access 7, 44219–44228 (2019)
S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, ICDAR 2003 robust reading competitions, in 7th International Conference on Document Analysis and Recognition, 2003. Proceedings. Citeseer (2003), pp. 682–687
P. Lyu, C. Yao, W. Wu, S. Yan, X. Bai, Multi-oriented scene text detection via corner localization and region segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 7553–7563
J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue, Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20, 3111–3122 (2018)
A. Mishra, K. Alahari, C. Jawahar, Scene text recognition using higher order language priors (2012)
F. Naiemi, V. Ghods, H. Khalesi, An efficient character recognition method using enhanced HOG for spam image detection. Soft. Comput. 23(22), 11759–11774 (2019)
F. Naiemi, V. Ghods, H. Khalesi, Scene text detection using enhanced Extremal region and convolutional neural network. Multimed. Tools Appl. 79(37), 27137–27159 (2020)
F. Naiemi, V. Ghods, H. Khalesi, A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst. Appl. 170, 114549 (2021)
N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U., Pal, C. Rigaud, J. Chazalon, W. Khlif, Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt, in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1 (IEEE, 2017), pp. 1454–1459
L. Neumann, J. Matas, Real-time scene text localization and recognition, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 3538–3545
L. Neumann, J. Matas, A method for text localization and recognition in real-world images, in Asian Conference on Computer Vision (Springer, 2010), pp. 770–783
T. Novikova, O. Barinova, P. Kohli, V. Lempitsky, Large-lexicon attribute-consistent text recognition in natural images, in European Conference on Computer Vision (Springer, 2012), pp. 752–765
X. Ren, Y. Zhou, Z. Huang, J. Sun, X. Yang, K. Chen, A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access 5, 3193–3204 (2017)
T. Saito, M. Rehmsmeier, The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)
B. Shi, X. Wang, P. Lyu, C. Yao, X. Bai, Robust scene text recognition with automatic rectification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4168–4176
C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, Z. Zhang, Scene text recognition using part-based tree-structured character detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 2961–2968
L.N. Smith, Cyclical learning rates for training neural networks, in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2017), pp. 464–472
L.N. Smith, A disciplined approach to neural network hyper-parameters: Part 1—learning rate, batch size, momentum, and weight decay (2018). arXiv preprint https://arxiv.org/abs/1803.09820
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 1–9
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2818–2826
S. Uchida, Y. Shigeyoshi, Y. Kunishige, F. Yaokai, A keypoint-based approach toward scenery character detection, in 2011 International Conference on Document Analysis and Recognition (IEEE, 2011), pp. 819–823
K. Wang, B. Babenko, S. Belongie,End-to-end scene text recognition, in 2011 International Conference on Computer Vision (IEEE, 2011), pp. 1457–1464
R. Wang, N. Sang, C. Gao, Text detection approach based on confidence map and context information. Neurocomputing 157, 153–165 (2015)
T. Wang, D.J. Wu, A. Coates, A.Y. Ng, End-to-end text recognition with convolutional neural networks, in Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (IEEE, 2012), pp. 3304–3308
D. Wei, Y.M. Li, Generalized sampling expansions with multiple sampling rates for lowpass and bandpass signals in the fractional Fourier transform domain. IEEE Trans. Signal Process. 64(18), 4861–4874 (2016)
D. Wei, Y.M. Li, Convolution and multichannel sampling for the offset linear canonical transform and their applications. IEEE Trans. Signal Process. 67(23), 6009–6024 (2019)
S. Yadav, R. Yadav, A. Kumar, M. Kumar, A novel approach to design optimal 2-D digital diferentiator using vortex search optimization algorithm. Multimed. Tools Appl. 80, 5901–5916 (2021)
S. Yadav, R. Yadav, A. Kumar, M. Kumar, Design of optimal two-dimensional FIR filters with Quadrantally symmetric properties using vortex search algorithm. J. Circuits Syst. Comput. 29(10), 2050155 (2020)
Q. Yang, M. Cheng, W. Zhou, Y. Chen, M. Qiu, W. Lin, W. Chu, Inceptext: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection (2018). arXiv preprint https://arxiv.org/abs/1805.01167
C. Yao, X. Bai, W. Liu, A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23, 4737–4749 (2014)
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, in 2012 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 1083–1090
C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, Z. Cao, Scene text detection via holistic, multi-channel prediction (2016). arXiv preprint https://arxiv.org/abs/1606.09002
C. Yao, X. Bai, B. Shi, W. Liu, Strokelets: a learned multi-scale representation for scene text recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 4042–4049
Q. Ye, D. Doermann, Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1480–1500 (2014)
F. Zhan, H. Zhu, S. Lu, Scene text synthesis for efficient and effective deep network training (2019). arXiv preprint https://arxiv.org/abs/1901.09193
D. Zhang, D. Meng, J. Han, Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 865–878 (2016)
J. Zhang, Q. Gao, H. Wang, Detecting anomalies from high-dimensional wireless network data streams: a case study. Soft Comput. 15(6), 1195–1215 (2011)
Z. Zhang, W. Shen, C. Yao, X. Bai, Symmetry-based text line detection in natural scenes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 2558–2567
Z. Zhong, L. Sun, Q. Huo, An anchor-free region proposal network for Faster R-CNN-based text detection approaches. Int. J. Doc. Anal. Recogn. 22, 315–327 (2019)
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, EAST: an efficient and accurate scene text detector, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2017), pp. 5551–5560
Z. Zhu, M. Liao, B. Shi, X. Bai, Feature fusion for scene text detection, in 2018 13th IAPR International Workshop on Document Analysis Systems (DAS) (IEEE, 2018), pp. 193–198
Funding
For this study, no funding was received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Naiemi, F., Ghods, V. & Khalesi, H. MOSTL: An Accurate Multi-Oriented Scene Text Localization. Circuits Syst Signal Process 40, 4452–4473 (2021). https://doi.org/10.1007/s00034-021-01674-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01674-0