Semantic Segmentation Architecture for Text Detection with an Attention Module

Naim, Soufiane; Moumkine, Noureddine

doi:10.1007/978-3-031-35251-5_35

Soufiane Naim¹² &
Noureddine Moumkine¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 712))

Included in the following conference series:

International Conference on Advanced Intelligent Systems for Sustainable Development

161 Accesses

Abstract

Scene text detection is a very challenging problem due to its variability in size, font, color and orientation. The trend of research is to create models which detect bounding box coordinates around text areas inside an image. However, in this work we try to address this problem in a semantic segmentation way by creating masks to highlight these zones. Our model is a fully convolutional network, inspired by the UNet architecture. It is a pixel-wise classification framework which outputs a prediction map having the same width and height of the input image. Pixels inside this map are labeled either belonging or not to a text area. To increase our model performance, we make use of transposed convolution to upsample the feature maps in the decoder part. We add an attention module to our architecture in order to recalibrate channels-wise features which help us to increase the overall performance by 1%. We also propose to use the intersection over union loss to solve the imbalance problem since text generally occupies a very small amount of pixels inside images. We conduct the train and test experiments on the ICDAR 2015 dataset. Our model reaches a MeanIoU score of 44.20%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bagri, N., Johari, P.K.: A comparative study on feature extraction using texture and shape for content based image retrieval. Int. J. Adv. Sci. Technol. 80, 41–52 (2015). https://doi.org/10.14257/ijast.2015.80.04
Article Google Scholar
Greenhalgh, J., Mirmehdi, M.: Real-time detection and recognition of road traffic signs. IEEE Trans. Intell. Transp. Syst. 13(4), 1498–1506 (2012)
Article Google Scholar
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn. Lett. 34(2), 107–116 (2013)
Article Google Scholar
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2010)
MathSciNet MATH Google Scholar
Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A., Koch, C.: Adaboost for text detection in natural scene. In: 2011 International Conference on Document Analysis and Recognition, pp 429–434 (2011a)
Google Scholar
Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: IEEE International Conference on Document Analysis and Recognition, pp 440–445 (2011)
Google Scholar
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic,multi-channel prediction (2016). arXiv preprint arXiv:1606.09002
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016)
Article Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3538–3545). IEEE (2012)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.-W., Heng, P.-A.: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (Dec.2018). https://doi.org/10.1109/TMI.2018.2845918
Article Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv:1608.06993 (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-Excitation Networks. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 2018, 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Article Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference Document Analysis Recognition (ICDAR), pp. 1156–1160 Aug (2015)
Google Scholar
Kawa, S., Kawano, M.: An overview. In: Umehara, H., Okazaki, K., Stone, J.H., Kawa, S., Kawano, M. (eds.) IgG4-Related Disease, pp. 3–7. Springer, Tokyo (2014). https://doi.org/10.1007/978-4-431-54228-5_1
Chapter Google Scholar
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016). https://doi.org/10.1007/s11704-015-4488-0
Article Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2963–2970. IEEE (2010)
Google Scholar
Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16631-5_7
Chapter Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images, pp. 2315–2324 (2016)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multioriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition (2016)
Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016). arXiv preprint arXiv:1603.07285
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Computer Science and Applications Laboratory, Mohammedia, Morocco
Soufiane Naim & Noureddine Moumkine

Authors

Soufiane Naim
View author publications
You can also search for this author in PubMed Google Scholar
Noureddine Moumkine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soufiane Naim .

Editor information

Editors and Affiliations

Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Janusz Kacprzyk
Abdelmalek Essaâdi University, Tangier, Morocco
Mostafa Ezziyyani
Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naim, S., Moumkine, N. (2023). Semantic Segmentation Architecture for Text Detection with an Attention Module. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development. AI2SD 2022. Lecture Notes in Networks and Systems, vol 712. Springer, Cham. https://doi.org/10.1007/978-3-031-35251-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-35251-5_35
Published: 09 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35250-8
Online ISBN: 978-3-031-35251-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics