Skip to main content

Semantic Segmentation Architecture for Text Detection with an Attention Module

  • Conference paper
  • First Online:
International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 712))

  • 161 Accesses

Abstract

Scene text detection is a very challenging problem due to its variability in size, font, color and orientation. The trend of research is to create models which detect bounding box coordinates around text areas inside an image. However, in this work we try to address this problem in a semantic segmentation way by creating masks to highlight these zones. Our model is a fully convolutional network, inspired by the UNet architecture. It is a pixel-wise classification framework which outputs a prediction map having the same width and height of the input image. Pixels inside this map are labeled either belonging or not to a text area. To increase our model performance, we make use of transposed convolution to upsample the feature maps in the decoder part. We add an attention module to our architecture in order to recalibrate channels-wise features which help us to increase the overall performance by 1%. We also propose to use the intersection over union loss to solve the imbalance problem since text generally occupies a very small amount of pixels inside images. We conduct the train and test experiments on the ICDAR 2015 dataset. Our model reaches a MeanIoU score of 44.20%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bagri, N., Johari, P.K.: A comparative study on feature extraction using texture and shape for content based image retrieval. Int. J. Adv. Sci. Technol. 80, 41–52 (2015). https://doi.org/10.14257/ijast.2015.80.04

    Article  Google Scholar 

  2. Greenhalgh, J., Mirmehdi, M.: Real-time detection and recognition of road traffic signs. IEEE Trans. Intell. Transp. Syst. 13(4), 1498–1506 (2012)

    Article  Google Scholar 

  3. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn. Lett. 34(2), 107–116 (2013)

    Article  Google Scholar 

  4. Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2010)

    MathSciNet  MATH  Google Scholar 

  5. Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A., Koch, C.: Adaboost for text detection in natural scene. In: 2011 International Conference on Document Analysis and Recognition, pp 429–434 (2011a)

    Google Scholar 

  6. Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: IEEE International Conference on Document Analysis and Recognition, pp 440–445 (2011)

    Google Scholar 

  7. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic,multi-channel prediction (2016). arXiv preprint arXiv:1606.09002

  8. Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016)

    Article  Google Scholar 

  9. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3538–3545). IEEE (2012)

    Google Scholar 

  10. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  11. Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.-W., Heng, P.-A.: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (Dec.2018). https://doi.org/10.1109/TMI.2018.2845918

    Article  Google Scholar 

  12. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv:1608.06993 (2016)

  13. Hu, J., Shen, L., Sun, G.: Squeeze-and-Excitation Networks. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 2018, 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

    Article  Google Scholar 

  14. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference Document Analysis Recognition (ICDAR), pp. 1156–1160 Aug (2015)

    Google Scholar 

  15. Kawa, S., Kawano, M.: An overview. In: Umehara, H., Okazaki, K., Stone, J.H., Kawa, S., Kawano, M. (eds.) IgG4-Related Disease, pp. 3–7. Springer, Tokyo (2014). https://doi.org/10.1007/978-4-431-54228-5_1

    Chapter  Google Scholar 

  16. Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016). https://doi.org/10.1007/s11704-015-4488-0

    Article  Google Scholar 

  17. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016)

  18. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2963–2970. IEEE (2010)

    Google Scholar 

  19. Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16631-5_7

    Chapter  Google Scholar 

  20. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images, pp. 2315–2324 (2016)

    Google Scholar 

  21. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multioriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  22. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016). arXiv preprint arXiv:1603.07285

  23. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  24. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soufiane Naim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Naim, S., Moumkine, N. (2023). Semantic Segmentation Architecture for Text Detection with an Attention Module. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development. AI2SD 2022. Lecture Notes in Networks and Systems, vol 712. Springer, Cham. https://doi.org/10.1007/978-3-031-35251-5_35

Download citation

Publish with us

Policies and ethics