Abstract
Scene text detection is a very challenging problem due to its variability in size, font, color and orientation. The trend of research is to create models which detect bounding box coordinates around text areas inside an image. However, in this work we try to address this problem in a semantic segmentation way by creating masks to highlight these zones. Our model is a fully convolutional network, inspired by the UNet architecture. It is a pixel-wise classification framework which outputs a prediction map having the same width and height of the input image. Pixels inside this map are labeled either belonging or not to a text area. To increase our model performance, we make use of transposed convolution to upsample the feature maps in the decoder part. We add an attention module to our architecture in order to recalibrate channels-wise features which help us to increase the overall performance by 1%. We also propose to use the intersection over union loss to solve the imbalance problem since text generally occupies a very small amount of pixels inside images. We conduct the train and test experiments on the ICDAR 2015 dataset. Our model reaches a MeanIoU score of 44.20%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bagri, N., Johari, P.K.: A comparative study on feature extraction using texture and shape for content based image retrieval. Int. J. Adv. Sci. Technol. 80, 41–52 (2015). https://doi.org/10.14257/ijast.2015.80.04
Greenhalgh, J., Mirmehdi, M.: Real-time detection and recognition of road traffic signs. IEEE Trans. Intell. Transp. Syst. 13(4), 1498–1506 (2012)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn. Lett. 34(2), 107–116 (2013)
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2010)
Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A., Koch, C.: Adaboost for text detection in natural scene. In: 2011 International Conference on Document Analysis and Recognition, pp 429–434 (2011a)
Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: IEEE International Conference on Document Analysis and Recognition, pp 440–445 (2011)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic,multi-channel prediction (2016). arXiv preprint arXiv:1606.09002
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3538–3545). IEEE (2012)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.-W., Heng, P.-A.: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (Dec.2018). https://doi.org/10.1109/TMI.2018.2845918
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv:1608.06993 (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-Excitation Networks. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 2018, 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference Document Analysis Recognition (ICDAR), pp. 1156–1160 Aug (2015)
Kawa, S., Kawano, M.: An overview. In: Umehara, H., Okazaki, K., Stone, J.H., Kawa, S., Kawano, M. (eds.) IgG4-Related Disease, pp. 3–7. Springer, Tokyo (2014). https://doi.org/10.1007/978-4-431-54228-5_1
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comp. Sci. 10(1), 19–36 (2016). https://doi.org/10.1007/s11704-015-4488-0
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2963–2970. IEEE (2010)
Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9009, pp. 91–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16631-5_7
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images, pp. 2315–2324 (2016)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multioriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition (2016)
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning (2016). arXiv preprint arXiv:1603.07285
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). https://doi.org/10.1109/TPAMI.2016.2572683
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Naim, S., Moumkine, N. (2023). Semantic Segmentation Architecture for Text Detection with an Attention Module. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds) International Conference on Advanced Intelligent Systems for Sustainable Development. AI2SD 2022. Lecture Notes in Networks and Systems, vol 712. Springer, Cham. https://doi.org/10.1007/978-3-031-35251-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-35251-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35250-8
Online ISBN: 978-3-031-35251-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)