Skip to main content
Log in

Semantic segmentation based on double pyramid network with improved global attention mechanism

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Scene semantic segmentation is an important and challenging task, which requires labeling the category of each pixel in the image accurately. The encoder-decoder framework represented by fully convolutional network(FCN) has unique advantages in semantic segmentation. However, it is still hard to segment the small target and object boundary in the FCN framework. So, this paper proposes a global attention double pyramid network(GADPNet) based on an improved global attention mechanism to improve the performance of semantic segmentation. It is composed of deep convolutional neural networks Resnet-101, atrous spatial pyramid pooling(ASPP) module, proposed pyramid decoder structure and improved global attention module. Resnet-101 is the backbone which is used to extract different stages’ features. ASPP module is used to capture multi-scale features from a high-level feature branch. Pyramid decoder structure can take advantage of multi-scale features from ASPP module and different stages’ low-level multi-scale feature maps guided by improved global attention module. The proposed decoder enhances the ability to capture multi-scale features. GADPNet is an end-to-end network. The experimental results of the value of mIoU on Pascal VOC 2012 test dataset and cityscapes val dataset are 80.5% and 72.9%, which indicate that the proposed GADPNet obtains higher semantic segmentation accuracy compared with the current methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Applic 29(5):1257–1265

    Article  Google Scholar 

  2. Wang Y, Chen Q, Chen S, Wu J (2020) Multi-scale convolutional features network for semantic segmentation in indoor scenes. IEEE Access 8:89575–89583

    Article  Google Scholar 

  3. Wang M, Li H, Tao D, Wu X (2012) Multimodal graph-based reranking for web image search. IEEE Trans Image Process 21(11):4649–4661

    Article  MathSciNet  MATH  Google Scholar 

  4. Bhargavi K, Jyothi S (2014) A survey on threshold based segmentation technique in image processing. Int J Innov Res Develop 3(12):234–239

    Google Scholar 

  5. Zhang Y, Li X, Gao X, Zhang C (2016) A simple algorithm of superpixel segmentation with boundary constraint. IEEE Trans Circuits Syst Video Technol 27(7):1502–1514

    Article  Google Scholar 

  6. Bhargavi K, Jyothi S (2013) A survey of graph theoretical approaches to image segmentation. Pattern Recognit 46(3):1020–1038

    Article  Google Scholar 

  7. Kang W, Yang QQ, Liang RP (2009) The comparative research on image segmentation algorithms. In: 2009 first international workshop on education technology and computer science, vol 2, pp 703–707

  8. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556

  9. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  10. Matthew D, Fergus R (2014) Visualizing and understanding convolutional networks

  11. Yosinski J, Clune J, Nguyen A, Fuchs T, Lipson H (2019) Pyramid context contrast for semantic segmentation. IEEE Access 7:173679–173693

    Article  Google Scholar 

  12. Zhang N, Li J, Li Y, Du Y (2019) Global attention pyramid network for semantic segmentation. In: 2019 chinese control conference (CCC), pp 8728–8732

  13. Sang H, Zhou Q, Zhao Y (2020) Pcanet: pyramid convolutional attention network for semantic segmentation. Image Vis Comput 103:103997

    Article  Google Scholar 

  14. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Scott D, Dragomir E, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vision Pattern Recognit:1–9

  15. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. Proc IEEE Conf Comput Vision Pattern Recognit:1251–1258

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vision Pattern Recognit:770–778

  17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. Proc IEEE Conf Comput Vision Pattern Recognit:4700–4708

  18. Shang Y, Zhong S, Gong S, Zhou L, Ying W (2019) DXNEt: an encoder-decoder architecture with XSPP for semantic image segmentation in street scenes. Int Conf Neural Inf Process:550–557

  19. Dong G, Yan Y, Shen C, Wang H (2021) Real-time High-performance semantic image segmentation of urban street scenes. Trans Intell Trans Syst 22(6):3258–3274

    Article  Google Scholar 

  20. Chen LC, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV), pp 801–818

  21. Peng C, Ma J (2020) Semantic segmentation using stride spatial pyramid pooling and dual attention decoder. Pattern Recognit 107:107498

    Article  Google Scholar 

  22. He J, Zhang Z, Qiao Y (2019) Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3562– 3572

  23. Wu H, Zhang J, Huang K, Liang K, Yu Y (2019) Rethinking dilated convolution in the backbone for semantic segmentation. arXiv:1903.11816

  24. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A (2017) I, Polosukhin, attention is all you need. Adv Neural Inf Process Syst:5998–6008

  25. Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 593–602

  26. Fu J, Liu J, Tian H, Li Y, Fang YBZ, Lu H (2019) Dual attention network for scene segmentation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3141–3149

  27. Shen D, Ji Y, Li P, Wang Y, Lin D (2020) Ranet: region attention network for semantic segmentation. Adv Neural Inf Process Syst 33:13927–13938

    Google Scholar 

  28. Lu X, Wang W, Shen J, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3623–3632

  29. Lu X, Wang W, Shen J, Ma C, Shen J, Shao L, Porikli F (2020) Zero-shot video object segmentation with co-attention siamese networks. IEEE Trans Pattern Anal Mach Intell

  30. Wang W, Lu X, Shen J, Crandall D, Shao L (2019) Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9236–9245

  31. Wang W, Lu X, Shen J, Crandall D, Shao L (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell

  32. Everingham M, Van Gool L, Williams CK, Winn J (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  33. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  34. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: a light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9190–9200

  35. Zhu H, Zhang M, Zhang X, Zhang L (2021) Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput Applic 33(10):5151–5166

    Article  Google Scholar 

  36. Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 593–602

  37. Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. European Conf Comput Vision:173–190

  38. Lin G, Shen C, van den Hengel A, Reid I (2018) Exploring context with deep structured models for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 40(6):1352–1366

    Article  Google Scholar 

  39. Zhou Z, Zhou Y, Wang D, Mu J, Zhou H (2021) Self-attention feature fusion network for semantic segmentation. Neurocomputing 453:50–59

    Article  Google Scholar 

  40. Chen LC, Papandreou G, Kokkinos I (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  41. Shaw A, Hunter D, Landolar F, Sidhu S (2019) Squeezenas: fast neural architecture search for faster semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1–11

  42. Kim M, Park B, Chi S (2020) Accelerator-aware fast spatial feature network for real-time semantic segmentation. IEEE Access 8:226524–226537

    Article  Google Scholar 

  43. Han HY, Chen YC, Hsiao PY, Fu LC (2020) Using channel-wise attention for deep CNN based real-time semantic segmentation with class-aware edge information. IEEE Trans Intell Transp Syst 22 (2):1041–1051

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by Hunan Provincial Natural Science Foundation (2020JJ5218), the Scientific Research Fund of Education Department of Hunan Province (22A0417), and the Hunan Provincial Innovation Foundation for Postgraduate (CX20201114), General project of Hunan Water Resources Department(XSKJ2021000-13), the Open Fund of Education Department of Hunan Province (20K062), Hunan University Students Innovation and Entrepreneurship Training Project (2021-20-3151).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wujing Li or Shuixiang Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ou, X., Wang, H., Zhang, G. et al. Semantic segmentation based on double pyramid network with improved global attention mechanism. Appl Intell 53, 18898–18909 (2023). https://doi.org/10.1007/s10489-023-04463-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04463-1

Keywords

Navigation