Skip to main content
Log in

Position attention optimized deep semantic segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Semantic segmentation can be applied in various fields of computer vision such as scene understanding. In order to assist intelligent machines to detect and recognize the objects, an optimal model to realize accurate semantic segmentation is urgently needed. However, labelling semantic objects in a complex scene is a challenging task due to the lack of more discriminative features. In this work, we solve the semantic segmentation task by proposing a Position Attention Optimized Deep Semantic Segmentation (PADSS) framework in the popular Discriminative Feature Network (DFN), which comprises the Smooth Network (SN) and Border Network (BN). Considering that different levels of feature possess distinct discriminative power, our PADSS framework embeds two types of attention blocks in the smooth network to select more discriminative features. Specifically, in the low stage of the network, we introduce the Position Attention Block (PAB) to obtain more effective features and strengthen the dependencies among similar features. Moreover, in the high stage of the network, the Channel Attention Block (CAB) is leveraged to gain richer context information. We incorporate PAB with CAB to make the prediction of semantic labels more accurate. The proposed model is validated on the two public benchmark datasets and the experimental results show that our PADSS outperforms its competitors and achieves the state-of-the-art (SOTA) performance 87.8% Mean IOU on PASCAL VOC 2012 and 80.8% Mean IOU on Cityscapes dataset. Results suggest that the proposed PADSS is robust enough to learn a variety of discriminative features over various semantic segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Ali H, Shujjahuddin A, Rada L (2020) A New Active Contours Image Segmentation Model Driven by Generalized Mean with Outlier Restoration Achievements. Int J Pattern Recogn 34(11):1–24. https://doi.org/10.1142/S0218001420540269

    Article  Google Scholar 

  2. Chen L-C et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv. https://doi.org/10.48550/arXiv.1706.05587

  3. Chen L-C et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  PubMed  Google Scholar 

  4. Cordts M et al (2016) The cityscapes dataset for semantic urban scene understanding, in IEEE Int Conf Comput Vis Pattern Recognit., 3213–322

  5. Ding H et al (2018) Context contrasted feature and gated multiscale aggregation for scene segmentation, in IEEE Int Conf Comput Vis Pattern Recognit. 2393–2402. https://doi.org/10.1109/CVPR.2018.00254

  6. Everingham M et al (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results., http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html

  7. Fu J et al (2019) Stacked deconvolutional network for semantic segmentation. IEEE Trans. Image Process., 3085–3089 https://doi.org/10.1109/TIP.2019.2895460

  8. Fu J et al (2020) Dual Attention Network for Scene Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 3141–3149 https://doi.org/10.1109/CVPR.2019.00326

  9. Garcia-Garcia A et al (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation, in International Conference on Computational Linguistics, COLING, 2132–2144. https://doi.org/10.48550/arXiv.1704.06857

  10. Ghiasi G, Fowlkes C (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation, In European Conf Comput Vis., 519–534. https://doi-org.ezproxy.is.ed.ac.uk/https://doi.org/10.1007/978-3-319-46487-9_32

  11. Hariharan B et al (2011) Semantic contours from inverse detectors, in IEEE Int Conf Comput Vis. 991–998

  12. Huang Z et al (2019) CCNet: Criss-Cross Attention for Semantic Segmentation, in IEEE Int Conf Comput Vis., 603–612 10.1109 /ICCV.2019.00069

  13. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  14. Li C et al (2020) ANU-Net: Attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput Graph 90:11–20. https://doi.org/10.1016/j.cag.2020.05.003

    Article  Google Scholar 

  15. Li X et al (2019) Expectation-Maximization Attention Networks for Semantic Segmentation, in IEEE Int. Conf. Comput. Vis., 9166–9175 10.1109. /ICCV.2019.00926

  16. Lin G et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1925– 1934. https://doi.org/10.1109/CVPR.2017.549

  17. Lin T-Y et al (2017) Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2): 318-327. 10. 1109/TPAMI.2018.2858826

  18. Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking Wider to See Better, Computer Research Repository. https://doi.org/10.48550/arXiv.1506.04579

  19. Liu Y et al (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10.1109/CVPR.2017.622

    Article  PubMed  Google Scholar 

  20. Liu Z et al (2015) Semantic image segmentation via deep parsing network, in IEEE Int Conf Comput Vis., 1377–1385

  21. Long J, Shelhamer E, Darrell T (2014) Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/CVPR.2015.7298965

    Article  Google Scholar 

  22. Luo P et al (2017) Deep dual learning for semantic image segmentation, in IEEE Int. Conf. Comput., Vis. 2737–2745 https://doi.org/10.1109/ICCV.2017.296

  23. Peng C et al (2017) Large kernel matters-improve semantic segmentation by global convolutional network, in IEEE Int Conf Comput Vis Pattern Recognit., 1743– 1751. https://doi.org/10.48550/arXiv.1703.02719

  24. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect Notes Comput Sci 9351:234–241

    Article  Google Scholar 

  25. Russakovsky O, Deng J et al (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vision 115:211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  26. Tang C, Liu X et al (2019) DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Discriminative Multi-Scale Deep Features. IEEE Trans Pattern Anal Mach Intell 44(2):955–968. https://doi.org/10.1109/TPAMI.2020.3014629

    Article  Google Scholar 

  27. Tang C, Liu X, An S et al (2020) Br2net: defocus blur detection via a bidirectional channel attention residual refining network. IEEE Trans Multimedia 23:624–635. https://doi.org/10.1109/TMM.2020.2985541

    Article  Google Scholar 

  28. Vaswan A et al (2017) Attention is all you need, in NeurIPS, pp 5998–6008. https://doi.org/10.48550/arXiv.1706.03762

  29. Wang P et al (2018) Understanding Convolution for Semantic Segmentation, in IEEE Workshop on Applications of Computer Vision, 1451–1460 https://doi-org.ezproxy.is.ed.ac.uk/10.1109/ WACV.2018.00163

  30. Wang X et al (2018) Non-local Neural Networks, in IEEE Int Conf Comput Vis Pattern Recognit., 7794–7803.https://doi.org/10.48550/arXiv.1711.07971

  31. Yu C, Wang J et al (2018) BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. In European Conf Comput Vis., 33–40

  32. Yu C, Wang J et al (2018) Learning a Discriminative Feature Network for Semantic Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1857–1866

  33. Yu Z, Feng C, Liu M-Y, Ramalingam S (2017) Casenet: Deep category-aware semantic edge detection, in IEEE Int Conf Comput Vis Pattern Recognit., 1761–1770

  34. Zhao H et al (2017) Pyramid scene parsing network, in IEEE Int Conf Comput Vis Pattern Recognit., 6230–6239. https://doi.org/10.1109/CVPR.2017.660

  35. Zhao T, Wu X (2019) Pyramid Feature Attention Network for Saliency Detection, in IEEE Int Conf Comput Vis Pattern Recognit., 3080–3089. https://doi.org/10.48550/arXiv.1903.00179

  36. Zhang H et al (2018) Context encoding for semantic segmentation, in IEEE Int Conf Comp Vis Pattern Recognit. 7151–7160

  37. Zhang Z et al (2018) Exfuse: Enhancing feature fusion for semantic segmentation, In European Conf Comp Vis., 269–284

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61401127, Natural Science Foundation of Heilongjiang Province under Grant LH2022F038, Cultivation Project of National Natural Science Foundation under Grant XPPY202208 and Graduate Innovation Fund of Harbin Normal University under Grant HSDSSCX2019-29.

Author information

Authors and Affiliations

Authors

Contributions

Rui Zhao: Methodology, software programming, Writing—original draft. Xiaoyan Yu: Supervision, Writing—review & editing. Xianwei Rong: Resources, Data curation.

Corresponding author

Correspondence to Xianwei Rong.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, R., Yu, X. & Rong, X. Position attention optimized deep semantic segmentation. Multimed Tools Appl 83, 29531–29545 (2024). https://doi.org/10.1007/s11042-023-16022-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16022-4

Keywords

Navigation