Position attention optimized deep semantic segmentation

Zhao, Rui; Yu, Xiaoyan; Rong, Xianwei

doi:10.1007/s11042-023-16022-4

Position attention optimized deep semantic segmentation

Published: 13 September 2023

Volume 83, pages 29531–29545, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

123 Accesses
Explore all metrics

Abstract

Semantic segmentation can be applied in various fields of computer vision such as scene understanding. In order to assist intelligent machines to detect and recognize the objects, an optimal model to realize accurate semantic segmentation is urgently needed. However, labelling semantic objects in a complex scene is a challenging task due to the lack of more discriminative features. In this work, we solve the semantic segmentation task by proposing a Position Attention Optimized Deep Semantic Segmentation (PADSS) framework in the popular Discriminative Feature Network (DFN), which comprises the Smooth Network (SN) and Border Network (BN). Considering that different levels of feature possess distinct discriminative power, our PADSS framework embeds two types of attention blocks in the smooth network to select more discriminative features. Specifically, in the low stage of the network, we introduce the Position Attention Block (PAB) to obtain more effective features and strengthen the dependencies among similar features. Moreover, in the high stage of the network, the Channel Attention Block (CAB) is leveraged to gain richer context information. We incorporate PAB with CAB to make the prediction of semantic labels more accurate. The proposed model is validated on the two public benchmark datasets and the experimental results show that our PADSS outperforms its competitors and achieves the state-of-the-art (SOTA) performance 87.8% Mean IOU on PASCAL VOC 2012 and 80.8% Mean IOU on Cityscapes dataset. Results suggest that the proposed PADSS is robust enough to learn a variety of discriminative features over various semantic segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PPNet : pooling position attention network for semantic segmentation

Article 02 September 2023

Semantic segmentation based on double pyramid network with improved global attention mechanism

Article 14 February 2023

AS-TransUnet: Combining ASPP and Transformer for Semantic Segmentation

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Ali H, Shujjahuddin A, Rada L (2020) A New Active Contours Image Segmentation Model Driven by Generalized Mean with Outlier Restoration Achievements. Int J Pattern Recogn 34(11):1–24. https://doi.org/10.1142/S0218001420540269
Article Google Scholar
Chen L-C et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv. https://doi.org/10.48550/arXiv.1706.05587
Chen L-C et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article PubMed Google Scholar
Cordts M et al (2016) The cityscapes dataset for semantic urban scene understanding, in IEEE Int Conf Comput Vis Pattern Recognit., 3213–322
Ding H et al (2018) Context contrasted feature and gated multiscale aggregation for scene segmentation, in IEEE Int Conf Comput Vis Pattern Recognit. 2393–2402. https://doi.org/10.1109/CVPR.2018.00254
Everingham M et al (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results., http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html
Fu J et al (2019) Stacked deconvolutional network for semantic segmentation. IEEE Trans. Image Process., 3085–3089 https://doi.org/10.1109/TIP.2019.2895460
Fu J et al (2020) Dual Attention Network for Scene Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 3141–3149 https://doi.org/10.1109/CVPR.2019.00326
Garcia-Garcia A et al (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation, in International Conference on Computational Linguistics, COLING, 2132–2144. https://doi.org/10.48550/arXiv.1704.06857
Ghiasi G, Fowlkes C (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation, In European Conf Comput Vis., 519–534. https://doi-org.ezproxy.is.ed.ac.uk/https://doi.org/10.1007/978-3-319-46487-9_32
Hariharan B et al (2011) Semantic contours from inverse detectors, in IEEE Int Conf Comput Vis. 991–998
Huang Z et al (2019) CCNet: Criss-Cross Attention for Semantic Segmentation, in IEEE Int Conf Comput Vis., 603–612 10.1109 /ICCV.2019.00069
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Li C et al (2020) ANU-Net: Attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput Graph 90:11–20. https://doi.org/10.1016/j.cag.2020.05.003
Article Google Scholar
Li X et al (2019) Expectation-Maximization Attention Networks for Semantic Segmentation, in IEEE Int. Conf. Comput. Vis., 9166–9175 10.1109. /ICCV.2019.00926
Lin G et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1925– 1934. https://doi.org/10.1109/CVPR.2017.549
Lin T-Y et al (2017) Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2): 318-327. 10. 1109/TPAMI.2018.2858826
Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking Wider to See Better, Computer Research Repository. https://doi.org/10.48550/arXiv.1506.04579
Liu Y et al (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10.1109/CVPR.2017.622
Article PubMed Google Scholar
Liu Z et al (2015) Semantic image segmentation via deep parsing network, in IEEE Int Conf Comput Vis., 1377–1385
Long J, Shelhamer E, Darrell T (2014) Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/CVPR.2015.7298965
Article Google Scholar
Luo P et al (2017) Deep dual learning for semantic image segmentation, in IEEE Int. Conf. Comput., Vis. 2737–2745 https://doi.org/10.1109/ICCV.2017.296
Peng C et al (2017) Large kernel matters-improve semantic segmentation by global convolutional network, in IEEE Int Conf Comput Vis Pattern Recognit., 1743– 1751. https://doi.org/10.48550/arXiv.1703.02719
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect Notes Comput Sci 9351:234–241
Article Google Scholar
Russakovsky O, Deng J et al (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vision 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Tang C, Liu X et al (2019) DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Discriminative Multi-Scale Deep Features. IEEE Trans Pattern Anal Mach Intell 44(2):955–968. https://doi.org/10.1109/TPAMI.2020.3014629
Article Google Scholar
Tang C, Liu X, An S et al (2020) Br2net: defocus blur detection via a bidirectional channel attention residual refining network. IEEE Trans Multimedia 23:624–635. https://doi.org/10.1109/TMM.2020.2985541
Article Google Scholar
Vaswan A et al (2017) Attention is all you need, in NeurIPS, pp 5998–6008. https://doi.org/10.48550/arXiv.1706.03762
Wang P et al (2018) Understanding Convolution for Semantic Segmentation, in IEEE Workshop on Applications of Computer Vision, 1451–1460 https://doi-org.ezproxy.is.ed.ac.uk/10.1109/ WACV.2018.00163
Wang X et al (2018) Non-local Neural Networks, in IEEE Int Conf Comput Vis Pattern Recognit., 7794–7803.https://doi.org/10.48550/arXiv.1711.07971
Yu C, Wang J et al (2018) BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. In European Conf Comput Vis., 33–40
Yu C, Wang J et al (2018) Learning a Discriminative Feature Network for Semantic Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1857–1866
Yu Z, Feng C, Liu M-Y, Ramalingam S (2017) Casenet: Deep category-aware semantic edge detection, in IEEE Int Conf Comput Vis Pattern Recognit., 1761–1770
Zhao H et al (2017) Pyramid scene parsing network, in IEEE Int Conf Comput Vis Pattern Recognit., 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhao T, Wu X (2019) Pyramid Feature Attention Network for Saliency Detection, in IEEE Int Conf Comput Vis Pattern Recognit., 3080–3089. https://doi.org/10.48550/arXiv.1903.00179
Zhang H et al (2018) Context encoding for semantic segmentation, in IEEE Int Conf Comp Vis Pattern Recognit. 7151–7160
Zhang Z et al (2018) Exfuse: Enhancing feature fusion for semantic segmentation, In European Conf Comp Vis., 269–284

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61401127, Natural Science Foundation of Heilongjiang Province under Grant LH2022F038, Cultivation Project of National Natural Science Foundation under Grant XPPY202208 and Graduate Innovation Fund of Harbin Normal University under Grant HSDSSCX2019-29.

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Harbin Normal University, Harbin, 150025, China
Rui Zhao, Xiaoyan Yu & Xianwei Rong

Authors

Rui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xianwei Rong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Rui Zhao: Methodology, software programming, Writing—original draft. Xiaoyan Yu: Supervision, Writing—review & editing. Xianwei Rong: Resources, Data curation.

Corresponding author

Correspondence to Xianwei Rong.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, R., Yu, X. & Rong, X. Position attention optimized deep semantic segmentation. Multimed Tools Appl 83, 29531–29545 (2024). https://doi.org/10.1007/s11042-023-16022-4

Download citation

Received: 05 February 2022
Revised: 15 April 2023
Accepted: 11 June 2023
Published: 13 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16022-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Position attention optimized deep semantic segmentation

Abstract

Access this article

Similar content being viewed by others

PPNet : pooling position attention network for semantic segmentation

Semantic segmentation based on double pyramid network with improved global attention mechanism

AS-TransUnet: Combining ASPP and Transformer for Semantic Segmentation

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Position attention optimized deep semantic segmentation

Abstract

Access this article

Similar content being viewed by others

PPNet : pooling position attention network for semantic segmentation

Semantic segmentation based on double pyramid network with improved global attention mechanism

AS-TransUnet: Combining ASPP and Transformer for Semantic Segmentation

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation