Abstract
Semantic segmentation can be applied in various fields of computer vision such as scene understanding. In order to assist intelligent machines to detect and recognize the objects, an optimal model to realize accurate semantic segmentation is urgently needed. However, labelling semantic objects in a complex scene is a challenging task due to the lack of more discriminative features. In this work, we solve the semantic segmentation task by proposing a Position Attention Optimized Deep Semantic Segmentation (PADSS) framework in the popular Discriminative Feature Network (DFN), which comprises the Smooth Network (SN) and Border Network (BN). Considering that different levels of feature possess distinct discriminative power, our PADSS framework embeds two types of attention blocks in the smooth network to select more discriminative features. Specifically, in the low stage of the network, we introduce the Position Attention Block (PAB) to obtain more effective features and strengthen the dependencies among similar features. Moreover, in the high stage of the network, the Channel Attention Block (CAB) is leveraged to gain richer context information. We incorporate PAB with CAB to make the prediction of semantic labels more accurate. The proposed model is validated on the two public benchmark datasets and the experimental results show that our PADSS outperforms its competitors and achieves the state-of-the-art (SOTA) performance 87.8% Mean IOU on PASCAL VOC 2012 and 80.8% Mean IOU on Cityscapes dataset. Results suggest that the proposed PADSS is robust enough to learn a variety of discriminative features over various semantic segmentation tasks.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Ali H, Shujjahuddin A, Rada L (2020) A New Active Contours Image Segmentation Model Driven by Generalized Mean with Outlier Restoration Achievements. Int J Pattern Recogn 34(11):1–24. https://doi.org/10.1142/S0218001420540269
Chen L-C et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv. https://doi.org/10.48550/arXiv.1706.05587
Chen L-C et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Cordts M et al (2016) The cityscapes dataset for semantic urban scene understanding, in IEEE Int Conf Comput Vis Pattern Recognit., 3213–322
Ding H et al (2018) Context contrasted feature and gated multiscale aggregation for scene segmentation, in IEEE Int Conf Comput Vis Pattern Recognit. 2393–2402. https://doi.org/10.1109/CVPR.2018.00254
Everingham M et al (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results., http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html
Fu J et al (2019) Stacked deconvolutional network for semantic segmentation. IEEE Trans. Image Process., 3085–3089 https://doi.org/10.1109/TIP.2019.2895460
Fu J et al (2020) Dual Attention Network for Scene Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 3141–3149 https://doi.org/10.1109/CVPR.2019.00326
Garcia-Garcia A et al (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation, in International Conference on Computational Linguistics, COLING, 2132–2144. https://doi.org/10.48550/arXiv.1704.06857
Ghiasi G, Fowlkes C (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation, In European Conf Comput Vis., 519–534. https://doi-org.ezproxy.is.ed.ac.uk/https://doi.org/10.1007/978-3-319-46487-9_32
Hariharan B et al (2011) Semantic contours from inverse detectors, in IEEE Int Conf Comput Vis. 991–998
Huang Z et al (2019) CCNet: Criss-Cross Attention for Semantic Segmentation, in IEEE Int Conf Comput Vis., 603–612 10.1109 /ICCV.2019.00069
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Li C et al (2020) ANU-Net: Attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput Graph 90:11–20. https://doi.org/10.1016/j.cag.2020.05.003
Li X et al (2019) Expectation-Maximization Attention Networks for Semantic Segmentation, in IEEE Int. Conf. Comput. Vis., 9166–9175 10.1109. /ICCV.2019.00926
Lin G et al (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1925– 1934. https://doi.org/10.1109/CVPR.2017.549
Lin T-Y et al (2017) Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2): 318-327. 10. 1109/TPAMI.2018.2858826
Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking Wider to See Better, Computer Research Repository. https://doi.org/10.48550/arXiv.1506.04579
Liu Y et al (2019) Richer convolutional features for edge detection. IEEE Trans Pattern Anal Mach Intell 41(8):1939–1946. https://doi.org/10.1109/CVPR.2017.622
Liu Z et al (2015) Semantic image segmentation via deep parsing network, in IEEE Int Conf Comput Vis., 1377–1385
Long J, Shelhamer E, Darrell T (2014) Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/CVPR.2015.7298965
Luo P et al (2017) Deep dual learning for semantic image segmentation, in IEEE Int. Conf. Comput., Vis. 2737–2745 https://doi.org/10.1109/ICCV.2017.296
Peng C et al (2017) Large kernel matters-improve semantic segmentation by global convolutional network, in IEEE Int Conf Comput Vis Pattern Recognit., 1743– 1751. https://doi.org/10.48550/arXiv.1703.02719
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect Notes Comput Sci 9351:234–241
Russakovsky O, Deng J et al (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vision 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Tang C, Liu X et al (2019) DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Discriminative Multi-Scale Deep Features. IEEE Trans Pattern Anal Mach Intell 44(2):955–968. https://doi.org/10.1109/TPAMI.2020.3014629
Tang C, Liu X, An S et al (2020) Br2net: defocus blur detection via a bidirectional channel attention residual refining network. IEEE Trans Multimedia 23:624–635. https://doi.org/10.1109/TMM.2020.2985541
Vaswan A et al (2017) Attention is all you need, in NeurIPS, pp 5998–6008. https://doi.org/10.48550/arXiv.1706.03762
Wang P et al (2018) Understanding Convolution for Semantic Segmentation, in IEEE Workshop on Applications of Computer Vision, 1451–1460 https://doi-org.ezproxy.is.ed.ac.uk/10.1109/ WACV.2018.00163
Wang X et al (2018) Non-local Neural Networks, in IEEE Int Conf Comput Vis Pattern Recognit., 7794–7803.https://doi.org/10.48550/arXiv.1711.07971
Yu C, Wang J et al (2018) BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. In European Conf Comput Vis., 33–40
Yu C, Wang J et al (2018) Learning a Discriminative Feature Network for Semantic Segmentation, in IEEE Int Conf Comput Vis Pattern Recognit., 1857–1866
Yu Z, Feng C, Liu M-Y, Ramalingam S (2017) Casenet: Deep category-aware semantic edge detection, in IEEE Int Conf Comput Vis Pattern Recognit., 1761–1770
Zhao H et al (2017) Pyramid scene parsing network, in IEEE Int Conf Comput Vis Pattern Recognit., 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhao T, Wu X (2019) Pyramid Feature Attention Network for Saliency Detection, in IEEE Int Conf Comput Vis Pattern Recognit., 3080–3089. https://doi.org/10.48550/arXiv.1903.00179
Zhang H et al (2018) Context encoding for semantic segmentation, in IEEE Int Conf Comp Vis Pattern Recognit. 7151–7160
Zhang Z et al (2018) Exfuse: Enhancing feature fusion for semantic segmentation, In European Conf Comp Vis., 269–284
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 61401127, Natural Science Foundation of Heilongjiang Province under Grant LH2022F038, Cultivation Project of National Natural Science Foundation under Grant XPPY202208 and Graduate Innovation Fund of Harbin Normal University under Grant HSDSSCX2019-29.
Author information
Authors and Affiliations
Contributions
Rui Zhao: Methodology, software programming, Writing—original draft. Xiaoyan Yu: Supervision, Writing—review & editing. Xianwei Rong: Resources, Data curation.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, R., Yu, X. & Rong, X. Position attention optimized deep semantic segmentation. Multimed Tools Appl 83, 29531–29545 (2024). https://doi.org/10.1007/s11042-023-16022-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16022-4