Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding

Lin, Zhongkang; Sun, Wei; Tang, Bo; Li, Jinda; Yao, Xinyuan; Li, Yu

doi:10.1007/s00371-021-02360-7

Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding

Original article
Published: 26 January 2022

Volume 39, pages 597–608, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Zhongkang Lin^1,2,
Wei Sun ORCID: orcid.org/0000-0002-3424-9643^1,2,
Bo Tang^1,2,
Jinda Li^1,2,
Xinyuan Yao^1,2 &
…
Yu Li^1,2

1076 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Semantic segmentation is an active field of computer vision. It provides semantic information for many applications. In semantic segmentation tasks, spatial information, context information, and high-level semantic information play an important role in improving segmentation accuracy. In this paper, a semantic segmentation network with multi-path structure, attention reweighting, and multi-scale encoding structure is proposed. Firstly, three parallel structures were designed, including a pyramid spatial path with a pyramid image input, a context path composed of a lightweight backbone network, and a semantic graph path composed of spatial graph convolutional layers. Secondly, a feature fusion module was designed to perform a weighted fusion of the output features of different paths based on the channel attention mechanism. Then, the semantic segmentation dataset CamVid and Cityscapes were used for network training. Finally, ablation experiments were carried out to verify the effectiveness of the proposed network components, and analyze the computational efficiency and segmentation accuracy of the model. The experimental results show that the semantic segmentation network can improve the accuracy of semantic segmentation by combining multi-scale information, high-level semantic information, and global context information while ensuring high computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Enhanced multi-scale networks for semantic segmentation

Article Open access 04 December 2023

Semantic Image Segmentation with Feature Fusion Based on Laplacian Pyramid

Article 26 March 2022

Efficient real-time semantic segmentation: accelerating accuracy with fast non-local attention

Article 23 October 2023

References

Liu, F., et al.: Structural feature learning-based unsupervised semantic segmentation of synthetic aperture radar image. J. Appl. Remote Sens. 13(1), 014501 (2019)
Article Google Scholar
Wang, D., Han, M.: SA-U-Net++: SAR marine floating raft aquaculture identification based on semantic segmentation and ISAR augmentation. J. Appl. Remote Sens. 15(1), 016505 (2021)
Article Google Scholar
Liu, Y., et al.: Semantic segmentation of multisensor remote sensing imagery with deep ConvNets and higher-order conditional random fields. J. Appl. Remote Sens. 13(1), 016501 (2019)
Article Google Scholar
Wang, Y., Xiao, S.: Learning multiscale spatial context for three-dimensional point cloud semantic segmentation. J Electron Imag 29(6), 063005 (2020)
Article Google Scholar
Ku, T., et al.: SHREC 2020: 3D point cloud semantic segmentation for street scenes. Comput. Graph. 93, 13–24 (2020)
Article Google Scholar
Hegde, S., Gangisetty, S.: PIG-Net: Inception based deep learning architecture for 3D point cloud segmentation. Comput. Graph. 95, 13–22 (2021)
Article Google Scholar
Boulch, A., et al.: SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Comput. Graph. 71, 189–198 (2018)
Article Google Scholar
Wang, P., et al.: 3D shape segmentation via shape fully convolutional networks. Comput. Graph. 76, 182–192 (2018)
Article Google Scholar
Li, C., et al.: ANU-Net: attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput. Graph. 90, 11–20 (2020)
Article Google Scholar
Yuan, D., Qiang, J., Yin, J.: Image segmentation via foreground and background semantic descriptors. J. Electron. Imag. 26(5), 053004 (2017)
Article Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)
Article Google Scholar
Paszke, A., et al., ENet: A deep neural network architecture for real-time semantic segmentation. ArXiv, 2016. abs/1606.02147.
Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. ArXiv, 2018. abs/1704.08545.
Li, X., et al., Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6459-6468
Wu, Z., C. Shen and A.V.D. Hengel, Real-time semantic image segmentation via spatial sparsity. ArXiv, 2017. abs/1712.00213.
Yu, C., et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. in ECCV. 2018.
Woo, S., et al. CBAM: Convolutional block attention module. in ECCV. 2018.
Brostow, G., et al. Segmentation and recognition using structure from motion point clouds. in ECCV. 2008.
Cordts, M., et al., The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: p. 3213-3223
Wang, P., et al., Understanding Convolution for Semantic Segmentation. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: p. 1451-1460
Chen, L., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Article Google Scholar
Chen, L., et al., Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017. abs/1706.05587.
Zhao, H., et al., Pyramid scene parsing network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6230-6239
Wang, C., et al.: On the contextual aspects of using deep convolutional neural network for semantic image segmentation. J. Electron. Imag. 27(5), 051223 (2018)
Article Google Scholar
Zhang, R., et al., Scale-adaptive convolutions for scene parsing. 2017 IEEE international conference on computer vision (ICCV), 2017: p. 2050-2058
He, K., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Article Google Scholar
Gao, S., et al.: Res2Net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2021)
Article Google Scholar
Wang, Y., et al., MGCN: Descriptor learning using multiscale GCNs. ArXiv, 2020. https://arxiv.org/abs/2001.10472
Li, X., et al. Expectation-maximization attention networks for semantic segmentation. in Proceedings of the IEEE International Conference on Computer Vision. 2019.
Wang, F., et al., Residual attention network for image classification. 2017 ieee conference on computer vision and pattern recognition (CVPR), 2017: p. 6450-6458
Hu, J., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2011–2023 (2020)
Article Google Scholar
Zhao, Y., et al., Multi-class part parsing with joint boundary-semantic awareness. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 9176-9185
Li, Y. and A. Gupta. Beyond grids: Learning graph representations for visual recognition. in NeurIPS. 2018.
Liang, X., et al. Symbolic graph reasoning meets convolutions. in NeurIPS. 2018.
Bruna, J., et al., Spectral networks and locally connected networks on graphs. CoRR, 2014. https://arxiv.org/abs/1312.6203
Defferrard, M., X. Bresson and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. in NIPS. 2016.
Kipf, T. and M. Welling, Semi-supervised classification with graph convolutional networks. 2017. https://arxiv.org/abs/1609.02907
Velickovic, P., et al., Graph attention networks. ArXiv, 2018. abs/1710.10903.
Michieli, U., et al. GMNet: Graph matching network for large scale part semantic segmentation in the wild. in ECCV. 2020.
Peng, C., et al., Large Kernel Matters — improve semantic segmentation by global convolutional network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 1743-1751
Ioffe, S. and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift. ArXiv, 2015. abs/1502.03167.
Glorot, X., A. Bordes and Y. Bengio. Deep sparse rectifier neural networks. in AISTATS. 2011.
He, K., et al., Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016: p. 770-778
Howard, A.G., et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. ArXiv, 2017. abs/1704.04861.
F., C. Xception: Deep learning with depthwise separable convolutions. in 2017 ieee conference on computer vision and pattern recognition (CVPR). 2017.
J., D., et al. ImageNet: a large-scale hierarchical image database. in 2009 ieee conference on computer vision and pattern recognition. 2009.
Richter, S.R., et al., Playing for data: ground truth from computer games. ArXiv, 2016. abs/1608.02192.
Kirillov, A., et al., Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: p. 6392-6401
X., L., X. E. and Z. H. Dynamic-Structured Semantic Propagation Network. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.
J., F., et al., Scene Segmentation With Dual Relation-Aware Attention Network. IEEE Transactions on Neural Networks and Learning Systems, 2020: p. 1–14.
M., Y., et al. DenseASPP for Semantic Segmentation in Street Scenes. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.
Cheng, B., et al., SPGNet: Semantic Prediction Guidance for Scene Parsing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: p. 5217-5227
Tsai, Y., et al., Learning to adapt structured output space for semantic segmentation. 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018: p. 7472-7481
Luo, Y., et al., Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019: p. 2502-2511
Tsai, Y., et al., Domain adaptation for structured output via discriminative patch representations. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 1456-1465
Zou, Y., et al., Confidence regularized self-training. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 5981-5990
Zhang, Q., et al. Category anchor-guided unsupervised domain adaptation for semantic segmentation. in NeurIPS. 2019.
Zheng, Z., Yang, Y.W.: Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Computer Vision 54, 1–15 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 51874217), Foundation of Hubei Provincial Education Department (No. B2020011), WUST National Defense Pre-research Foundation (No. GF202008).

Author information

Authors and Affiliations

Key Laboratory of Metallurgical Equipment and Control Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
Zhongkang Lin, Wei Sun, Bo Tang, Jinda Li, Xinyuan Yao & Yu Li
Engineering Research Center of Metallurgical Automation and Measurement Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
Zhongkang Lin, Wei Sun, Bo Tang, Jinda Li, Xinyuan Yao & Yu Li

Authors

Zhongkang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bo Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jinda Li
View author publications
You can also search for this author in PubMed Google Scholar
Xinyuan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Sun or Bo Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, Z., Sun, W., Tang, B. et al. Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding. Vis Comput 39, 597–608 (2023). https://doi.org/10.1007/s00371-021-02360-7

Download citation

Accepted: 10 November 2021
Published: 26 January 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00371-021-02360-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding

Abstract

Access this article

Similar content being viewed by others

Enhanced multi-scale networks for semantic segmentation

Semantic Image Segmentation with Feature Fusion Based on Laplacian Pyramid

Efficient real-time semantic segmentation: accelerating accuracy with fast non-local attention

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding

Abstract

Access this article

Similar content being viewed by others

Enhanced multi-scale networks for semantic segmentation

Semantic Image Segmentation with Feature Fusion Based on Laplacian Pyramid

Efficient real-time semantic segmentation: accelerating accuracy with fast non-local attention

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation