Skip to main content
Log in

Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Semantic segmentation is an active field of computer vision. It provides semantic information for many applications. In semantic segmentation tasks, spatial information, context information, and high-level semantic information play an important role in improving segmentation accuracy. In this paper, a semantic segmentation network with multi-path structure, attention reweighting, and multi-scale encoding structure is proposed. Firstly, three parallel structures were designed, including a pyramid spatial path with a pyramid image input, a context path composed of a lightweight backbone network, and a semantic graph path composed of spatial graph convolutional layers. Secondly, a feature fusion module was designed to perform a weighted fusion of the output features of different paths based on the channel attention mechanism. Then, the semantic segmentation dataset CamVid and Cityscapes were used for network training. Finally, ablation experiments were carried out to verify the effectiveness of the proposed network components, and analyze the computational efficiency and segmentation accuracy of the model. The experimental results show that the semantic segmentation network can improve the accuracy of semantic segmentation by combining multi-scale information, high-level semantic information, and global context information while ensuring high computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Liu, F., et al.: Structural feature learning-based unsupervised semantic segmentation of synthetic aperture radar image. J. Appl. Remote Sens. 13(1), 014501 (2019)

    Article  Google Scholar 

  2. Wang, D., Han, M.: SA-U-Net++: SAR marine floating raft aquaculture identification based on semantic segmentation and ISAR augmentation. J. Appl. Remote Sens. 15(1), 016505 (2021)

    Article  Google Scholar 

  3. Liu, Y., et al.: Semantic segmentation of multisensor remote sensing imagery with deep ConvNets and higher-order conditional random fields. J. Appl. Remote Sens. 13(1), 016501 (2019)

    Article  Google Scholar 

  4. Wang, Y., Xiao, S.: Learning multiscale spatial context for three-dimensional point cloud semantic segmentation. J Electron Imag 29(6), 063005 (2020)

    Article  Google Scholar 

  5. Ku, T., et al.: SHREC 2020: 3D point cloud semantic segmentation for street scenes. Comput. Graph. 93, 13–24 (2020)

    Article  Google Scholar 

  6. Hegde, S., Gangisetty, S.: PIG-Net: Inception based deep learning architecture for 3D point cloud segmentation. Comput. Graph. 95, 13–22 (2021)

    Article  Google Scholar 

  7. Boulch, A., et al.: SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Comput. Graph. 71, 189–198 (2018)

    Article  Google Scholar 

  8. Wang, P., et al.: 3D shape segmentation via shape fully convolutional networks. Comput. Graph. 76, 182–192 (2018)

    Article  Google Scholar 

  9. Li, C., et al.: ANU-Net: attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput. Graph. 90, 11–20 (2020)

    Article  Google Scholar 

  10. Yuan, D., Qiang, J., Yin, J.: Image segmentation via foreground and background semantic descriptors. J. Electron. Imag. 26(5), 053004 (2017)

    Article  Google Scholar 

  11. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)

    Article  Google Scholar 

  12. Paszke, A., et al., ENet: A deep neural network architecture for real-time semantic segmentation. ArXiv, 2016. abs/1606.02147.

  13. Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. ArXiv, 2018. abs/1704.08545.

  14. Li, X., et al., Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6459-6468

  15. Wu, Z., C. Shen and A.V.D. Hengel, Real-time semantic image segmentation via spatial sparsity. ArXiv, 2017. abs/1712.00213.

  16. Yu, C., et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. in ECCV. 2018.

  17. Woo, S., et al. CBAM: Convolutional block attention module. in ECCV. 2018.

  18. Brostow, G., et al. Segmentation and recognition using structure from motion point clouds. in ECCV. 2008.

  19. Cordts, M., et al., The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: p. 3213-3223

  20. Wang, P., et al., Understanding Convolution for Semantic Segmentation. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: p. 1451-1460

  21. Chen, L., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)

    Article  Google Scholar 

  22. Chen, L., et al., Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017. abs/1706.05587.

  23. Zhao, H., et al., Pyramid scene parsing network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6230-6239

  24. Wang, C., et al.: On the contextual aspects of using deep convolutional neural network for semantic image segmentation. J. Electron. Imag. 27(5), 051223 (2018)

    Article  Google Scholar 

  25. Zhang, R., et al., Scale-adaptive convolutions for scene parsing. 2017 IEEE international conference on computer vision (ICCV), 2017: p. 2050-2058

  26. He, K., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)

    Article  Google Scholar 

  27. Gao, S., et al.: Res2Net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2021)

    Article  Google Scholar 

  28. Wang, Y., et al., MGCN: Descriptor learning using multiscale GCNs. ArXiv, 2020. https://arxiv.org/abs/2001.10472

  29. Li, X., et al. Expectation-maximization attention networks for semantic segmentation. in Proceedings of the IEEE International Conference on Computer Vision. 2019.

  30. Wang, F., et al., Residual attention network for image classification. 2017 ieee conference on computer vision and pattern recognition (CVPR), 2017: p. 6450-6458

  31. Hu, J., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2011–2023 (2020)

    Article  Google Scholar 

  32. Zhao, Y., et al., Multi-class part parsing with joint boundary-semantic awareness. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 9176-9185

  33. Li, Y. and A. Gupta. Beyond grids: Learning graph representations for visual recognition. in NeurIPS. 2018.

  34. Liang, X., et al. Symbolic graph reasoning meets convolutions. in NeurIPS. 2018.

  35. Bruna, J., et al., Spectral networks and locally connected networks on graphs. CoRR, 2014. https://arxiv.org/abs/1312.6203

  36. Defferrard, M., X. Bresson and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. in NIPS. 2016.

  37. Kipf, T. and M. Welling, Semi-supervised classification with graph convolutional networks. 2017. https://arxiv.org/abs/1609.02907

  38. Velickovic, P., et al., Graph attention networks. ArXiv, 2018. abs/1710.10903.

  39. Michieli, U., et al. GMNet: Graph matching network for large scale part semantic segmentation in the wild. in ECCV. 2020.

  40. Peng, C., et al., Large Kernel Matters — improve semantic segmentation by global convolutional network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 1743-1751

  41. Ioffe, S. and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift. ArXiv, 2015. abs/1502.03167.

  42. Glorot, X., A. Bordes and Y. Bengio. Deep sparse rectifier neural networks. in AISTATS. 2011.

  43. He, K., et al., Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016: p. 770-778

  44. Howard, A.G., et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. ArXiv, 2017. abs/1704.04861.

  45. F., C. Xception: Deep learning with depthwise separable convolutions. in 2017 ieee conference on computer vision and pattern recognition (CVPR). 2017.

  46. J., D., et al. ImageNet: a large-scale hierarchical image database. in 2009 ieee conference on computer vision and pattern recognition. 2009.

  47. Richter, S.R., et al., Playing for data: ground truth from computer games. ArXiv, 2016. abs/1608.02192.

  48. Kirillov, A., et al., Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: p. 6392-6401

  49. X., L., X. E. and Z. H. Dynamic-Structured Semantic Propagation Network. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.

  50. J., F., et al., Scene Segmentation With Dual Relation-Aware Attention Network. IEEE Transactions on Neural Networks and Learning Systems, 2020: p. 1–14.

  51. M., Y., et al. DenseASPP for Semantic Segmentation in Street Scenes. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.

  52. Cheng, B., et al., SPGNet: Semantic Prediction Guidance for Scene Parsing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: p. 5217-5227

  53. Tsai, Y., et al., Learning to adapt structured output space for semantic segmentation. 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018: p. 7472-7481

  54. Luo, Y., et al., Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019: p. 2502-2511

  55. Tsai, Y., et al., Domain adaptation for structured output via discriminative patch representations. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 1456-1465

  56. Zou, Y., et al., Confidence regularized self-training. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 5981-5990

  57. Zhang, Q., et al. Category anchor-guided unsupervised domain adaptation for semantic segmentation. in NeurIPS. 2019.

  58. Zheng, Z., Yang, Y.W.: Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Computer Vision 54, 1–15 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 51874217), Foundation of Hubei Provincial Education Department (No. B2020011), WUST National Defense Pre-research Foundation (No. GF202008).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Sun or Bo Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Z., Sun, W., Tang, B. et al. Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding. Vis Comput 39, 597–608 (2023). https://doi.org/10.1007/s00371-021-02360-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02360-7

Keywords

Navigation