Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

CGNet: cross-guidance network for semantic segmentation

  • 20 Accesses

Abstract

Semantic segmentation is a fundamental task in image analysis. The issue of semantic segmentation is to extract discriminative features for distinguishing different objects and recognizing hard examples. However, most existing methods have limitations on resolving this problem. To tackle this problem, we identify the contributions of the edge and saliency information for segmentation and present a novel end-to-end network, termed cross-guidance network (CGNet) to leverage them to benefit the semantic segmentation. The edge and saliency detection network are unified into the CGNet, and model the intrinsic information among them, guiding the process of extracting discriminative features. Specifically, the CGNet attempts to extract segmentation, edge, and salient features, simultaneously. Then it transfers them into the cross-guidance module (CGM) to generate the pre-knowledge features based on the modeled information, optimizing the context feature extraction process. The proposed approach is extensively evaluated on PASCAL VOC 2012, PASCAL-Person-Part, and Cityscapes, and achieves state-of-the-art performance, demonstrating the superiority of the proposed approach.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101

  2. 2

    Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640–651

  3. 3

    He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904–1916

  4. 4

    Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6230–6239

  5. 5

    Chen L-C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848

  6. 6

    Chen L-C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv: 1706.05587

  7. 7

    Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 833–851

  8. 8

    Joachims T, Finley T, Yu C-N J. Cutting-plane training of structural SVMs. Mach Learn, 2009, 77: 27–59

  9. 9

    Lin T-Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2999–3007

  10. 10

    Wu Z, Shen C, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. 2016. ArXiv: 1604.04339

  11. 11

    Kokkinos I. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5454–5463

  12. 12

    Sun H Q, Pang Y W. GlanceNets efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101

  13. 13

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015

  14. 14

    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778

  15. 15

    Huang G, Liu Z, Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2261–2269

  16. 16

    Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1800–1807

  17. 17

    Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495

  18. 18

    Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528

  19. 19

    Yu F, Koltun V, Funkhouser T A. Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 636–644

  20. 20

    Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5168–5177

  21. 21

    Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151–7160

  22. 22

    Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019

  23. 23

    Jégou S, Drozdzal M, Vázquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1175–1183

  24. 24

    Yang M, Yu K, Zhang C, et al. DenseASPP for semantic segmentation in street scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684–3692

  25. 25

    Zhang Z, Zhang X, Peng C, et al. ExFuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 273–288

  26. 26

    Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 418–434

  27. 27

    Li H, Xiong P, An J, et al. Pyramid attention network for semantic segmentation. In: Proceedings of British Machine Vision Conference, Newcastle, 2018. 285

  28. 28

    Peng C, Zhang X, Yu G, et al. Large kernel matters-improve semantic segmentation by global convolutional network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1743–1751

  29. 29

    Wei Z, Sun Y, Wang J. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3947–3955

  30. 30

    Pang Y, Wang T, Anwer R M, et al. Efficient featurized image pyramid network for single shot detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7336–7344

  31. 31

    Deng R, Shen C, Liu S, et al. Learning to predict crisp boundaries. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 570–586

  32. 32

    Xie S, Tu Z. Holistically-nested edge detection. Int J Comput Vis, 2017, 125: 3–18

  33. 33

    Liu Y, Cheng M-M, Hu X, et al. Richer convolutional features for edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5872–5881

  34. 34

    Liu Y, Lew M S. Learning relaxed deep supervision for better edge detection. In: Proceedings of IEEE Conference on Computer Vision, Las Vegas, 2016. 231–240

  35. 35

    Shen W, Wang X, Wang Y, et al. DeepContour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3982–3991

  36. 36

    Wang T-C, Liu M-Y, Zhu J-Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8798–8807

  37. 37

    Wang W, Lai Q, Fu H, et al. Salient object detection in the deep learning era: an in-depth survey. 2019. ArXiv: 1904.09146

  38. 38

    Liu N, Han J. DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 678–686

  39. 39

    Wang W, Shen J, Dong X, et al. Salient object detection driven by fixation prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1711–1720

  40. 40

    Wang W, Shen J, Yang R, et al. Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 20–33

  41. 41

    Wang W, Shen J, Dong X, et al. Inferring salient objects from human fixations. IEEE Trans Pattern Anal Mach Intell, 2019. doi: https://doi.org/10.1109/TPAMI.2019.2905607

  42. 42

    Liu N, Han J, Yang M-H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089–3098

  43. 43

    Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132–7141

  44. 44

    Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146–3154

  45. 45

    Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794–7803

  46. 46

    Zhang X, Wang T, Qi J, et al. Progressive attention guided recurrent network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 714–722

  47. 47

    Zhang X, Xiong H, Zhou W, et al. Picking deep filter responses for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1134–1142

  48. 48

    Everingham M, van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge. Int J Comput Vis, 2010, 88: 303–338

  49. 49

    Xia F, Wang P, Chen X, et al. Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6080–6089

  50. 50

    Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213–3223

  51. 51

    Hariharan B, Arbelaez P, Bourdev L D, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2017. 991–998

  52. 52

    Zheng S, Jayasumana S, Romera-Paredes B. Conditional random fields as recurrent neural networks. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1529–1537

  53. 53

    Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1377–1385

  54. 54

    Lin G, Shen C, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194–3203

  55. 55

    Ke T-W, Hwang J-J, Liu Z, et al. Adaptive affinity fields for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 605–621

  56. 56

    Wu Z, Shen C, van den Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recogn, 2019, 90: 119–133

  57. 57

    Xia F, Wang P, Chen L-C, et al. Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 648–663

  58. 58

    Chen L-C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3640–3649

  59. 59

    Liang X, Shen X, Xiang D, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3185–3193

  60. 60

    Gong K, Liang X, Zhang D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6757–6765

  61. 61

    Luo Y, Zheng Z, Zheng L, et al. Macro-micro adversarial network for human parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 424–440

  62. 62

    Liang X, Shen X, Feng J, et al. Semantic object parsing with graph LSTM. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 125–143

  63. 63

    Zhao J, Li J, Nie X, et al. Self-supervised neural aggregation networks for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1595–1603

  64. 64

    Liang X, Lin L, Shen X, et al. Interpretable structure-evolving LSTM. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2175–2184

  65. 65

    Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 519–534

  66. 66

    Zhu B, Chen Y, Tang M, et al. Progressive cognitive human parsing. In: Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, 2018. 7607–7614

  67. 67

    Li Q Z, Arnab A, Torr P H S. Holistic, instance-level human parsing. In: Proceedings of British Machine Vision Conference, London, 2017

  68. 68

    Fang H, Lu G, Fang X, et al. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 70–78

  69. 69

    Gong K, Liang X, Li Y, et al. Instance-level human parsing via part grouping network. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 805–822

  70. 70

    Liang X, Zhou H, Xing E. Dynamic-structure semantic propagation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 752–761

  71. 71

    Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451–1460

  72. 72

    Zhang R, Tang S, Zhang Y, et al. Scale-adaptive convolutions for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2050–2058

  73. 73

    Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 334–349

  74. 74

    Yu C, Wang J, Peng C, et al. Learning a discriminative feature network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1857–1866

  75. 75

    Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 270–286

  76. 76

    Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. 593–602

Download references

Acknowledgements

This work was supported in part by the Science and Technology Innovation 2030-Major Project of Artificial Intelligence of the Ministry of Science and Technology of China (Grant No. 2018AAA01028) and in part by National Natural Science Foundation of China (Grant No. 61632018).

Author information

Correspondence to Yanwei Pang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Pang, Y. CGNet: cross-guidance network for semantic segmentation. Sci. China Inf. Sci. 63, 120104 (2020). https://doi.org/10.1007/s11432-019-2718-7

Download citation

Keywords

  • semantic segmentation
  • fully convolutional networks
  • pyramid network
  • edge detection
  • saliency detection
  • cross-guidance