Advertisement

GINet: Graph Interaction Network for Scene Parsing

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)

Abstract

Recently, context reasoning using image regions beyond local convolution has shown great potential for scene parsing. In this work, we explore how to incorperate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss). The GI unit is capable of enhancing feature representations of convolution networks over high-level semantics and learning the semantic coherency adaptively to each sample. Specifically, the dataset-based linguistic knowledge is first incorporated in the GI unit to promote context reasoning over the visual graph, then the evolved representations of the visual graph are mapped to each local representation to enhance the discriminated capability for scene parsing. GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph. We perform full ablation studies to demonstrate the effectiveness of each component in our approach. Particularly, the proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.

Keywords

Scene parsing Context reasoning Graph interaction 

Supplementary material

504472_1_En_3_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1310 KB)

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)CrossRefGoogle Scholar
  3. 3.
    Biederman, I., Mezzanotte, R.J., Rabinowitz, J.C.: Scene perception: detecting and judging objects undergoing relational violations. Cogn. Pychol. 14(2), 143–177 (1982)CrossRefGoogle Scholar
  4. 4.
    Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: thing and stuff classes in context. In: CVPR (2018)Google Scholar
  5. 5.
    Chaurasia, A., Culurciello, E.: Linknet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP) (2017)Google Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
  7. 7.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)CrossRefGoogle Scholar
  8. 8.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  9. 9.
    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
  10. 10.
    Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., Kalantidis, Y.: Graph-based global reasoning networks. In: CVPR (2019)Google Scholar
  11. 11.
    Cheng, B., et al.: Spgnet: semantic prediction guidance for scene parsing. In: ICCV (2019)Google Scholar
  12. 12.
    Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8885–8894 (2019)Google Scholar
  13. 13.
    Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: CVPR (2018)Google Scholar
  14. 14.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  15. 15.
    Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR (2019)Google Scholar
  16. 16.
    Fu, J., et al.: Adaptive context network for scene parsing. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6748–6757 (2019)Google Scholar
  17. 17.
    Gong, K., Gao, Y., Liang, X., Shen, X., Wang, M., Lin, L.: Graphonomy: Universal Human Parsing Via Graph Transfer Learning. In: CVPR (2019)Google Scholar
  18. 18.
    He, H., Zhang, J., Zhang, Q., Tao, D.: Grapy-ml: Graph pyramid mutual learning for cross-dataset human parsing. CoRR abs/1911.12053 (2019). http://arxiv.org/abs/1911.12053
  19. 19.
    He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  21. 21.
    Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. arXiv preprint arXiv:1811.11721 (2018)
  22. 22.
    Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)
  23. 23.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016)Google Scholar
  24. 24.
    Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9167–9176 (2019)Google Scholar
  25. 25.
    Li, Y., Gupta, A.: Beyond grids: learning graph representations for visual recognition. In: Advances in Neural Information Processing Systems, pp. 9225–9235 (2018)Google Scholar
  26. 26.
    Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. arXiv preprint arXiv:1904.12787 (2019)
  27. 27.
    Liang, X., Hu, Z., Zhang, H., Lin, L., Xing, E.P.: Symbolic graph reasoning meets convolutions. In: Advances in Neural Information Processing Systems (2018)Google Scholar
  28. 28.
    Liang, X., Zhou, H., Xing, E.: Dynamic-structured semantic propagation network. In: CVPR (2018)Google Scholar
  29. 29.
    Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR (2016)Google Scholar
  30. 30.
    Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
  31. 31.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  32. 32.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  33. 33.
    Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)Google Scholar
  34. 34.
    Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: CVPR (2017)Google Scholar
  35. 35.
    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)Google Scholar
  36. 36.
    Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: CVPR (2017)Google Scholar
  37. 37.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention (2015)Google Scholar
  38. 38.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  39. 39.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning (2013)Google Scholar
  40. 40.
    Tian, Z., He, T., Shen, C., Yan, Y.: Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: CVPR (2019)Google Scholar
  41. 41.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation,detection, and recognition. Int. J. Comput. Vis. 63, 113–140 (2005)CrossRefGoogle Scholar
  42. 42.
    Wang, Q., Guo, G.: LS-CNN: characterizing local patches at multiple scales for face recognition. IEEE Trans. Inf. Forensics Secur. 15, 1640–1653 (2019)CrossRefGoogle Scholar
  43. 43.
    Wang, Q., Wu, T., Zheng, H., Guo, G.: Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020Google Scholar
  44. 44.
    Wang, Q., Zheng, Y., Yang, G., Jin, W., Chen, X., Yin, Y.: Multiscale rotation-invariant convolutional neural networks for lung texture classification. IEEE J. Biomed. Health Inf. 22(1), 184–195 (2018)CrossRefGoogle Scholar
  45. 45.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks (2017)Google Scholar
  46. 46.
    Wu, H., Zhang, J., Huang, K., Liang, K., Yu, Y.: Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816 (2019)
  47. 47.
    Wu, T., Tang, S., Zhang, R., Cao, J., Li, J.: Tree-structured kronecker convolutional network for semantic segmentation. In: ICME (2019)Google Scholar
  48. 48.
    Wu, T., Tang, S., Zhang, R., Guo, G., Zhang, Y.: Consensus feature network for scene parsing. arXiv preprint arXiv:1907.12411 (2019)
  49. 49.
    Wu, T., Tang, S., Zhang, R., Zhang, Y.: Cgnet: A light-weight context guided network for semantic segmentation. arXiv preprint arXiv:1811.08201 (2018)
  50. 50.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: CVPR (2018)Google Scholar
  51. 51.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  52. 52.
    Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065 (2019)
  53. 53.
    Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018)
  54. 54.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)Google Scholar
  55. 55.
    Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: CVPR (2019)Google Scholar
  56. 56.
    Zhang, S., Yan, S., He, X.: Latentgnn: Learning efficient non-local relations for visual recognition. arXiv preprint arXiv:1905.11634 (2019)
  57. 57.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  58. 58.
    Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 270–286. SpringerSpringer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_17CrossRefGoogle Scholar
  59. 59.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR (2017)Google Scholar
  60. 60.
    Zhou, L., Zhang, C., Wu, M.: D-linknet: linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In: CVPR Workshops (2018)Google Scholar
  61. 61.
    Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp, 593–602 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institute of Deep Learning, Baidu ResearchBeijingChina
  2. 2.National Engineering Laboratory for Deep Learning Technology and ApplicationBeijingChina
  3. 3.Beijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations