Advertisement

Tensor Low-Rank Reconstruction for Semantic Segmentation

Conference paper
  • 543 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)

Abstract

Context information plays an indispensable role in the success of semantic segmentation. Recently, non-local self-attention based methods are proved to be effective for context information collection. Since the desired context consists of spatial-wise and channel-wise attentions, 3D representation is an appropriate formulation. However, these non-local methods describe 3D context information based on a 2D similarity matrix, where space compression may lead to channel-wise attention missing. An alternative is to model the contextual information directly without compression. However, this effort confronts a fundamental difficulty, namely the high-rank property of context information. In this paper, we propose a new approach to model the 3D context representations, which not only avoids the space compression but also tackles the high-rank difficulty. Here, inspired by tensor canonical-polyadic decomposition theory (i.e, a high-rank tensor can be expressed as a combination of rank-1 tensors.), we design a low-rank-to-high-rank context reconstruction framework (i.e, RecoNet). Specifically, we first introduce the tensor generation module (TGM), which generates a number of rank-1 tensors to capture fragments of context feature. Then we use these rank-1 tensors to recover the high-rank context features through our proposed tensor reconstruction module (TRM). Extensive experiments show that our method achieves state-of-the-art on various public datasets. Additionally, our proposed method has more than 100 times less computational cost compared with conventional non-local-based methods.

Keywords

Semantic segmentation Low-rank reconstruction Tensor decomposition 

Supplementary material

504472_1_En_4_MOESM1_ESM.pdf (227 kb)
Supplementary material 1 (pdf 226 KB)

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE TPAMI 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: thing and stuff classes in context. In: Proceedings of the CVPR, pp. 1209–1218 (2018)Google Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40(4), 834–848 (2018)CrossRefGoogle Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  5. 5.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
  6. 6.
    Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A 2-nets: double attention networks. In: Proceedings of the NIPS, pp. 352–361 (2018)Google Scholar
  7. 7.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the CVPR, pp. 1251–1258 (2017)Google Scholar
  8. 8.
    Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Proceedings of the NIPS, pp. 577–585 (2015)Google Scholar
  9. 9.
    Cui, Y., Chen, Z., Wei, S., Wang, S., Liu, T., Hu, G.: Attention-over-attention neural networks for reading comprehension. In: Proceedings of the ACL (2017)Google Scholar
  10. 10.
    Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In: Proceedings of the CVPR, pp. 8885–8894 (2019)Google Scholar
  11. 11.
    Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the CVPR, pp. 2393–2402 (2018)Google Scholar
  12. 12.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  13. 13.
    Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. arXiv preprint arXiv:1809.02983 (2018)
  14. 14.
    He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the ICCV, pp. 3562–3572 (2019)Google Scholar
  15. 15.
    He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the CVPR, pp. 7519–7528 (2019)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)Google Scholar
  17. 17.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the CVPR, pp. 7132–7141 (2018)Google Scholar
  18. 18.
    Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the ICCV, pp. 603–612 (2019)Google Scholar
  19. 19.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. (SIREV) 51, 3 (2009)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In: Proceedings of the ICLR (2015)Google Scholar
  21. 21.
    Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the ICCV, pp. 9167–9176 (2019)Google Scholar
  22. 22.
    Liang, X., Xing, E., Zhou, H.: Dynamic-structured semantic propagation network. In: Proceedings of the CVPR, pp. 752–761 (2018)Google Scholar
  23. 23.
    Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 622–638. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_37CrossRefGoogle Scholar
  24. 24.
    Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the CVPR, pp. 1925–1934 (2017)Google Scholar
  25. 25.
    Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the CVPR, pp. 3194–3203 (2016)Google Scholar
  26. 26.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  27. 27.
    Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. IEEE TPAMI 33(5), 978–994 (2011)CrossRefGoogle Scholar
  28. 28.
    Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
  29. 29.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of the ICCV, pp. 1377–1385 (2015)Google Scholar
  30. 30.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR, pp. 3431–3440 (2015)Google Scholar
  31. 31.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Workshop (2017)Google Scholar
  32. 32.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  33. 33.
    Rui, Z., Sheng, T., Zhang, Y., Li, J., Yan, S.: Scale-adaptive convolutions for scene parsing. In: Proceedings of the ICCV, pp. 2031–2039 (2017)Google Scholar
  34. 34.
    Sharma, A., Tuzel, O., Liu, M.Y.: Recursive context propagation network for semantic scene labeling. In: Proceedings of the NIPS (2014)Google Scholar
  35. 35.
    Shuai, B., Zup, Z., Wang, B., Wang, G.: Scene segmentation with DAG-recurrent neural networks. IEEE TPAMI 40(6), 1480–1493 (2018)CrossRefGoogle Scholar
  36. 36.
    Vaswani, A., et al.: Attention is all you need. In: Proceedings of the NIPS, pp. 5998–6008 (2017)Google Scholar
  37. 37.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the CVPR, pp. 7794–7803 (2018)Google Scholar
  38. 38.
    Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_1CrossRefGoogle Scholar
  39. 39.
    Wu, Z., Shen, C., Hengel, A.v.d.: Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885 (2016)
  40. 40.
    Wu, Z., Shen, C., Van Den Hengel, A.: Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn. 90, 119–133 (2019)CrossRefGoogle Scholar
  41. 41.
    Yang, J., Price, B., Cohen, S., Yang, M.H.: Context driven scene parsing with attention to rare classes. In: Proceedings of the CVPR, pp. 3294–3301 (2014)Google Scholar
  42. 42.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the NAACL, pp. 1480–1489 (2016)Google Scholar
  43. 43.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the CVPR, pp. 1857–1866 (2018)Google Scholar
  44. 44.
    Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: Proceedings of the CVPR, pp. 7370–7379 (2017)Google Scholar
  45. 45.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: Proceedings of the CVPR, pp. 7151–7160 (2018)Google Scholar
  46. 46.
    Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: Proceedings of the CVPR, pp. 548–557 (2019)Google Scholar
  47. 47.
    Zhang, S., He, X., Yan, S.: LatentGNN: learning efficient non-local relations for visual recognition. In: Proceedings of the ICML, pp. 7374–7383 (2019)Google Scholar
  48. 48.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the CVPR, pp. 2881–2890 (2017)Google Scholar
  49. 49.
    Zhao, H., et al.: PSANet: point-wise spatial attention network for acene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part IX. LNCS, vol. 11213, pp. 270–286. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_17CrossRefGoogle Scholar
  50. 50.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: Proceedings of the ICCV, pp. 1529–1537 (2015)Google Scholar
  51. 51.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the CVPR, pp. 633–641 (2017)Google Scholar
  52. 52.
    Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the ICCV, pp. 593–602 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Chinese University of Hong KongNew TerritoriesHong Kong
  2. 2.Shanghai Jiao Tong UniversityShanghaiChina
  3. 3.ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesBeijingChina
  4. 4.SmartMoreShenzhenChina

Personalised recommendations