Advertisement

GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild

Conference paper
  • 616 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

The semantic segmentation of parts of objects in the wild is a challenging task in which multiple instances of objects and multiple parts within those objects must be detected in the scene. This problem remains nowadays very marginally explored, despite its fundamental importance towards detailed object understanding. In this work, we propose a novel framework combining higher object-level context conditioning and part-level spatial relationships to address the task. To tackle object-level ambiguity, a class-conditioning module is introduced to retain class-level semantics when learning parts-level semantics. In this way, mid-level features carry also this information prior to the decoding stage. To tackle part-level ambiguity and localization we propose a novel adjacency graph-based module that aims at matching the relative spatial relationships between ground truth and predicted parts. The experimental evaluation on the Pascal-Part dataset shows that we achieve state-of-the-art results on this task.

Keywords

Part parsing Semantic segmentation Graph matching Deep learning 

Supplementary material

504445_1_En_24_MOESM1_ESM.pdf (6.7 mb)
Supplementary material 1 (pdf 6884 KB)

References

  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283 (2016)Google Scholar
  2. 2.
    Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33718-5_60CrossRefGoogle Scholar
  3. 3.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 39(12), 2481–2495 (2017)Google Scholar
  4. 4.
    Chen, L.C.: DeepLab official TensorFlow implementation. https://github.com/tensorflow/models/tree/master/research/deeplab. Accessed 01 Mar 2020
  5. 5.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 40(4), 834–848 (2018)Google Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  7. 7.
    Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3640–3649 (2016)Google Scholar
  8. 8.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1978 (2014)Google Scholar
  9. 9.
    Das, D., Lee, C.G.: Unsupervised domain adaptation using regularized hyper-graph matching. In: Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 3758–3762. IEEE (2018)Google Scholar
  10. 10.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)Google Scholar
  11. 11.
    Dhar, P., Singh, R.V., Peng, K.C., Wu, Z., Chellappa, R.: Learning without memorizing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5138–5146 (2019)Google Scholar
  12. 12.
    Dong, J., Chen, Q., Shen, X., Yang, J., Yan, S.: Towards unified human parsing and pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 843–850 (2014)Google Scholar
  13. 13.
    Emmert-Streib, F., Dehmer, M., Shi, Y.: Fifty years of graph matching, network alignment and network comparison. Inf. Sci. 346, 180–197 (2016)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Eslami, S., Williams, C.: A generative model for parts-based object segmentation. In: Neural Information Processing Systems (NeurIPS), pp. 100–107 (2012)Google Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)Google Scholar
  16. 16.
    Fang, H.S., Lu, G., Fang, X., Xie, J., Tai, Y.W., Lu, C.: Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  17. 17.
    Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Do semantic parts emerge in convolutional neural networks? Int. J. Comput. Vis. (IJCV) 126(5), 476–494 (2018)Google Scholar
  18. 18.
    Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimedia Inf. Retrieval 7(2), 87–93 (2018)Google Scholar
  19. 19.
    Haggag, H., Abobakr, A., Hossny, M., Nahavandi, S.: Semantic body parts segmentation for quadrupedal animals. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 000855–000860 (2016)Google Scholar
  20. 20.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 447–456 (2015)Google Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  22. 22.
    Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5546–5555 (2015)Google Scholar
  23. 23.
    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 40(12), 2935–2947 (2018)Google Scholar
  24. 24.
    Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 41(4), 871–885 (2018)Google Scholar
  25. 25.
    Liang, X., Lin, L., Shen, X., Feng, J., Yan, S., Xing, E.P.: Interpretable structure-evolving LSTM. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2017)Google Scholar
  26. 26.
    Liang, X., et al.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 37(12), 2402–2414 (2015)Google Scholar
  27. 27.
    Liang, X., Shen, X., Feng, J., Lin, L., Yan, S.: Semantic object parsing with graph LSTM. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV. pp. 125–143. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-319-46448-0_8
  28. 28.
    Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52(2), 1089–1106 (2019)Google Scholar
  29. 29.
    Livi, L., Rizzi, A.: The graph matching problem. Pattern Anal. Appl. 16(3), 253–283 (2013)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)Google Scholar
  31. 31.
    Lu, W., Lian, X., Yuille, A.: Parsing semantic parts of cars using graphical models and segment appearance consistency. arXiv preprint arXiv:1406.2375 (2014)
  32. 32.
    Mel, M., Michieli, U., Zanuttigh, P.: Incremental and multi-task learning strategies for coarse-to-fine semantic segmentation. Technologies 8(1), 1 (2020)Google Scholar
  33. 33.
    Michieli, U., Zanuttigh, P.: Incremental learning techniques for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2019)Google Scholar
  34. 34.
    Michieli, U., Zanuttigh, P.: Knowledge distillation for incremental learning in semantic segmentation. arXiv preprint arXiv:1911.03462 (2020)
  35. 35.
    Nie, X., Feng, J., Yan, S.: Mutual learning to adapt for joint human parsing and pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 519–534. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_31CrossRefGoogle Scholar
  36. 36.
    Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2001–2010 (2017)Google Scholar
  37. 37.
    Shmelkov, K., Schmid, C., Alahari, K.: Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 3400–3409 (2017)Google Scholar
  38. 38.
    Song, Y., Chen, X., Li, J., Zhao, Q.: Embedding 3D geometric features for rigid object part segmentation. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 580–588 (2017)Google Scholar
  39. 39.
    Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 3400–3407 (2013)Google Scholar
  40. 40.
    Wang, J., Yuille, A.L.: Semantic part segmentation using compositional model combining shape and appearance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1788–1797 (2015)Google Scholar
  41. 41.
    Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Joint object and part segmentation using deep learned potentials. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 1573–1581 (2015)Google Scholar
  42. 42.
    Wang, Y., Tran, D., Liao, Z., Forsyth, D.: Discriminative hierarchical part-based models for human parsing and action recognition. J. Mach. Learn. Res. 13(Oct), 3075–3102 (2012)Google Scholar
  43. 43.
    Xia, F., Wang, P., Chen, L.-C., Yuille, A.L.: Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 648–663. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_39CrossRefGoogle Scholar
  44. 44.
    Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6769–6778 (2017)Google Scholar
  45. 45.
    Xia, F., Zhu, J., Wang, P., Yuille, A.: Pose-guided human parsing with deep learned features. arXiv preprint arXiv:1508.03881 (2015)
  46. 46.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3570–3577. IEEE (2012)Google Scholar
  47. 47.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392 (2011)Google Scholar
  48. 48.
    Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 834–849. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_54CrossRefGoogle Scholar
  49. 49.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)Google Scholar
  50. 50.
    Zhao, J., et al.: Self-supervised neural aggregation networks for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 7–15 (2017)Google Scholar
  51. 51.
    Zhao, Y., Li, J., Zhang, Y., Tian, Y.: Multi-class part parsing with joint boundary-semantic awareness. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 9177–9186 (2019)Google Scholar
  52. 52.
    Zhu, L.L., Chen, Y., Lin, C., Yuille, A.: Max margin learning of hierarchical configural deformable templates (HCDTs) for efficient object parsing and pose estimation. Int. J. Comput. Vis. (IJCV) 93(1), 1–21 (2011)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of PadovaPadovaItaly

Personalised recommendations