Skip to main content

3D Compositional Zero-Shot Learning with DeCompositional Consensus

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Parts represent a basic unit of geometric and semantic similarity across different objects. We argue that part knowledge should be composable beyond the observed object classes. Towards this, we present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes for semantic segmentation. We provide a structured study through benchmarking the task with the proposed Compositional-PartNet dataset. This dataset is created by processing the original PartNet to maximize part overlap across different objects. The existing point cloud part segmentation methods fail to generalize to unseen object classes in this setting. As a solution, we propose DeCompositional Consensus, which combines a part segmentation network with a part scoring network. The key intuition to our approach is that a segmentation mask over some parts should have a consensus with its part scores when each part is taken apart. The two networks reason over different part combinations defined in a per-object part prior to generate the most suitable segmentation mask. We demonstrate that our method allows compositional zero-shot segmentation and generalized zero-shot classification, and establishes the state of the art on both tasks.

M. F. Naeem and E. P. Örnek—Author contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: CVPR (2013)

    Google Scholar 

  2. Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: CVPR (2016)

    Google Scholar 

  3. Atzmon, Y., Kreuk, F., Shalit, U., Chechik, G.: A causal view of compositional zero-shot recognition. In: NeurIPS (2020)

    Google Scholar 

  4. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115–147 (1987)

    Article  Google Scholar 

  5. Boulch, A.: Convpoint: continuous convolutions for point cloud processing. Comput. Graph. 88, 24–34 (2020)

    Article  Google Scholar 

  6. Chen, X., Golovinskiy, A., Funkhouser, T.: A benchmark for 3D mesh segmentation. In: SIGGRAPH (2009)

    Google Scholar 

  7. Chen, Z., Wang, S., Li, J., Huang, Z.: Rethinking generative zero-shot learning: an ensemble learning perspective for recognising visual patches. In: Proceedings of the 28th ACM International Conference on Multimedia (2020)

    Google Scholar 

  8. Chen, Z., Tagliasacchi, A., Zhang, H.: Bsp-net: generating compact meshes via binary space partitioning. In: CVPR (2020)

    Google Scholar 

  9. Chen, Z., Yin, K., Fisher, M., Chaudhuri, S., Zhang, H.: Bae-net: branched autoencoder for shape co-segmentation. In: ICCV (2019)

    Google Scholar 

  10. Cheraghian, A., Rahman, S., Campbell, D., Petersson, L.: Transductive zero-shot learning for 3D point cloud classification. In: WACV (2020)

    Google Scholar 

  11. Cheraghian, A., Rahman, S., Campbell, D., Petersson, L.: Mitigating the hubness problem for zero-shot learning of 3D objects. In: BMVC (2019). https://bmvc2019.org/wp-content/uploads/papers/0233-paper.pdf

  12. Cheraghian, A., Rahman, S., Petersson, L.: Zero-shot learning of 3D point cloud objects. In: MVA (2019)

    Google Scholar 

  13. Deng, B., Genova, K., Yazdani, S., Bouaziz, S., Hinton, G., Tagliasacchi, A.: Cvxnet: learnable convex decomposition. In: CVPR (2020)

    Google Scholar 

  14. Deng, S., Xu, X., Wu, C., Chen, K., Jia, K.: 3D affordancenet: a benchmark for visual object affordance understanding. In: CVPR (2021)

    Google Scholar 

  15. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)

    Google Scholar 

  16. Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Do semantic parts emerge in convolutional neural networks? Int. J. Comput. Vis. 126, 476–494 (2018)

    Article  MathSciNet  Google Scholar 

  17. Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Objects as context for detecting their semantic parts. In: CVPR (2018)

    Google Scholar 

  18. Hinton, G.: Some demonstrations of the effects of structural descriptions in mental imagery. Cogn. Sci. 3(3), 231–250 (1979)

    Article  Google Scholar 

  19. Hoffman, D.D., Richards, W.A.: Parts of recognition. Cognition 18(1–3), 65–96 (1984)

    Article  Google Scholar 

  20. Isola, P., Lim, J.J., Adelson, E.H.: Discovering states and transformations in image collections. In: CVPR (2015)

    Google Scholar 

  21. Jaklic, A., Leonardis, A., Solina, F., Solina, F.: Segmentation and Recovery of Superquadrics, vol. 20. Springer, Heidelberg (2000). https://doi.org/10.1007/978-94-015-9456-1

    Book  MATH  Google Scholar 

  22. Johnson, J., et al.: Image retrieval using scene graphs. In: CVPR (2015)

    Google Scholar 

  23. Kawana, Y., Mukuta, Y., Harada, T.: Unsupervised pose-aware part decomposition for 3D articulated objects (2021)

    Google Scholar 

  24. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)

    Google Scholar 

  25. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI (2008)

    Google Scholar 

  26. LeCun, Y.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  27. Li, X., Liu, S., Kim, K., Wang, X., Yang, M.H., Kautz, J.: Putting humans in a scene: learning affordance in 3D indoor environments. In: CVPR (2019)

    Google Scholar 

  28. Li, Y.L., Xu, Y., Mao, X., Lu, C.: Symmetry and group in attribute-object compositions. In: CVPR (2020)

    Google Scholar 

  29. Luo, T., et al.: Learning to group: a bottom-up framework for 3D part discovery in unseen categories. ICLR (2020)

    Google Scholar 

  30. Mancini, M., Naeem, M.F., Xian, Y., Akata, Z.: Learning graph embeddings for open world compositional zero-shot learning. In: arXiv (2021)

    Google Scholar 

  31. Mancini, M., Naeem, M.F., Xian, Y., Akata, Z.: Open world compositional zero-shot learning. In: CVPR (2021)

    Google Scholar 

  32. Michele, B., Boulch, A., Puy, G., Bucher, M., Marlet, R.: Generative zero-shot learning for semantic segmentation of 3D point cloud. CoRR abs/2108.06230 (2021). https://arxiv.org/abs/2108.06230

  33. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NeurIPS (2013)

    Google Scholar 

  34. Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: CVPR (2017)

    Google Scholar 

  35. Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)

    Google Scholar 

  36. Naeem, M.F., Xian, Y., Tombari, F., Akata, Z.: Learning graph embeddings for compositional zero-shot learning. In: CVPR (2021)

    Google Scholar 

  37. Nagarajan, T., Grauman, K.: Attributes as operators: factorizing unseen attribute-object compositions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 172–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_11

    Chapter  Google Scholar 

  38. Newcombe, R.A., et al.: Kinectfusion: real-time dense surface mapping and tracking. In: ISMAR (2011)

    Google Scholar 

  39. Paschalidou, D., Gool, L.V., Geiger, A.: Learning unsupervised hierarchical part decomposition of 3D objects from a single RGB image. In: CVPR (2020)

    Google Scholar 

  40. Paschalidou, D., Ulusoy, A.O., Geiger, A.: Superquadrics revisited: learning 3D shape parsing beyond cuboids. In: CVPR (2019)

    Google Scholar 

  41. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)

    Google Scholar 

  42. Purushwalkam, S., Nickel, M., Gupta, A., Ranzato, M.: Task-driven modular networks for zero-shot compositional learning. In: ICCV (2019)

    Google Scholar 

  43. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  44. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google Scholar 

  45. Qi, W., Mullapudi, R.T., Gupta, S., Ramanan, D.: Learning to move with affordance maps. In: ICLR (2020)

    Google Scholar 

  46. Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR (2016)

    Google Scholar 

  47. Rolls, E.T., Treves, A.: Neural networks in the brain involved in memory and recall. In: Van Pelt, J., Corner, M., Uylings, H., Lopes Da Silva, F. (eds.) The Self-Organizing Brain: From Growth Cones to Functional Networks, Progress in Brain Research, vol. 102, pp. 335–341. Elsevier, Amsterdam (1994)

    Google Scholar 

  48. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: ICLR (2017)

    Google Scholar 

  49. Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: NeurIPS (2013)

    Google Scholar 

  50. Tulsiani, S., Su, H., Guibas, L.J., Efros, A.A., Malik, J.: Learning shape abstractions by assembling volumetric primitives. In: CVPR (2017)

    Google Scholar 

  51. Wald, J., Avetisyan, A., Navab, N., Tombari, F., Niessner, M.: Rio: 3D object instance re-localization in changing indoor environments. In: ICCV (2019)

    Google Scholar 

  52. Wang, X., Sun, X., Cao, X., Xu, K., Zhou, B.: Learning fine-grained segmentation of 3D shapes without part labels. In: CVPR (2021)

    Google Scholar 

  53. Wang, X., Gupta, A.: Videos as space-time region graphs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 413–431. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_25

    Chapter  Google Scholar 

  54. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 38, 1–12 (2019)

    Google Scholar 

  55. Xian, Y., Choudhury, S., He, Y., Schiele, B., Akata, Z.: Semantic projection network for zero-and few-label semantic segmentation. In: CVPR (2019)

    Google Scholar 

  56. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE TPAMI 41(9), 2251–2265 (2019). https://doi.org/10.1109/TPAMI.2018.2857768

    Article  Google Scholar 

  57. Xu, M., Zhang, J., Zhou, Z., Xu, M., Qi, X., Qiao, Y.: Learning geometry-disentangled representation for complementary understanding of 3D object point cloud. In: AAAI (2021)

    Google Scholar 

  58. Yang, M., Deng, C., Yan, J., Liu, X., Tao, D.: Learning unseen concepts via hierarchical decomposition and composition. In: CVPR (2020)

    Google Scholar 

  59. Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. In: SIGGRAPH Asia (2016)

    Google Scholar 

  60. Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)

    Google Scholar 

  61. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  62. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: CVPR (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ferjad Naeem .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4748 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Naeem, M.F., Örnek, E.P., Xian, Y., Van Gool, L., Tombari, F. (2022). 3D Compositional Zero-Shot Learning with DeCompositional Consensus. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19815-1_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19814-4

  • Online ISBN: 978-3-031-19815-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics