Skip to main content

Planes vs. Chairs: Category-Guided 3D Shape Learning Without any 3D Cues

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

Abstract

We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. Our approach uses a set of single-view images of multiple object categories without viewpoint annotation, forcing the model to learn across multiple object categories without 3D supervision. To facilitate learning with such minimal supervision, we use category labels to guide shape learning with a novel categorical metric learning approach. We also utilize adversarial and viewpoint regularization techniques to further disentangle the effects of viewpoint and shape. We obtain the first results for large-scale (more than 50 categories) single-viewpoint shape prediction using a single model. We are also the first to examine and quantify the benefit of class information in single-view supervised 3D shape reconstruction. Our method achieves superior performance over state-of-the-art methods on ShapeNet-13, ShapeNet-55 and Pascal3D+.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    When only a single view of each object instance is available for training, there are an infinite number of possible 3D shapes that could explain the image under appropriate camera viewpoints.

  2. 2.

    Masks are leveraged with additional losses that enforce consistency between the shape and the input mask. For conciseness, we omit them here and refer readers to [25] for more details.

  3. 3.

    Their method has an optional per-sample test-time optimization to further refine the voxels and convert into meshes. Using the authors’ implementation of this optimization did not lead to improved results in our experiments, so we use the voxel prediction as the output for evaluation.

References

  1. Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_13

    Chapter  Google Scholar 

  2. Chang, A.X., et al.: Shapenet: sn information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  4. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  6. Gadelha, M., Maji, S., Wang, R.: 3D shape induction from 2D views of multiple objects. In: 2017 International Conference on 3D Vision (3DV), pp. 402–411. IEEE (2017)

    Google Scholar 

  7. Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  8. Goel, S., Kanazawa, A., Malik, J.: Shape and viewpoint without keypoints. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 88–104. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_6

    Chapter  Google Scholar 

  9. Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)

  10. Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: A Papier-Mâché approach to learning 3D surface generation. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  11. Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)

  12. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  13. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  14. Henderson, P., Ferrari, V.: Learning single-image 3d reconstruction by generative modelling of shape, pose and shading. Int. J. Comput. Vis. 128(4), 835–854 (2019). https://doi.org/10.1007/s11263-019-01219-8

    Article  MATH  Google Scholar 

  15. Henzler, P., Mitra, N.J., Ritschel, T.: Escaping plato’s cave: 3D shape from adversarial rendering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9984–9993 (2019)

    Google Scholar 

  16. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)

  17. Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. arXiv preprint arXiv:1810.09381 (2018)

  18. Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23

    Chapter  Google Scholar 

  19. Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. arXiv preprint arXiv:1708.05375 (2017)

  20. Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9778–9787 (2019)

    Google Scholar 

  21. Kato, H., Ushiku, Y., Harada, T.: Neural 3d mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)

    Google Scholar 

  22. Khosla, P., et al.: Supervised contrastive learning. arXiv preprint arXiv:2004.11362 (2020)

  23. Li, X., et al.: Self-supervised single-view 3D reconstruction via semantic consistency. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 677–693. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_40

    Chapter  Google Scholar 

  24. Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  25. Lin, C.H., Wang, C., Lucey, S.: Sdf-srn: Learning signed distance 3D object reconstruction from static images. arXiv preprint arXiv:2010.10505 (2020)

  26. Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7708–7717 (2019)

    Google Scholar 

  27. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 212–220 (2017)

    Google Scholar 

  28. Lorensen, W.E., Cline, H.E.: Marching cubes: A high resolution 3d surface construction algorithm. ACM siggraph computer graphics 21(4), 163–169 (1987)

    Article  Google Scholar 

  29. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: International conference on machine learning. pp. 3481–3490. PMLR (2018)

    Google Scholar 

  30. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: Learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4460–4470 (2019)

    Google Scholar 

  31. Michalkiewicz, M., Parisot, S., Tsogkas, S., Baktashmotlagh, M., Eriksson, A., Belilovsky, E.: Few-shot single-view 3-D object reconstruction with compositional priors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 614–630. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_37

    Chapter  Google Scholar 

  32. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  33. Miyato, T., Koyama, M.: CGANs with projection discriminator. arXiv preprint arXiv:1802.05637 (2018)

  34. Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368 (2017)

    Google Scholar 

  35. Mustikovela, S.K., et al.: Self-supervised viewpoint learning from image collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3971–3981 (2020)

    Google Scholar 

  36. Navaneet, K., Mathew, A., Kashyap, S., Hung, W.C., Jampani, V., Babu, R.V.: From image collections to point clouds with self-supervised shape and pose networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1132–1140 (2020)

    Google Scholar 

  37. Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3d representations without 3D supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3504–3515 (2020)

    Google Scholar 

  38. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  39. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pxel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)

    Google Scholar 

  40. Simoni, A., Pini, S., Vezzani, R., Cucchiara, R.: Multi-category mesh reconstruction from image collections. In: 2021 International Conference on 3D Vision (3DV), pp. 1321–1330. IEEE (2021)

    Google Scholar 

  41. Sun, X., et al.: Pix3d: dataset and methods for single-image 3d shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  42. Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3405–3414 (2019)

    Google Scholar 

  43. Thai, A., Stojanov, S., Upadhya, V., Rehg, J.M.: 3D reconstruction of novel object shapes from single images. arXiv preprint arXiv:2006.07752 (2020)

  44. Tulsiani, S., Kulkarni, N., Gupta, A.: Implicit mesh reconstruction from unannotated image collections. arXiv preprint arXiv:2007.08504 (2020)

  45. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2634 (2017)

    Google Scholar 

  46. Wallace, B., Hariharan, B.: Few-shot generalization for single-image 3D reconstruction via priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3818–3827 (2019)

    Google Scholar 

  47. Wang, F., Xiang, X., Cheng, J., Yuille, A.L.: Normface: L2 hypersphere embedding for face verification. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1041–1049 (2017)

    Google Scholar 

  48. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10(2), 207–244 (2009)

    MATH  Google Scholar 

  49. Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2020)

    Google Scholar 

  50. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)

    Google Scholar 

  51. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. arXiv preprint arXiv:1905.10711 (2019)

  52. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. arXiv preprint arXiv:1612.00814 (2016)

  53. Yang, G., Cui, Y., Belongie, S., Hariharan, B.: Learning single-view 3D reconstruction with limited pose supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_6

    Chapter  Google Scholar 

  54. Yariv, L., et al.: Multiview neural surface reconstruction by disentangling geometry and appearance. arXiv preprint arXiv:2003.09852 (2020)

  55. Ye, Y., Tulsiani, S., Gupta, A.: Shelf-supervised mesh prediction in the wild. arXiv preprint arXiv:2102.06195 (2021)

  56. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)

  57. Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J.B., Freeman, W.T., Wu, J.: Learning to reconstruct shapes from unseen classes. arXiv preprint arXiv:1812.11166 (2018)

  58. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgments

We thank Miao Liu for helpful comments and discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zixuan Huang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5485 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Z., Stojanov, S., Thai, A., Jampani, V., Rehg, J.M. (2022). Planes vs. Chairs: Category-Guided 3D Shape Learning Without any 3D Cues. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics