Skip to main content

Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

The performance of existing single-view 3D reconstruction methods heavily relies on large-scale 3D annotations. However, such annotations are tedious and expensive to collect. Semi-supervised learning serves as an alternative way to mitigate the need for manual labels, but remains unexplored in 3D reconstruction. Inspired by the recent success of semi-supervised image classification tasks, we propose SSP3D, a semi-supervised framework for 3D reconstruction. In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction. We further introduce a discriminator-guided module to incentivize better shape generation, as well as a regularizer to tolerate noisy training samples. On the ShapeNet benchmark, the proposed approach outperforms previous supervised methods by clear margins under various labeling ratios, (i.e., 1%, 5% , 10% and 20%). Moreover, our approach also performs well when transferring to real-world Pix3D datasets under labeling ratios of 10%. We also demonstrate our method could transfer to novel categories with few novel supervised data. Experiments on the popular ShapeNet dataset show that our method outperforms the zero-shot baseline by over 12% and we also perform rigorous ablations and analysis to validate our approach. Code is available at https://github.com/ChenHsing/SSP3D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Please refer to Appendix for more details.

References

  1. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)

    Google Scholar 

  2. Brier, G.W., et al.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78, 1–3 (1950)

    Article  Google Scholar 

  3. Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Rob. 32, 1309–1332 (2016)

    Article  Google Scholar 

  4. Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  5. Cheng, T.Y., Yang, H.R., Trigoni, N., Chen, H.T., Liu, T.L.: Pose adaptive dual mixup for few-shot single-view 3D reconstruction. In: AAAI (2022)

    Google Scholar 

  6. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  7. Ge, C., Liang, Y., Song, Y., Jiao, J., Wang, J., Luo, P.: Revitalizing CNN attention via transformers in self-supervised visual representation learning. In: NeurIPS (2021)

    Google Scholar 

  8. Gkioxari, G., Malik, J., Johnson, J.: Mesh r-cnn. In: ICCV (2019)

    Google Scholar 

  9. Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS (2020)

    Google Scholar 

  10. Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: AAAI (2019)

    Google Scholar 

  11. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  13. Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: Augmix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)

    Google Scholar 

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  16. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: ICLR (2017)

    Google Scholar 

  17. Laradji, I., Rodríguez, P., Vazquez, D., Nowrouzezahrai, D.: SSR: semi-supervised soft rasterizer for single-view 2D to 3D reconstruction. In: ICCVW (2021)

    Google Scholar 

  18. Li, H., Wu, Z., Shrivastava, A., Davis, L.S.: Rethinking pseudo labels for semi-supervised object detection. In: AAAI (2022)

    Google Scholar 

  19. Liu, Y.C., et al.: Unbiased teacher for semi-supervised object detection. In: ICLR (2021)

    Google Scholar 

  20. Michalkiewicz, M., Parisot, S., Tsogkas, S., Baktashmotlagh, M., Eriksson, A., Belilovsky, E.: Few-shot single-view 3-D object reconstruction with compositional priors. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 614–630. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_37

    Chapter  Google Scholar 

  21. Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: ECCV (2018)

    Google Scholar 

  22. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: CVPR (2018)

    Google Scholar 

  23. Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: NeurIPS (2016)

    Google Scholar 

  24. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  25. Shi, Z., Meng, Z., Xing, Y., Ma, Y., Wattenhofer, R.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. In: BMVC (2021)

    Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  27. Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. In: NeurIPS (2020)

    Google Scholar 

  28. Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: CVPR (2018)

    Google Scholar 

  29. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS (2017)

    Google Scholar 

  30. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: ICCV (2017)

    Google Scholar 

  31. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  32. Wallace, B., Hariharan, B.: Few-shot generalization for single-image 3D reconstruction via priors. In: ICCV (2019)

    Google Scholar 

  33. Wang, D., et al.: Multi-view 3D reconstruction with transformers. In: ICCV (2021)

    Google Scholar 

  34. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: ECCV (2018)

    Google Scholar 

  35. Weng, Z., Yang, X., Li, A., Wu, Z., Jiang, Y.G.: Semi-supervised vision transformers. In: ECCV (2022)

    Google Scholar 

  36. Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., Tenenbaum, J.: Marrnet: 3D shape reconstruction via 2.5D sketches. In: NeurIPS (2017)

    Google Scholar 

  37. Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: NeurIPS (2016)

    Google Scholar 

  38. Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W.T., Tenenbaum, J.B.: Learning shape priors for single-view 3D completion and reconstruction. In: ECCV (2018)

    Google Scholar 

  39. Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2vox: context-aware 3D reconstruction from single and multi-view images. In: ICCV (2019)

    Google Scholar 

  40. Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. IJCV 128, 2919–2935 (2020)

    Article  Google Scholar 

  41. Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019)

  42. Yang, G., Cui, Y., Belongie, S., Hariharan, B.: Learning single-view 3D reconstruction with limited pose supervision. In: ECCV (2018)

    Google Scholar 

  43. Yang, S., Xu, M., Xie, H., Perry, S., Xia, J.: Single-view 3D object reconstruction from shape priors in memory. In: CVPR (2021)

    Google Scholar 

  44. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: ICCV (2019)

    Google Scholar 

  45. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: ICLR (2018)

    Google Scholar 

  46. Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J., Freeman, B., Wu, J.: Learning to reconstruct shapes from unseen classes. In: NeurIPS (2018)

    Google Scholar 

Download references

Acknowledgement

Y.-G. Jiang was sponsored in part by “Shuguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission (No. 20SG01). Z. Wu was supported by NSFC under Grant No. 62102092.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuxuan Wu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 664 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xing, Z., Li, H., Wu, Z., Jiang, YG. (2022). Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics