Self-supervised reflectance-guided 3d shape reconstruction from single-view images

Fang, Binbin; Xiao, Nanfeng

doi:10.1007/s10489-022-03724-9

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

Published: 13 July 2022

Volume 53, pages 6966–6977, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

398 Accesses
1 Citation
Explore all metrics

Abstract

3D shape reconstruction from a single-view image is an utterly ill-posed and challenging problem, while multi-view methods can reconstruct an object’s shape only from raw images. However, these raw images should be shot in a static scene, to promise that corresponding features in the images can be mapped to the same spatial location. Recent single-view methods need only single-view images of static or dynamic objects, by turning to prior knowledge to mine the latent multi-view information in single-view images. Some of them utilize prior models (e.g. rendering-based or style-transfer-based) to generate novel-view images, which are however not sufficiently accurate, to feed their model. In this paper, we represent Augmented Self-Supervised 3D Reconstruction with Monotonous Material (ASRMM) approach, trained end-to-end in a self-supervised manner, to obtain the 3D reconstruction of a category-specific object, without any relevant prior models for novel-view images. Our approach draws inspiration from the experience that (1) high quality multi-view images are difficult to obtain, and (2) the shape of an object of single material can be visually inferred more easily, rather than of multiple kinds of complex material. As to practice these motivations, ASRMM makes material monotonous in its diffuse part by setting reflectance an identical value, and apply this idea on the source and reconstruction images. Experiments show that our model can reasonably reconstruct the 3D model of faces, cats, cars and birds from their collections of single-view images, and the experiments also show that our approach can be generalized to different reconstruction tasks, including unsupervised depth-based reconstruction and 2D supervised mesh reconstruction, and achieve promising improvement in the quality of the reconstructed shape and the texture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Open-Vocabulary Text-Driven Human Image Generation

Article 15 May 2024

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

Article 09 May 2024

References

Hu T, Wang L, Xu X, Liu S, Jia J (2021) Self-supervised 3d mesh reconstruction from single images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6002–6011
Kanazawa A, Tulsiani S, Efros AA, Malik J (2018) Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European conference on computer vision (ECCV), pp 371–386
Pan X, Dai B, Liu Z, Loy CC, Luo P (2020) Do 2d gans know 3d shape? unsupervised 3d shape? Reconstruction from 2d image gans. arXiv:2011.00844
Deng Y, Yang J, Xu S, Chen D, Jia Y, Tong X (2019) Accurate 3d face reconstruction with weakly-supervised learning: from single image to image set. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Ullman S (1979) The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences 203(1153):405–426
Google Scholar
Schonberger JL, Frahm J. -M. (2016) Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4104–4113
Wu S, Makadia A, Wu J, Snavely N, Tucker R, Kanazawa A (2021) De-rendering the world’s revolutionary artefacts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6338–6347
Wu S, Rupprecht C, Vedaldi A (2020) Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001 California Institute of Technology
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G (2018) Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 52–67
Gkioxari G, Malik J, Johnson J (2019) Mesh r-cnn. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9785–9795
Saito S, Huang Z, Natsume R, Morishima S, Kanazawa A, Li H (2019) Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2304–2314
Genova K, Cole F, Vlasic D, Sarna A, Freeman WT, Funkhouser T (2019) Learning shape templates with structured implicit functions. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7154–7164
Tewari A, Zollhofer M, Kim H, Garrido P, Bernard F, Perez P, Theobalt C (2017) Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1274–1283
Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) Ganfit: generative adversarial network fitting for high fidelity 3d face reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1155–1164
Kato H, Harada T (2019) Learning view priors for single-view 3d reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9778–9787
Blinn JF (1977) Models of light reflection for computer synthesized pictures. In: Proceedings of the 4th annual conference on computer graphics and interactive techniques, pp 192–198
Dib A, Bharaj G, Ahn J, Thébault C, Gosselin P, Romeo M, Chevallier L (2021) Practical face reconstruction via differentiable ray tracing. In: Computer Graphics Forum. Wiley, vol 40, pp 153–164
Cook RL, Torrance KE (1982) A reflectance model for computer graphics. ACM Transactions on Graphics (ToG) 1(1):7–24
Article Google Scholar
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119
Kowalski M, Garbin SJ, Estellers V, Baltrušaitis T, Johnson M, Shotton J (2020) Config: controllable neural face image generation. In: Computer Vision–ECCV 2020: 16th European conference, glasgow, UK, 23–28 August 2020, proceedings, Part XI 16. Springer, pp 299–315
Zhang Y, Chen W, Ling H, Gao J, Zhang Y, Torralba A, Fidler S (2020) Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering, arXiv:2010.09125
Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in gans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1532–1540
Shi Y, Aggarwal D, Jain AK (2021) Lifting 2d stylegan for 3d-aware face generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6258–6266
Kato H, Ushiku Y, Harada T (2018) Neural 3d mesh renderer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916
Lambert J (1760) Photometria sive de mensura et gradibus luminis colorum et umbrae augsburg Detleffsen for the widow of Eberhard Klett
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Chen W, Ling H, Gao J, Smith E, Lehtinen J, Jacobson A, Fidler S (2019) Learning to predict 3d objects with an interpolation-based differentiable renderer. Adv Neural Inf Process Syst 32:9609–9619
Google Scholar
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Parkhi OM, Vedaldi A, Zisserman A, Jawahar C (2012) Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3498–3505
Zhang W, Sun J, Tang X (2008) Cat head detection-how to effectively exploit shape and texture features. In: European conference on computer vision. Springer, pp 802–816
Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T (2009) A 3d face model for pose and illumination invariant face recognition. In: 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, pp 296– 301
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H et al (2015) Shapenet: an information-rich 3d model repository. arXiv:1512.03012
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst, vol 30
Grill J-B, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar MG et al (2020) Bootstrap your own latent: a new approach to self-supervised learning. arXiv:2006.07733
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607

Download references

Acknowledgements

This study is funded by the Basic and Applied Basic Research of Guangdong Province under grand [No. 2015A0308018], the authors express their thanks to the grant.

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Binbin Fang & Nanfeng Xiao

Authors

Binbin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Nanfeng Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binbin Fang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, B., Xiao, N. Self-supervised reflectance-guided 3d shape reconstruction from single-view images. Appl Intell 53, 6966–6977 (2023). https://doi.org/10.1007/s10489-022-03724-9

Download citation

Accepted: 05 May 2022
Published: 13 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10489-022-03724-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Open-Vocabulary Text-Driven Human Image Generation

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-supervised reflectance-guided 3d shape reconstruction from single-view images

Abstract

Access this article

Similar content being viewed by others

Deep learning-based 3D reconstruction: a survey

Open-Vocabulary Text-Driven Human Image Generation

BTD-RF: 3D scene reconstruction using block-term tensor decomposition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation