Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image

Wang, Jingting; Yu, Cuican; Li, Huibin

doi:10.1007/978-981-99-8432-9_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

543 Accesses

Abstract

Reconstructing a 3D face model with high-quality geometry and texture from a single face image is ill-conditioned and challenging. On the one hand, many methods heavily rely on a large amount of training data, which is not easy to obtain. On the other hand, position local features of a face surface can not reflect the global information of an entire face. Due to these challenges, existing methods can hardly reconstruct detailed geometry and realistic textures. To address these issues, we propose a multi-modal feature guided 3D face reconstruction method, named MMFG, which does not require any training data and can generate detailed geometry from a single image. Specifically, we represent the reconstructed 3D face as a signed distance field, and propose to combine the position local feature and multi-modal global features to reconstruct a detailed 3D face. To obtain region-aware information, a Swin Transformer is used as our global feature extractor to extract multi-modal global feature from the rendered multi-view RGB images and depth images. Furthermore, considering the different effects of RGB and depth information on albedo and shading, we use the global features from different modal to guide the recovery of BRDF component respectively during differentiable rendering. Experimental results demonstrate that the proposed method can generate more detailed 3D faces, achieving state-of-the-art results on texture reconstruction and competitive results on shape reconstruction on the NoW dataset.

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant (No. 61976173), Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 22JSY011) and the MoE-CMCC Artificial Intelligence Project (No. MCM20190701).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vetter, T., Blanz, V.: Estimating coloured 3d face models from single images: an example based approach. ECCV 1407, 499–513 (1998)
Google Scholar
Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen, Y.C., Li, H.: Avatar digitization from a single image for real-time rendering. ToG 36(6), 1–14 (2017)
Article Google Scholar
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: ICCV, pp. 1576–1585 (2017)
Google Scholar
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: CVPR Workshops (2019)
Google Scholar
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3d face model from in-the-wild images. ToG 40(4), 1–13 (2021)
Article Google Scholar
Zielonka, W., Bolkart, T., Thies, J.: Towards metrical reconstruction of human faces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022, LNCS, vol. 13673, pp. 250–269. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_15
Ren, X., Lattas, A., Gecer, B., Deng, J., Ma, C., Yang, X.: Facial geometric detail recovery via implicit representation. In: FG, pp. 1–8. IEEE (2023)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Google Scholar
Parke, F.I.: Measuring three-dimensional surfaces with a two-dimensional data tablet. Comput. Graph. 1(1), 5–7 (1975)
Article Google Scholar
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3d face reconstruction from a single image via direct volumetric CNN regression. In: ICCV, pp. 1031–1039 (2017)
Google Scholar
Feng, Y., Wu, F., Shao, X., Wang, Y.: Joint 3d face reconstruction and dense alignment with position map regression network. In: ECCV, pp. 534–551 (2018)
Google Scholar
Zeng, X., Peng, X., Qiao, Y.: Df2net: a dense-fine-finer network for detailed 3d face reconstruction. In: ICCV, pp. 2315–2324 (2019)
Google Scholar
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S.: Occupancy networks: learning 3d reconstruction in function space. In: CVPR, pp. 4460–4470 (2019)
Google Scholar
Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3d shape. In: CVPR, pp. 4857–4866 (2020)
Google Scholar
Ibing, M., Lim, I., Kobbelt, L.: 3d shape generation with grid-based implicit functions. In: CVPR, pp. 13559–13568 (2021)
Google Scholar
Takikawa, T., et al.: Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In: CVPR, pp. 11358–11367 (2021)
Google Scholar
Yenamandra, T., et al.: i3dmm: Deep implicit 3d morphable model of human heads. In: CVPR, pp. 12803–12813 (2021)
Google Scholar
Zheng, M., Yang, H., Huang, D., Chen, L.: ImFace: a nonlinear 3d morphable face model with implicit neural representations. In: CVPR, pp. 20343–20352 (2022)
Google Scholar
Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFIT: generative adversarial network fitting for high fidelity 3d face reconstruction. In: CVPR, pp. 1155–1164 (2019)
Google Scholar
Gecer, B., Deng, J., Zafeiriou, S.: OSTeC: one-shot texture completion. In: CVPR, pp. 7628–7638 (2021)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. TPAMI 44(3), 1623–1637 (2020)
Article Google Scholar
Jiang, Y., Ji, D., Han, Z., Zwicker, M.: SDFDiff: differentiable rendering of signed distance fields for 3d shape optimization. In: CVPR, pp. 1251–1261 (2020)
Google Scholar
Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV, pp. 14314–14323 (2021)
Google Scholar
Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: CVPR, pp. 5549–5558 (2020)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Google Scholar
Sanyal, S., Bolkart, T., Feng, H., Black, M.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: CVPR, pp. 7763–7772 (2019)
Google Scholar
Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 152–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_10
Chapter Google Scholar
Yang, H., et al.: FacEscape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In: CVPR, pp. 601–610 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China
Jingting Wang, Cuican Yu & Huibin Li

Authors

Jingting Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cuican Yu
View author publications
You can also search for this author in PubMed Google Scholar
Huibin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huibin Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Yu, C., Li, H. (2024). Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_29

Download citation

DOI: https://doi.org/10.1007/978-981-99-8432-9_29
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image