Abstract
Reconstructing a 3D face model with high-quality geometry and texture from a single face image is ill-conditioned and challenging. On the one hand, many methods heavily rely on a large amount of training data, which is not easy to obtain. On the other hand, position local features of a face surface can not reflect the global information of an entire face. Due to these challenges, existing methods can hardly reconstruct detailed geometry and realistic textures. To address these issues, we propose a multi-modal feature guided 3D face reconstruction method, named MMFG, which does not require any training data and can generate detailed geometry from a single image. Specifically, we represent the reconstructed 3D face as a signed distance field, and propose to combine the position local feature and multi-modal global features to reconstruct a detailed 3D face. To obtain region-aware information, a Swin Transformer is used as our global feature extractor to extract multi-modal global feature from the rendered multi-view RGB images and depth images. Furthermore, considering the different effects of RGB and depth information on albedo and shading, we use the global features from different modal to guide the recovery of BRDF component respectively during differentiable rendering. Experimental results demonstrate that the proposed method can generate more detailed 3D faces, achieving state-of-the-art results on texture reconstruction and competitive results on shape reconstruction on the NoW dataset.
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant (No. 61976173), Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 22JSY011) and the MoE-CMCC Artificial Intelligence Project (No. MCM20190701).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vetter, T., Blanz, V.: Estimating coloured 3d face models from single images: an example based approach. ECCV 1407, 499–513 (1998)
Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen, Y.C., Li, H.: Avatar digitization from a single image for real-time rendering. ToG 36(6), 1–14 (2017)
Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: ICCV, pp. 1576–1585 (2017)
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In: CVPR Workshops (2019)
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3d face model from in-the-wild images. ToG 40(4), 1–13 (2021)
Zielonka, W., Bolkart, T., Thies, J.: Towards metrical reconstruction of human faces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022, LNCS, vol. 13673, pp. 250–269. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_15
Ren, X., Lattas, A., Gecer, B., Deng, J., Ma, C., Yang, X.: Facial geometric detail recovery via implicit representation. In: FG, pp. 1–8. IEEE (2023)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)
Park, J.J., Florence, P., Straub, J., Newcombe, R.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Parke, F.I.: Measuring three-dimensional surfaces with a two-dimensional data tablet. Comput. Graph. 1(1), 5–7 (1975)
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3d face reconstruction from a single image via direct volumetric CNN regression. In: ICCV, pp. 1031–1039 (2017)
Feng, Y., Wu, F., Shao, X., Wang, Y.: Joint 3d face reconstruction and dense alignment with position map regression network. In: ECCV, pp. 534–551 (2018)
Zeng, X., Peng, X., Qiao, Y.: Df2net: a dense-fine-finer network for detailed 3d face reconstruction. In: ICCV, pp. 2315–2324 (2019)
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S.: Occupancy networks: learning 3d reconstruction in function space. In: CVPR, pp. 4460–4470 (2019)
Genova, K., Cole, F., Sud, A., Sarna, A., Funkhouser, T.: Local deep implicit functions for 3d shape. In: CVPR, pp. 4857–4866 (2020)
Ibing, M., Lim, I., Kobbelt, L.: 3d shape generation with grid-based implicit functions. In: CVPR, pp. 13559–13568 (2021)
Takikawa, T., et al.: Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In: CVPR, pp. 11358–11367 (2021)
Yenamandra, T., et al.: i3dmm: Deep implicit 3d morphable model of human heads. In: CVPR, pp. 12803–12813 (2021)
Zheng, M., Yang, H., Huang, D., Chen, L.: ImFace: a nonlinear 3d morphable face model with implicit neural representations. In: CVPR, pp. 20343–20352 (2022)
Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFIT: generative adversarial network fitting for high fidelity 3d face reconstruction. In: CVPR, pp. 1155–1164 (2019)
Gecer, B., Deng, J., Zafeiriou, S.: OSTeC: one-shot texture completion. In: CVPR, pp. 7628–7638 (2021)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. TPAMI 44(3), 1623–1637 (2020)
Jiang, Y., Ji, D., Han, Z., Zwicker, M.: SDFDiff: differentiable rendering of signed distance fields for 3d shape optimization. In: CVPR, pp. 1251–1261 (2020)
Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV, pp. 14314–14323 (2021)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: CVPR, pp. 5549–5558 (2020)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR, pp. 4401–4410 (2019)
Sanyal, S., Bolkart, T., Feng, H., Black, M.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: CVPR, pp. 7763–7772 (2019)
Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., Li, S.Z.: Towards fast, accurate and stable 3d dense face alignment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 152–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_10
Yang, H., et al.: FacEscape: a large-scale high quality 3d face dataset and detailed riggable 3d face prediction. In: CVPR, pp. 601–610 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Yu, C., Li, H. (2024). Multi-modal Feature Guided Detailed 3D Face Reconstruction from a Single Image. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_29
Download citation
DOI: https://doi.org/10.1007/978-981-99-8432-9_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)