Abstract
We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN). First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters. With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images. Our network is further augmented with a self-supervised learning scheme, which exploits face geometry information embedded in multiple frames of the same person, to alleviate the ill-posed nature of regressing 3D face geometry from a single image. These three insights yield a single approach that combines the complementary strengths of parametric model learning and data-driven learning techniques. We conduct a rigorous evaluation on the challenging AFLW2000-3D, Florence and FaceWarehouse datasets, and show that our method outperforms the state-of-the-art for all metrics.
P. Liu—Work mainly done during an internship at Huya AI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. In: TPAMI (2003)
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: AVSS (2009)
Nagano, K., et al.: pagan: real-time avatars using dynamic textures. In: SIGGRAPH Asia (2018)
Hu, L., et al.: Avatar digitization from a single image for real-time rendering. TOG 36(6), 1–14 (2017)
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)
Kim, H., et al.: Deep video portraits. TOG 37(4), 1–14 (2018)
Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)
Saito, S., Li, T., Li, H.: Real-time facial segmentation and performance capture from RGB input. In: ECCV (2016)
Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. TOG 33(4), 1–10 (2014)
Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. TOG 32(4), 1–10 (2013)
Sanyal, S., Bolkart, T., Feng, H., Black, M.J.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: CVPR (2019)
Yi, H., et al.: Mmface: a multi-metric regression network for unconstrained face reconstruction. In: CVPR (2019)
Chang, F.J., Tran, A.T., Hassner, T., Masi, I., Nevatia, R., Medioni, G.: Expnet: Landmark-free, deep, 3D facial expressions. In: FG (2018)
Wu, F., et al.: MVF-net: multi-view 3D face morphable model regression. In: CVPR (2019)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR (2016)
Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: ECCV (2018)
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: ICCV (2017)
Bagdanov, A.D., Del Bimbo, A., Masi, I.: The florence 2D/3D hybrid face dataset. In: Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding (2011)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2013)
Liu, Y., Jourabloo, A., Ren, W., Liu, X.: Dense face alignment. In: ICCV (2017)
Gou, C., Wu, Y., Wang, F.Y., Ji, Q.: Shape augmented regression for 3D face alignment. In: ECCV (2016)
Romdhani, S., Vetter, T.: Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In: CVPR (2005)
Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: CVPR (2017)
Tuan Tran, A., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: CVPR (2017)
Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: ICCV (2017)
Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV (2017)
Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: CVPR (2018)
Chang, F.-J., Tran, A.T., Hassner, T., Masi, I., Nevatia, R., Medioni, G.: Deep, landmark-free fame: face alignment, modeling, and expression estimation. Int. J. Comput. Vis. 127(6), 930–956 (2019). https://doi.org/10.1007/s11263-019-01151-x
Piotraschke, M., Blanz, V.: Automated 3D face reconstruction from multiple images using quality measures. In: CVPR (2016)
Tewari, A., et al.: FML: face model learning from videos. In: CVPR (2019)
Hafner, D., Demetz, O., Weickert, J.: Why is the census transform good for robust optic flow computation? In: SSVM (2013)
Liu, P., King, I., Lyu, M.R., Xu, J.: Ddflow: learning optical flow with unlabeled data distillation. In: AAAI (2019)
Xiong, X., De la Torre, F.: Global supervised descent method. In: CVPR (2015)
Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: ICCV (2017)
Bhagavatula, C., Zhu, C., Luu, K., Savvides, M.: Faster than real-time facial alignment: a 3D spatial transformer network approach in unconstrained poses. In: ICCV (2017)
Zollhöfer, M., et al.: State of the art on monocular 3D face reconstruction, tracking, and applications. In: Computer Graphics Forum, Wiley Online Library (2018)
Deng, J., et al.: The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking. Int. J. Comput. Vis. 127(6), 599–624 (2018). https://doi.org/10.1007/s11263-018-1134-y
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: ICCVW (2015)
Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: ICCVW (2011)
Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 HZ. In: CVPR (2018)
Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: CVPR (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Kim, H., Zollhöfer, M., Tewari, A., Thies, J., Richardt, C., Theobalt, C.: Inversefacenet: deep monocular inverse face rendering. In: CVPR (2018)
Acknowledgement
This work was partially supported by the RRC of the Hong Kong Special Administrative Region (No. CUHK 14210717 of the General Research Fund) and National Key Research and Development Program of China (No. 2018AAA0100204). We also thank Yao Feng, Feng Liu and Ayush Tewari for kind help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, P., Han, X., Lyu, M., King, I., Xu, J. (2021). Learning 3D Face Reconstruction with a Pose Guidance Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12626. Springer, Cham. https://doi.org/10.1007/978-3-030-69541-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-69541-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69540-8
Online ISBN: 978-3-030-69541-5
eBook Packages: Computer ScienceComputer Science (R0)