Skip to main content

Learning 3D Face Reconstruction with a Pose Guidance Network

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12626))

Included in the following conference series:

  • 860 Accesses

Abstract

We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN). First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters. With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images. Our network is further augmented with a self-supervised learning scheme, which exploits face geometry information embedded in multiple frames of the same person, to alleviate the ill-posed nature of regressing 3D face geometry from a single image. These three insights yield a single approach that combines the complementary strengths of parametric model learning and data-driven learning techniques. We conduct a rigorous evaluation on the challenging AFLW2000-3D, Florence and FaceWarehouse datasets, and show that our method outperforms the state-of-the-art for all metrics.

P. Liu—Work mainly done during an internship at Huya AI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. In: TPAMI (2003)

    Google Scholar 

  2. Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: AVSS (2009)

    Google Scholar 

  3. Nagano, K., et al.: pagan: real-time avatars using dynamic textures. In: SIGGRAPH Asia (2018)

    Google Scholar 

  4. Hu, L., et al.: Avatar digitization from a single image for real-time rendering. TOG 36(6), 1–14 (2017)

    Article  Google Scholar 

  5. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)

    Google Scholar 

  6. Kim, H., et al.: Deep video portraits. TOG 37(4), 1–14 (2018)

    Google Scholar 

  7. Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)

    Google Scholar 

  8. Saito, S., Li, T., Li, H.: Real-time facial segmentation and performance capture from RGB input. In: ECCV (2016)

    Google Scholar 

  9. Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. TOG 33(4), 1–10 (2014)

    Google Scholar 

  10. Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. TOG 32(4), 1–10 (2013)

    Article  Google Scholar 

  11. Sanyal, S., Bolkart, T., Feng, H., Black, M.J.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: CVPR (2019)

    Google Scholar 

  12. Yi, H., et al.: Mmface: a multi-metric regression network for unconstrained face reconstruction. In: CVPR (2019)

    Google Scholar 

  13. Chang, F.J., Tran, A.T., Hassner, T., Masi, I., Nevatia, R., Medioni, G.: Expnet: Landmark-free, deep, 3D facial expressions. In: FG (2018)

    Google Scholar 

  14. Wu, F., et al.: MVF-net: multi-view 3D face morphable model regression. In: CVPR (2019)

    Google Scholar 

  15. Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR (2016)

    Google Scholar 

  16. Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: ECCV (2018)

    Google Scholar 

  17. Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: ICCV (2017)

    Google Scholar 

  18. Bagdanov, A.D., Del Bimbo, A., Masi, I.: The florence 2D/3D hybrid face dataset. In: Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding (2011)

    Google Scholar 

  19. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20(3), 413–425 (2013)

    Google Scholar 

  20. Liu, Y., Jourabloo, A., Ren, W., Liu, X.: Dense face alignment. In: ICCV (2017)

    Google Scholar 

  21. Gou, C., Wu, Y., Wang, F.Y., Ji, Q.: Shape augmented regression for 3D face alignment. In: ECCV (2016)

    Google Scholar 

  22. Romdhani, S., Vetter, T.: Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In: CVPR (2005)

    Google Scholar 

  23. Dou, P., Shah, S.K., Kakadiaris, I.A.: End-to-end 3D face reconstruction with deep neural networks. In: CVPR (2017)

    Google Scholar 

  24. Tuan Tran, A., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: CVPR (2017)

    Google Scholar 

  25. Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: ICCV (2017)

    Google Scholar 

  26. Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV (2017)

    Google Scholar 

  27. Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: CVPR (2018)

    Google Scholar 

  28. Chang, F.-J., Tran, A.T., Hassner, T., Masi, I., Nevatia, R., Medioni, G.: Deep, landmark-free fame: face alignment, modeling, and expression estimation. Int. J. Comput. Vis. 127(6), 930–956 (2019). https://doi.org/10.1007/s11263-019-01151-x

    Article  Google Scholar 

  29. Piotraschke, M., Blanz, V.: Automated 3D face reconstruction from multiple images using quality measures. In: CVPR (2016)

    Google Scholar 

  30. Tewari, A., et al.: FML: face model learning from videos. In: CVPR (2019)

    Google Scholar 

  31. Hafner, D., Demetz, O., Weickert, J.: Why is the census transform good for robust optic flow computation? In: SSVM (2013)

    Google Scholar 

  32. Liu, P., King, I., Lyu, M.R., Xu, J.: Ddflow: learning optical flow with unlabeled data distillation. In: AAAI (2019)

    Google Scholar 

  33. Xiong, X., De la Torre, F.: Global supervised descent method. In: CVPR (2015)

    Google Scholar 

  34. Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: ICCV (2017)

    Google Scholar 

  35. Bhagavatula, C., Zhu, C., Luu, K., Savvides, M.: Faster than real-time facial alignment: a 3D spatial transformer network approach in unconstrained poses. In: ICCV (2017)

    Google Scholar 

  36. Zollhöfer, M., et al.: State of the art on monocular 3D face reconstruction, tracking, and applications. In: Computer Graphics Forum, Wiley Online Library (2018)

    Google Scholar 

  37. Deng, J., et al.: The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking. Int. J. Comput. Vis. 127(6), 599–624 (2018). https://doi.org/10.1007/s11263-018-1134-y

    Article  Google Scholar 

  38. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)

    Google Scholar 

  39. Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: ICCVW (2015)

    Google Scholar 

  40. Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)

    Article  Google Scholar 

  41. Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: ICCVW (2011)

    Google Scholar 

  42. Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 HZ. In: CVPR (2018)

    Google Scholar 

  43. Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: CVPR (2018)

    Google Scholar 

  44. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  45. Kim, H., Zollhöfer, M., Tewari, A., Thies, J., Richardt, C., Theobalt, C.: Inversefacenet: deep monocular inverse face rendering. In: CVPR (2018)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by the RRC of the Hong Kong Special Administrative Region (No. CUHK 14210717 of the General Research Fund) and National Key Research and Development Program of China (No. 2018AAA0100204). We also thank Yao Feng, Feng Liu and Ayush Tewari for kind help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengpeng Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, P., Han, X., Lyu, M., King, I., Xu, J. (2021). Learning 3D Face Reconstruction with a Pose Guidance Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12626. Springer, Cham. https://doi.org/10.1007/978-3-030-69541-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69541-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69540-8

  • Online ISBN: 978-3-030-69541-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics