Advertisement

JNR: Joint-Based Neural Rig Representation for Compact 3D Face Modeling

Conference paper
  • 437 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

In this paper, we introduce a novel approach to learn a 3D face model using a joint-based face rig and a neural skinning network. Thanks to the joint-based representation, our model enjoys some significant advantages over prior blendshape-based models. First, it is very compact such that we are orders of magnitude smaller while still keeping strong modeling capacity. Second, because each joint has its semantic meaning, interactive facial geometry editing is made easier and more intuitive. Third, through skinning, our model supports adding mouth interior and eyes, as well as accessories (hair, eye glasses, etc.) in a simpler, more accurate and principled way. We argue that because the human face is highly structured and topologically consistent, it does not need to be learned entirely from data. Instead we can leverage prior knowledge in the form of a human-designed 3D face rig to reduce the data dependency, and learn a compact yet strong face model from only a small dataset (less than one hundred 3D scans). To further improve the modeling capacity, we train a skinning weight generator through adversarial learning. Experiments on fitting high-quality 3D scans (both neutral and expressive), noisy depth images, and RGB images demonstrate that its modeling capacity is on-par with state-of-the-art face models, such as FLAME and Facewarehouse, even though the model is 10 to 20 times smaller. This suggests broad value in both graphics and vision applications on mobile and edge devices.

Keywords

Face modeling 3D face reconstruction GANs 

Supplementary material

504473_1_En_23_MOESM1_ESM.pdf (1.2 mb)
Supplementary material 1 (pdf 1214 KB)

Supplementary material 2 (mp4 25902 KB)

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D Face Model for Pose and Illumination Invariant Face Recognition (2009)Google Scholar
  5. 5.
    Abrevaya, V.F., Wuhrer, S., Boyer, E.: Multilinear autoencoder for 3D face model learning. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9, March 2018.  https://doi.org/10.1109/WACV.2018.00007
  6. 6.
    Amberg, B., Romdhani, S., Vetter, T.: Optimal step nonrigid ICP algorithms for surface registration. In: CVPR. IEEE Computer Society (2007). http://dblp.uni-trier.de/db/conf/cvpr/cvpr2007.html#AmbergRV07
  7. 7.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017. http://proceedings.mlr.press/v70/arjovsky17a.html
  8. 8.
    Bailey, S.W., Otte, D., Dilorenzo, P., O’Brien, J.F.: Fast and deep deformation approximations. ACM Trans. Graph. (TOG) 37(4), 1–12 (2018)CrossRefGoogle Scholar
  9. 9.
    Bao, M., Cong, M., Grabli, S., Fedkiw, R.: High-quality face capture using anatomical muscles. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  10. 10.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings SIGGRAPH, pp. 187–194 (1999)Google Scholar
  11. 11.
    Booth, J., Roussos, A., Ponniah, A., Dunaway, D., Zafeiriou, S.: Large scale 3D morphable models. Int. J. Comput. Vis. 126(2–4), 233–254 (2018).  https://doi.org/10.1007/s11263-017-1009-7MathSciNetCrossRefGoogle Scholar
  12. 12.
    Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. ACM Trans. Graph. 32(4), 1–10 (2013)CrossRefGoogle Scholar
  13. 13.
    Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Visual. Comput. Graph. 20(3), 413–425 (2014)CrossRefGoogle Scholar
  14. 14.
    Chaudhuri, B., Vesdapunt, N., Wang, B.: Joint face detection and facial motion retargeting for multiple faces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  15. 15.
    Chaudhuri, B., Vesdapunt, N., Wang, B.: Joint face detection and facial motion retargeting for multiple faces. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  16. 16.
    Chung, J.S., Nagrani, A., Zisserman, A.: Voxceleb2: deep speaker recognition. In: INTERSPEECH (2018)Google Scholar
  17. 17.
    Dai, H., Pears, N., Smith, W.: A data-augmented 3D morphable model of the ear. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 404–408, May 2018.  https://doi.org/10.1109/FG.2018.00065
  18. 18.
    Dai, H., Pears, N., Smith, W.A.P., Duncan, C.: A 3D morphable model of craniofacial shape and texture variation. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  19. 19.
    Dai, H., Shao, L.: Pointae: point auto-encoder for 3D statistical shape and texture modelling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019Google Scholar
  20. 20.
    Edwards, P., Landreth, C., Fiume, E., Singh, K.: Jali: an animator-centric viseme model for expressive lip synchronization. ACM Trans. Graph. 35(4), 127:1–127:11 (2016).  https://doi.org/10.1145/2897824.2925984
  21. 21.
    Ferrari, C., Lisanti, G., Berretti, S., Bimbo, A.D.: Dictionary learning based 3D morphable model construction for face recognition with varying expression and pose. In: 2015 International Conference on 3D Vision. IEEE, October 2015.  https://doi.org/10.1109/3dv.2015.63
  22. 22.
    Garrido, P., Valgaerts, L., Wu, C., Theobalt, C.: Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32, 158:1–158:10 (2013). (Proceedings of SIGGRAPH Asia 2013).  https://doi.org/10.1145/2508363.2508380
  23. 23.
    Garrido, P., et al.: Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35(3), 28:1–28:15 (2016). (Presented at SIGGRAPH 2016)Google Scholar
  24. 24.
    Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In: CVPR (2019). http://arxiv.org/abs/1902.05978
  25. 25.
    Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
  26. 26.
    Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). http://arxiv.org/abs/1704.04861, cite arxiv:1704.04861
  27. 27.
    Huynh, L., et al.: Mesoscopic facial geometry inference using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT (2018)Google Scholar
  28. 28.
    Jiang, Z.H., Wu, Q., Chen, K., Zhang, J.: Disentangled representation learning for 3D face shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  29. 29.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. ICLR 2017 (2017). https://openreview.net/forum?id=SJU4ayYgl
  30. 30.
    Li, K., Liu, J., Lai, Y.K., Yang, J.: Generating 3D faces using multi-column graph convolutional networks. Comput. Graph. Forum 38(7), 215–224 (2019).  https://doi.org/10.1111/cgf.13830CrossRefGoogle Scholar
  31. 31.
    Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6), 194:1–194:17 (2017)Google Scholar
  32. 32.
    Liu, L., Zheng, Y., Tang, D., Yuan, Y., Fan, C., Zhou, K.: Neuroskinning: automatic skin binding for production characters with deep graph networks. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)Google Scholar
  33. 33.
    Patel, A., Smith, W.: Simplification of 3D morphable models. In: Proceedings of the International Conference on Computer Vision, pp. 271–278 (2011).  https://doi.org/10.1109/ICCV.2011.6126252, International Conference on Computer Vision; Conference date: 06-11-2011 Through 13-11-2011
  34. 34.
    Ploumpis, S., Wang, H., Pears, N., Smith, W.A.P., Zafeiriou, S.: Combining 3D morphable models: a large scale face-and-head model. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  35. 35.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. ArXiv abs/1710.05941 (2017)Google Scholar
  36. 36.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2017)Google Scholar
  37. 37.
    Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 725–741. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_43CrossRefGoogle Scholar
  38. 38.
    Roth, J., Tong, Y., Liu, X.: Adaptive 3D face reconstruction from unconstrained photo collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  39. 39.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE International Conference on Computer Vision Workshops (ICCVW) (2013)Google Scholar
  40. 40.
    Tewari, A., et al.: Fml: face model learning from videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  41. 41.
    Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  42. 42.
    Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 2549–2559 (2018).  https://doi.org/10.1109/CVPR.2018.00270, http://openaccess.thecvf.com/content_cvpr_2018/html/Tewari_Self-Supervised_Multi-Level_Face_CVPR_2018_paper.html
  43. 43.
    Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  44. 44.
    Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. In: ACM SIGGRAPH 2011 Papers, pp. 77:1–77:10. SIGGRAPH 2011 (2011)Google Scholar
  45. 45.
    Wu, C., Bradley, D., Gross, M., Beeler, T.: An anatomically-constrained local deformation model for monocular face capture. ACM Trans. Graph. 35(4), 115:1–115:12 (2016).  https://doi.org/10.1145/2897824.2925882
  46. 46.
    Wu, C., Shiratori, T., Sheikh, Y.: Deep incremental learning for efficient high-fidelity face tracking. ACM Trans. Graph. 37(6), 234:1–234:12 (2018).  https://doi.org/10.1145/3272127.3275101
  47. 47.
    Zhang, Z., et al.: Multimodal spontaneous emotion corpus for human behavior analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Microsoft Cloud and AIPittsburghUSA

Personalised recommendations