Generating 3D Faces Using Convolutional Mesh Autoencoders

  • Anurag RanjanEmail author
  • Timo Bolkart
  • Soubhik Sanyal
  • Michael J. Black
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We show that, replacing the expression space of an existing state-of-the-art face model with our model, achieves a lower reconstruction error. Our data, model and code are available at



We thank T. Alexiadis and J. Márquez for data aquisition; H. Feng for rendering the figures; S. Wuhrer for advice on mesh convolutions; and G. Pavlakos, D. Paschalidou and S. Pujades for helping us with paper revisions.

Supplementary material

474178_1_En_43_MOESM1_ESM.pdf (30.8 mb)
Supplementary material 1 (pdf 31561 KB)


  1. 1.
    Amberg, B., Knothe, R., Vetter, T.: Expression invariant 3D face recognition with a morphable model. In: International Conference on Automatic Face Gesture Recognition, pp. 1–6 (2008)Google Scholar
  2. 2.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH, pp. 187–194 (1999)Google Scholar
  3. 3.
    Booth, J., Roussos, A., Ponniah, A., Dunaway, D., Zafeiriou, S.: Large scale 3D morphable models. Int. J. Comput. Vis. 126, 1–22 (2017)MathSciNetGoogle Scholar
  4. 4.
    Boscaini, D., Masci, J., Melzi, S., Bronstein, M.M., Castellani, U., Vandergheynst, P.: Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. In: Eurographics Symposium on Geometry Processing, pp. 13–23 (2015)CrossRefGoogle Scholar
  5. 5.
    Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 3189–3197 (2016)Google Scholar
  6. 6.
    Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graph. 32(4), 40 (2013)CrossRefGoogle Scholar
  7. 7.
    Breidt, M., Bülthoff, H.H., Curio, C.: Robust semantic analysis by synthesis of 3D facial motion. In: International Conference on Automatic Face and Gesture Recognition and Workshops, pp. 713–719 (2011)Google Scholar
  8. 8.
    Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016)
  9. 9.
    Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Numerical Geometry of Non-Rigid Shapes. Springer, Heidelberg (2008). Scholar
  10. 10.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. Signal Process. Mag. 34(4), 18–42 (2017)CrossRefGoogle Scholar
  11. 11.
    Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. CoRR abs/1312.6203 (2013)Google Scholar
  12. 12.
    Brunton, A., Bolkart, T., Wuhrer, S.: Multilinear wavelets: a statistical shape space for human faces. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 297–312. Springer, Cham (2014). Scholar
  13. 13.
    Brunton, A., Salazar, A., Bolkart, T., Wuhrer, S.: Review of statistical shape spaces for 3D data with comparative analysis for human faces. Comput. Vis. Image Underst. 128, 1–17 (2014)CrossRefGoogle Scholar
  14. 14.
    Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. Trans. Vis. Comput. Graph. 20(3), 413–425 (2014)CrossRefGoogle Scholar
  15. 15.
    Chung, F.R.K.: Spectral Graph Theory, vol. 92. American Mathematical Soc., Providence (1997)zbMATHGoogle Scholar
  16. 16.
    Cosker, D., Krumhuber, E., Hilton, A.: A FACS valid 3D dynamic action unit database with applications to 3D dynamic morphable facial modeling. In: International Conference on Computer Vision, pp. 2296–2303 (2011)Google Scholar
  17. 17.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)Google Scholar
  18. 18.
    Abrevaya, V.F., Wuhrer, S., Boyer, E.: Multilinear autoencoder for 3D face model learning. In: Winter Conference on Applications of Computer Vision, pp. 1–9 (2018)Google Scholar
  19. 19.
    Ferrari, C., Lisanti, G., Berretti, S., Bimbo, A.D.: Dictionary learning based 3D morphable model construction for face recognition with varying expression and pose. In: International Conference on 3D Vision, pp. 509–517 (2015)Google Scholar
  20. 20.
    Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: Proceedings of the 24th Annual Conference on Computer Graphics And Interactive Techniques, pp. 209–216. ACM Press/Addison-Wesley Publishing Co. (1997)Google Scholar
  21. 21.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Fourteenth International Conference on Artificial Intelligence and Statistics (2011)Google Scholar
  22. 22.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  23. 23.
    Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. CoRR abs/1506.05163 (2015)Google Scholar
  25. 25.
    Jackson, A.S., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: International Conference on Computer Vision (2017)Google Scholar
  26. 26.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2016)Google Scholar
  27. 27.
    Li, H., Weise, T., Pauly, M.: Example-based facial rigging. ACM Trans. Graph. 29(4), 32 (2010)Google Scholar
  28. 28.
    Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6), 194 (2017)Google Scholar
  29. 29.
    Litany, O., Bronstein, A., Bronstein, M., Makadia, A.: Deformable shape completion with graph convolutional autoencoders. arXiv preprint arXiv:1712.00268 (2017)
  30. 30.
    Maron, H., et al.: Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. 36(4), 71:1–71:10 (2017)CrossRefGoogle Scholar
  31. 31.
    Masci, J., Boscaini, D., Bronstein, M., Vandergheynst, P.: Geodesic convolutional neural networks on Riemannian manifolds. In: International Conference on Computer Vision Workshops, pp. 37–45 (2015)Google Scholar
  32. 32.
    Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model CNNs (2017)Google Scholar
  33. 33.
    Neumann, T., Varanasi, K., Wenger, S., Wacker, M., Magnor, M., Theobalt, C.: Sparse localized deformation components. Trans. Graph. (Proc. SIGGRAPH Asia) 32(6), 179:1–179:10 (2013)Google Scholar
  34. 34.
    van den Oord, A., et al.: WaveNet: a generative model for raw audio. CoRR abs/1609.03499 (2016)Google Scholar
  35. 35.
    Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
  36. 36.
    Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301 (2009)Google Scholar
  37. 37.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Savran, A., et al.: Bosphorus database for 3D face analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) BioID 2008. LNCS, vol. 5372, pp. 47–56. Springer, Heidelberg (2008). Scholar
  39. 39.
    Sinha, A., Bai, J., Ramani, K.: Deep learning 3D shape surfaces using geometry images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 223–240. Springer, Cham (2016). Scholar
  40. 40.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
  41. 41.
    Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: International Conference on Computer Vision (2017)Google Scholar
  42. 42.
    Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. Trans. Graph. 34(6), 183:1–183:14 (2015)CrossRefGoogle Scholar
  43. 43.
    Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: real-time face capture and reenactment of RGB videos. In: Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)Google Scholar
  44. 44.
    Tran, A.T., Hassner, T., Masi, I., Paz, E., Nirkin, Y., Medioni, G.: Extreme 3D face reconstruction: looking past occlusions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  45. 45.
    Verma, N., Boyer, E., Verbeek, J.: Dynamic filters in graph convolutional networks. CoRR abs/1706.05206 (2017)Google Scholar
  46. 46.
    Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. Trans. Graph. 24(3), 426–433 (2005)CrossRefGoogle Scholar
  47. 47.
    Yang, F., Wang, J., Shechtman, E., Bourdev, L., Metaxas, D.: Expression flow for 3D-aware face component transfer. Trans. Graph. 30(4), 60:1–60:10 (2011)CrossRefGoogle Scholar
  48. 48.
    Yi, L., Su, H., Guo, X., Guibas, L.J.: SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation (2017)Google Scholar
  49. 49.
    Yin, L., Chen, X., Sun, Y., Worm, T., Reale, M.: A high-resolution 3D dynamic facial expression database. In: International Conference on Automatic Face and Gesture Recognition, pp. 1–6 (2008)Google Scholar
  50. 50.
    Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: International Conference on Automatic Face and Gesture Recognition, pp. 211–216 (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Anurag Ranjan
    • 1
    Email author
  • Timo Bolkart
    • 1
  • Soubhik Sanyal
    • 1
  • Michael J. Black
    • 1
  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany

Personalised recommendations