Towards Fast, Accurate and Stable 3D Dense Face Alignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)


Existing methods of 3D dthus limiting the scope of their practical applications. In this paper, we propose a novel regression framework which makes a balance among speed, accuracy and stability. Firstly, on the basis of a lightweight backbone, we propose a meta-joint optimization strategy to dynamically regress a small set of 3DMM parameters, which greatly enhances speed and accuracy simultaneously. To further improve the stability on videos, we present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving. On the premise of high accuracy and stability, our model runs at 50 fps on a single CPU core and outperforms other state-of-the-art heavy models simultaneously. Experiments on several challenging datasets validate the efficiency of our method. The code and models will be available at


3D dense face alignment 3D face reconstruction 



This work was supported in part by the National Key Research & Development Program (No. 2020YFC2003901), Chinese National Natural Science Foundation Projects #61872367, #61876178, #61806196, #61976229.

Supplementary material

504475_1_En_10_MOESM1_ESM.pdf (5 mb)
Supplementary material 1 (pdf 5120 KB)

Supplementary material 2 (mp4 7832 KB)


  1. 1.
    Bagdanov, A.D., Bimbo, A.D., Masi, I.: The florence 2D/3D hybrid face dataset. In: ACM workshop on Human gesture and behavior understanding (2011)Google Scholar
  2. 2.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. TPAMI 35, 2930–2940 (2013)CrossRefGoogle Scholar
  3. 3.
    Bettadapura, V.: Face expression recognition and analysis: the state of the art. arXiv:1203.6722 (2012)
  4. 4.
    Bhagavatula, C., Zhu, C., Luu, K., Savvides, M.: Faster than real-time facial alignment: a 3D spatial transformer network approach in unconstrained poses. In: ICCV (2017)Google Scholar
  5. 5.
    Blanz, V., Vetter, T., et al.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH (1999)Google Scholar
  6. 6.
    Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., Dunaway, D.: A 3D morphable model learnt from 10,000 faces. In: CVPR (2016)Google Scholar
  7. 7.
    Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: ICCV (2017)Google Scholar
  8. 8.
    Cao, C., Chai, M., Woodford, O., Luo, L.: Stabilized real-time face tracking via a learned dynamic rigidity prior. In: SIGGRAPH Asia 2018 Technical Papers. ACM (2018)Google Scholar
  9. 9.
    Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. ACM Trans. Graph. (TOG) 32, 1–10 (2013)CrossRefGoogle Scholar
  10. 10.
    Cao, J., Hu, Y., Zhang, H., He, R., Sun, Z.: Learning a high fidelity pose invariant model for high-resolution face frontalization. In: Advances in Neural Information Processing Systems, pp. 2867–2877 (2018)Google Scholar
  11. 11.
    Cao, J., Hu, Y., Zhang, H., He, R., Sun, Z.: Towards high fidelity face frontalization in the wild. Int. J. Comput. Vision 128, 1–20 (2019)Google Scholar
  12. 12.
    Cao, J., Huang, H., Li, Y., He, R., Sun, Z.: Informative sample mining network for multi-domain image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)Google Scholar
  13. 13.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. IJCV 107, 177–190 (2014)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Chinaev, N., Chigorin, A., Laptev, I.: Mobileface: 3D face reconstruction with efficient cnn regression. In: ECCV (2018)Google Scholar
  15. 15.
    Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3D face reconstruction with weakly-supervised learning: from single image to image set. In: CVPR Workshop (2019)Google Scholar
  16. 16.
    Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y., Sheikh, Y.: Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: CVPR (2018)Google Scholar
  17. 17.
    Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: ECCV (2018)Google Scholar
  18. 18.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  19. 19.
    Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: Ganfit: generative adversarial network fitting for high fidelity 3D face reconstruction. In: CVPR (2019)Google Scholar
  20. 20.
    Guo, J., et al.: Dominant and complementary emotion recognition from still images of faces. IEEE Access 6, 26391–26403 (2018)CrossRefGoogle Scholar
  21. 21.
    Guo, J., et al.: Multi-modality network with visual and geometrical information for micro emotion recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 814–819. IEEE (2017)Google Scholar
  22. 22.
    Guo, J., Zhu, X., Lei, Z.: 3ddfa (2018).
  23. 23.
    Guo, J., Zhu, X., Lei, Z., Li, S.Z.: Face synthesis for eyeglass-robust face recognition. In: Zhou, J., et al. (eds.) CCBR 2018. LNCS, vol. 10996, pp. 275–284. Springer, Cham (2018). Scholar
  24. 24.
    Guo, J., Zhu, X., Xiao, J., Lei, Z., Wan, G., Li, S.Z.: Improving face anti-spoofing by 3D virtual synthesis. In: 2019 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2019)Google Scholar
  25. 25.
    Guo, J., Zhu, X., Zhao, C., Cao, D., Lei, Z., Li, S.Z.: Learning meta face recognition in unseen domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6163–6172 (2020)Google Scholar
  26. 26.
    Jackson, A., Bulat, A., Argyriou, V., Tzimiropoulos, G.: Large pose 3D face reconstruction from a single image via direct volumetric cnn regression. In: ICCV (2017)Google Scholar
  27. 27.
    Kim, H., et al.: Neural style-preserving visual dubbing. ACM Trans. Graph. (TOG) 38, 1–13 (2019)Google Scholar
  28. 28.
    Kim, H., et al.: Deep video portraits. ACM Trans. Graph. (TOG) 37, 1–14 (2018)Google Scholar
  29. 29.
    Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: ICCV Workshop (2011)Google Scholar
  30. 30.
    Lepetit, V., Fua, P., et al.: Monocular model-based 3D tracking of rigid objects: A survey. Found. Trends® Comput. Graph. Vision (2005)Google Scholar
  31. 31.
    Liu, F., Zeng, D., Zhao, Q., Liu, X.: Joint face alignment and 3D face reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 545–560. Springer, Cham (2016). Scholar
  32. 32.
    Liu, H., Lu, J., Feng, J., Zhou, J.: Two-stream transformer networks for video-based face alignment. TPAMI 40, 2546–2554 (2018)CrossRefGoogle Scholar
  33. 33.
    Liu, Y., Jourabloo, A., Ren, W., Liu, X.: Dense face alignment. In: ICCV (2017)Google Scholar
  34. 34.
    Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: Xm2vtsdb: the extended m2vts database. In: Second International Conference on Audio and Video-Based Biometric Person Authentication (1999)Google Scholar
  35. 35.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  36. 36.
    Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 38–56. Springer, Cham (2016). Scholar
  37. 37.
    Qin, Y., et al.: Learning meta model for zero-and few-shot face anti-spoofing. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) (2020)Google Scholar
  38. 38.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: CVPRW (2013)Google Scholar
  39. 39.
    Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: Benchmark and results. In: ICCV Workshops (2015)Google Scholar
  40. 40.
    Tai, Y., et al.: Towards highly accurate and stable face alignment for high-resolution videos. arXiv:1811.00342 (2018)
  41. 41.
    Taigman, Y., Yang., M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  42. 42.
    Tewari, A., et al.: Fml: face model learning from videos. In: CVPR (2019)Google Scholar
  43. 43.
    Tewari, A., et al.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV (2017)Google Scholar
  44. 44.
    Tuan, T., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3d morphable models with a very deep neural network. In: CVPR (2017)Google Scholar
  45. 45.
    Wang, Z., et al.: Deep spatial gradient and temporal depth learning for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5042–5051 (2020)Google Scholar
  46. 46.
    Xiong, X., De, T.F.: Global supervised descent method. In: CVPR (2015)Google Scholar
  47. 47.
    Yang, C.Y., Liu, S., Yang, M.H.: Structured face hallucination. In: CVPR (2013)Google Scholar
  48. 48.
    Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: ICCV (2017)Google Scholar
  49. 49.
    Yu, Z., Li, X., Niu, X., Shi, J., Zhao, G.: Face anti-spoofing with human material perception. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)Google Scholar
  50. 50.
    Yu, Z., et al.: Searching central difference convolutional networks for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5295–5305 (2020)Google Scholar
  51. 51.
    Zafeiriou, S., Chrysos, G.G., Roussos, A., Ververas, E., Deng, J., Trigeorgis, G.: The 3D menpo facial landmark tracking challenge. In: ICCV (2017)Google Scholar
  52. 52.
    Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: NeurIPS (2019)Google Scholar
  53. 53.
    Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: CVPRW (2013)Google Scholar
  54. 54.
    Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR (2016)Google Scholar
  55. 55.
    Zhu, X., Liu, X., Lei, Z., Li, S.Z.: Face alignment in full pose range: a 3D total solution. TPAMI 41, 78–92 (2019)CrossRefGoogle Scholar
  56. 56.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)Google Scholar
  57. 57.
    Zhu, X., et al.: Beyond 3D mm space: towards fine-grained 3D face reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.CBSR&NLPR, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.College of SoftwareBeihang UniversityBeijingChina
  4. 4.School of EngineeringWestlake UniversityHangzhouChina

Personalised recommendations