Advertisement

“Look Ma, No Landmarks!” – Unsupervised, Model-Based Dense Face Alignment

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

In this paper, we show how to train an image-to-image network to predict dense correspondence between a face image and a 3D morphable model using only the model for supervision. We show that both geometric parameters (shape, pose and camera intrinsics) and photometric parameters (texture and lighting) can be inferred directly from the correspondence map using linear least squares and our novel inverse spherical harmonic lighting model. The least squares residuals provide an unsupervised training signal that allows us to avoid artefacts common in the literature such as shrinking and conservative underfitting. Our approach uses a network that is 10\(\times \) smaller than parameter regression networks, significantly reduces sensitivity to image alignment and allows known camera calibration or multi-image constraints to be incorporated during inference. We achieve results competitive with state-of-the-art but without any auxiliary supervision used by previous methods.

Keywords

3D morphable model Dense correspondence Face alignment Landmark Unsupervised Self-supervised 

Notes

Acknowledgements

W. Smith is supported by a Royal Academy of Engineering/The Leverhulme Trust Senior Research Fellowship.

Supplementary material

504434_1_En_41_MOESM1_ESM.pdf (23.7 mb)
Supplementary material 1 (pdf 24260 KB)

Supplementary material 2 (mp4 998 KB)

References

  1. 1.
    Psychological image collection at stirling (PICS). http://pics.stir.ac.uk/
  2. 2.
    Aldrian, O., Smith, W.A.: Inverse rendering of faces with a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1080–1093 (2013)CrossRefGoogle Scholar
  3. 3.
    Alp Guler, R., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6799–6808 (2017)Google Scholar
  4. 4.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014).  https://doi.org/10.1007/s11263-013-0667-3MathSciNetCrossRefGoogle Scholar
  5. 5.
    Crispell, D., Bazik, M.: Pix2face: direct 3D face model estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2512–2518 (2017)Google Scholar
  6. 6.
    Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., Tong, X.: Accurate 3D face reconstruction with weakly-supervised learning: from single image to image set. In: IEEE Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  7. 7.
    Egger, B., et al.: 3D morphable face models-past, present and future. arXiv preprint arXiv:1909.01815 (2019)
  8. 8.
    Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D face reconstruction and dense alignment with position map regression network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 534–551 (2018)Google Scholar
  9. 9.
    Feng, Z.H., et al.: Evaluation of dense 3D reconstruction from 2D face images in the wild. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 780–786. IEEE (2018)Google Scholar
  10. 10.
    Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., Freeman, W.T.: Unsupervised training for 3D morphable model regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8377–8386 (2018)Google Scholar
  11. 11.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  13. 13.
    Kim, H., Zollöfer, M., Tewari, A., Thies, J., Richardt, C., Christian, T.: InverseFaceNet: deep single-shot inverse face rendering from a single image. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2018) (2018)Google Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  15. 15.
    Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: Advances in Neural Information Processing Systems, pp. 9605–9616 (2018)Google Scholar
  16. 16.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
  17. 17.
    Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: IEEE Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT (2018)Google Scholar
  18. 18.
    Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)Google Scholar
  19. 19.
    Piotraschke, M., Blanz, V.: Automated 3D face reconstruction from multiple images using quality measures. In: Proceedings of the CVPR, pp. 3418–3427 (2016)Google Scholar
  20. 20.
    Ramamoorthi, R., Hanrahan, P.: An efficient representation for irradiance environment maps. In: Proceedings of the SIGGRAPH, pp. 497–500 (2001)Google Scholar
  21. 21.
    Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014)Google Scholar
  22. 22.
    Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: Proceedings of the 3DV, pp. 460–469 (2016)Google Scholar
  23. 23.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  24. 24.
    Sanyal, S., Bolkart, T., Feng, H., Black, M.: Learning to regress 3D face shape and expression from an image without 3D supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  25. 25.
    Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1576–1585 (2017)Google Scholar
  26. 26.
    Sengupta, S., Kanazawa, A., Castillo, C.D., Jacobs, D.W.: SfSNet: learning shape, reflectance and illuminance of faces ‘in the wild’. In: Proceedings of the ECCV (2018)Google Scholar
  27. 27.
    Tewari, A., et al.: FML: face model learning from videos. arXiv preprint arXiv:1812.07603 (2018)
  28. 28.
    Tewari, A., et al.: Self-supervised multi-level face model learning for monocular reconstruction at over 250 Hz. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  29. 29.
    Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  30. 30.
    Tian, W., Liu, F., Zhao, Q.: Landmark-based 3D face reconstruction from an arbitrary number of unconstrained images. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 774–779. IEEE (2018)Google Scholar
  31. 31.
    Tran, A.T., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: Proceedings of the CVPR, pp. 5163–5172 (2017)Google Scholar
  32. 32.
    Tran, L., Liu, X.: On learning 3D face morphable model from in-the-wild images. IEEE Trans. Pattern Anal. Mach. Intell. (2019, to appear) Google Scholar
  33. 33.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4177–4187 (2016)Google Scholar
  34. 34.
    Yan, J., Lei, Z., Yi, D., Li, S.Z.: Learn to combine multiple hypotheses for accurate face alignment. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 392–396 (2013)Google Scholar
  35. 35.
    Yu, R., Saito, S., Li, H., Ceylan, D., Li, H.: Learning dense facial correspondences in unconstrained images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4723–4732 (2017)Google Scholar
  36. 36.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012)Google Scholar
  37. 37.
    Zhu, S., Li, C., Change Loy, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4998–5006 (2015)Google Scholar
  38. 38.
    Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)Google Scholar
  39. 39.
    Zhu, X., Liu, X., Lei, Z., Li, S.Z.: Face alignment in full pose range: a 3D total solution. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 78–92 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Canon Inc.TokyoJapan
  2. 2.University of YorkYorkUK

Personalised recommendations