Advertisement

Towards High Fidelity Face Frontalization in the Wild

  • Jie Cao
  • Yibo Hu
  • Hongwen Zhang
  • Ran He
  • Zhenan SunEmail author
Article
  • 76 Downloads
Part of the following topical collections:
  1. Special Issue: Generating Realistic Visual Data of Human Behavior

Abstract

Face frontalization refers to the process of synthesizing the frontal view of a face from a given profile. Due to self-occlusion and appearance distortion in the wild, it is extremely challenging to recover faithful high-resolution results meanwhile preserve texture details. This paper proposes a high fidelity pose in-variant model (HF-PIM) to produce photographic and identity-preserving results. HF-PIM frontalizes the profiles through a novel texture fusion warping procedure and leverages a dense correspondence field to bind the 2D and 3D surface spaces. We decompose the prerequisite of warping into dense correspondence field estimation and facial texture map recovering, which are both well addressed by deep networks. Different from those reconstruction methods relying on 3D data, we also propose adversarial residual dictionary learning to supervise facial texture map recovering with only monocular images. Furthermore, a multi-perception guided loss is proposed to address the practical misalignment between the ground truth frontal and profile faces, allowing HF-PIM to effectively utilize multiple images during training. Quantitative and qualitative evaluations on five controlled and uncontrolled databases show that the proposed method not only boosts the performance of pose-invariant face recognition but also improves the visual quality of high-resolution frontalization appearances.

Keywords

Face frontalization Realistic face generation Pose-invariant face recognition 

Notes

Acknowledgements

This work is funded by the National Key Research and Development Program of China (Grant Nos. 2016YFB1001001, 2017YFC0821602), the National Natural Science Foundation of China (Grant Nos. 61622310, 61427811, U1836217), and Beijing Natural Science Foundation (Grant No. JQ18017).

References

  1. AbdAlmageed, W., Wu, Y., Rawls, S., Harel, S., Hassner, T., Masi, I., Choi, J., Lekust, J., Kim, J., Natarajan, P., et al. (2016). Face recognition using deep multi-pose representations. In IEEE winter conference on applications of computer vision (WACV) (pp. 1–9).Google Scholar
  2. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. In International conference on machine learning (ICML) (pp. 214–223).Google Scholar
  3. Ashraf, A. B., Lucey, S., & Chen, T. (2008). Learning patch correspondences for improved viewpoint invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  4. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Annual conference on computer graphics and interactive techniques (SIGGRAPH) (pp. 187–194).Google Scholar
  5. Booth, J., & Zafeiriou, S. (2014). Optimal UV spaces for facial morphable model construction. In IEEE international conference on image processing (ICIP) (pp. 4672–4676).Google Scholar
  6. Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In IEEE international conference on computer vision (ICCV) (pp. 1021–1030).Google Scholar
  7. Cao, J., Hu, Y., Yu, B., He, R., & Sun, Z. (2019). 3D aided duet GANs for multi-view face image synthesis. IEEE Transactions on Information Forensics and Security (TIFS), 14(8), 2028–2042.CrossRefGoogle Scholar
  8. Cao, J., Hu, Y., Zhang, H., He, R., & Sun, Z. (2018). Learning a high fidelity pose invariant model for high-resolution face frontalization. In Conference on neural information processing systems (NeurIPS).Google Scholar
  9. Cao, K., Rong, Y., Li, C., Tang, X., & Loy, C.C. (2018). Pose-robust face recognition via deep residual equivariant mapping. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5187–5196).Google Scholar
  10. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Conference on neural information processing systems (NeurIPS) (pp. 2172–2180).Google Scholar
  11. Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3828–3836).Google Scholar
  12. Cole, F., Belanger, D., Krishnan, D., Sarna, A., Mosseri, I., & Freeman, W.T. (2017). Synthesizing normalized faces from facial identity features. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3386–3395).Google Scholar
  13. Dana, H.Z.J.X.K. (2017). Deep TEN: Texture encoding network. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2896–2905).Google Scholar
  14. Deng, J., Cheng, S., Xue, N., Zhou, Y., & Zafeiriou, S. (2018). UV-GAN: Adversarial facial UV map completion for pose-invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7093–7102).Google Scholar
  15. Dovgard, R., & Basri, R. (2004). Statistical symmetric shape from shading for 3D structure recovery of faces. In European conference on computer vision (ECCV) (pp. 99–113).Google Scholar
  16. Ferrari, C., Lisanti, G., Berretti, S., & Del Bimbo, A. (2016). Effective 3D based frontalization for unconstrained face recognition. In International conference on pattern recognition (ICPR) (pp. 1047–1052).Google Scholar
  17. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Conference on neural information processing systems (NeurIPS) (pp. 2672–2680).Google Scholar
  18. Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing (IVC), 28(5), 807–813.CrossRefGoogle Scholar
  19. Güler, R. A., Neverova, N., & Kokkinos, I. (2018). DensePose: Dense human pose estimation in the wild. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  20. Güler, R. A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., & Kokkinos, I. (2017). Densereg: Fully convolutional dense shape regression in-the-wild. In IEEE Conference on computer vision and pattern recognition (CVPR) (vol. 2, p. 5).Google Scholar
  21. Han, C., Shan, S., Kan, M., Wu, S., & Chen, X. (2018). Face recognition with contrastive convolution. In European conference on computer vision (ECCV) (pp. 118–134).Google Scholar
  22. Hassner, T. (2013). Viewing real-world faces in 3D. In IEEE international conference on computer vision (ICCV) (pp. 3607–3614).Google Scholar
  23. Hassner, T., Harel, S., Paz, E., & Enbar, R. (2015). Effective face frontalization in unconstrained images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4295–4304).Google Scholar
  24. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a Nash equilibrium. In Conference on neural information processing systems (NeurIPS) (pp. 6629–6640).Google Scholar
  25. Hu, Y., Wu, X., Yu, B., He, R., & Sun, Z. (2018). Pose-guided photorealistic face rotation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8398–8406).Google Scholar
  26. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep., University of Massachusetts, Amherst.Google Scholar
  27. Huang, H., He, R., Sun, Z., & Tan, T. (2017). Wavelet-SRnet: A wavelet-based CNN for multi-scale face super resolution. In IEEE international conference on computer vision (ICCV) (pp. 1689–1697).Google Scholar
  28. Huang, R., Zhang, S., Li, T., & He, R. (2017). Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In IEEE international conference on computer vision (ICCV) (pp. 2458–2467).Google Scholar
  29. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (ECCV) (pp. 694–711).Google Scholar
  30. Kan, M., Shan, S., Chang, H., & Chen, X. (2014). Stacked progressive auto-encoders (SPAE) for face recognition across poses. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1883–1890).Google Scholar
  31. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In International conference on learning representations (ICLR).Google Scholar
  32. Klare, B.F., Jain, A.K., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., & Burge, M. (2015). Pushing the frontiers of unconstrained face detection and recognition: IARPA janus benchmark a. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1931–1939).Google Scholar
  33. Li, J., Zhao, J., Zhao, F., Liu, H., Li, J., Shen, S., Feng, J., & Sim, T. (2016). Robust face recognition with deep multi-view representation learning. In ACM international conference on multimedia (ACM-MM) (pp. 1068–1072).Google Scholar
  34. Li, P., Wu, X., Hu, Y., He, R., & Sun, Z. (2019). M2FPA: A multi-yaw multi-pitch high-quality database and benchmark for facial pose analysis. IEEE international conference on computer vision (ICCV).Google Scholar
  35. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In IEEE international conference on computer vision (ICCV) (pp. 3730–3738).Google Scholar
  36. Lucey, S., & Chen, T. (2008). A viewpoint invariant, sparsely registered, patch based, face verifier. International Journal of Computer Vision (IJCV), 80(1), 58–71.CrossRefGoogle Scholar
  37. Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In IEEE international conference on computer vision (ICCV) (pp. 2813–2821).Google Scholar
  38. Masi, I., Rawls, S., Medioni, G., & Natarajan, P. (2016). Pose-aware face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4838–4846).Google Scholar
  39. Masi, I., Trn, A. T., Hassner, T., Leksut, J. T., & Medioni, G. (2016). Do we really need to collect millions of faces for effective face recognition? In European conference on computer vision (ECCV) (pp. 579–596).Google Scholar
  40. Miyato, T., & Koyama, M. (2018). cGANs with projection discriminator. In International conference on learning representations (ICLR).Google Scholar
  41. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch. In Conference on neural information processing systems (NeurIPS-W).Google Scholar
  42. Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 296–301).Google Scholar
  43. Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In International conference on learning representations (ICLR).Google Scholar
  44. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer assisted intervention (MICCAI) (pp. 234–241).Google Scholar
  45. Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2015). Robust statistical face frontalization. In IEEE international conference on computer vision (ICCV) (pp. 3871–3879).Google Scholar
  46. Sengupta, S., Chen, J. C., Castillo, C., Patel, V. M., Chellappa, R., & Jacobs, D. W. (2016). Frontal to profile face verification in the wild. In IEEE winter conference on applications of computer vision (WAC) (pp. 1–9).Google Scholar
  47. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 2, p. 5).Google Scholar
  48. Sun, X., Nasrabadi, N. M., & Tran, T. D. (2018). Supervised deep sparse coding networks. In IEEE international conference on image processing (ICIP) (pp. 346–350).Google Scholar
  49. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9).Google Scholar
  50. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In IEEE international conference on computer vision (ICCV) (pp. 1701–1708).Google Scholar
  51. Tian, Y., Peng, X., Zhao, L., Zhang, S., & Metaxas, D. N. (2018). CR-GAN: Learning complete representations for multi-view generation. In International joint conference on artificial intelligence (IJCAI).Google Scholar
  52. Tran, L., & Liu, X. (2018). Nonlinear 3D face morphable model. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  53. Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning GAN for pose-invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 3, p. 7).Google Scholar
  54. Tran, L.Q., Yin, X., & Liu, X. (2018). Representation learning by rotating your faces. In IEEE transactions on pattern analysis and machine intelligence (TPAMI).Google Scholar
  55. Van Gemert, J. C., Geusebroek, J. M., Veenman, C. J., & Smeulders, A. W. (2008). Kernel codebooks for scene categorization. In European conference on computer vision (ECCV) (pp. 696–709).Google Scholar
  56. Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13(11), 2884–2896.CrossRefGoogle Scholar
  57. Yan, J., Lin, D., & Loy, C. C. (2018). Consensus-driven propagation in massive unlabeled data for face recognition. In European conference on computer vision (ECCV).Google Scholar
  58. Yang, J., Reed, S.E., Yang, M.H., & Lee, H. (2015). Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In Conference on neural information processing systems (NeurIPS) (pp. 1099–1107).Google Scholar
  59. Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., & Kim, J. (2015). Rotating your face using multi-task deep neural network. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 676–684).Google Scholar
  60. Yin, X., Yu, X., Sohn, K., Liu, X., & Chandraker, M. (2017). Towards large-pose face frontalization in the wild. In IEEE international conference on computer vision (ICCV) (pp. 1–10).Google Scholar
  61. Zhao, J., Cheng, Y., Xu, Y., Xiong, L., Li, J., Zhao, F., Jayashree, K., Pranata, S., Shen, S., Xing, J., et al. (2018). Towards pose invariant face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2207–2216).Google Scholar
  62. Zhao, J., Xiong, L., Cheng, Y., Cheng, Y., Li, J., Zhou, L., Xu, Y., Karlekar, J., Pranata, S., Shen, S., et al. (2018). 3D-aided deep pose-invariant face recognition. In International joint conference on artificial intelligence (IJCAI).Google Scholar
  63. Zhao, J., Xiong, L., Jayashree, P. K., Li, J., Zhao, F., Wang, Z., Pranata, P. S., Shen, P. S., Yan, S., & Feng, J. (2017). Dual-agent GANs for photorealistic and identity preserving profile face synthesis. In Conference on neural information processing systems (NeurIPS) (pp. 66–76).Google Scholar
  64. Zhao, J., Xiong, L., Li, J., Xing, J., Yan, S., & Feng, J. (2018). 3D-aided dual-agent gans for unconstrained face recognition. IEEE transactions on pattern analysis and machine intelligence (TPAMI).Google Scholar
  65. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE international conference on computer vision (ICCV) (pp. 2242–2251).Google Scholar
  66. Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 146–155).Google Scholar
  67. Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 787–796).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Center for Research on Intelligent Perception and ComputingCASIABeijingChina
  2. 2.National Laboratory of Pattern RecognitionCASIABeijingChina
  3. 3.School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
  4. 4.Center for Excellence in Brain Science and Intelligence TechnologyCASBeijingChina

Personalised recommendations