Skip to main content
Log in

Towards High Fidelity Face Frontalization in the Wild

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Face frontalization refers to the process of synthesizing the frontal view of a face from a given profile. Due to self-occlusion and appearance distortion in the wild, it is extremely challenging to recover faithful high-resolution results meanwhile preserve texture details. This paper proposes a high fidelity pose in-variant model (HF-PIM) to produce photographic and identity-preserving results. HF-PIM frontalizes the profiles through a novel texture fusion warping procedure and leverages a dense correspondence field to bind the 2D and 3D surface spaces. We decompose the prerequisite of warping into dense correspondence field estimation and facial texture map recovering, which are both well addressed by deep networks. Different from those reconstruction methods relying on 3D data, we also propose adversarial residual dictionary learning to supervise facial texture map recovering with only monocular images. Furthermore, a multi-perception guided loss is proposed to address the practical misalignment between the ground truth frontal and profile faces, allowing HF-PIM to effectively utilize multiple images during training. Quantitative and qualitative evaluations on five controlled and uncontrolled databases show that the proposed method not only boosts the performance of pose-invariant face recognition but also improves the visual quality of high-resolution frontalization appearances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://github.com/1adrianb/face-alignment.

  2. Visualization results produced by other methods are released by their authors. Different methods usually report visual examples of different identities. We try our best to find those identities reported by most methods.

References

  • AbdAlmageed, W., Wu, Y., Rawls, S., Harel, S., Hassner, T., Masi, I., Choi, J., Lekust, J., Kim, J., Natarajan, P., et al. (2016). Face recognition using deep multi-pose representations. In IEEE winter conference on applications of computer vision (WACV) (pp. 1–9).

  • Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. In International conference on machine learning (ICML) (pp. 214–223).

  • Ashraf, A. B., Lucey, S., & Chen, T. (2008). Learning patch correspondences for improved viewpoint invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).

  • Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Annual conference on computer graphics and interactive techniques (SIGGRAPH) (pp. 187–194).

  • Booth, J., & Zafeiriou, S. (2014). Optimal UV spaces for facial morphable model construction. In IEEE international conference on image processing (ICIP) (pp. 4672–4676).

  • Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In IEEE international conference on computer vision (ICCV) (pp. 1021–1030).

  • Cao, J., Hu, Y., Yu, B., He, R., & Sun, Z. (2019). 3D aided duet GANs for multi-view face image synthesis. IEEE Transactions on Information Forensics and Security (TIFS), 14(8), 2028–2042.

    Article  Google Scholar 

  • Cao, J., Hu, Y., Zhang, H., He, R., & Sun, Z. (2018). Learning a high fidelity pose invariant model for high-resolution face frontalization. In Conference on neural information processing systems (NeurIPS).

  • Cao, K., Rong, Y., Li, C., Tang, X., & Loy, C.C. (2018). Pose-robust face recognition via deep residual equivariant mapping. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5187–5196).

  • Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Conference on neural information processing systems (NeurIPS) (pp. 2172–2180).

  • Cimpoi, M., Maji, S., & Vedaldi, A. (2015). Deep filter banks for texture recognition and segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3828–3836).

  • Cole, F., Belanger, D., Krishnan, D., Sarna, A., Mosseri, I., & Freeman, W.T. (2017). Synthesizing normalized faces from facial identity features. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3386–3395).

  • Dana, H.Z.J.X.K. (2017). Deep TEN: Texture encoding network. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2896–2905).

  • Deng, J., Cheng, S., Xue, N., Zhou, Y., & Zafeiriou, S. (2018). UV-GAN: Adversarial facial UV map completion for pose-invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7093–7102).

  • Dovgard, R., & Basri, R. (2004). Statistical symmetric shape from shading for 3D structure recovery of faces. In European conference on computer vision (ECCV) (pp. 99–113).

  • Ferrari, C., Lisanti, G., Berretti, S., & Del Bimbo, A. (2016). Effective 3D based frontalization for unconstrained face recognition. In International conference on pattern recognition (ICPR) (pp. 1047–1052).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Conference on neural information processing systems (NeurIPS) (pp. 2672–2680).

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing (IVC), 28(5), 807–813.

    Article  Google Scholar 

  • Güler, R. A., Neverova, N., & Kokkinos, I. (2018). DensePose: Dense human pose estimation in the wild. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Güler, R. A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., & Kokkinos, I. (2017). Densereg: Fully convolutional dense shape regression in-the-wild. In IEEE Conference on computer vision and pattern recognition (CVPR) (vol. 2, p. 5).

  • Han, C., Shan, S., Kan, M., Wu, S., & Chen, X. (2018). Face recognition with contrastive convolution. In European conference on computer vision (ECCV) (pp. 118–134).

  • Hassner, T. (2013). Viewing real-world faces in 3D. In IEEE international conference on computer vision (ICCV) (pp. 3607–3614).

  • Hassner, T., Harel, S., Paz, E., & Enbar, R. (2015). Effective face frontalization in unconstrained images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4295–4304).

  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a Nash equilibrium. In Conference on neural information processing systems (NeurIPS) (pp. 6629–6640).

  • Hu, Y., Wu, X., Yu, B., He, R., & Sun, Z. (2018). Pose-guided photorealistic face rotation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8398–8406).

  • Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep., University of Massachusetts, Amherst.

  • Huang, H., He, R., Sun, Z., & Tan, T. (2017). Wavelet-SRnet: A wavelet-based CNN for multi-scale face super resolution. In IEEE international conference on computer vision (ICCV) (pp. 1689–1697).

  • Huang, R., Zhang, S., Li, T., & He, R. (2017). Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In IEEE international conference on computer vision (ICCV) (pp. 2458–2467).

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (ECCV) (pp. 694–711).

  • Kan, M., Shan, S., Chang, H., & Chen, X. (2014). Stacked progressive auto-encoders (SPAE) for face recognition across poses. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1883–1890).

  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In International conference on learning representations (ICLR).

  • Klare, B.F., Jain, A.K., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., & Burge, M. (2015). Pushing the frontiers of unconstrained face detection and recognition: IARPA janus benchmark a. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1931–1939).

  • Li, J., Zhao, J., Zhao, F., Liu, H., Li, J., Shen, S., Feng, J., & Sim, T. (2016). Robust face recognition with deep multi-view representation learning. In ACM international conference on multimedia (ACM-MM) (pp. 1068–1072).

  • Li, P., Wu, X., Hu, Y., He, R., & Sun, Z. (2019). M2FPA: A multi-yaw multi-pitch high-quality database and benchmark for facial pose analysis. IEEE international conference on computer vision (ICCV).

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In IEEE international conference on computer vision (ICCV) (pp. 3730–3738).

  • Lucey, S., & Chen, T. (2008). A viewpoint invariant, sparsely registered, patch based, face verifier. International Journal of Computer Vision (IJCV), 80(1), 58–71.

    Article  Google Scholar 

  • Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In IEEE international conference on computer vision (ICCV) (pp. 2813–2821).

  • Masi, I., Rawls, S., Medioni, G., & Natarajan, P. (2016). Pose-aware face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4838–4846).

  • Masi, I., Trn, A. T., Hassner, T., Leksut, J. T., & Medioni, G. (2016). Do we really need to collect millions of faces for effective face recognition? In European conference on computer vision (ECCV) (pp. 579–596).

  • Miyato, T., & Koyama, M. (2018). cGANs with projection discriminator. In International conference on learning representations (ICLR).

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch. In Conference on neural information processing systems (NeurIPS-W).

  • Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 296–301).

  • Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In International conference on learning representations (ICLR).

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer assisted intervention (MICCAI) (pp. 234–241).

  • Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2015). Robust statistical face frontalization. In IEEE international conference on computer vision (ICCV) (pp. 3871–3879).

  • Sengupta, S., Chen, J. C., Castillo, C., Patel, V. M., Chellappa, R., & Jacobs, D. W. (2016). Frontal to profile face verification in the wild. In IEEE winter conference on applications of computer vision (WAC) (pp. 1–9).

  • Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 2, p. 5).

  • Sun, X., Nasrabadi, N. M., & Tran, T. D. (2018). Supervised deep sparse coding networks. In IEEE international conference on image processing (ICIP) (pp. 346–350).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9).

  • Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In IEEE international conference on computer vision (ICCV) (pp. 1701–1708).

  • Tian, Y., Peng, X., Zhao, L., Zhang, S., & Metaxas, D. N. (2018). CR-GAN: Learning complete representations for multi-view generation. In International joint conference on artificial intelligence (IJCAI).

  • Tran, L., & Liu, X. (2018). Nonlinear 3D face morphable model. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning GAN for pose-invariant face recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 3, p. 7).

  • Tran, L.Q., Yin, X., & Liu, X. (2018). Representation learning by rotating your faces. In IEEE transactions on pattern analysis and machine intelligence (TPAMI).

  • Van Gemert, J. C., Geusebroek, J. M., Veenman, C. J., & Smeulders, A. W. (2008). Kernel codebooks for scene categorization. In European conference on computer vision (ECCV) (pp. 696–709).

  • Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13(11), 2884–2896.

    Article  Google Scholar 

  • Yan, J., Lin, D., & Loy, C. C. (2018). Consensus-driven propagation in massive unlabeled data for face recognition. In European conference on computer vision (ECCV).

  • Yang, J., Reed, S.E., Yang, M.H., & Lee, H. (2015). Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In Conference on neural information processing systems (NeurIPS) (pp. 1099–1107).

  • Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., & Kim, J. (2015). Rotating your face using multi-task deep neural network. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 676–684).

  • Yin, X., Yu, X., Sohn, K., Liu, X., & Chandraker, M. (2017). Towards large-pose face frontalization in the wild. In IEEE international conference on computer vision (ICCV) (pp. 1–10).

  • Zhao, J., Cheng, Y., Xu, Y., Xiong, L., Li, J., Zhao, F., Jayashree, K., Pranata, S., Shen, S., Xing, J., et al. (2018). Towards pose invariant face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2207–2216).

  • Zhao, J., Xiong, L., Cheng, Y., Cheng, Y., Li, J., Zhou, L., Xu, Y., Karlekar, J., Pranata, S., Shen, S., et al. (2018). 3D-aided deep pose-invariant face recognition. In International joint conference on artificial intelligence (IJCAI).

  • Zhao, J., Xiong, L., Jayashree, P. K., Li, J., Zhao, F., Wang, Z., Pranata, P. S., Shen, P. S., Yan, S., & Feng, J. (2017). Dual-agent GANs for photorealistic and identity preserving profile face synthesis. In Conference on neural information processing systems (NeurIPS) (pp. 66–76).

  • Zhao, J., Xiong, L., Li, J., Xing, J., Yan, S., & Feng, J. (2018). 3D-aided dual-agent gans for unconstrained face recognition. IEEE transactions on pattern analysis and machine intelligence (TPAMI).

  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE international conference on computer vision (ICCV) (pp. 2242–2251).

  • Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 146–155).

  • Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 787–796).

Download references

Acknowledgements

This work is funded by the National Key Research and Development Program of China (Grant Nos. 2016YFB1001001, 2017YFC0821602), the National Natural Science Foundation of China (Grant Nos. 61622310, 61427811, U1836217), and Beijing Natural Science Foundation (Grant No. JQ18017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenan Sun.

Additional information

Communicated by Xavier Alameda-Pineda, Elisa Ricci, Albert Ali Salah, Nicu Sebe, Shuicheng Yan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, J., Hu, Y., Zhang, H. et al. Towards High Fidelity Face Frontalization in the Wild. Int J Comput Vis 128, 1485–1504 (2020). https://doi.org/10.1007/s11263-019-01229-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01229-6

Keywords

Navigation