Facial landmark disentangled network with variational autoencoder

Liang, Sen; Zhou, Zhi-ze; Guo, Yu-dong; Gao, Xuan; Zhang, Ju-yong; Bao, Hu-jun

doi:10.1007/s11766-022-4589-0

Facial landmark disentangled network with variational autoencoder

Open access
Published: 16 June 2022

Volume 37, pages 290–305, (2022)
Cite this article

Download PDF

You have full access to this open access article

Applied Mathematics-A Journal of Chinese Universities Aims and scope Submit manuscript

Facial landmark disentangled network with variational autoencoder

Download PDF

Sen Liang¹,
Zhi-ze Zhou¹,
Yu-dong Guo¹,
Xuan Gao²,
Ju-yong Zhang² &
…
Hu-jun Bao¹

610 Accesses
4 Citations
Explore all metrics

Abstract

Learning disentangled representation of data is a key problem in deep learning. Specifically, disentangling 2D facial landmarks into different factors (e.g., identity and expression) is widely used in the applications of face reconstruction, face reenactment and talking head et al.. However, due to the sparsity of landmarks and the lack of accurate labels for the factors, it is hard to learn the disentangled representation of landmarks. To address these problem, we propose a simple and effective model named FLD-VAE to disentangle arbitrary facial landmarks into identity and expression latent representations, which is based on a Variational Autoencoder framework. Besides, we propose three invariant loss functions in both latent and data levels to constrain the invariance of representations during training stage. Moreover, we implement an identity preservation loss to further enhance the representation ability of identity factor. To the best of our knowledge, this is the first work to end-to-end disentangle identity and expression factors simultaneously from one single facial landmark.

Article PDF

Reconstructing Neutral Face Expressions with Disentangled Variational Autoencoder

Deep learning face representation by fixed erasing in facial landmarks

Article 26 June 2019

SL2E-AFRE : Personalized 3D face reconstruction using autoencoder with simultaneous subspace learning and landmark estimation

Article 31 October 2020

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

V Blanz, T Vetter. A morphable model for the synthesis of 3d faces, the 26th annual conference on Computer graphics and interactive techniques, 1999, 187–194.
A Bulat, G Tzimiropoulos. How far are we from solving the 2d&3d face alignment problem? (and a dataset of 230,000 3d facial landmarks), IEEE International Conference on Computer Vision, 2017, 1021–1030.
E Burkov, I Pasechnik, A Grigorev, V Lempitsky. Neural head reenactment with latent pose descriptors, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 13786–13795.
L Chen, Z Li, R K Maddox, Z Duan, C Xu. Lip movements generation at a glance, European Conference on Computer Vision (ECCV), 2018, 520–535.
X Chen, Y Duan, R Houthooft, J Schulman, I Sutskever, P Abbee. Infogan: interpretable representation learning by information maximizing generative adversarial nets, International Conference on Neural Information Processing Systems, 2016, 2180–2188.
J S Chung, A Zisserman. Lip reading in the wild, Asian Conference on Computer Vision, 2016, 87–103.
M Cooke, J Barker, S Cunningham, X Shao. An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America, 2006, 120(5): 2421–2424.
Article Google Scholar
G E Dahl, T N Sainath, G E Hinton. Improving deep neural networks for lvcsr using rectified linear units and dropout, IEEE international conference on acoustics, speech and signal processing, 2013, 8609–8613.
J Deng, J Guo, N Xue, S Zafeidiou. Arcface: Additive angular margin loss for deep face recognition, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 4685–4694.
Y Feng, H Feng, M J Black, T Bolkart. Learning an animatable detailed 3d face model from in-the-wild images, arXiv preprint, 2020, arXiv: 2012.04012.
Y Feng, F Wu, X Shao, Y Wang, X Zhou. Joint 3D face reconstruction and dense alignment with position map regression network, European Conference on Computer Vision (ECCV), 2018, 534–551.
T Gerig, A Morel-Forster, C Blumer, B Egger, M Luthi, S Schoenborn, T Vetter. Morphable Face Models - An Open Framework, 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2013), 2018, 75–82.
I Gogic, J Ahlberg, I S Pandzic. Regression-based methods for face alignment: A survey, Signal Processing, 2021, 178: 107755–107774.
Article Google Scholar
I Higgins, L Matthey, A Pal, C Burgess, X Glorot, M Botvinick, S Mohamed, A Lerchner. Beta-vae: Learning basic visual concepts with a constrained variational framework, International Conference on Learning Representations(ICLR), 2017.
X Hui. A survey for 2d and 3d face alignment, International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 2019, 57–63.
Z H Jiang, Q Wu, K Chen, J Zhang. Disentangled representation learning for 3d face shape, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 11949–11958.
T Karras, S Laine, T Aila. A style-based generator architecture for generative adversarial networks, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 4401–4410.
D E King. Dlib-ML: A machine learning toolkit, The Journal of Machine Learning Research, 2009, 10: 1755–1758.
Google Scholar
D P Kingma, M Welling. Auto-encoding variational bayes, arXiv preprint, 2013, arXiv:1312.6114.
T D Kulkarni, W F Whitney, P Kohli, J B Tenenbaum. Deep convolutional inverse graphics network, International Conference on Neural Information Processing Systems(NeurIPS), 2015, 2: 2539–2547.
Google Scholar
O Langner, R Dotsch, G Bijlstra, D H Wigboldus, S T Hawk, A V Knippenberg. Presentation and validation of the radboud faces database, Cognition and Emotion, 2010, 24(8): 1377–1388.
Article Google Scholar
W Lee, D Kim, S Hong, H Lee. High-Fidelity Synthesis with Disentangled Representation, arXiv e-prints, 2020, arXiv:2001.04296.
T Li, T Bolkart, M J Black, H Li, J Romero. Learning a model of facial shape and expression from 4d scans, ACM Transactions on Graphics (TOG), 2017, 36: 1–17.
Article Google Scholar
A Paszke, S Gross, F Massa, A Lerer, J Bradbury, G Chanan, T Killeen, Z Lin, N Gimelshein, L Antiga, et al. Pytorch: An imperative style, high-performance deep learning library, arXiv preprint, 2019, arXiv: 1912.01703.
P Paysan, R Knothe, B Amberg, S Romdhani, T Vetter. A 3d face model for pose and illumination invariant face recognition, IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009, 296–301.
H X Pham, Y Wang, V Pavlovic. End-to-end learning for 3d facial animation from speech, ACM International Conference on Multimodal Interaction, 2018, 361–365.
A Ranjan, T Bolkart, S Sanyal, M J Black. Generating 3d faces using convolutional mesh autoencoders, European Conference on Computer Vision (ECCV), 2018, 704–720.
A Richard, C Lea, S Ma, J Gall, F De La Torre, Y Sheikh. Audio-and gaze-driven facial animation of codec avatars, IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, 41–50.
S Sanyal, T Bolkart, H Feng, M J Black. Learning to regress 3d face shape and expression from an image without 3d supervision, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 7763–7772.
S Sinha, S Biswas, B Bhowmick. Identity-preserving realistic talking face generation, International Joint Conference on Neural Networks (IJCNN), 2020, 1–10.
L Sirovich, M Kirby. Low-dimensional procedure for the characterization of human faces, Journal of the Optical Society of America A, 1987, 4(3): 519–24.
Article Google Scholar
B Sisman, J Yamagishi, S King, H Li. An overview of voice conversion and its challenges:From statistical modeling to deep learning, IEEE/ACM Transactions on Audio, Speech and Language Processing, 2021, 29: 132–157.
Article Google Scholar
J Thies, M Zollhofer, M Stamminger, C Theobalt, M Niessner. Face2face: Real-time face capture and reenactment of rgb videos, IEEE conference on computer vision and pattern recognition, 2016, 2387–2395.
X Wen, M Wang, C Richardt, Z Chen, S Hu. Photorealistic audio-driven video portraits, IEEE Transactions on Visualization and Computer Graphics, 2020, 26(12): 3457–3466.
Article Google Scholar
S Xiang, Y Gu, P Xiang, M He, K Nagno, H Chen, H Li. One-shot identity-preserving portrait reenactment, arXiv e-prints, 2020, arXiv: 2004.12452.
Z Yang, W Zhu, W Wu, C Qian, Q Zhou, B Zhou, C Loy. Transmomo:Invariance-driven unsupervised video motion retargeting, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 5306–5315.
M E Yumer, N J Mitra. Spectral style transfer for human motion between independent actions, ACM Transactions on Graphics (TOG), 2016, 35(4): 1–8.
Article Google Scholar
J Zhang, X Zeng, M Wang, Y Pan, L Liu, Y Liu, Y Ding, C Fan. Freenet: Multi-identity face reenactment, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 5326–5335.
H Zhou, Y Liu, Z Liu, P Luo, X Wang. Talking face generation by adversarially disentangled audio-visual representation, Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 9299–9306.
Article Google Scholar
Y Zhou, X Han, E Shechtman, J Echevarria, E Kalogerakis, D Li. Makelttalk: speaker-aware talking-head animation, ACM Transactions on Graphics (TOG), 2020, 39(6): 1–15.
Google Scholar

Download references

Funding

Supported by the National Natural Science Foundation of China(61210007).

Author information

Authors and Affiliations

CAD & CG State Labs, Zhejiang University, Hangzhou, 310027, China
Sen Liang, Zhi-ze Zhou, Yu-dong Guo & Hu-jun Bao
School of Mathematical Sciences, University of Science and Technology of China, Hefei, 230026, China
Xuan Gao & Ju-yong Zhang

Authors

Sen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-ze Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yu-dong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ju-yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hu-jun Bao
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articles Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, S., Zhou, Zz., Guo, Yd. et al. Facial landmark disentangled network with variational autoencoder. Appl. Math. J. Chin. Univ. 37, 290–305 (2022). https://doi.org/10.1007/s11766-022-4589-0

Download citation

Received: 15 October 2021
Revised: 17 February 2022
Published: 16 June 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11766-022-4589-0

MR Subject Classification

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Facial landmark disentangled network with variational autoencoder

Abstract

Article PDF

Similar content being viewed by others

Reconstructing Neutral Face Expressions with Disentangled Variational Autoencoder

Deep learning face representation by fixed erasing in facial landmarks

SL2E-AFRE : Personalized 3D face reconstruction using autoencoder with simultaneous subspace learning and landmark estimation

References

Funding

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

MR Subject Classification

Keywords

Navigation

Facial landmark disentangled network with variational autoencoder

Abstract

Article PDF

Similar content being viewed by others

Reconstructing Neutral Face Expressions with Disentangled Variational Autoencoder

Deep learning face representation by fixed erasing in facial landmarks

SL2E-AFRE : Personalized 3D face reconstruction using autoencoder with simultaneous subspace learning and landmark estimation

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

MR Subject Classification

Keywords

Search

Navigation