Skip to main content
Log in

Personalized smile synthesis using attention-guided global parametric model and local non-parametric model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

A Correction to this article was published on 23 December 2022

This article has been updated

Abstract

This study proposes a new learning-based smile synthesis system, in which a given neutral facial image is automatically transferred as a smile result in a certain style. Although the example-based face synthesis framework has made great progress recently, the construction of robust transformation, the preservation of personal characteristics and the production of high-quality images, etc. remain unresolved problems. These questions are addressed in the proposed framework using a new expression attention-guided global parametric model and local non-parametric model. Our key innovations include (a) a flexible framework design that produces expression attention regions with only expression category labels as supervision, (b) a novel smile style analysis framework that explores different smile styles from training samples that are then used to guide more robust face modeling, and (c) a two-step expression transformation approach is proposed that integrates global parametric models for robust prediction of expression geometry and local non-parametric models for high-quality image generation. Experimental results show that in the case of a limited training data scenario, the facial images obtained using the proposed framework are more vivid than those generated using existing synthesis methods. In addition, the proposed method can be extended directly to the image-to-image transformation task to produce high-quality hallucinations of faces, which is very importance in digital entertainment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author, Ching-Ting Tu, upon reasonable request.

Change history

References

  1. Bouaziz S, Pauly M (2014) Semi-Supervised Facial Animation Retargeting. EPFL Technical Report. #202143

  2. Bozorgtabar B, Mahapatra D, Thiran J-P (2020) ExprADA: Adversarial Domain Adaptation for Facial Expression Analysis. Patt Recognit. 107111

  3. Choi Y, Choi M-J, Kim M, Ha J-W, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. CVPR:8789–8797

  4. Choi Y, Uh Y, Yoo J, Ha J-W (2020) StarGAN v2: diverse image synthesis for multiple domains. CVPR:8185–8194

  5. Chowdhary CL, Patel PV, Kathrotia KJ, Attique M, Kumaresan P, Ijaz MF (2020) Analytical Study of Hybrid Techniques for Image Encryption and Decryption. Sensors. 5162

  6. Deng Z, Neumann U, Lewis JP, Kim TY, Bulut M, Narayanan S (2006) Expressive facial animation synthesis by learning speech Coarticulation and expression spaces. IEEE Trans Vis Comput Graph 12:1523–1534

    Article  Google Scholar 

  7. Etoundi CML, Nkapkop JDD, Tsafack N, Ngono JM, Ele P, Wozniak M, Shafi J, Ijaz MF (2022) A Novel Compound-Coupled Hyperchaotic Map for Image Encryption. Symmetry. 493

  8. Fan G-F, Zhang L-Z, Yu M, Hong W-C, Dong S-Q (2022) Applications of random forest in multivariable response surface for short-term load forecasting. Int J Electrical Power Energy Syst

  9. Freeman WT, Pasztor EC (1999) Learning low-level vision. ICCV:1182–1189

  10. Ghahramani Z, Hinton GE(1997) The EM Algorithm for Mixtures of Factor Analyzers. Technical Report CRG-TR-96-1

  11. Gong B, Wang Y, Liu J, Tang X (2009) Automatic facial expression recognition on a dingle 3D face by exploring shape deformation. ACM Multimedia:569–572

  12. Huang D, Torre FDL (2010) Bilinear Kernel Reduced Rank Regression for Facial Expression Synthesis. ECCV. 364–377

  13. Huang L, Su C (2006) Facial expression synthesis using manifold learning and belief propagation. SoftComput:1193–1200

  14. Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. Int Conf Aut Face Gesture Recogn:46–53

  15. Khan N, Akram A, Mahmood A, Ashraf S, Murtaza K (2020) Masked linear regression for learning local receptive fields for facial expression synthesis. Int J Comput Vis 128:1433–1454

    Article  Google Scholar 

  16. Li K, Dai Q, Wang R, Liu Y, Xu F, Wang J (2014) A data-driven approach for facial expression retargeting in video. IEEE Trans Multimedia 16:299–310

    Article  Google Scholar 

  17. Liu W, Chen W, Yang Z, Shen L (2021) Translate the facial regions you like using self-adaptive region translation. AAAI 35:2180–2188

    Article  Google Scholar 

  18. Lu Z, Hu T, Song L, Zhang Z, He R (2018) Conditional expression synthesis with face parsing transformation. ACM Multimedia:1083–1091

  19. Mohammed U, Prince SJD, Kautz J (2009) Visio-lization: generating novel facial images SIGGRAPH

  20. Noh JY, Neumann U(2006) Expression Cloning. ACM SIGGRAPH courses

  21. Peng Y, Yin H (2019) ApprGAN: appearance-based GAN for facial expression synthesis. IET Image Process 13:2706–2715

    Article  Google Scholar 

  22. Pumarola A, Agudo A, Martínez AM, Sanfeliu A, Moreno-Noguer F (2020) GANimation: one-shot anatomically consistent facial animation. Int J Comput Vis 128:698–713

    Article  Google Scholar 

  23. Sahoo KK, Dutta I, Ijaz MF, Wozniak M, Singh PK (2021) TLEFuzzyNet: fuzzy rank-based Ensemble of Transfer Learning Models for emotion recognition from human speeches. IEEE Access 9:166518–166530

    Article  Google Scholar 

  24. Song Y, Bao L, Yang Q, Yang M-H (2014) Real-time Exemplar-based Face Sketch Synthesis. Proc. ECCV. pp. 800–813

  25. Tamang J, Nkapkop JDD, Ijaz MF, Prasad PK, Tsafack N, Saha A, Kengne J, Son Y (2021) Dynamical properties of ion-acoustic waves in space plasma and its application to image encryption. IEEE Access 9:18762–18782

    Article  Google Scholar 

  26. Tang H, Liu H, Xu D, Torr PHS, Sebe N (2021) AttentionGAN: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans Neural Networks Learn Syst

  27. Torralba A, Murphy KP, Freeman WT (2007) Sharing visual features for multiclass and multi-view object detection. IEEE Trans Patt Anal Mach Intell 29:854–896

    Article  Google Scholar 

  28. Tran DL, Walecki RT, Rudovic O, Eleftheriadis S, Schuller BW, Pantic M (2017) DeepCoder: Semi-Parametric Variational Autoencoders for Automatic Facial Action Coding. ICCV. pp.3209–3218

  29. Wang S, Gu XD, Qin H (2008) Automatic non-rigid registration of 3D dynamic data for facial expression. IEEE Conf Comput Vision Patt Recogn 2008:1–8

    Google Scholar 

  30. Xia J, Quynh DTP, He Y, Chen X, Hoi SCH (2012) Modeling and compressing 3-D facial expressions using geometry videos. IEEE Trans Circ Syst Video Technol 22:77–90

    Article  Google Scholar 

  31. Xu W, Xie X, Lai J (2021) RelightGAN: instance-level generative adversarial network for face illumination transfer. IEEE Trans Image Process 30:3450–3460

    Article  Google Scholar 

  32. Yun T, Guan L (2013) A deformable 3-D facial expression model for dynamic human emotional state recognition. IEEE Trans Circ Syst Video Technol:142–157

  33. Zhang Q, Liu Z, Quo G, Terzopoulos D, Shum HY (2006) Geometry-driven photorealistic facial expression synthesis. IEEE Trans Vis Comput Graph 12(1):48–60

    Article  Google Scholar 

  34. Zhang Y, Ji Q, Zhu Z, Yi B (2008) Dynamic facial expression analysis and synthesis with MPEG-4 facial animation parameters. IEEE Trans Circ Syst Video Technol 18:1383–1396

    Article  Google Scholar 

  35. Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose-invariant facial expression recognition. IEEE Trans Image Process:4445–4460

  36. Zhang F, Zhang T, Mao Q, Xu C (2020) A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Trans Image Process 29:6574–6589

    Article  MATH  Google Scholar 

  37. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV:2242–2251

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology of Taiwan under Grant numbers MOST 109-2221-E-005 -056 -MY2. We thank anonymous reviewers for the insightful comments that improved this paper. We acknowledge all the authors for distributing the source code into the public domain and allowing us to use it as a basis for modifying it as comparison methods in this study.

Funding

This work was supported by the Ministry of Science and Technology of Taiwan under Grant numbers MOST 109-2221-E-005 -056 -MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Ting Tu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The affiliation of the 2nd and 3rd authors in the original publication of this article was incorrect.

Appendix

Appendix

The detailed algorithm for the extraction of the expression-variant patch is shown in Algorithm 1, where smiling images and neutral images in the training set are taken as positive and negative samples, respectively. The goal of this algorithm is to extract a set of discriminative features to classify these positive and negative samples, where the facial regions of these extracted features are defined as expression-variant patches (EVP) in this study.

Algorithm 1
figure e

Expression-variant patch extraction by the GentleBoost algorithm [27].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, CT., Hsieh, SH., Chen, KL. et al. Personalized smile synthesis using attention-guided global parametric model and local non-parametric model. Multimed Tools Appl 82, 21585–21609 (2023). https://doi.org/10.1007/s11042-022-14260-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14260-6

Keywords

Navigation