Skip to main content
Log in

Lip syncing method for realistic expressive 3D face model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41

Similar content being viewed by others

References

  1. Anh LQ, Pelachaud C (2011) Expressive Gesture Model for Humanoid Robot. Springer Verlag Berlin Heidelberg:224–231

  2. Bailly G, Raidt S, Elisei F (2010) Gaze, conversational agents and face-to-face communication. Speech Commun 52(6):598–612

    Article  Google Scholar 

  3. Balcı K (2007a) Xface: MPEG-4 Based Open Source Toolkit for 3D Facial Animation. Proceedings of the 15th international conference on Multimedia. ACM:399–402

  4. Balcı K (2007b) Xface: MPEG-4 based open source toolkit for 3D facial animation. ITCirst, Cogn. Commun. Technol

    Google Scholar 

  5. Balcı K, Zancanaro M, Pianesi F (2007) Xface open source project and SMIL-agent scripting language for creating and animating embodied conversational agents. Proc. 15th Int. Conf. Multimedia. ACM:1013–1016

  6. Bao C. A facial animation system for generating complex expressions. APSIPA ASC 2011.

  7. Black, Alan W., Clark, Rob, Richmond, Korin, King, Simon, Zen, Heiga, Taylor, Paul, and Caley, Richard. The festival speech synthesis system. The festival speech synthesis system 2006. [Online]. Available: http://www.cstr.ed.ac.uk/projects/festival .

  8. Cassell, J., Vilhjálmsson H. H., and Bickmore T. (2001) BEAT: the Behavior Expression Animation Toolkit. in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 137, 477–486.

  9. Cerekovic A, Pandžic IS (2011) Multimodal behavior realization for embodied conversational agents. Multimed Tools Appl 54(1):143–164

  10. Cerekovic A, Pejša T, Pandžic IS (2010) A controller-based animation system for synchronizing and realizing human-like conversational behaviors. pp. 80–91

  11. De’Mello S, Olney A, Williams C, Hays P (2012) Gaze tutor: a gaze-reactive intelligent tutoring system. Int J Hum Comput Stud 70(5):377–398

    Article  Google Scholar 

  12. Ekman, Paul. (1999) Basic emotions. San Francisco: University of california handbook of cognition and emotion 1999. p. chapter 3.

  13. FaceFX 2015. [Online]. Available: http://www.facefx.com/.

  14. Frantz, S., Rohr, K., and Siegfried Stiehl, H. (1998) Multi-step procedures for the localization of 2D and 3D point landmarks and automatic ROI size selection. Computer Vision (ECCV'98). Springer, pp. 687–703.

  15. Frantz, S., Rohr K., and Siegfried Stiehl H. (2000) Localization of 3D anatomical point landmarks in 3D tomographic images using deformable models. Medical image computing and computer-assisted intervention (MICCAI). Springer-Verlag. Berlin. pp. 492–501.

  16. Gillies M, Pan X, Slater M (2010) Piavca: a framework for heterogeneous interactions with virtual characters. Virtual Real 14(4):221–228

    Article  Google Scholar 

  17. Hong P, Wen Z, Huang TS (2002) Real-time speech-driven face animation with expressions using neural networks. IEEE Trans Neural Netw 13(4):916–927

    Article  Google Scholar 

  18. Kessler, Brett and Treiman, Rebecca. (2002) Syllable structure and the distribution of phonemes in english syllables. Journal of Memory and Language, 2002. [Online]. Available: http://www.artsci.wustl.edu/~bkessler/SyllStructDistPhon/CVC.html.

  19. Kolivand H, Sunar MS (2015) A survey of shadow volume algorithms in computer graphics. IETE 30(1):38–46

    Article  Google Scholar 

  20. Kowler E (2011) Eye movements: the past 25 years. Vis Res 51(13):1457–1483

    Article  Google Scholar 

  21. Lee C, Lee S, Chin S (2011) Multi-layer structural wound synthesis on 3D face. Computer animation and Virtual Worlds Comp 22(2–5):177–185

  22. Lee S, Carlson G, Jones S, Johnson A, Leigh J, Renambot L (2010) Designing an expressive avatar of a real person in intelligent virtual agents. pp. 64–76

  23. Lee Y, Terzopoulos D, Walters K (1995) Realistic modeling for facial animation. Proc. 22nd Annu. Conf. Comput. Graph. Interact. Tech. SIGGRAPH 95(1):55–62

    Google Scholar 

  24. Leone, G. R., Paci G., and Cosi P. (2012) LUCIA : An Open Source 3D Expressive Avatar for Multimodal h. m. i. Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. pp. 193–202.

  25. Leuski A, Richmond T (2014) Mobile personal healthcare mediated by virtual humans. IUI 2014 Demonstration. pp. 21–24

  26. Li Z, Mao X (2011) EEMML: the emotional eye movement animation toolkit. Multimed Tools Appl 60(1):181–201

    Article  Google Scholar 

  27. Li B, Zhang Q, Zhou D, Wei X (2013) Facial animation based on feature points. TELKOMNIKA 11 no. 3

  28. Pandzic IS, Forchheimer R (2003) MPEG-4 facial animation: the standard, implementation and applications. John Wiley & Sons, New York

    Google Scholar 

  29. Pasquariello S., Pelachaud, C., and Kyneste, S. A. (2001) Greta: A simple facial animation engine facial animation coding in MPEG-4 standard. Proc. 6th Online World Conf. Soft Comput. Ind. Appl 2001.

  30. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572

    Article  MATH  Google Scholar 

  31. Queiroz RB, Cohen M, Musse SR (2009) An extensible framework for interactive facial animation with facial expressions, lip synchronization and eye behavior. Comput Entertain 7(4):1

    Article  Google Scholar 

  32. Raouzaiou SKA, Tsapatsoulis N, Karpouzis K (2002) Parameterized facial expression synthesis based on mpeg-4. Eurasip J Appl Signal Process 10:1021–1038

    MATH  Google Scholar 

  33. Serra J, Ribeiro M, Freitas J, Orvalho V (2012) A proposal for a visual speech animation system. Springer-Verlag, Berlin Heidelb, pp 267–276

    Google Scholar 

  34. Shapiro A (2011) Building a character animation system. LNCS 7060, Springer-Verlag Berlin Heidelberg, pp. 98–109

  35. Singular Inversions. (2006) Facegen software.

  36. Somasundaram A. (2006) AUDIO-VISUAL SPEECH. The Ohio State University.

  37. Sphinx Group Carnegie Mellon University. Cmu sphinx project. 2006. [Online]. Available: http://cmusphinx.sourceforge.net.

  38. Taylor SL, Mahler M, Theobald B, Matthews I (2012) Dynamic units of visual speech. Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 245–250

  39. TRueSpel. (2001a) English-truespel (USA Accent) text conversion tool. [Online]. Available: http://www.foreignword.com/dictionary/truespel/transpel.htm.

  40. TRueSpel. (2001b) English-truespel (USA Accent) text conversion tool.

  41. Vezzetti, E. and Marcolin, F. (2014) Geometry-based 3D face morphology analysis: soft-tissue landmark formalization. Multimedia tools and applications 2014. 895-929.

  42. Vezzetti E., Marcolin F., Stola V. (2013) 3D human face soft tissues landmarking method: An Advanced Approach. Computers in Industry. ISSN 0166–3615.

  43. Wei L, Deng Z (2015) A practical model for live speech-driven lip-sync. IEEE Computer Graphics and Applications 35(2):70–78

  44. Xu Y., Feng A. W., Marsella S., and Shapiro A. A Practical and Configurable Lip Sync Method for Games. (2013) Proc. Motion Games - MIG. pp. 109–118.

  45. Zhang S., Wu Z., Meng H. M., and Cai L. (2010) Facial expression synthesis based on emotion dimensions for affective talking avatar. T. Nishida. pp. 109–132.

  46. Zhao X, Dellandréa E, Zou J, Chen L (2013) A unified probabilistic framework for automatic 3D facial expression analysis based on a Bayesian belief inference and statistical feature models. Image Vis Comput 31(3):231–245

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Itimad Raheem Ali.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, I.R., Kolivand, H. & Alkawaz, M.H. Lip syncing method for realistic expressive 3D face model. Multimed Tools Appl 77, 5323–5366 (2018). https://doi.org/10.1007/s11042-017-4437-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4437-z

Keywords

Navigation