Lip syncing method for realistic expressive 3D face model

Ali, Itimad Raheem; Kolivand, Hoshang; Alkawaz, Mohammed Hazim

doi:10.1007/s11042-017-4437-z

Lip syncing method for realistic expressive 3D face model

Published: 21 March 2017

Volume 77, pages 5323–5366, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Itimad Raheem Ali¹,
Hoshang Kolivand² &
Mohammed Hazim Alkawaz^3,4

668 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anh LQ, Pelachaud C (2011) Expressive Gesture Model for Humanoid Robot. Springer Verlag Berlin Heidelberg:224–231
Bailly G, Raidt S, Elisei F (2010) Gaze, conversational agents and face-to-face communication. Speech Commun 52(6):598–612
Article Google Scholar
Balcı K (2007a) Xface: MPEG-4 Based Open Source Toolkit for 3D Facial Animation. Proceedings of the 15th international conference on Multimedia. ACM:399–402
Balcı K (2007b) Xface: MPEG-4 based open source toolkit for 3D facial animation. ITCirst, Cogn. Commun. Technol
Google Scholar
Balcı K, Zancanaro M, Pianesi F (2007) Xface open source project and SMIL-agent scripting language for creating and animating embodied conversational agents. Proc. 15th Int. Conf. Multimedia. ACM:1013–1016
Bao C. A facial animation system for generating complex expressions. APSIPA ASC 2011.
Black, Alan W., Clark, Rob, Richmond, Korin, King, Simon, Zen, Heiga, Taylor, Paul, and Caley, Richard. The festival speech synthesis system. The festival speech synthesis system 2006. [Online]. Available: http://www.cstr.ed.ac.uk/projects/festival .
Cassell, J., Vilhjálmsson H. H., and Bickmore T. (2001) BEAT: the Behavior Expression Animation Toolkit. in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 137, 477–486.
Cerekovic A, Pandžic IS (2011) Multimodal behavior realization for embodied conversational agents. Multimed Tools Appl 54(1):143–164
Cerekovic A, Pejša T, Pandžic IS (2010) A controller-based animation system for synchronizing and realizing human-like conversational behaviors. pp. 80–91
De’Mello S, Olney A, Williams C, Hays P (2012) Gaze tutor: a gaze-reactive intelligent tutoring system. Int J Hum Comput Stud 70(5):377–398
Article Google Scholar
Ekman, Paul. (1999) Basic emotions. San Francisco: University of california handbook of cognition and emotion 1999. p. chapter 3.
FaceFX 2015. [Online]. Available: http://www.facefx.com/.
Frantz, S., Rohr, K., and Siegfried Stiehl, H. (1998) Multi-step procedures for the localization of 2D and 3D point landmarks and automatic ROI size selection. Computer Vision (ECCV'98). Springer, pp. 687–703.
Frantz, S., Rohr K., and Siegfried Stiehl H. (2000) Localization of 3D anatomical point landmarks in 3D tomographic images using deformable models. Medical image computing and computer-assisted intervention (MICCAI). Springer-Verlag. Berlin. pp. 492–501.
Gillies M, Pan X, Slater M (2010) Piavca: a framework for heterogeneous interactions with virtual characters. Virtual Real 14(4):221–228
Article Google Scholar
Hong P, Wen Z, Huang TS (2002) Real-time speech-driven face animation with expressions using neural networks. IEEE Trans Neural Netw 13(4):916–927
Article Google Scholar
Kessler, Brett and Treiman, Rebecca. (2002) Syllable structure and the distribution of phonemes in english syllables. Journal of Memory and Language, 2002. [Online]. Available: http://www.artsci.wustl.edu/~bkessler/SyllStructDistPhon/CVC.html.
Kolivand H, Sunar MS (2015) A survey of shadow volume algorithms in computer graphics. IETE 30(1):38–46
Article Google Scholar
Kowler E (2011) Eye movements: the past 25 years. Vis Res 51(13):1457–1483
Article Google Scholar
Lee C, Lee S, Chin S (2011) Multi-layer structural wound synthesis on 3D face. Computer animation and Virtual Worlds Comp 22(2–5):177–185
Lee S, Carlson G, Jones S, Johnson A, Leigh J, Renambot L (2010) Designing an expressive avatar of a real person in intelligent virtual agents. pp. 64–76
Lee Y, Terzopoulos D, Walters K (1995) Realistic modeling for facial animation. Proc. 22nd Annu. Conf. Comput. Graph. Interact. Tech. SIGGRAPH 95(1):55–62
Google Scholar
Leone, G. R., Paci G., and Cosi P. (2012) LUCIA : An Open Source 3D Expressive Avatar for Multimodal h. m. i. Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. pp. 193–202.
Leuski A, Richmond T (2014) Mobile personal healthcare mediated by virtual humans. IUI 2014 Demonstration. pp. 21–24
Li Z, Mao X (2011) EEMML: the emotional eye movement animation toolkit. Multimed Tools Appl 60(1):181–201
Article Google Scholar
Li B, Zhang Q, Zhou D, Wei X (2013) Facial animation based on feature points. TELKOMNIKA 11 no. 3
Pandzic IS, Forchheimer R (2003) MPEG-4 facial animation: the standard, implementation and applications. John Wiley & Sons, New York
Google Scholar
Pasquariello S., Pelachaud, C., and Kyneste, S. A. (2001) Greta: A simple facial animation engine facial animation coding in MPEG-4 standard. Proc. 6th Online World Conf. Soft Comput. Ind. Appl 2001.
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572
Article MATH Google Scholar
Queiroz RB, Cohen M, Musse SR (2009) An extensible framework for interactive facial animation with facial expressions, lip synchronization and eye behavior. Comput Entertain 7(4):1
Article Google Scholar
Raouzaiou SKA, Tsapatsoulis N, Karpouzis K (2002) Parameterized facial expression synthesis based on mpeg-4. Eurasip J Appl Signal Process 10:1021–1038
MATH Google Scholar
Serra J, Ribeiro M, Freitas J, Orvalho V (2012) A proposal for a visual speech animation system. Springer-Verlag, Berlin Heidelb, pp 267–276
Google Scholar
Shapiro A (2011) Building a character animation system. LNCS 7060, Springer-Verlag Berlin Heidelberg, pp. 98–109
Singular Inversions. (2006) Facegen software.
Somasundaram A. (2006) AUDIO-VISUAL SPEECH. The Ohio State University.
Sphinx Group Carnegie Mellon University. Cmu sphinx project. 2006. [Online]. Available: http://cmusphinx.sourceforge.net.
Taylor SL, Mahler M, Theobald B, Matthews I (2012) Dynamic units of visual speech. Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 245–250
TRueSpel. (2001a) English-truespel (USA Accent) text conversion tool. [Online]. Available: http://www.foreignword.com/dictionary/truespel/transpel.htm.
TRueSpel. (2001b) English-truespel (USA Accent) text conversion tool.
Vezzetti, E. and Marcolin, F. (2014) Geometry-based 3D face morphology analysis: soft-tissue landmark formalization. Multimedia tools and applications 2014. 895-929.
Vezzetti E., Marcolin F., Stola V. (2013) 3D human face soft tissues landmarking method: An Advanced Approach. Computers in Industry. ISSN 0166–3615.
Wei L, Deng Z (2015) A practical model for live speech-driven lip-sync. IEEE Computer Graphics and Applications 35(2):70–78
Xu Y., Feng A. W., Marsella S., and Shapiro A. A Practical and Configurable Lip Sync Method for Games. (2013) Proc. Motion Games - MIG. pp. 109–118.
Zhang S., Wu Z., Meng H. M., and Cai L. (2010) Facial expression synthesis based on emotion dimensions for affective talking avatar. T. Nishida. pp. 109–132.
Zhao X, Dellandréa E, Zou J, Chen L (2013) A unified probabilistic framework for automatic 3D facial expression analysis based on a Bayesian belief inference and statistical feature models. Image Vis Comput 31(3):231–245

Download references

Author information

Authors and Affiliations

College of Business Informatics, University of Information Technology and Communication, Al-nedhal street, Baghdad, Iraq
Itimad Raheem Ali
Department of Computer Science, Liverpool John Moores University, L3 3AF, Liverpool, UK
Hoshang Kolivand
Faculty of Information Sciences and Engineering, Management and Science University, Shah Alam, Selangor, Malaysia
Mohammed Hazim Alkawaz
Center of Scientific Research and Development, Nawroz University, Kurdistan Region, Iraq
Mohammed Hazim Alkawaz

Authors

Itimad Raheem Ali
View author publications
You can also search for this author in PubMed Google Scholar
Hoshang Kolivand
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Hazim Alkawaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Itimad Raheem Ali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, I.R., Kolivand, H. & Alkawaz, M.H. Lip syncing method for realistic expressive 3D face model. Multimed Tools Appl 77, 5323–5366 (2018). https://doi.org/10.1007/s11042-017-4437-z

Download citation

Received: 24 January 2016
Revised: 13 December 2016
Accepted: 20 January 2017
Published: 21 March 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11042-017-4437-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lip syncing method for realistic expressive 3D face model

Abstract

Access this article

Similar content being viewed by others

Audio-Driven Lips and Expression on 3D Human Face

Voice Animator: Automatic Lip-Synching in Limited Animation by Audio

Physically-Based Facial Modeling and Animation with Unity3D Game Engine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lip syncing method for realistic expressive 3D face model

Abstract

Access this article

Similar content being viewed by others

Audio-Driven Lips and Expression on 3D Human Face

Voice Animator: Automatic Lip-Synching in Limited Animation by Audio

Physically-Based Facial Modeling and Animation with Unity3D Game Engine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation