Abstract
Lip synchronization of 3D face model is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in use and importance. High level of realism can be used in demanding applications such as computer games and cinema. Authoring lip syncing with complex and subtle expressions is still difficult and fraught with problems in terms of realism. This research proposed a lip syncing method of realistic expressive 3D face model. Animated lips requires a 3D face model capable of representing the myriad shapes the human face experiences during speech and a method to produce the correct lip shape at the correct time. The paper presented a 3D face model designed to support lip syncing that align with input audio file. It deforms using Raised Cosine Deformation (RCD) function that is grafted onto the input facial geometry. The face model was based on MPEG-4 Facial Animation (FA) Standard. This paper proposed a method to animate the 3D face model over time to create animated lip syncing using a canonical set of visemes for all pairwise combinations of a reduced phoneme set called ProPhone. The proposed research integrated emotions by the consideration of Ekman model and Plutchik’s wheel with emotive eye movements by implementing Emotional Eye Movements Markup Language (EEMML) to produce realistic 3D face model.
Similar content being viewed by others
References
Anh LQ, Pelachaud C (2011) Expressive Gesture Model for Humanoid Robot. Springer Verlag Berlin Heidelberg:224–231
Bailly G, Raidt S, Elisei F (2010) Gaze, conversational agents and face-to-face communication. Speech Commun 52(6):598–612
Balcı K (2007a) Xface: MPEG-4 Based Open Source Toolkit for 3D Facial Animation. Proceedings of the 15th international conference on Multimedia. ACM:399–402
Balcı K (2007b) Xface: MPEG-4 based open source toolkit for 3D facial animation. ITCirst, Cogn. Commun. Technol
Balcı K, Zancanaro M, Pianesi F (2007) Xface open source project and SMIL-agent scripting language for creating and animating embodied conversational agents. Proc. 15th Int. Conf. Multimedia. ACM:1013–1016
Bao C. A facial animation system for generating complex expressions. APSIPA ASC 2011.
Black, Alan W., Clark, Rob, Richmond, Korin, King, Simon, Zen, Heiga, Taylor, Paul, and Caley, Richard. The festival speech synthesis system. The festival speech synthesis system 2006. [Online]. Available: http://www.cstr.ed.ac.uk/projects/festival .
Cassell, J., Vilhjálmsson H. H., and Bickmore T. (2001) BEAT: the Behavior Expression Animation Toolkit. in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 137, 477–486.
Cerekovic A, Pandžic IS (2011) Multimodal behavior realization for embodied conversational agents. Multimed Tools Appl 54(1):143–164
Cerekovic A, Pejša T, Pandžic IS (2010) A controller-based animation system for synchronizing and realizing human-like conversational behaviors. pp. 80–91
De’Mello S, Olney A, Williams C, Hays P (2012) Gaze tutor: a gaze-reactive intelligent tutoring system. Int J Hum Comput Stud 70(5):377–398
Ekman, Paul. (1999) Basic emotions. San Francisco: University of california handbook of cognition and emotion 1999. p. chapter 3.
FaceFX 2015. [Online]. Available: http://www.facefx.com/.
Frantz, S., Rohr, K., and Siegfried Stiehl, H. (1998) Multi-step procedures for the localization of 2D and 3D point landmarks and automatic ROI size selection. Computer Vision (ECCV'98). Springer, pp. 687–703.
Frantz, S., Rohr K., and Siegfried Stiehl H. (2000) Localization of 3D anatomical point landmarks in 3D tomographic images using deformable models. Medical image computing and computer-assisted intervention (MICCAI). Springer-Verlag. Berlin. pp. 492–501.
Gillies M, Pan X, Slater M (2010) Piavca: a framework for heterogeneous interactions with virtual characters. Virtual Real 14(4):221–228
Hong P, Wen Z, Huang TS (2002) Real-time speech-driven face animation with expressions using neural networks. IEEE Trans Neural Netw 13(4):916–927
Kessler, Brett and Treiman, Rebecca. (2002) Syllable structure and the distribution of phonemes in english syllables. Journal of Memory and Language, 2002. [Online]. Available: http://www.artsci.wustl.edu/~bkessler/SyllStructDistPhon/CVC.html.
Kolivand H, Sunar MS (2015) A survey of shadow volume algorithms in computer graphics. IETE 30(1):38–46
Kowler E (2011) Eye movements: the past 25 years. Vis Res 51(13):1457–1483
Lee C, Lee S, Chin S (2011) Multi-layer structural wound synthesis on 3D face. Computer animation and Virtual Worlds Comp 22(2–5):177–185
Lee S, Carlson G, Jones S, Johnson A, Leigh J, Renambot L (2010) Designing an expressive avatar of a real person in intelligent virtual agents. pp. 64–76
Lee Y, Terzopoulos D, Walters K (1995) Realistic modeling for facial animation. Proc. 22nd Annu. Conf. Comput. Graph. Interact. Tech. SIGGRAPH 95(1):55–62
Leone, G. R., Paci G., and Cosi P. (2012) LUCIA : An Open Source 3D Expressive Avatar for Multimodal h. m. i. Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. pp. 193–202.
Leuski A, Richmond T (2014) Mobile personal healthcare mediated by virtual humans. IUI 2014 Demonstration. pp. 21–24
Li Z, Mao X (2011) EEMML: the emotional eye movement animation toolkit. Multimed Tools Appl 60(1):181–201
Li B, Zhang Q, Zhou D, Wei X (2013) Facial animation based on feature points. TELKOMNIKA 11 no. 3
Pandzic IS, Forchheimer R (2003) MPEG-4 facial animation: the standard, implementation and applications. John Wiley & Sons, New York
Pasquariello S., Pelachaud, C., and Kyneste, S. A. (2001) Greta: A simple facial animation engine facial animation coding in MPEG-4 standard. Proc. 6th Online World Conf. Soft Comput. Ind. Appl 2001.
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572
Queiroz RB, Cohen M, Musse SR (2009) An extensible framework for interactive facial animation with facial expressions, lip synchronization and eye behavior. Comput Entertain 7(4):1
Raouzaiou SKA, Tsapatsoulis N, Karpouzis K (2002) Parameterized facial expression synthesis based on mpeg-4. Eurasip J Appl Signal Process 10:1021–1038
Serra J, Ribeiro M, Freitas J, Orvalho V (2012) A proposal for a visual speech animation system. Springer-Verlag, Berlin Heidelb, pp 267–276
Shapiro A (2011) Building a character animation system. LNCS 7060, Springer-Verlag Berlin Heidelberg, pp. 98–109
Singular Inversions. (2006) Facegen software.
Somasundaram A. (2006) AUDIO-VISUAL SPEECH. The Ohio State University.
Sphinx Group Carnegie Mellon University. Cmu sphinx project. 2006. [Online]. Available: http://cmusphinx.sourceforge.net.
Taylor SL, Mahler M, Theobald B, Matthews I (2012) Dynamic units of visual speech. Eurographics/ACM SIGGRAPH Symposium on Computer Animation, 245–250
TRueSpel. (2001a) English-truespel (USA Accent) text conversion tool. [Online]. Available: http://www.foreignword.com/dictionary/truespel/transpel.htm.
TRueSpel. (2001b) English-truespel (USA Accent) text conversion tool.
Vezzetti, E. and Marcolin, F. (2014) Geometry-based 3D face morphology analysis: soft-tissue landmark formalization. Multimedia tools and applications 2014. 895-929.
Vezzetti E., Marcolin F., Stola V. (2013) 3D human face soft tissues landmarking method: An Advanced Approach. Computers in Industry. ISSN 0166–3615.
Wei L, Deng Z (2015) A practical model for live speech-driven lip-sync. IEEE Computer Graphics and Applications 35(2):70–78
Xu Y., Feng A. W., Marsella S., and Shapiro A. A Practical and Configurable Lip Sync Method for Games. (2013) Proc. Motion Games - MIG. pp. 109–118.
Zhang S., Wu Z., Meng H. M., and Cai L. (2010) Facial expression synthesis based on emotion dimensions for affective talking avatar. T. Nishida. pp. 109–132.
Zhao X, Dellandréa E, Zou J, Chen L (2013) A unified probabilistic framework for automatic 3D facial expression analysis based on a Bayesian belief inference and statistical feature models. Image Vis Comput 31(3):231–245
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ali, I.R., Kolivand, H. & Alkawaz, M.H. Lip syncing method for realistic expressive 3D face model. Multimed Tools Appl 77, 5323–5366 (2018). https://doi.org/10.1007/s11042-017-4437-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4437-z