Abstract
In this paper, the authors explore different approaches to animating 3D facial emotions, some of which use manual keyframe animation and some of which use machine learning. To compare approaches the authors conducted an experiment consisting of side-by-side comparisons of animation clips generated by skeleton, blendshape, audio-driven, and vision-based capture facial animation techniques. Ninety-five participants viewed twenty face animation clips of characters expressing five distinct emotions (anger, sadness, happiness, fear, neutral), which were created using the four different facial animation techniques. After viewing each clip, the participants were asked to identify the emotions that the characters appeared to be conveying and rate their naturalness. Findings showed that the naturalness ratings of the happy emotion produced by the four methods tended to be consistent, whereas the naturalness ratings of the fear emotion created with skeletal animation were significantly higher than the other methods. Recognition of sad and neutral emotions were very low for all methods as compared to the other emotions. Overall, the skeleton approach had significantly higher ratings for naturalness and higher recognition rate than the other methods.
This work is supported by NSF-IIS award #1821894: Multimodal Affective Pedagogical Agents for Different Types of Learners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
References
Adolphs, R.: Perception and emotion: how we recognize facial expressions. Curr. Dir. Psychol. Sci. 15(5), 222–226 (2006)
Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graph. (ToG) 32(4), 1–10 (2013)
Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. (TOG) 33(4), 1–10 (2014)
Fan, B., Xie, L., Yang, S., Wang, L., Soong, F.K.: A deep bidirectional LSTM approach for video-realistic talking head. Multimed. Tools Appl. 75(9), 5287–5309 (2016)
Friesen, W.V.: Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice-Hall (1975)
Hanrahan, P., Sturman, D.: Interactive animation of parametric models. Vis. Comput. 1(4), 260–266 (1985)
Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework. In: Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. University of Surrey (2016)
Jeni, L.A., Tulyakov, S., Yin, L., Sebe, N., Cohn, J.F.: The first 3D face alignment in the wild (3DFAW) challenge. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 511–520. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_35
Jourabloo, A., Liu, X.: Large-pose face alignment via CNN-based dense 3D model fitting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4188–4196 (2016)
Karras, T., Aila, T., Laine, S., Herva, A., Lehtinen, J.: Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)
Lewis, J.P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F.H., Deng, Z.: Practice and theory of blendshape facial models. Eurograph. (State Art Rep.) 1(8), 2 (2014)
Li, L., Liu, Y., Zhang, H.: A survey of computer facial animation techniques. In: 2012 International Conference on Computer Science and Electronics Engineering, vol. 3, pp. 434–438. IEEE (2012)
Liang, Z., Ding, S., Lin, L.: Unconstrained facial landmark localization with backbone-branches fully-convolutional networks. arXiv preprint arXiv:1507.03409 (2015)
Parke, F.I., Waters, K.: Computer Facial Animation. CRC Press, Boca Raton (2008)
Parke, F.I.: Computer generated animation of faces. In: Proceedings of the ACM annual conference, vol. 1, pp. 451–457 (1972)
Ping, H.Y., Abdullah, L.N., Sulaiman, P.S., Halin, A.A.: Computer facial animation: a review. Int. J. Comput. Theory Eng. 5(4), 658 (2013)
Rhodes, G., Haxby, J.: Oxford Handbook of Face Perception. Oxford University Press, Oxford (2011)
Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 460–469. IEEE (2016)
Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: Sixth International Conference on Spoken Language Processing (2000)
Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 117–124. IEEE (2011)
Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: Synthesizing Obama: learning lip sync from audio. ACM Trans. Graph. (ToG) 36(4), 1–13 (2017)
Taylor, S., et al.: A deep learning approach for generalized speech animation. ACM Trans. Graph. (TOG) 36(4), 1–11 (2017)
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)
Orvalho, V., Bastos, P., Parke, F., Oliveira, B., Alvarez, X.: A facial rigging survey. In: 2012 Eurographics Conference, pp. 182–204. EG Digital Library (2012)
Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2006 Courses, p. 24-es (2006)
Wang, L., Han, W., Soong, F.K., Huo, Q.: Text driven 3D photo-realistic talking head. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Wang, L., Qian, X., Han, W., Soong, F.K.: Synthesizing photo-real talking head via trajectory-guided sample selection. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. ACM Trans. Graph. (TOG) 30(4), 1–10 (2011)
Xie, L., Liu, Z.Q.: Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Trans. Multimed. 9(3), 500–510 (2007)
Zhang, X., Wang, L., Li, G., Seide, F., Soong, F.K.: A new language independent, photo-realistic talking head driven by voice only. In: Interspeech, pp. 2743–2747 (2013)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Zhao, R., Wang, Y., Benitez-Quiroz, C.F., Liu, Y., Martinez, A.M.: Fast and precise face alignment and 3D shape reconstruction from a single 2D image. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 590–603. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_41
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)
Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796 (2015)
Acknowledgments
This work is supported by NSF-Cyberlearning award 1821894: Multimodal Affective Animated Pedagogical Agents for Different Types of Learners.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wei, M., Adamo, N., Giri, N., Chen, Y. (2023). A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture. In: Brooks, A.L. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 479. Springer, Cham. https://doi.org/10.1007/978-3-031-28993-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-28993-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28992-7
Online ISBN: 978-3-031-28993-4
eBook Packages: Computer ScienceComputer Science (R0)