A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture

Wei, Mingzhu; Adamo, Nicoletta; Giri, Nandhini; Chen, Yingjie

doi:10.1007/978-3-031-28993-4_3

Mingzhu Wei ORCID: orcid.org/0000-0002-2957-1937¹⁶,
Nicoletta Adamo ORCID: orcid.org/0000-0001-8311-5302¹⁶,
Nandhini Giri¹⁶ &
…
Yingjie Chen¹⁶

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 479))

Included in the following conference series:

International Conference on ArtsIT, Interactivity and Game Creation

687 Accesses

Abstract

In this paper, the authors explore different approaches to animating 3D facial emotions, some of which use manual keyframe animation and some of which use machine learning. To compare approaches the authors conducted an experiment consisting of side-by-side comparisons of animation clips generated by skeleton, blendshape, audio-driven, and vision-based capture facial animation techniques. Ninety-five participants viewed twenty face animation clips of characters expressing five distinct emotions (anger, sadness, happiness, fear, neutral), which were created using the four different facial animation techniques. After viewing each clip, the participants were asked to identify the emotions that the characters appeared to be conveying and rate their naturalness. Findings showed that the naturalness ratings of the happy emotion produced by the four methods tended to be consistent, whereas the naturalness ratings of the fear emotion created with skeletal animation were significantly higher than the other methods. Recognition of sad and neutral emotions were very low for all methods as compared to the other emotions. Overall, the skeleton approach had significantly higher ratings for naturalness and higher recognition rate than the other methods.

This work is supported by NSF-IIS award #1821894: Multimodal Affective Pedagogical Agents for Different Types of Learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adolphs, R.: Perception and emotion: how we recognize facial expressions. Curr. Dir. Psychol. Sci. 15(5), 222–226 (2006)
Article Google Scholar
Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graph. (ToG) 32(4), 1–10 (2013)
Article MATH Google Scholar
Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. (TOG) 33(4), 1–10 (2014)
Google Scholar
Fan, B., Xie, L., Yang, S., Wang, L., Soong, F.K.: A deep bidirectional LSTM approach for video-realistic talking head. Multimed. Tools Appl. 75(9), 5287–5309 (2016)
Article Google Scholar
Friesen, W.V.: Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice-Hall (1975)
Google Scholar
Hanrahan, P., Sturman, D.: Interactive animation of parametric models. Vis. Comput. 1(4), 260–266 (1985)
Article Google Scholar
Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework. In: Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. University of Surrey (2016)
Google Scholar
Jeni, L.A., Tulyakov, S., Yin, L., Sebe, N., Cohn, J.F.: The first 3D face alignment in the wild (3DFAW) challenge. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 511–520. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_35
Chapter Google Scholar
Jourabloo, A., Liu, X.: Large-pose face alignment via CNN-based dense 3D model fitting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4188–4196 (2016)
Google Scholar
Karras, T., Aila, T., Laine, S., Herva, A., Lehtinen, J.: Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)
Article Google Scholar
Lewis, J.P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F.H., Deng, Z.: Practice and theory of blendshape facial models. Eurograph. (State Art Rep.) 1(8), 2 (2014)
Google Scholar
Li, L., Liu, Y., Zhang, H.: A survey of computer facial animation techniques. In: 2012 International Conference on Computer Science and Electronics Engineering, vol. 3, pp. 434–438. IEEE (2012)
Google Scholar
Liang, Z., Ding, S., Lin, L.: Unconstrained facial landmark localization with backbone-branches fully-convolutional networks. arXiv preprint arXiv:1507.03409 (2015)
Parke, F.I., Waters, K.: Computer Facial Animation. CRC Press, Boca Raton (2008)
Book Google Scholar
Parke, F.I.: Computer generated animation of faces. In: Proceedings of the ACM annual conference, vol. 1, pp. 451–457 (1972)
Google Scholar
Ping, H.Y., Abdullah, L.N., Sulaiman, P.S., Halin, A.A.: Computer facial animation: a review. Int. J. Comput. Theory Eng. 5(4), 658 (2013)
Article Google Scholar
Rhodes, G., Haxby, J.: Oxford Handbook of Face Perception. Oxford University Press, Oxford (2011)
Book Google Scholar
Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 460–469. IEEE (2016)
Google Scholar
Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: Sixth International Conference on Spoken Language Processing (2000)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 117–124. IEEE (2011)
Google Scholar
Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: Synthesizing Obama: learning lip sync from audio. ACM Trans. Graph. (ToG) 36(4), 1–13 (2017)
Article Google Scholar
Taylor, S., et al.: A deep learning approach for generalized speech animation. ACM Trans. Graph. (TOG) 36(4), 1–11 (2017)
Article Google Scholar
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)
Google Scholar
Orvalho, V., Bastos, P., Parke, F., Oliveira, B., Alvarez, X.: A facial rigging survey. In: 2012 Eurographics Conference, pp. 182–204. EG Digital Library (2012)
Google Scholar
Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2006 Courses, p. 24-es (2006)
Google Scholar
Wang, L., Han, W., Soong, F.K., Huo, Q.: Text driven 3D photo-realistic talking head. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Wang, L., Qian, X., Han, W., Soong, F.K.: Synthesizing photo-real talking head via trajectory-guided sample selection. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Google Scholar
Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. ACM Trans. Graph. (TOG) 30(4), 1–10 (2011)
Article Google Scholar
Xie, L., Liu, Z.Q.: Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Trans. Multimed. 9(3), 500–510 (2007)
Article Google Scholar
Zhang, X., Wang, L., Li, G., Seide, F., Soong, F.K.: A new language independent, photo-realistic talking head driven by voice only. In: Interspeech, pp. 2743–2747 (2013)
Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Chapter Google Scholar
Zhao, R., Wang, Y., Benitez-Quiroz, C.F., Liu, Y., Martinez, A.M.: Fast and precise face alignment and 3D shape reconstruction from a single 2D image. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 590–603. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_41
Chapter Google Scholar
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013)
Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)
Google Scholar
Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796 (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by NSF-Cyberlearning award 1821894: Multimodal Affective Animated Pedagogical Agents for Different Types of Learners.

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN, USA
Mingzhu Wei, Nicoletta Adamo, Nandhini Giri & Yingjie Chen

Authors

Mingzhu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Nicoletta Adamo
View author publications
You can also search for this author in PubMed Google Scholar
Nandhini Giri
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingzhu Wei .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Anthony L. Brooks

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, M., Adamo, N., Giri, N., Chen, Y. (2023). A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture. In: Brooks, A.L. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 479. Springer, Cham. https://doi.org/10.1007/978-3-031-28993-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-28993-4_3
Published: 02 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28992-7
Online ISBN: 978-3-031-28993-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture