Skip to main content

A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture

  • Conference paper
  • First Online:
ArtsIT, Interactivity and Game Creation (ArtsIT 2022)

Abstract

In this paper, the authors explore different approaches to animating 3D facial emotions, some of which use manual keyframe animation and some of which use machine learning. To compare approaches the authors conducted an experiment consisting of side-by-side comparisons of animation clips generated by skeleton, blendshape, audio-driven, and vision-based capture facial animation techniques. Ninety-five participants viewed twenty face animation clips of characters expressing five distinct emotions (anger, sadness, happiness, fear, neutral), which were created using the four different facial animation techniques. After viewing each clip, the participants were asked to identify the emotions that the characters appeared to be conveying and rate their naturalness. Findings showed that the naturalness ratings of the happy emotion produced by the four methods tended to be consistent, whereas the naturalness ratings of the fear emotion created with skeletal animation were significantly higher than the other methods. Recognition of sad and neutral emotions were very low for all methods as compared to the other emotions. Overall, the skeleton approach had significantly higher ratings for naturalness and higher recognition rate than the other methods.

This work is supported by NSF-IIS award #1821894: Multimodal Affective Pedagogical Agents for Different Types of Learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.autodesk.com.

  2. 2.

    https://www.nvidia.com/en-us/omniverse/apps/audio2face/.

  3. 3.

    https://blendermarket.com/products/faceit.

  4. 4.

    https://www.mocapx.com/.

  5. 5.

    https://charactergenerator.autodesk.com/.

  6. 6.

    https://www.youtube.com/watch?v=4awHbEVvcjM &list=PL637gQB3PR-rtQUUe-AeNDRRSiZ08a5il &index=12.

  7. 7.

    https://www.prolific.co/.

  8. 8.

    https://purdue.ca1.qualtrics.com/jfe/form/SV_77iAhuvDxAikaSG.

  9. 9.

    https://github.com/weimingzhu101/A-Comparative-Study-of-Four-3D-Facial-Animation-Methods/blob/main/Fig4.png.

  10. 10.

    https://github.com/weimingzhu101/A-Comparative-Study-of-Four-3D-Facial-Animation-Methods/blob/main/Fig5.png.

  11. 11.

    https://github.com/weimingzhu101/A-Comparative-Study-of-Four-3D-Facial-Animation-Methods/blob/main/Fig6-7.png.

  12. 12.

    https://github.com/weimingzhu101/A-Comparative-Study-of-Four-3D-Facial-Animation-Methods.

  13. 13.

    https://docs.unrealengine.com/5.0/en-US/live-link-in-unreal-engine/.

  14. 14.

    https://www.unrealengine.com.

  15. 15.

    https://www.reallusion.com/iclone.

References

  1. Adolphs, R.: Perception and emotion: how we recognize facial expressions. Curr. Dir. Psychol. Sci. 15(5), 222–226 (2006)

    Article  Google Scholar 

  2. Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graph. (ToG) 32(4), 1–10 (2013)

    Article  MATH  Google Scholar 

  3. Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. (TOG) 33(4), 1–10 (2014)

    Google Scholar 

  4. Fan, B., Xie, L., Yang, S., Wang, L., Soong, F.K.: A deep bidirectional LSTM approach for video-realistic talking head. Multimed. Tools Appl. 75(9), 5287–5309 (2016)

    Article  Google Scholar 

  5. Friesen, W.V.: Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice-Hall (1975)

    Google Scholar 

  6. Hanrahan, P., Sturman, D.: Interactive animation of parametric models. Vis. Comput. 1(4), 260–266 (1985)

    Article  Google Scholar 

  7. Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework. In: Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. University of Surrey (2016)

    Google Scholar 

  8. Jeni, L.A., Tulyakov, S., Yin, L., Sebe, N., Cohn, J.F.: The first 3D face alignment in the wild (3DFAW) challenge. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 511–520. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_35

    Chapter  Google Scholar 

  9. Jourabloo, A., Liu, X.: Large-pose face alignment via CNN-based dense 3D model fitting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4188–4196 (2016)

    Google Scholar 

  10. Karras, T., Aila, T., Laine, S., Herva, A., Lehtinen, J.: Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)

    Article  Google Scholar 

  11. Lewis, J.P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F.H., Deng, Z.: Practice and theory of blendshape facial models. Eurograph. (State Art Rep.) 1(8), 2 (2014)

    Google Scholar 

  12. Li, L., Liu, Y., Zhang, H.: A survey of computer facial animation techniques. In: 2012 International Conference on Computer Science and Electronics Engineering, vol. 3, pp. 434–438. IEEE (2012)

    Google Scholar 

  13. Liang, Z., Ding, S., Lin, L.: Unconstrained facial landmark localization with backbone-branches fully-convolutional networks. arXiv preprint arXiv:1507.03409 (2015)

  14. Parke, F.I., Waters, K.: Computer Facial Animation. CRC Press, Boca Raton (2008)

    Book  Google Scholar 

  15. Parke, F.I.: Computer generated animation of faces. In: Proceedings of the ACM annual conference, vol. 1, pp. 451–457 (1972)

    Google Scholar 

  16. Ping, H.Y., Abdullah, L.N., Sulaiman, P.S., Halin, A.A.: Computer facial animation: a review. Int. J. Comput. Theory Eng. 5(4), 658 (2013)

    Article  Google Scholar 

  17. Rhodes, G., Haxby, J.: Oxford Handbook of Face Perception. Oxford University Press, Oxford (2011)

    Book  Google Scholar 

  18. Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 460–469. IEEE (2016)

    Google Scholar 

  19. Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: Sixth International Conference on Spoken Language Processing (2000)

    Google Scholar 

  20. Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 117–124. IEEE (2011)

    Google Scholar 

  21. Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: Synthesizing Obama: learning lip sync from audio. ACM Trans. Graph. (ToG) 36(4), 1–13 (2017)

    Article  Google Scholar 

  22. Taylor, S., et al.: A deep learning approach for generalized speech animation. ACM Trans. Graph. (TOG) 36(4), 1–11 (2017)

    Article  Google Scholar 

  23. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)

    Google Scholar 

  24. Orvalho, V., Bastos, P., Parke, F., Oliveira, B., Alvarez, X.: A facial rigging survey. In: 2012 Eurographics Conference, pp. 182–204. EG Digital Library (2012)

    Google Scholar 

  25. Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2006 Courses, p. 24-es (2006)

    Google Scholar 

  26. Wang, L., Han, W., Soong, F.K., Huo, Q.: Text driven 3D photo-realistic talking head. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  27. Wang, L., Qian, X., Han, W., Soong, F.K.: Synthesizing photo-real talking head via trajectory-guided sample selection. In: Eleventh Annual Conference of the International Speech Communication Association (2010)

    Google Scholar 

  28. Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. ACM Trans. Graph. (TOG) 30(4), 1–10 (2011)

    Article  Google Scholar 

  29. Xie, L., Liu, Z.Q.: Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Trans. Multimed. 9(3), 500–510 (2007)

    Article  Google Scholar 

  30. Zhang, X., Wang, L., Li, G., Seide, F., Soong, F.K.: A new language independent, photo-realistic talking head driven by voice only. In: Interspeech, pp. 2743–2747 (2013)

    Google Scholar 

  31. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7

    Chapter  Google Scholar 

  32. Zhao, R., Wang, Y., Benitez-Quiroz, C.F., Liu, Y., Martinez, A.M.: Fast and precise face alignment and 3D shape reconstruction from a single 2D image. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 590–603. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_41

    Chapter  Google Scholar 

  33. Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013)

    Google Scholar 

  34. Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)

    Google Scholar 

  35. Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 787–796 (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by NSF-Cyberlearning award 1821894: Multimodal Affective Animated Pedagogical Agents for Different Types of Learners.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingzhu Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, M., Adamo, N., Giri, N., Chen, Y. (2023). A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture. In: Brooks, A.L. (eds) ArtsIT, Interactivity and Game Creation. ArtsIT 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 479. Springer, Cham. https://doi.org/10.1007/978-3-031-28993-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28993-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28992-7

  • Online ISBN: 978-3-031-28993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics