Abstract
Due to the lack of easy-to-use tools, User-Generated-Animation (UGA) only has limited development compared with the rapid development of user participatory culture and the resulting User-Generated Contents (UGC). In this paper, we develop a machine learning based tool called WeAnimate that can generate a clip of animation using only a character picture and a source video. Users are able to provide character movements by the video. In the tool, a classifier model is trained to identify every motion in the video and to obtain therefrom the motion sequences. Then a strategy combining skeletal animation with neural network is presented to produce multiple auxiliary images from one original image of the new character. Eventually, these images will be spliced into a new animation according to the time order of the motion sequences of video frames. We evaluate the capability, effects, and performance of this animation generation tool with practical applications. The evaluation shows the usage and effectiveness of WeAnimate.
Similar content being viewed by others
References
Agarap AF (2017) An architecture combining convolutional neural network (cnn) and support vector machine (svm) for image classification. arXiv:1712.03541
Albrecht I, Haber J, Kahler K, Schroder M, Seidel HP (2002) May i talk to you?:-)-facial animation from text. In: 10Th pacific conference on computer graphics and applications, 2002. Proceedings. IEEE, pp 77–86
Beer D, Burrows R (2010) Consumption prosumption and participatory web cultures: an introduction
Berney S, Bétrancourt M (2016) Does animation enhance learning? a meta-analysis. Comput Educat 101:150–167
Burgess J, Green J (2018) Youtube: Online video and participatory culture. Wiley, Hoboken
Chan C, Ginosar S, Zhou T, Efros AA (2019) Everybody dance now. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5933–5942
Chandra MA, Bedi S (2018) Survey on svm and their application in image classification. Int J Inf Technol, 1–11
Chen L, Maddox RK, Duan Z, Xu C (2019) Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7832–7841
Condry I (2013) The soul of anime: Collaborative creativity and Japan’s media success story. Duke University Press, Durham
Dai H, Cai B, Song J, Zhang D (2010) Skeletal animation based on bvh motion data
Dale K, Sunkavalli K, Johnson MK, Vlasic D, Matusik W, Pfister H (2011) Video face replacement. In: Proceedings of the 2011 SIGGRAPH Asia conference, pp 1–10
Delorme M, Filhol M, Braffort A (2009) Animation generation process for sign language synthesis. In: 2009 Second international conferences on advances in computer-human interactions. IEEE, pp 386–390
Eskimez SE, Maddox RK, Xu C, Duan Z (2018) Generating talking face landmarks from speech. In: International conference on latent variable analysis and signal separation. Springer, pp 372–381
Hayashi M, Inoue S, Douke M, Hamaguchi N, Kaneko H, Bachelder S, Nakajima M (2014) T2v: New technology of converting text to cg animation. ITE Trans Media Technol Appl 2(1):74–81
Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3192–3199
Kang N, Bai J, Pan J, Qin H (2019) Interactive animation generation of virtual characters using single rgb-d camera. Vis Comput 35(6):849–860
Khungurn P (2020) Talking head anime from a single image. [EB/OL] (2019) https://pkhungurn.github.io/talking-head-anime/ Accessed May 6
Kim RE, Koo SM (2018) Development of creativity program using disney animation of young children. Indian Journal of Public Health Research & Development 9(11)
Korshunova I, Shi W, Dambre J, Theis L (2017) Fast face-swap using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 3677–3685
Lee E, Lee JA, Moon JH, Sung Y (2015) Pictures speak louder than words: Motivations for using instagram. Cyberpsychol Behav Soc Netw 18 (9):552–556
Li J, Yin B, Wang L, Kong D (2014) Chinese sign language animation generation considering context. Multimed Tools Appl 71(2):469–483
Li Y, Min M, Shen D, Carlson D, Carin L (2018) Video generation from text. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Lin TH, Teng CJ, Hsiao FJ (2013) Animation generation systems and methods. US Patent 8,462,198
Liu Y, Xu F, Chai J, Tong X, Wang L, Huo Q (2015) Video-audio driven real-time facial animation. ACM Transactions on Graphics (TOG) 34(6):1–10
Meena HK, Joshi SD, Sharma KK (2019) Facial expression recognition using graph signal processing on hog. IETE J Res, 1–7
O’Byrne I, Radakovic N, Hunter-Doniger T, Fox M, Kern R, Parnell S (2018) Designing spaces for creativity and divergent thinking: Pre-service teachers creating stop motion animation on tablets. Int J Educ Math Sci Technol 6(2):182–199
Pan JJ, Zhang JJ (2011) Sketch-based skeleton-driven 2d animation and motion capture
Peña-López I et al (2007) Participative web and user-created content. web 2.0, wikis and social networking
Richard A, Lea C, Ma S, Gall J, de la Torre F, Sheikh Y (2021) Audio-and gaze-driven facial animation of codec avatars. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 41–50
Shim H, Kang B, Kwag K (2009) Web2animation-automatic generation of 3d animation from the web text. In: 2009 IEEE/WIC/ACM International joint conference on web intelligence and intelligent agent technology, vol. 1. IEEE, pp. 596–601
Sinha S (2016) Pro unity animation
Song Y, Zhu J, Li D, Wang X, Qi H (2018) Talking face generation by conditional recurrent adversarial network. arXiv:1804.04786
Sugisaki E, Seah HS, Kyota F, Nakajima M (2009) Simulation-based in-between creation for cacani system. In: ACM SIGGRAPH ASIA 2009 Sketches, pp 1–1
Taylor S, Kim T, Yue Y, Mahler M, Krahe J, Rodriguez AG, Hodgins J, Matthews I (2017) A deep learning approach for generalized speech animation. ACM Transactions on Graphics (TOG) 36(4):1–11
Tian G, Yuan Y, Liu Y (2019) Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks. In: 2019 IEEE International conference on multimedia & expo workshops (ICMEW). IEEE, pp 366–371
Vrhovski H (2017) Adobe character animator. Ph.D. thesis, University of Rijeka. Department of Informatics
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: Bmvc 2009-british machine vision conference. BMVA Press, pp 124–1
Wang J, Wang L (2018) Animation development in multimedia teaching software based on multimedia tool book. Educational Sciences: Theory & Practice 18(5)
Yoon H (2019) Do higher skills result in better jobs? the case of the korean animation industry. Geoforum 99:267–277
Yu J, Shi J, Zhou Y (2005) Skeleton driven limb animation based on three-layered structure. Lect Notes Comput Sci 3809(18):1187–1190
Zhou Y, Han X, Shechtman E, Echevarria J, Kalogerakis E, Li D (2020) Makelttalk: Speaker-aware talking-head animation. ACM Trans. Graph 39(6). https://doi.org/10.1145/3414685.3417774
Zhou Y, Xu Z, Landreth C, Kalogerakis E, Maji S, Singh K (2018) Visemenet: Audio-driven animator-centric speech animation. ACM Transactions on Graphics (TOG) 37(4):1–10
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yin, H., Liu, J., Chen, X. et al. WeAnimate: Motion-coherent animation generation from video data. Multimed Tools Appl 81, 20685–20703 (2022). https://doi.org/10.1007/s11042-022-12359-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12359-4