Skip to main content
Log in

Encoder–decoder recurrent network model for interactive character animation generation

The Visual Computer Aims and scope Submit manuscript

Abstract

In this paper, we propose a generative recurrent model for human-character interaction. Our model is an encoder-recurrent-decoder network. The recurrent network is composed by multiple layers of long short-term memory (LSTM) and is incorporated with an encoder network and a decoder network before and after the recurrent network. With the proposed model, the virtual character’s animation is generated on the fly while it interacts with the human player. The coming animation of the character is automatically generated based on the history motion data of both itself and its opponent. We evaluated our model based on both public motion capture databases and our own recorded motion data. Experimental results demonstrate that the LSTM layers can help the character learn a long history of human dynamics to animate itself. In addition, the encoder–decoder networks can significantly improve the stability of the generated animation. This method can automatically animate a virtual character responding to a human player.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  2. CMU: Carnegie mellon university graphics lab motion capture database. http://mocap.cs.cmu.edu/. Accessed 2003

  3. Deng, L., Leung, H., Gu, N., Yang, Y.: Real-time mocap dance recognition for an interactive dancing game. Comput. Anim. Virtual Worlds 22(2–3), 229–237 (2011)

    Article  Google Scholar 

  4. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

  5. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)

  6. Feng, A.W., Xu, Y., Shapiro, A.: An example-based motion synthesis technique for locomotion and object manipulation. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 95–102. ACM (2012)

  7. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4346–4354 (2015)

  8. Graves, A.: Supervised sequence labelling. In: Kacprzyk, J. (ed.) Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012)

  9. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)

  10. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

    Article  Google Scholar 

  11. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)

  12. Harvey, F.G., Pal, C.: Semi-supervised learning with encoder–decoder recurrent neural networks: experiments with motion capture sequences. arXiv preprint arXiv:1511.06653 (2015)

  13. Ho, E.S., Chan, J.C., Komura, T., Leung, H.: Interactive partner control in close interactions for real-time applications. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 9(3), 21 (2013)

    Google Scholar 

  14. Ho, E.S., Komura, T.: Character motion synthesis by topology coordinates. In: Zhang, H., Chen, M. (ed.) Computer Graphics Forum, vol. 28, pp. 299–308. Wiley Online Library (2009)

  15. Ho, E.S., Komura, T.: A finite state machine based on topology coordinates for wrestling games. Comput. Anim. Virtual Worlds 22(5), 435–443 (2011)

    Article  Google Scholar 

  16. Ho, E.S., Komura, T., Tai, C.L.: Spatial relationship preserving character motion adaptation. ACM Trans. Graph. (TOG) 29(4), 33 (2010)

    Article  Google Scholar 

  17. Ho, E.S., Wang, H., Komura, T.: A multi-resolution approach for adapting close character interaction. In: Proceedings of the 20th ACM Symposium on Virtual Reality Software and Technology, pp. 97–106. ACM (2014)

  18. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kolen, J.F., Kremer, S.C.(eds.) A field guide to dynamical recurrent neural networks, pp. 237–244. IEEE Press (2001)

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  20. Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. (TOG) 35(4), 138 (2016)

    Article  Google Scholar 

  21. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5308–5317 (2016)

  22. Kim, J., Seol, Y., Lee, J.: Human motion reconstruction from sparse 3D motion sensors using kernel CCA-based regression. Comput. Anim. Virtual Worlds 24(6), 565–576 (2013)

    Article  Google Scholar 

  23. Li, Y., Lan, C., Xing, J., Zeng, W., Yuan, C., Liu, J.: Online human action detection using joint classification-regression recurrent neural networks. In: European Conference on Computer Vision, pp. 203–220. Springer (2016)

  24. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: European Conference on Computer Vision, pp. 816–833. Springer (2016)

  25. Mahasseni, B., Todorovic, S.: Regularizing long short term memory with 3D human-skeleton sequences for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3054–3062 (2016)

  26. Multon, F., Kulpa, R., Hoyet, L., Komura, T.: From motion capture to real-time character animation. In: International Workshop on Motion in Games, pp. 72–81. Springer (2008)

  27. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

  28. Noitom: Legacy. http://www.noitom.com/

  29. Ondruska, P., Posner, I.: Deep tracking: seeing beyond seeing using recurrent neural networks. arXiv preprint arXiv:1602.00991 (2016)

  30. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. ICML 3(28), 1310–1318 (2013)

    Google Scholar 

  31. Peng, X.B., Berseth, G., van de Panne, M.: Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph. (TOG) 35(4), 81 (2016)

    Google Scholar 

  32. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Möbius, B. (ed.) Interspeech, pp. 338–342 (2014)

  33. Shu, T., Ryoo, M., Zhu, S.C.: Learning social affordance for human–robot interaction. arXiv preprint arXiv:1604.03692 (2016)

  34. Shum, H.P., Wang, H., Ho, E.S., Komura, T.: Skillvis: a visualization tool for boxing skill assessment. In: Proceedings of the 9th International Conference on Motion in Games, pp. 145–153. ACM (2016)

  35. Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1961–1970 (2016)

  36. Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024 (2011)

  37. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

  38. Tan, J., Liu, K., Turk, G.: Stable proportional-derivative controllers. IEEE Comput. Graph. Appl. 31(4), 34–44 (2011)

    Article  Google Scholar 

  39. Van Welbergen, H., Van Basten, B.J., Egges, A., Ruttkay, Z.M., Overmars, M.H.: Real time animation of virtual humans: a trade-off between naturalness and control. In: Zhang, H., Chen, M. (ed.) Computer Graphics Forum, vol. 29, pp. 2530–2554. Wiley Online Library (2010)

  40. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

  41. Vogt, D., Grehl, S., Berger, E., Amor, H.B., Jung, B.: A data-driven method for real-time character animation in human-agent interaction. In: International Conference on Intelligent Virtual Agents, pp. 463–476. Springer (2014)

  42. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)

    Article  Google Scholar 

  43. Wang, Y., Li, E., Wang, F., Xu, B.: A virtual character learns to defend himself in sword fighting based on q-network. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 291–298. IEEE (2016)

  44. Wang, Y., Wang, Z., Bao, G., Xu, B.: Optimization control for biped motion trajectory. In: 2014 International Conference on Audio, Language and Image Processing (ICALIP), pp. 780–785. IEEE (2014)

  45. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)

    Article  Google Scholar 

  46. Ye, Y., Liu, C.K.: Synthesis of responsive motion using a dynamic model. In: Zhang, H., Chen, M. (ed.) Computer Graphics Forum, vol. 29, pp. 555–562. Wiley Online Library (2010)

  47. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 28–35. IEEE (2012)

  48. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. arXiv preprint arXiv:1603.07772 (2016)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under No.61471359, by National Key Technology R&D Program of China under No. 2015BAH53F01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wujun Che.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Che, W. & Xu, B. Encoder–decoder recurrent network model for interactive character animation generation. Vis Comput 33, 971–980 (2017). https://doi.org/10.1007/s00371-017-1378-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-017-1378-5

Keywords

Navigation