The Visual Computer

, Volume 36, Issue 1, pp 141–160 | Cite as

Adversarial learning for modeling human motion

  • Qi Wang
  • Thierry ArtièresEmail author
  • Mickael Chen
  • Ludovic Denoyer
Original Article


We investigate how adversarial learning may be used for various animation tasks related to human motion synthesis. We propose a learning framework that we decline for building various models corresponding to various needs: a random synthesis generator that randomly produces realistic motion capture trajectories; conditional variants that allow controlling the synthesis by providing high-level features that the animation should match; a style transfer model that allows transforming an existing animation in the style of another one. Our work is built on the adversarial learning strategy that has been proposed in the machine learning field very recently (2014) for learning accurate generative models on complex data, and that has been shown to provide impressive results, mainly on image data. We report both objective and subjective evaluation results on motion capture data performed under emotion, the Emilya Dataset. Our results show the potential of our proposals for building models for a variety of motion synthesis tasks.


Adversarial learning, generative models Recurrent neural networks Motion capture data Motion synthesis Style transferring 



We warmly thank Catherine Pélachaud (CNRS, France) for providing the Emilya dataset. Part of this work was done within the framework of the French-funded ANR Deep in France Project (ANR-16-CE23-0006). The thesis of author Qi WANG is funded by China Scholarship Council.

Supplementary material

371_2018_1594_MOESM1_ESM.pdf (128 kb)
Supplementary material 1 (pdf 127 KB)

Supplementary material 2 (mp4 44929 KB)

Supplementary material 3 (mp4 24630 KB)

Supplementary material 4 (mp4 108408 KB)


  1. 1.
    Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.: Generating Sentences from a Continuous Space. ICLR pp. 1–13 (2016)Google Scholar
  2. 2.
    Brand, M., Hertzmann, A.: Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192. ACM Press/Addison-Wesley Publishing Co. (2000)Google Scholar
  3. 3.
    Chen, M., Denoyer, L.: Multi-view Generative Adversarial Networks. CoRR abs/1611.02019 (2016)Google Scholar
  4. 4.
    Chen, M., Denoyer, L., Artieres, T.: Multi-view data generation without view supervision. In: International Conference on Learning Representations (2018)Google Scholar
  5. 5.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv:1606.03657 [cs.LG] pp. 1–14 (2016)
  6. 6.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. CoRR abs/1606.03657 (2016). arXiv:1606.03657
  7. 7.
    Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar (2014)Google Scholar
  8. 8.
    Chollet, F., et al.: Keras. (2015)
  9. 9.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555 (2014)
  10. 10.
    Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A Recurrent Latent Variable Model for Sequential Data. arxiv (2015)Google Scholar
  11. 11.
    Denton, E., Birodkar, V.: Unsupervised Learning of Disentangled Representations from Video. CoRR abs/1705.10915 (2017)Google Scholar
  12. 12.
    Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Arxiv pp. 1–10 (2015)Google Scholar
  13. 13.
    Ding, Y., Prepin, K., Huang, J., Pelachaud, C., Artières, T.: Laughter animation synthesis. In: AAMAS (2014)Google Scholar
  14. 14.
    Fourati, N., Pelachaud, C.: Emilya: Emotional body expression in daily actions database. In: LREC, pp. 3486–3493 (2014)Google Scholar
  15. 15.
    Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent Network Models for Human Dynamics. In: 2015 IEEE International Conference on Computer Vision (ICCV) pp. 4346–4354 (2015).
  16. 16.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML. JMLR Workshop and Conference Proceedings (2015)Google Scholar
  17. 17.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A Neural Algorithm of Artistic Style. arXiv preprint arXiv:1508.06576 (2015)
  18. 18.
    Gleicher, M.: Motion editing with space–time constraints. In: Proceedings of the 1997 Symposium on Interactive 3D Graphics, pp. 139–ff. ACM (1997)Google Scholar
  19. 19.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)Google Scholar
  20. 20.
    Graves, A.: Generating Sequences with Recurrent Neural Networks. Technical Reports pp. 1–43 (2013). arXiv:1308.0850
  21. 21.
    Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. ACM Trans. Graph. 23(3), 522–531 (2004). CrossRefGoogle Scholar
  22. 22.
    Holden, D., Komura, T., Saito, J.: Phase-functioned neural networks for character control. ACM Trans. Graph. (TOG) 36(4), 42 (2017)CrossRefGoogle Scholar
  23. 23.
    Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35(4), 1–11 (2016). CrossRefGoogle Scholar
  24. 24.
    Hsu, E., Pulli, K., Popović, J.: Style translation for human motion. In: ACM Transactions on Graphics (TOG), vol. 24, pp. 1082–1089. ACM (2005)Google Scholar
  25. 25.
    Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-rnn: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5308–5317 (2016)Google Scholar
  26. 26.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  27. 27.
    Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.: Fader networks: manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA, pp. 5969–5978 (2017)Google Scholar
  28. 28.
    Levine, S., Wang, J., Popović, A.H.Z., Koltun, V.: Continuous character control with low-dimensional embeddings. ACM Trans. Graph. 31(4), 1–10 (2012). CrossRefGoogle Scholar
  29. 29.
    Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completionGoogle Scholar
  30. 30.
    Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.: Adversarial Autoencoders. arXiv pp. 1–10 (2015). arXiv:1511.05644
  31. 31.
    Mathieu, M., Zhao, J.J., Sprechmann, P., Ramesh, A., LeCun, Y.: Disentangling Factors of Variation in Deep Representations Using Adversarial Training. CoRR abs/1611.03383 (2016). arXiv:1611.03383
  32. 32.
    Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. CoRR pp. 1–7 (2014). arXiv:1411.1784
  33. 33.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR (2014)Google Scholar
  34. 34.
    Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. Tech. Rep. CG-2007-2, Universität Bonn (2007)Google Scholar
  35. 35.
    Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., Clune, J.: Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. Iccv (3) (2017)Google Scholar
  36. 36.
    van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: Wavenet: A Generative Model for Raw Audio. arXiv pp. 846–849 (2015).
  37. 37.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. arXiv preprint arXiv:1804.02717 (2018)Google Scholar
  39. 39.
    Perarnau, G., van de Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible Conditional GANs for Image Editing. CoRR abs/1611.06355 (2016). arXiv:1611.06355
  40. 40.
    Radenen, M., Artières, T.: Contextual hidden markov models. In: ICASSP, pp. 2113–2116 (2012)Google Scholar
  41. 41.
    Schmidhuber, J., Hochreiter, S.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  42. 42.
    Shapiro, A., Cao, Y., Faloutsos, P.: Style components. In: Proceedings of the Graphics Interface 2006 Conference, June 7–9, 2006, Quebec, Canada, pp. 33–39 (2006).
  43. 43.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural Networks. Nips pp. 3104–3112 (2014). MathSciNetCrossRefGoogle Scholar
  44. 44.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)CrossRefGoogle Scholar
  45. 45.
    Wang, Q., Artieres, T.: Motion capture synthesis with adversarial learning. In: International Conference on Intelligent Virtual Agents, pp. 467–470. Springer (2017)Google Scholar
  46. 46.
    Wang, Q., Artières, T., Ding, Y.: Learning activity patterns performed with emotion. In: Proceedings of the 3rd International Symposium on Movement and Computing, MOCO (2016)Google Scholar
  47. 47.
    Wang, Q., Chen, M., Artires, T., Denoyer, L.: transferring style in motion capture sequences with adversarial learning. In: ESANN (2018)Google Scholar
  48. 48.
    Welman, C.: Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation. Simon Fraser University (1994)Google Scholar
  49. 49.
    Xia, S., Wang, C., Chai, J., Hodgins, J.: Realtime style transfer for unlabeled heterogeneous human motion. ACM Trans. Graph. (TOG) 34(4), 119 (2015)CrossRefGoogle Scholar
  50. 50.
    Yumer, M.E., Mitra, N.J.: Spectral style transfer for human motion between independent actions. ACM Trans. Graph. (TOG) 35(4), 137 (2016)CrossRefGoogle Scholar
  51. 51.
    Zhou, Y., Li, Z., Xiao, S., He, C., Huang, Z., Li, H.: Auto-conditioned recurrent networks for extended complex human motion synthesis (2018)Google Scholar
  52. 52.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial NetworksGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Aix Marseille Univ, Univ. de Toulon, CNRS, LISMarseilleFrance
  2. 2.Ecole Centrale MarseilleMarseilleFrance
  3. 3.Sorbonne Université, CNRSParisFrance

Personalised recommendations