Hierarchical Style-Based Networks for Motion Synthesis

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)


Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world. In this paper, we propose an unsupervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location. Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner. Given the starting and ending states, a memory bank is used to retrieve motion references as source material for short-range clip generation. We first propose to explicitly disentangle the provided motion material into style and content counterparts via bi-linear transformation modelling, where diverse synthesis is achieved by free-form combination of these two components. The short-range clips are then connected to form a long-range motion sequence. Without ground truth annotation, we propose a parameterized bi-directional interpolation scheme to guarantee the physical validity and visual naturalness of generated results. On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion, which is also generalizable to unseen motion data during testing. Moreover, we demonstrate the generated sequences are useful as subgoals for actual physical execution in the animated world. Please refer to our project page for more synthesised results (


Long-range motion generation Motion style transfer 



This work was supported in part by BAIR and BDD. This work was also supported in part by grant No. 18DZ1112300, No. 61976137, No. 61527804, No. U1611461, No. U19B2035, and No. 2016YFB1001003.

Supplementary material

504452_1_En_11_MOESM1_ESM.pdf (46 kb)
Supplementary material 1 (pdf 46 KB)


  1. 1.
    Barsoum, E., Kender, J., Liu, Z.: HP-GAN: probabilistic 3D human motion prediction via GAN. In: CVPR Workshops, pp. 1418–1427 (2018)Google Scholar
  2. 2.
    Brand, M., Hertzmann, A.: Style machines. In: SIGGRAPH, pp. 183–192 (2000)Google Scholar
  3. 3.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cai, H., Bai, C., Tai, Y.-W., Tang, C.-K.: Deep video generation, prediction and completion of human action sequences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 374–390. Springer, Cham (2018). Scholar
  5. 5.
    Chen, C., et al.: Unsupervised 3D pose estimation with geometric self-supervision. In: CVPR, pp. 5714–5724 (2019)Google Scholar
  6. 6.
    Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: NIPS, pp. 64–72 (2016)Google Scholar
  7. 7.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR, pp. 2414–2423 (2016)Google Scholar
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    Holden, D., Habibie, I., Kusajima, I., Komura, T.: Fast neural style transfer for motion data. IEEE Comput. Graph. Appl. 37(4), 42–49 (2017)CrossRefGoogle Scholar
  10. 10.
    Holden, D., Komura, T., Saito, J.: Phase-functioned neural networks for character control. ACM Trans. Graph. 36(4), 42:1–42:13 (2017)CrossRefGoogle Scholar
  11. 11.
    Holden, D., Saito, J., Komura, T.: A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35(4), 138:1–138:11 (2016)CrossRefGoogle Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  13. 13.
    Kovar, L., Gleicher, M.: Flexible automatic motion blending with registration curves. In: SIGGRAPH/Eurographics Symposium, pp. 214–224 (2003)Google Scholar
  14. 14.
    Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: NIPS, pp. 3675–3683 (2016)Google Scholar
  15. 15.
    Lee, Y., Sun, S., Somasundaram, S., Hu, E.S., Lim, J.J.: Composing complex skills by learning transition policies. In: ICLR (2019)Google Scholar
  16. 16.
    Levine, S., Wang, J.M., Haraux, A., Popovic, Z., Koltun, V.: Continuous character control with low-dimensional embeddings. ACM Trans. Graph. 31(4), 28:1–28:10 (2012)CrossRefGoogle Scholar
  17. 17.
    Li, Y., Wang, T., Shum, H.: Motion texture: a two-level statistical model for character motion synthesis. ACM Trans. Graph. 21(3), 465–472 (2002)CrossRefGoogle Scholar
  18. 18.
    Li, Y., Roblek, D., Tagliasacchi, M.: From here to there: video inbetweening using direct 3D convolutions. CoRR abs/1905.10240 (2019)Google Scholar
  19. 19.
    Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: ICCV, pp. 4473–4481 (2017)Google Scholar
  20. 20.
    Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: CVPR, pp. 1410–1418 (2015)Google Scholar
  21. 21.
    Myers, D.R.: Robot Motion: Planning and Control edited by Michael Brady M.I.T. Press, Cambridge MA, USA, 1983 (£33.95). Robotica 1(2), 109 (1983)Google Scholar
  22. 22.
    Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: ICCV, pp. 261–270 (2017)Google Scholar
  23. 23.
    Park, S.I., Shin, H.J., Shin, S.Y.: On-line locomotion generation based on motion blending. In: SIGGRAPH/Eurographics Symposium, pp. 105–111 (2002)Google Scholar
  24. 24.
    Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019)Google Scholar
  25. 25.
    Pavllo, D., Feichtenhofer, C., Auli, M.: Modeling human motion with quaternion-based neural networks. In: IJCV (2019)Google Scholar
  26. 26.
    Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. (TOG) 37(4), 143 (2018)Google Scholar
  27. 27.
    Shoemake, K.: Animating rotation with quaternion curves. In: SIGGRAPH, pp. 245–254 (1985)Google Scholar
  28. 28.
    Tan, C.I., Tai, W.: Characteristics preserving racer animation: a data-driven race path synthesis in formation space. J. Vis. Comput. Anim. 23(3–4), 215–223 (2012)Google Scholar
  29. 29.
    Tenenbaum, J.B., Freeman, W.T.: Separating style and content. In: NeurIPS, pp. 662–668 (1996)Google Scholar
  30. 30.
    Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: Towards accurate generative models of video: a new metric & challenges. CoRR abs/1812.01717 (2018).
  31. 31.
    Urtasun, R., Fleet, D.J., Geiger, A., Popovic, J., Darrell, T., Lawrence, N.D.: Topologically-constrained latent variable models. In: ICML, pp. 1080–1087 (2008)Google Scholar
  32. 32.
    Wexler, Y., Shechtman, E., Irani, M.: Space-time completion of video. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 463–476 (2007)CrossRefGoogle Scholar
  33. 33.
    Xia, G., Sun, H., Liu, Q., Hang, R.: Learning-based sphere nonlinear interpolation for motion synthesis. IEEE Trans. Ind. Inform. 15(5), 2927–2937 (2019). Scholar
  34. 34.
    Xia, S., Wang, C., Chai, J., Hodgins, J.K.: Realtime style transfer for unlabeled heterogeneous human motion. ACM Trans. Graph. 34(4), 119:1–119:10 (2015)CrossRefGoogle Scholar
  35. 35.
    Yan, X., et al.: MT-VAE: learning motion transformations to generate multimodal human dynamics. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 276–293. Springer, Cham (2018). Scholar
  36. 36.
    Zhang, H., Starke, S., Komura, T., Saito, J.: Mode-adaptive neural networks for quadruped motion control. ACM Trans. Graph. 37(4)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Shanghai Jiao Tong UniversityShanghaiChina
  2. 2.University of CaliforniaBerkeleyUSA
  3. 3.University of CaliforniaSan DiegoUSA

Personalised recommendations