Advertisement

History Repeats Itself: Human Motion Prediction via Motion Attention

Conference paper
  • 649 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

Human motion prediction aims to forecast future human poses given a past motion. Whether based on recurrent or feed-forward neural networks, existing methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention-based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.

Keywords

Human motion prediction Motion attention 

Notes

Acknowledgements

This research was supported in part by the Australia Research Council DECRA Fellowship (DE180100628) and ARC Discovery Grant (DP200102274). The authors would like to thank NVIDIA for the donated GPU (Titan V).

Supplementary material

Supplementary material 1 (mp4 2649 KB)

504468_1_En_28_MOESM2_ESM.pdf (311 kb)
Supplementary material 2 (pdf 311 KB)

References

  1. 1.
    Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015)Google Scholar
  3. 3.
    Brand, M., Hertzmann, A.: Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192. ACM Press/Addison-Wesley Publishing Company (2000)Google Scholar
  4. 4.
    Butepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: CVPR (July 2017)Google Scholar
  5. 5.
    Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV. pp. 4346–4354 (2015)Google Scholar
  6. 6.
    Gong, H., Sim, J., Likhachev, M., Shi, J.: Multi-hypothesis motion planning for visual object tracking. In: ICCV, pp. 619–626. IEEE (2011)Google Scholar
  7. 7.
    Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR, pp. 12116–12125 (2019)Google Scholar
  8. 8.
    Gui, L.Y., Wang, Y.X., Liang, X., Moura, J.M.: Adversarial geometry-aware human motion prediction. In: ECCV, pp. 786–803 (2018)Google Scholar
  9. 9.
    Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV, pp. 7134–7143 (2019)Google Scholar
  10. 10.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  11. 11.
    Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: Deep learning on spatio-temporal graphs. In: CVPR, pp. 5308–5317 (2016)Google Scholar
  12. 12.
    Kiros, R., et al.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)Google Scholar
  13. 13.
    Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)Google Scholar
  14. 14.
    Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: ACM SIGGRAPH 2008 classes, pp. 1–10 (2008)Google Scholar
  15. 15.
    Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)Google Scholar
  16. 16.
    Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: CVPR, pp. 5226–5234 (2018)Google Scholar
  17. 17.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)Google Scholar
  18. 18.
    Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: ICCV (October 2019). https://amass.is.tue.mpg.de
  19. 19.
    Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV, pp. 9489–9497 (2019)Google Scholar
  20. 20.
    von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: ECCV (September 2018)Google Scholar
  21. 21.
    Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (July 2017)Google Scholar
  22. 22.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)Google Scholar
  23. 23.
    Pavllo, D., Feichtenhofer, C., Auli, M., Grangier, D.: Modeling human motion with quaternion-based neural networks. In: IJCV, pp. 1–18 (2019)Google Scholar
  24. 24.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 245 (2017)Google Scholar
  25. 25.
    Runia, T.F., Snoek, C.G., Smeulders, A.W.: Real-world repetition estimation by div, grad and curl. In: CVPR, pp. 9009–9017 (2018)Google Scholar
  26. 26.
    Sidenbladh, Hedvig., Black, Michael J., Sigal, Leonid: Implicit probabilistic models of human motion for synthesis and tracking. In: Heyden, Anders, Sparr, Gunnar, Nielsen, Mads, Johansen, Peter (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47969-4_52CrossRefGoogle Scholar
  27. 27.
    Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: ICML, pp. 1017–1024 (2011)Google Scholar
  28. 28.
    Tang, Y., Ma, L., Liu, W., Zheng, W.S.: Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI (July 2018).  https://doi.org/10.24963/ijcai.2018/130, http://dx.doi.org/10.24963/ijcai.2018/130
  29. 29.
    Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)Google Scholar
  30. 30.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. TPAMI 30(2), 283–298 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Australian National UniversityCanberraAustralia
  2. 2.EPFL–CVLab and ClearSpaceLausanneSwitzerland

Personalised recommendations