Multi-level Motion Attention for Human Motion Prediction

Mao, Wei; Liu, Miaomiao; Salzmann, Mathieu; Li, Hongdong

doi:10.1007/s11263-021-01483-7

Multi-level Motion Attention for Human Motion Prediction

Published: 16 June 2021

Volume 129, pages 2513–2535, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Wei Mao ORCID: orcid.org/0000-0002-8876-8983¹,
Miaomiao Liu¹,
Mathieu Salzmann² &
…
Hongdong Li¹

1607 Accesses
36 Citations
1 Altmetric
Explore all metrics

Abstract

Human motion prediction aims to forecast future human poses given a historical motion. Whether based on recurrent or feed-forward neural networks, existing learning based methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, we study the use of different types of attention, computed at joint, body part, and full pose levels. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 2: Overview of our motion attention pipeline.**

Fig. 4: Different fusion model.

Fig. 6

Fig. 7

History Repeats Itself: Human Motion Prediction via Motion Attention

A multilayer human motion prediction perceptron by aggregating repetitive motion

Article 13 September 2023

Predicting human poses via recurrent attention network

Article Open access 29 August 2023

Notes

Described at https://github.com/nghorbani/amass
Available at https://amass.is.tue.mpg.de/dataset.

References

Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2009). Nonrigid structure from motion in trajectory space. In: Advances in neural information processing systems, pp 41–48.
Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In: ICLR.
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate.
Brand, M., & Hertzmann, A. (2000). Style machines. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp 183–192.
Butepage, J., Black, M.J., Kragic, D., & Kjellstrom, H. (2017). Deep representation learning for human motion prediction and classification. In: CVPR.
Cai, Y., Huang, L., Wang, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., et al. (2020). Learning progressive joint propagation for human motion prediction. In: ECCV.
Fragkiadaki, K., Levine, S., Felsen, P., & Malik, J. (2015). Recurrent network models for human dynamics. In: ICCV, pp 4346–4354.
Gong, H., Sim, J., Likhachev, M., & Shi, J. (2011). Multi-hypothesis motion planning for visual object tracking. In: ICCV, IEEE, pp 619–626.
Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., & Ororbia, A.G. (2019). A neural temporal model for human motion prediction. In: CVPR, pp 12116–12125.
Gui, L.Y., Wang, Y.X., Liang, X., & Moura, J.M. (2018). Adversarial geometry-aware human motion prediction. In: ECCV, pp 786–803.
Hernandez, A., Gall, J., & Moreno-Noguer, F. (2019). Human motion prediction via spatio-temporal inpainting. In: ICCV, pp 7134–7143.
Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339.
Article Google Scholar
Jain, A., Zamir, A.R., Savarese, S., & Saxena, A. (2016). Structural-rnn: Deep learning on spatio-temporal graphs. In: CVPR, pp 5308–5317.
Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In: ICLR.
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In: NIPS, pp 3294–3302.
Koppula, H.S., & Saxena, A. (2013). Anticipating human activities for reactive robotic response. In: IROS, Tokyo, p 2071.
Kovar, L., Gleicher, M., & Pighin, F. (2008). Motion graphs. In: ACM SIGGRAPH 2008 classes, pp 1–10.
Levine, S., Wang, J. M., Haraux, A., Popović, Z., & Koltun, V. (2012). Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics, 31(4), 28.
Article Google Scholar
Li, C., Zhang, Z., Lee, W.S., Lee, G.H. (2018a). Convolutional sequence to sequence model for human dynamics. In: CVPR, pp 5226–5234.
Li, X., Li, H., Joo, H., Liu, Y., & Sheikh, Y. (2018b). Structure from recurrent motion: From rigidity to recurrency. In: CVPR, pp 3032–3040.
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). SMPL: A skinned multi-person linear model. ACM Trans Graphics (Proc SIGGRAPH Asia), 34(6), 248:1-248:16.
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., & Black, M.J. (2019). Amass: Archive of motion capture as surface shapes. In: ICCV, https://amass.is.tue.mpg.de.
Mao, W., Liu, M., Salzmann, M., & Li, H. (2019). Learning trajectory dependencies for human motion prediction. In: ICCV, pp 9489–9497.
Mao, W., Liu, M., & Salzmann, M. (2020). History repeats itself: Human motion prediction via motion attention. In: ECCV.
von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3d human pose in the wild using imus and a moving camera. In: ECCV.
Martinez, J., Black, M.J., & Romero, J. (2017). On human motion prediction using recurrent neural networks. In: CVPR.
Pavllo, D., Feichtenhofer, C., Auli, M., & Grangier, D. (2019). Modeling human motion with quaternion-based neural networks. IJCV pp 1–18.
Romero, J., Tzionas, D., & Black, M.J. (2017). Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc SIGGRAPH Asia) 36(6).
Runia, T.F., Snoek, C.G., & Smeulders, A.W. (2018). Real-world repetition estimation by div, grad and curl. In: CVPR, pp 9009–9017.
Sidenbladh, H., Black, M.J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In: ECCV, Springer, pp 784–800.
Sutskever, I., Martens, J., & Hinton, G.E. (2011). Generating text with recurrent neural networks. In: ICML, pp 1017–1024.
Tang, Y., Ma, L., Liu, W., Zheng, W.S. (2018). Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI 10.24963/ijcai.2018/130.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: NIPS, pp 5998–6008.
Wang, J. M., Fleet, D. J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by the Australia Research Council DECRA Fellowship (DE180100628) and ARC Discovery Grant (DP200102274). The authors would like to thank NVIDIA for the donated GPU (Titan V).

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Wei Mao, Miaomiao Liu & Hongdong Li
EPFL–CVLab & ClearSpace, Lausanne, Switzerland
Mathieu Salzmann

Authors

Wei Mao
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Salzmann
View author publications
You can also search for this author in PubMed Google Scholar
Hongdong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Mao.

Additional information

Communicated by Javier Romero.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, W., Liu, M., Salzmann, M. et al. Multi-level Motion Attention for Human Motion Prediction. Int J Comput Vis 129, 2513–2535 (2021). https://doi.org/10.1007/s11263-021-01483-7

Download citation

Received: 14 September 2020
Accepted: 24 May 2021
Published: 16 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11263-021-01483-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-level Motion Attention for Human Motion Prediction

Abstract

Access this article

Similar content being viewed by others

History Repeats Itself: Human Motion Prediction via Motion Attention

A multilayer human motion prediction perceptron by aggregating repetitive motion

Predicting human poses via recurrent attention network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 288 KB)

Supplementary material 3 (pdf 1562 KB)

Supplementary material 4 (pdf 1220 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-level Motion Attention for Human Motion Prediction

Abstract

Access this article

Similar content being viewed by others

History Repeats Itself: Human Motion Prediction via Motion Attention

A multilayer human motion prediction perceptron by aggregating repetitive motion

Predicting human poses via recurrent attention network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 288 KB)

Supplementary material 3 (pdf 1562 KB)

Supplementary material 4 (pdf 1220 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation