Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training

Chen, Wenshuo; Zhou, Xiang; Yu, Zhengdi; Gu, Weixi; Zhang, Kai

doi:10.1007/978-3-031-53963-3_35

Wenshuo Chen¹⁰,
Xiang Zhou¹⁰,
Zhengdi Yu¹¹,
Weixi Gu¹² &
…
Kai Zhang^10,13

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 920))

Included in the following conference series:

Future of Information and Communication Conference

239 Accesses

Abstract

Estimating human pose from video is a task that receives considerable attention due to its applicability in numerous 3D fields. The complexity of prior knowledge of human body movements poses a challenge to neural network models in the task of regressing keypoints. In this paper, we address this problem by incorporating motion prior in an adversarial way. Different from previous methods, we propose to decompose holistic motion prior to joint motion prior, making it easier for neural networks to learn from prior knowledge thereby boosting the performance on the task. We also utilize a novel regularization loss to balance accuracy and smoothness introduced by motion prior. Our method achieves 9% lower PA-MPJPE and 29% lower acceleration error than previous methods tested on 3DPW. The estimator proves its robustness by achieving impressive performance on in-the-wild dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Google Scholar
Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: Humor: 3D human motion model for robust pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11488–11499 (2021)
Google Scholar
Yuan, Y., Iqbal, U., Molchanov, P., Kitani, K., Kautz, J.: GLAMR: global occlusion-aware human mesh recovery with dynamic cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11038–11049 (2022)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2252–2261 (2019)
Google Scholar
Luo, Z., Golestaneh, S.A., Kitani, K.M.: 3D human motion estimation via motion compression and refinement. In: Proceedings of the Asian Conference on Computer Vision (2020)
Google Scholar
Li, R., Yang, S., Ross, D.A., Kanazawa, A.: AI choreographer: music conditioned 3D dance generation with AIST++. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13401–13412 (2021)
Google Scholar
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5442–5451 (2019)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Article Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5745–5753 (2019)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR (2017)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement, arXiv preprint arXiv:1804.02767 (2018)
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)
Google Scholar
Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 601–617 (2018)
Google Scholar
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623 (2019)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the key-Area Research and Development Program of Guangdong Province (2020B0909050003).

Author information

Authors and Affiliations

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Wenshuo Chen, Xiang Zhou & Kai Zhang
Durham University, Durham, UK
Zhengdi Yu
China Academy of Industrial Internet, Beijing, China
Weixi Gu
Research Institute of Tsinghua, Pearl River Delta, Beijing, China
Kai Zhang

Authors

Wenshuo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhengdi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Weixi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weixi Gu or Kai Zhang .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, W., Zhou, X., Yu, Z., Gu, W., Zhang, K. (2024). Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 920. Springer, Cham. https://doi.org/10.1007/978-3-031-53963-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-53963-3_35
Published: 17 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53962-6
Online ISBN: 978-3-031-53963-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training