Skip to main content
Log in

MGAPoseNet: multiscale graph-attention for 3D human pose estimation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Despite the considerable advancements made in the field of 3D human pose estimation from single-view images, previous studies have often overlooked the exploration of global and local correlations. Recognizing this limitation, we present MGAPoseNet, a novel network architecture meticulously designed to elevate the accuracy of 3D pose estimation. Our approach is distinguished by its simultaneous extraction of both local and global features, achieved through the parallel integration of Local Graph-based Joint Connection (LGC) and Global Attention-based Body Constraint (GAC) modules. Moreover, the performance of MGAPoseNet is further elevated by the sequential Spatial-Channel Graph MLP-Like Architecture (SC-GraphMLP) module. This module adeptly leverages spatial and channel information to model intricate interactions and dependencies among joint features, thereby refining the accuracy of pose estimation. Experimental evaluation conducted on benchmark datasets, including Human3.6M and MPI-INF-3DHP, unequivocally verifies the state-of-the-art performance of MGAPoseNet. This rigorous validation underscores its superiority in 3D human pose estimation tasks, while enhancing its coherence and clarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availibility

The data used in this research work is obtained from publicly available datasets and will be made accessible for research purposes. The researchers acknowledge the importance of data sharing to promote reproducibility and further advancements in the field of 3D human pose estimation.

References

  1. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)

  2. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499 (2016). Springer

  3. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)

  4. Zou, Z., Tang, W.: Modulated graph convolutional network for 3d human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11477–11487 (2021)

  5. Li, W., Liu, H., Ding, R., Liu, M., Wang, P., Yang, W.: Exploiting temporal contexts with strided transformer for 3d human pose estimation. IEEE Trans. Multimedia 25, 1282–1293 (2022)

    Article  Google Scholar 

  6. Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3d human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656–11665 (2021)

  7. Li, W., Liu, H., Tang, H., Wang, P., Van Gool, L.: Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13147–13156 (2022)

  8. Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)

  9. Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)

  10. Li, W., Liu, H., Guo, T., Tang, H., Ding, R.: Graphmlp: A graph MLP-like architecture for 3d human pose estimation. arXiv:2206.06420 (2022)

  11. Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. In: Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part II 12, pp. 332–347 (2015). Springer

  12. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)

  13. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)

  14. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)

  15. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)

  16. Li, C., Lee, G.H.: Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9887–9895 (2019)

  17. Fang, H.-S., Xu, Y., Wang, W., Liu, X., Zhu, S.-C.: Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  18. Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency, pp. 123–141. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01234-2_8

  19. Hossain, M., Little, J.: Exploiting Temporal Information for 3D Human Pose Estimation, pp. 69–86. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01249-6_5

  20. Hu, W., Zhang, C., Zhan, F., Zhang, L., Wong, T.-T.: Conditional directed graph convolution for 3d human pose estimation. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475219

  21. Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing Network Structure for 3D Human Pose Estimation. IEEE (2019). https://doi.org/10.1109/iccv.2019.00235

  22. Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation, pp. 318–334. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58607-2_19

  23. Zhao, W., Wang, W., Tian, Y.: GraFormer: Graph-Oriented Transformer for 3D Pose Estimation, pp. 20438–20447 (2022)

  24. Lian, D., Yu, Z., Sun, X., Gao, S.: AS-MLP: An axial shifted MLP architecture for vision

  25. Liu, H., Dai, Z., So, D., Le, Q.: Pay attention to mlps. NeurIPS 34(3), 9204–9215 (2021)

    Google Scholar 

  26. Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J.: MLP-Mixer: An all-MLP Architecture for Vision (2009)

  27. Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., Verbeek, J.: Resmlp: Feedforward Networks for Image Classification with Data-Efficient Training. Technical Report 3 (2021)

  28. Chen, G., Liang, B., Liu, Z., Yu, G., Xie, X., Luo, T., Xie, Z., Chen, D., Zhu, M.-Q., Shen, G.: High performance rigid and flexible visible-light photodetectors based on aligned X(In, Ga)P nanowire arrays. J. Mater. Chem. C 2(7), 1270–1277 (2014). https://doi.org/10.1039/c3tc31507j

    Article  Google Scholar 

  29. Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D Human Pose Estimation in the Wild by Adversarial Learning, pp. 5255–5264. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00551

  30. Xu, T., Takano, W.: Graph Stacked Hourglass Networks for 3D Human Pose Estimation, vol. 6, p. 7. IEEE (2021). https://doi.org/10.1109/cvpr46437.2021.01584

  31. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.: Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks. IEEE (2009). https://doi.org/10.1109/iccv.2019.00236

  32. Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach, vol. 3, pp. 507–523. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58568-6_30

  33. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI 36(7), 5 (2013)

    Google Scholar 

  34. Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2d human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019). https://doi.org/10.26599/tst.2018.9010100

    Article  Google Scholar 

  35. Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation in the wild using improved CNN supervision, pp. 506–516. IEEE (2017). https://doi.org/10.1109/3dv.2017.00064

Download references

Author information

Authors and Affiliations

Authors

Contributions

Liu wrote the main manuscript text and prepared Figs. 1, 2 and 3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Minghao Liu.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest that could influence the interpretation of the results or the objective presentation of the research findings. This research is purely academic and does not involve any financial or personal relationships that could lead to bias.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Wang, W. MGAPoseNet: multiscale graph-attention for 3D human pose estimation. SIViP (2024). https://doi.org/10.1007/s11760-024-03256-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03256-4

Keywords

Navigation