MGAPoseNet: multiscale graph-attention for 3D human pose estimation

Liu, Minghao; Wang, Wenshan

doi:10.1007/s11760-024-03256-4

MGAPoseNet: multiscale graph-attention for 3D human pose estimation

Original Paper
Published: 18 May 2024

(2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Minghao Liu¹^na1 &
Wenshan Wang¹^na1

59 Accesses
Explore all metrics

Abstract

Despite the considerable advancements made in the field of 3D human pose estimation from single-view images, previous studies have often overlooked the exploration of global and local correlations. Recognizing this limitation, we present MGAPoseNet, a novel network architecture meticulously designed to elevate the accuracy of 3D pose estimation. Our approach is distinguished by its simultaneous extraction of both local and global features, achieved through the parallel integration of Local Graph-based Joint Connection (LGC) and Global Attention-based Body Constraint (GAC) modules. Moreover, the performance of MGAPoseNet is further elevated by the sequential Spatial-Channel Graph MLP-Like Architecture (SC-GraphMLP) module. This module adeptly leverages spatial and channel information to model intricate interactions and dependencies among joint features, thereby refining the accuracy of pose estimation. Experimental evaluation conducted on benchmark datasets, including Human3.6M and MPI-INF-3DHP, unequivocally verifies the state-of-the-art performance of MGAPoseNet. This rigorous validation underscores its superiority in 3D human pose estimation tasks, while enhancing its coherence and clarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation

Article 17 January 2024

Human Pose Estimation Based on Feature Fusion and Graph Encoding Optimization

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Article 13 April 2024

Data availibility

The data used in this research work is obtained from publicly available datasets and will be made accessible for research purposes. The researchers acknowledge the importance of data sharing to promote reproducibility and further advancements in the field of 3D human pose estimation.

References

Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499 (2016). Springer
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)
Zou, Z., Tang, W.: Modulated graph convolutional network for 3d human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11477–11487 (2021)
Li, W., Liu, H., Ding, R., Liu, M., Wang, P., Yang, W.: Exploiting temporal contexts with strided transformer for 3d human pose estimation. IEEE Trans. Multimedia 25, 1282–1293 (2022)
Article Google Scholar
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3d human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656–11665 (2021)
Li, W., Liu, H., Tang, H., Wang, P., Van Gool, L.: Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13147–13156 (2022)
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)
Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)
Li, W., Liu, H., Guo, T., Tang, H., Ding, R.: Graphmlp: A graph MLP-like architecture for 3d human pose estimation. arXiv:2206.06420 (2022)
Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. In: Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part II 12, pp. 332–347 (2015). Springer
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)
Li, C., Lee, G.H.: Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9887–9895 (2019)
Fang, H.-S., Xu, Y., Wang, W., Liu, X., Zhu, S.-C.: Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency, pp. 123–141. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01234-2_8
Hossain, M., Little, J.: Exploiting Temporal Information for 3D Human Pose Estimation, pp. 69–86. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Hu, W., Zhang, C., Zhan, F., Zhang, L., Wong, T.-T.: Conditional directed graph convolution for 3d human pose estimation. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475219
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing Network Structure for 3D Human Pose Estimation. IEEE (2019). https://doi.org/10.1109/iccv.2019.00235
Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation, pp. 318–334. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58607-2_19
Zhao, W., Wang, W., Tian, Y.: GraFormer: Graph-Oriented Transformer for 3D Pose Estimation, pp. 20438–20447 (2022)
Lian, D., Yu, Z., Sun, X., Gao, S.: AS-MLP: An axial shifted MLP architecture for vision
Liu, H., Dai, Z., So, D., Le, Q.: Pay attention to mlps. NeurIPS 34(3), 9204–9215 (2021)
Google Scholar
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J.: MLP-Mixer: An all-MLP Architecture for Vision (2009)
Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., Verbeek, J.: Resmlp: Feedforward Networks for Image Classification with Data-Efficient Training. Technical Report 3 (2021)
Chen, G., Liang, B., Liu, Z., Yu, G., Xie, X., Luo, T., Xie, Z., Chen, D., Zhu, M.-Q., Shen, G.: High performance rigid and flexible visible-light photodetectors based on aligned X(In, Ga)P nanowire arrays. J. Mater. Chem. C 2(7), 1270–1277 (2014). https://doi.org/10.1039/c3tc31507j
Article Google Scholar
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D Human Pose Estimation in the Wild by Adversarial Learning, pp. 5255–5264. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00551
Xu, T., Takano, W.: Graph Stacked Hourglass Networks for 3D Human Pose Estimation, vol. 6, p. 7. IEEE (2021). https://doi.org/10.1109/cvpr46437.2021.01584
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.: Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks. IEEE (2009). https://doi.org/10.1109/iccv.2019.00236
Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach, vol. 3, pp. 507–523. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58568-6_30
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI 36(7), 5 (2013)
Google Scholar
Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2d human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019). https://doi.org/10.26599/tst.2018.9010100
Article Google Scholar
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation in the wild using improved CNN supervision, pp. 506–516. IEEE (2017). https://doi.org/10.1109/3dv.2017.00064

Download references

Author information

Minghao Liu and Wenshan Wang have contributed equally to this work.

Authors and Affiliations

Faculty of Science, Dalian Minzu University, No. 31, Jinshi Road, Jinshitan, Dalian, 116000, Liaoning Province, China
Minghao Liu & Wenshan Wang

Authors

Minghao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenshan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liu wrote the main manuscript text and prepared Figs. 1, 2 and 3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Minghao Liu.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest that could influence the interpretation of the results or the objective presentation of the research findings. This research is purely academic and does not involve any financial or personal relationships that could lead to bias.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, M., Wang, W. MGAPoseNet: multiscale graph-attention for 3D human pose estimation. SIViP (2024). https://doi.org/10.1007/s11760-024-03256-4

Download citation

Received: 05 March 2024
Revised: 23 April 2024
Accepted: 28 April 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s11760-024-03256-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MGAPoseNet: multiscale graph-attention for 3D human pose estimation

Abstract

Access this article

Similar content being viewed by others

Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation

Human Pose Estimation Based on Feature Fusion and Graph Encoding Optimization

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MGAPoseNet: multiscale graph-attention for 3D human pose estimation

Abstract

Access this article

Similar content being viewed by others

Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation

Human Pose Estimation Based on Feature Fusion and Graph Encoding Optimization

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Data availibility

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation