Abstract
Despite the considerable advancements made in the field of 3D human pose estimation from single-view images, previous studies have often overlooked the exploration of global and local correlations. Recognizing this limitation, we present MGAPoseNet, a novel network architecture meticulously designed to elevate the accuracy of 3D pose estimation. Our approach is distinguished by its simultaneous extraction of both local and global features, achieved through the parallel integration of Local Graph-based Joint Connection (LGC) and Global Attention-based Body Constraint (GAC) modules. Moreover, the performance of MGAPoseNet is further elevated by the sequential Spatial-Channel Graph MLP-Like Architecture (SC-GraphMLP) module. This module adeptly leverages spatial and channel information to model intricate interactions and dependencies among joint features, thereby refining the accuracy of pose estimation. Experimental evaluation conducted on benchmark datasets, including Human3.6M and MPI-INF-3DHP, unequivocally verifies the state-of-the-art performance of MGAPoseNet. This rigorous validation underscores its superiority in 3D human pose estimation tasks, while enhancing its coherence and clarity.
Similar content being viewed by others
Data availibility
The data used in this research work is obtained from publicly available datasets and will be made accessible for research purposes. The researchers acknowledge the importance of data sharing to promote reproducibility and further advancements in the field of 3D human pose estimation.
References
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499 (2016). Springer
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)
Zou, Z., Tang, W.: Modulated graph convolutional network for 3d human pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11477–11487 (2021)
Li, W., Liu, H., Ding, R., Liu, M., Wang, P., Yang, W.: Exploiting temporal contexts with strided transformer for 3d human pose estimation. IEEE Trans. Multimedia 25, 1282–1293 (2022)
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., Ding, Z.: 3d human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656–11665 (2021)
Li, W., Liu, H., Tang, H., Wang, P., Van Gool, L.: Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13147–13156 (2022)
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12939–12948 (2021)
Zheng, C., Mendieta, M., Wang, P., Lu, A., Chen, C.: A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5496–5507 (2022)
Li, W., Liu, H., Guo, T., Tang, H., Ding, R.: Graphmlp: A graph MLP-like architecture for 3d human pose estimation. arXiv:2206.06420 (2022)
Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. In: Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part II 12, pp. 332–347 (2015). Springer
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7025–7034 (2017)
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)
Li, C., Lee, G.H.: Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9887–9895 (2019)
Fang, H.-S., Xu, Y., Wang, W., Liu, X., Zhu, S.-C.: Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Lee, K., Lee, I., Lee, S.: Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency, pp. 123–141. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01234-2_8
Hossain, M., Little, J.: Exploiting Temporal Information for 3D Human Pose Estimation, pp. 69–86. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Hu, W., Zhang, C., Zhan, F., Zhang, L., Wong, T.-T.: Conditional directed graph convolution for 3d human pose estimation. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475219
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing Network Structure for 3D Human Pose Estimation. IEEE (2019). https://doi.org/10.1109/iccv.2019.00235
Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation, pp. 318–334. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58607-2_19
Zhao, W., Wang, W., Tian, Y.: GraFormer: Graph-Oriented Transformer for 3D Pose Estimation, pp. 20438–20447 (2022)
Lian, D., Yu, Z., Sun, X., Gao, S.: AS-MLP: An axial shifted MLP architecture for vision
Liu, H., Dai, Z., So, D., Le, Q.: Pay attention to mlps. NeurIPS 34(3), 9204–9215 (2021)
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J.: MLP-Mixer: An all-MLP Architecture for Vision (2009)
Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., Verbeek, J.: Resmlp: Feedforward Networks for Image Classification with Data-Efficient Training. Technical Report 3 (2021)
Chen, G., Liang, B., Liu, Z., Yu, G., Xie, X., Luo, T., Xie, Z., Chen, D., Zhu, M.-Q., Shen, G.: High performance rigid and flexible visible-light photodetectors based on aligned X(In, Ga)P nanowire arrays. J. Mater. Chem. C 2(7), 1270–1277 (2014). https://doi.org/10.1039/c3tc31507j
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D Human Pose Estimation in the Wild by Adversarial Learning, pp. 5255–5264. IEEE (2018). https://doi.org/10.1109/cvpr.2018.00551
Xu, T., Takano, W.: Graph Stacked Hourglass Networks for 3D Human Pose Estimation, vol. 6, p. 7. IEEE (2021). https://doi.org/10.1109/cvpr46437.2021.01584
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.: Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks. IEEE (2009). https://doi.org/10.1109/iccv.2019.00236
Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach, vol. 3, pp. 507–523. Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-58568-6_30
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI 36(7), 5 (2013)
Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2d human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019). https://doi.org/10.26599/tst.2018.9010100
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation in the wild using improved CNN supervision, pp. 506–516. IEEE (2017). https://doi.org/10.1109/3dv.2017.00064
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest that could influence the interpretation of the results or the objective presentation of the research findings. This research is purely academic and does not involve any financial or personal relationships that could lead to bias.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, M., Wang, W. MGAPoseNet: multiscale graph-attention for 3D human pose estimation. SIViP (2024). https://doi.org/10.1007/s11760-024-03256-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03256-4