MS-HRNet: multi-scale high-resolution network for human pose estimation

Wang, Yanxia; Wang, Renjie; Shi, Hu; Liu, Dan

doi:10.1007/s11227-024-06125-6

MS-HRNet: multi-scale high-resolution network for human pose estimation

Published: 25 April 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yanxia Wang¹,
Renjie Wang¹,
Hu Shi¹ &
…
Dan Liu¹

90 Accesses
Explore all metrics

Abstract

Human pose estimation has important applications in medical diagnosis (such as early diagnosis of autism in children and assisting with the diagnosis of Parkinson’s disease), human-computer interaction, animation, and other fields. Currently, many human pose estimation algorithms are based on deep learning. However, most research focuses only on increasing the depth and width of the network model. This approach overlooks that merely enlarging the network’s depth and width results in excessive parameterization, without enhancing the model’s effective receptive field or its ability to extract multi-scale features. Hence, this paper constructs a network model, named MS-HRNet (Multi-Scale High-Resolution Network), for human pose estimation. Specifically, we propose a more concise and efficient version of HRNet framework as the backbone network of MS-HRNet. This addresses the challenges of HRNet complex structure and large number of parameters that cause training difficulties, and its inadequacy in handling multi-scale information. Additionally, we designed a multi-scale convolutional kernel parallel module named MSBlock (Multi-Scale Block) as the basic block of MS-HRNet. By introducing coordinate attention modules and ASFF (Adaptive Spatial Feature Fusion ) modules, the model’s ability to extract information is effectively increased, and the issue of feature conflict during the fusion of features with different resolutions is resolved, with only a small increase in the number of model parameters. To evaluate the effectiveness of the proposed model, we conducted comparison experiment and ablation experiments using popular human pose estimation datasets, including COCO2017 and MPII, against multiple existing human pose estimation models.On the COCO 2017 dataset, the number of MS-HRNet parameters are decreased by 41% than the baseline model HRNet, the computational complexity by 59%, and the detection accuracies(mAP) are increased by 2.4 point.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

CBAM: Convolutional Block Attention Module

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request

References

Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards Accurate Multi-person Pose Estimation in the Wild, 4903–4911
Kocabas M, Karagoz S, Akbas E (2018) Multiposenet: Fast Multi-person Pose Estimation Using Pose Residual Network, 417–433
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields, 7291–7299
Toshev A, Szegedy C (2014) Deeppose: Human Pose Estimation via Deep Neural Networks, 1653–1660
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient Object Localization Using Convolutional Networks, 648–656
Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation, 483–499. Springer
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional Networks for Biomedical Image Segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer
Noh H, Hong S, Han B (2015) Learning Deconvolution Network for Semantic Segmentation, 1520–1528
Ige AO, Tomar NK, Aranuwa FO, Oriola O, Akingbesote AO, Noor MHM, Mazzara M, Aribisala BS (2023) Convsegnet: automated polyp segmentation from colonoscopy using context feature refinement with multiple convolutional kernel sizes. IEEE Access 11:16142–16155
Article Google Scholar
Xu J, Liu W, Xing W, Wei X (2023) Mspenet: multi-scale adaptive fusion and position enhancement network for human pose estimation. Vis Comput 39(5):2005–2019
Article Google Scholar
Sun K, Xiao B, Liu D, Wang J (2019) Deep High-Resolution Representation Learning for Human Pose Estimation, 5693–5703
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition, 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely Connected Convolutional Networks, 4700–4708
Tan M, Le Q (2019) Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks, 6105–6114. PMLR
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices, 6848–6856
Hou Q, Zhou D, Feng J (2021) Coordinate Attention for Efficient Mobile Network Design, 13713–13722
Qiao Y, Guo Y, He D (2023) Cattle body detection based on YOLOv5-ASFF for precision livestock farming. Comput Electron Agric 204:107579
Article Google Scholar
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human Pose Estimation Using Body Parts Dependent Joint Regressors, 3041–3048
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vision 61:55–79
Article Google Scholar
Newell A, Yang K, Den J (2016) Stacked Hourglass Networks for Human Pose Estimation, 483–499. Springer
Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale Structure-aware Network for Human Pose Estimation, 713–728
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context Attention for Human Pose Estimation, 1831–1840
Yue G, Li S, Cong R, Zhou T, Lei B, Wang T (2023) Attention-guided pyramid context network for polyp segmentation in colonoscopy images. IEEE Trans Instrum Meas 72:1–13
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks, 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional Block Attention Module, 3–19
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A Convnet for the 2020s, 11976–11986
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows, 10012–10022
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29
Zhu X, Cheng D, Zhang Z, Lin S, Dai J (2019) An Empirical Study of Spatial Attention Mechanisms in Deep Networks, 6688–6697
Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J (2019) Stand-alone self-attention in vision models. Adv Neural Inf Process Syst 32
Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling Local Self-attention for Parameter Efficient Visual Backbones, 12894–12904
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding?. ICML 2(3), 4
Howard A, Zhmoginov A, Chen L-C, Sandler M, Zhu M (2018) Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation, 4510–4520
Chen Y, Dai X, Chen D, Liu M, Dong X, Yuan L, Liu Z (2022) Mobile-Former: Bridging Mobilenet and Transformer, 5270–5279
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al. (2019) Searching for Mobilenetv3, 1314–1324
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft Coco: Common Objects in Context, 740–755. Springer
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, 3686–3693
Loshchilov I, Hutter F (2018) Fixing Weight Decay Regularization in Adam
Xiao B, Wu H, Wei Y (2018) Simple Baselines for Human Pose Estimation and Tracking, 466–481
Li Y, Zhang S, Wang Z, Yang S, Yang W, Xia S-T, Zhou E (2021) Tokenpose: Learning Keypoint Tokens for Human Pose Estimation, 11313–11322
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded Pyramid Network for Multi-person Pose Estimation, 7103–7112
Xiong Z, Wang C, Li Y, Luo Y, Cao Y (2022) Swin-pose: Swin Transformer Based Human Pose Estimation, 228–233. IEEE
Li Y, Liu R, Wang X, Wang R (2023) Human pose estimation based on lightweight basicblock. Mach Vis Appl 34(1):3
Article Google Scholar
Liu H, Wu J, He R (2023) Idpnet: a light-weight network and its variants for human pose estimation. J Supercomput 1–23

Download references

Acknowledgements

We thank all participants who supported our study and the reviewers for constructive suggestions on the manuscript.

Author information

Authors and Affiliations

College of Computer and Information Science, Chongqing Normal University, University City Middle Road, Chongqing, 401331, China
Yanxia Wang, Renjie Wang, Hu Shi & Dan Liu

Authors

Yanxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Renjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hu Shi
View author publications
You can also search for this author in PubMed Google Scholar
Dan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RW was responsible for the design and implementation of the experiments and the overall writing of the manuscript. YW was responsible for the review and revision of the manuscript. HS, DL, were responsible for some of the data visualization. All authors contributed to the article and approved the submitted version

Corresponding author

Correspondence to Renjie Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Wang, R., Shi, H. et al. MS-HRNet: multi-scale high-resolution network for human pose estimation. J Supercomput (2024). https://doi.org/10.1007/s11227-024-06125-6

Download citation

Accepted: 04 April 2024
Published: 25 April 2024
DOI: https://doi.org/10.1007/s11227-024-06125-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MS-HRNet: multi-scale high-resolution network for human pose estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MS-HRNet: multi-scale high-resolution network for human pose estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation