Skip to main content
Log in

Structure guided network for human pose estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Humans have an impressive ability to reliably perceive pose with semantic descriptions (e.g. both arm up or left leg bent). To leverage the transitive structure characteristics for human pose estimation, we explore the part descriptor that qualitatively describe the structure consistency on various appearance. Meantime, we utilize the fixed bone constraint to fully exploit structure knowledge. In this paper, we propose an effective network of jointly modeling part descriptor and bone heatmap as structure information to dynamically learn from compositional features. Specially, this part descriptor distill the structure consistency as external guidance via feature injection, and the introduced bone detection as internal guidance through multi-level feature fusion. Hence the proposed method enables the network effectively incorporating higher level structure into lower level keypoint detection models, which leads to extract more robust features for the optimal pose estimation. The effectiveness of proposed method has been evaluated on LSP, MPII, LIP, COCO and CrowdPose dataset. The experimental results demonstrate that it can outperform most of the state-of-the-art methods on the widely used benchmarks with less complexities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Li Q, Xie X, Zhang C, Zhang J, Shi G (2022) Detecting human-object interactions in videos by modeling the trajectory of objects and human skeleton. Neurocomputing 509:234–243. https://doi.org/10.1016/j.neucom.2022.08.008

    Article  Google Scholar 

  2. Newell A, Yang K, Jia D (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision

  3. Kamel A, Sheng B, Li P, Kim J, Feng DD (2021) Hybrid refinement-correction heatmaps for human pose estimation. IEEE Trans Multimedia 23:1330–1342. https://doi.org/10.1109/TMM.2020.2999181

    Article  Google Scholar 

  4. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision pattern recognition, pp 7103–7112

  5. Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, Yu G, Lu H, Wei Y, Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv:1901.00148

  6. Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: arXiv:1708.01101

  7. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: CVPR

  8. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR

  9. Xu X, Zou Q, Lin X (2022) Cfenet: content-aware feature enhancement network for multi-person pose estimation. Appl Intell:1–22

  10. Groos D, Ramampiaro H, Ihlen E (2020) Efficientpose: scalable single-person pose estimation. Appl Intell:1–16

  11. Xiao S, Shang J, Shuang L, Wei Y (2017) Compositional human pose regression. In: 2017 IEEE international conference on computer vision (ICCV)

  12. Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 197–214

  13. Ai B, Zhou Y, Yu Y, Du S (2017) Human pose estimation using deep structure guided learning. In: Applications of computer vision

  14. Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: IEEE international conference on automatic face and gesture recognition

  15. Li J, Su W, Wang Z (2020) Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation. In: AAAI, pp 11354–11361

  16. Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2021) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257

    Article  Google Scholar 

  17. Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Adv Neural Inf Process Syst, pp 1799–1807

  18. Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Computer vision and pattern recognition

  19. Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019) Human pose estimation with spatial contextual information

  20. Bin Y, Chen ZM, Wei XS, Chen X, Sang N (2020) Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit 106(1):107410

    Article  Google Scholar 

  21. Zheng G, Wang S, Yang B (2020) Hierarchical structure correlation inference for pose estimation. Neurocomputing 404:186–197. https://doi.org/10.1016/j.neucom.2020.04.108

    Article  Google Scholar 

  22. Dong K, Sun Y, Cheng X, Wang X, Wang B (2022) Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation. Appl Intell

  23. Yang S, Yang W, Cui Z (2022) Searching part-specific neural fabrics for human pose estimation. Pattern Recogn 128:108652. https://doi.org/10.1016/j.patcog.2022.108652

    Article  Google Scholar 

  24. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British machine vision conference. https://doi.org/10.5244/C.24.12

  25. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  26. Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  27. Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision

  28. Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10855–10864. https://doi.org/10.1109/CVPR.2019.01112

  29. Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: British machine vision conference

  30. Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: CVPR

  31. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: European conference on computer vision (ECCV)

  32. Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th international conference on computer vision, pp 1365–1372. https://doi.org/10.1109/ICCV.2009.5459303

  33. Ko BC, Hong JH, Nam JY (2015) Human action recognition in still images using action poselets and a two-layer classification model. J Vis Lang Comput 28(jun.):163–175

    Article  Google Scholar 

  34. Pons-Moll G, Fleet DJ, Rosenhahn B (2014) Posebits for monocular human pose estimation. In: 2014 IEEE conference on computer vision and pattern recognition, pp 2345–2352. https://doi.org/10.1109/CVPR.2014.300

  35. Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2017.2762010

  36. Nie X, Feng J, Zuo Y, Yan S (2018) Human pose estimation with parsing induced learner. In: CVPR

  37. Kim S, Seo M, Laptev I, Cho M, Kwak S (2019) Deep metric learning beyond binary supervision. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  38. Kim S, Seo M, Laptev I, Cho M, Kwak S (2019) Deep metric learning beyond binary supervision. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2283–2292. https://doi.org/10.1109/CVPR.2019.00239

  39. Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation. In: European conference on computer vision, pp 606–622

  40. Geng Z, Ke Sun BXZZJW (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR

  41. McNally W, Vats K, Wong A, McPhee J (2021) Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation. In: ECCV2022

  42. Zhang Y, Chen W (2022) Decision-level information fusion powered human pose estimation. Appl Intell:1–12

  43. Khirodkar R, Chari V, Agrawal A, Tyagi A (2021) Multi-instance pose networks: rethinking top-down pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3122–3131

  44. Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4715–4723

  45. Chu X, Ouyang W, Li H, Wang X (2016) Crf-cnn: modeling structured information in human pose estimation. Adv Neural Inf Process Syst 29:316–324

    Google Scholar 

  46. Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, pp 246–260

  47. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)

  48. Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  49. Chen Y, Shen C, Wei X, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: 2017 IEEE international conference on computer vision (ICCV), pp 1221–1230. https://doi.org/10.1109/ICCV.2017.137

  50. Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV)

  51. Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 17–30

Download references

Funding

This work was supported by Key-Area Research and Development Program of Guangdong Province (2021B0101400002), Guangzhou Key Laboratory of Scene Understanding and Intelligent Interaction(202201000001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuemei Xie.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Xie, X., Yin, W. et al. Structure guided network for human pose estimation. Appl Intell 53, 21012–21026 (2023). https://doi.org/10.1007/s10489-023-04521-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04521-8

Keywords

Navigation