Abstract
Humans have an impressive ability to reliably perceive pose with semantic descriptions (e.g. both arm up or left leg bent). To leverage the transitive structure characteristics for human pose estimation, we explore the part descriptor that qualitatively describe the structure consistency on various appearance. Meantime, we utilize the fixed bone constraint to fully exploit structure knowledge. In this paper, we propose an effective network of jointly modeling part descriptor and bone heatmap as structure information to dynamically learn from compositional features. Specially, this part descriptor distill the structure consistency as external guidance via feature injection, and the introduced bone detection as internal guidance through multi-level feature fusion. Hence the proposed method enables the network effectively incorporating higher level structure into lower level keypoint detection models, which leads to extract more robust features for the optimal pose estimation. The effectiveness of proposed method has been evaluated on LSP, MPII, LIP, COCO and CrowdPose dataset. The experimental results demonstrate that it can outperform most of the state-of-the-art methods on the widely used benchmarks with less complexities.
Similar content being viewed by others
References
Li Q, Xie X, Zhang C, Zhang J, Shi G (2022) Detecting human-object interactions in videos by modeling the trajectory of objects and human skeleton. Neurocomputing 509:234–243. https://doi.org/10.1016/j.neucom.2022.08.008
Newell A, Yang K, Jia D (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision
Kamel A, Sheng B, Li P, Kim J, Feng DD (2021) Hybrid refinement-correction heatmaps for human pose estimation. IEEE Trans Multimedia 23:1330–1342. https://doi.org/10.1109/TMM.2020.2999181
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision pattern recognition, pp 7103–7112
Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, Yu G, Lu H, Wei Y, Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv:1901.00148
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: arXiv:1708.01101
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: CVPR
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR
Xu X, Zou Q, Lin X (2022) Cfenet: content-aware feature enhancement network for multi-person pose estimation. Appl Intell:1–22
Groos D, Ramampiaro H, Ihlen E (2020) Efficientpose: scalable single-person pose estimation. Appl Intell:1–16
Xiao S, Shang J, Shuang L, Wei Y (2017) Compositional human pose regression. In: 2017 IEEE international conference on computer vision (ICCV)
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 197–214
Ai B, Zhou Y, Yu Y, Du S (2017) Human pose estimation using deep structure guided learning. In: Applications of computer vision
Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: IEEE international conference on automatic face and gesture recognition
Li J, Su W, Wang Z (2020) Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation. In: AAAI, pp 11354–11361
Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2021) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Adv Neural Inf Process Syst, pp 1799–1807
Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Computer vision and pattern recognition
Zhang H, Ouyang H, Liu S, Qi X, Shen X, Yang R, Jia J (2019) Human pose estimation with spatial contextual information
Bin Y, Chen ZM, Wei XS, Chen X, Sang N (2020) Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit 106(1):107410
Zheng G, Wang S, Yang B (2020) Hierarchical structure correlation inference for pose estimation. Neurocomputing 404:186–197. https://doi.org/10.1016/j.neucom.2020.04.108
Dong K, Sun Y, Cheng X, Wang X, Wang B (2022) Combining detailed appearance and multi-scale representation: a structure-context complementary network for human pose estimation. Appl Intell
Yang S, Yang W, Cui Z (2022) Searching part-specific neural fabrics for human pose estimation. Pattern Recogn 128:108652. https://doi.org/10.1016/j.patcog.2022.108652
Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British machine vision conference. https://doi.org/10.5244/C.24.12
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision
Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10855–10864. https://doi.org/10.1109/CVPR.2019.01112
Rafi U, Leibe B, Gall J, Kostrikov I (2016) An efficient convolutional network for human pose estimation. In: British machine vision conference
Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: CVPR
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: European conference on computer vision (ECCV)
Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th international conference on computer vision, pp 1365–1372. https://doi.org/10.1109/ICCV.2009.5459303
Ko BC, Hong JH, Nam JY (2015) Human action recognition in still images using action poselets and a two-layer classification model. J Vis Lang Comput 28(jun.):163–175
Pons-Moll G, Fleet DJ, Rosenhahn B (2014) Posebits for monocular human pose estimation. In: 2014 IEEE conference on computer vision and pattern recognition, pp 2345–2352. https://doi.org/10.1109/CVPR.2014.300
Ning G, Zhang Z, He Z (2017) Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2017.2762010
Nie X, Feng J, Zuo Y, Yan S (2018) Human pose estimation with parsing induced learner. In: CVPR
Kim S, Seo M, Laptev I, Cho M, Kwak S (2019) Deep metric learning beyond binary supervision. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Kim S, Seo M, Laptev I, Cho M, Kwak S (2019) Deep metric learning beyond binary supervision. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2283–2292. https://doi.org/10.1109/CVPR.2019.00239
Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation. In: European conference on computer vision, pp 606–622
Geng Z, Ke Sun BXZZJW (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR
McNally W, Vats K, Wong A, McPhee J (2021) Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation. In: ECCV2022
Zhang Y, Chen W (2022) Decision-level information fusion powered human pose estimation. Appl Intell:1–12
Khirodkar R, Chari V, Agrawal A, Tyagi A (2021) Multi-instance pose networks: rethinking top-down pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3122–3131
Chu X, Ouyang W, Li H, Wang X (2016) Structured feature learning for pose estimation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4715–4723
Chu X, Ouyang W, Li H, Wang X (2016) Crf-cnn: modeling structured information in human pose estimation. Adv Neural Inf Process Syst 29:316–324
Lifshitz I, Fetaya E, Ullman S (2016) Human pose estimation using deep consensus voting. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016, pp 246–260
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)
Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Chen Y, Shen C, Wei X, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: 2017 IEEE international conference on computer vision (ICCV), pp 1221–1230. https://doi.org/10.1109/ICCV.2017.137
Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV)
Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 17–30
Funding
This work was supported by Key-Area Research and Development Program of Guangdong Province (2021B0101400002), Guangzhou Key Laboratory of Scene Understanding and Intelligent Interaction(202201000001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Y., Xie, X., Yin, W. et al. Structure guided network for human pose estimation. Appl Intell 53, 21012–21026 (2023). https://doi.org/10.1007/s10489-023-04521-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04521-8