Skip to main content
Log in

TSNet : Tree structure network for human pose estimation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Multi-person pose estimation in natural scenes has been a hot topic in the recent years. The prediction speed of the top-down methods is affected by the number of people in the scene, so the bottom-up methods has an advantage in natural scenes. However, the study found that the accuracy of human margin joints (the joints farther from the center of the human, such as wrist and ankle) is always lower than that of the joints that are closer to the center of the human (such as shoulder and hip), and the accuracy gap between joints categories is large. Inspiring from the structural characteristics of human body, this paper proposes a tree structure network (TSNet) for human pose estimation, which divides the joints of the human into several levels according to the characteristics of human body structure, and stepwise predicts the joints from human center to human margin. Combining with the global features, the joint features of the next layer are predicted by extracting the correlation between the joint features of the current layer and the joint features of the previous layer. Therefore, each human joint contains not only the joint information of the current layer and the joint information of the previous layer, but also the background information. The experiment results show that this method can effectively alleviate the uneven precision of joints, and the TSNet can effectively improve the accuracy of lower body joints by setting different activation values for different joints. Extensive experiments on MPII datasets demonstrate the effectiveness of our proposed model and method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C. Mur phy, K.: Towards Accurate Multi-person Pose Estimation in the Wild, in: Proceedings of the CVPR, (2017), pp. 3711-3719

  2. Fang, H., Xie, S., Tai, Y., Lu, C.: RMPE: Regional Multi-person Pose Estimation, in:Proceedings of the ICCV, (2017), pp. 2353-2362

  3. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded Pyramid Network for Multi-person Pose Estimation, in: Proceedings of the CVPR, (2018), pp. 7103-7112

  4. Li, W., Wang, Z., Yin, B., Peng, Q., Du, Y., Xiao, T., Yu, G., Lu, H., Wei, Y., Sun, J.: R ethinking on Multi-Stage Networks for Human Pose Estimation, CoRR abs/1901.0 0148 (2019)

  5. Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-Person Pose Estimation With Enha nced Channel-Wise and Spatial Information, in: Proceedings of the CVPR, (2019), pp. 5667-5675

  6. Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional Pose Machines, in: Proceedings of the CVPR, 2016, pp. 4724-4732

  7. Liang, S., Sun, X., Wei, Y.: Compositional Human Pose Regression, in: Proceedings of the ICCV, (2017), pp. 2621-2630

  8. Liu, W., Chen, J., Li, C., Qian, C., Chu, X., Hu, X.: A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation, in: Proceedings of the AAAI, (2018), pp. 7170-7177

  9. Tang, W., Yu, P., Wu, Y.: Deeply Learned Compositional Models for Human Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 197-214

  10. Ke, L., Chang, M.-C., Qi, H., Lyu, S.: Multi-Scale Structure-Aware Network for Hu man Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 731-746

  11. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, in: Proceedings of the CVPR, (2016), pp. 4929-4937

  12. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model, in: Proceedings of the ECCV, (2016), pp. 34-50

  13. Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.: ArtTrack: Articulated Multi-Person Tracking in the Wild, in: Proceedings of the CVPR, (2017), pp. 1293-1301

  14. Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields, in: Proceedings of the CVPR, (2017), pp. 1302-1310

  15. Newell, A., Huang, Z., Deng, J.: Associative Embedding: End-to-End Learning for Joint Detection and Grouping, in: Proceedings of the NIPS, (2017), pp. 2274–2284

  16. Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: Composite Fields for Human Pose Estimation, in: Proceedings of the CVPR, (2019), pp. 11969-11978

  17. Nie, X., Feng, J., Xing, J., Yan, S.: Pose Partition Networks for Multi-person Pose Estimation, in: Proceedings of the ECCV, (2018), pp. 705-720

  18. XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking, in: Proceedings of the European conference on computer vision (ECCV). (2018): 466–481

  19. Cheng, Bowen., Wei, Yunchao., Shi, Honghui., Feris, Rogerio., Xiong, Jinjun., Huang, Thomas.: Decoupled classification20refinement: Hard false positive suppression for object detection. arXiv preprint arXiv:1810.04002, (2018). 2

  20. Cheng, Bowen., Wei, Yunchao., Shi, Honghui., Feris, Rogerio., Xiong, Jinjun., Huang, Thomas.: Revisiting rcnn: On awakening the classification power of faster rcnn. In ECCV, (2018).2

  21. Ren, Shaoqing., He, Kaiming., Girshick, Ross., Sun, Jian.: Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, (2015). 2

  22. Lin, Tsung-Yi., Doll’ar, Piotr, Girshick, Ross, He, Kaiming, Hariharan, Bharath, Belongie, Serge: Feature pyramid networks for object detection. CVPR 2(3), 5 (2017)

  23. Liu, Wei., Anguelov, Dragomir., Erhan, Dumitru., Szegedy, Christian., Reed, Scott., Fu, Cheng-Yang., CBerg, Alexander.: Ssd: Single shot multibox detector. In ECCV, (2016). 3

  24. He, Kaiming., Gkioxari, Georgia., Doll’ar, Piotr., Girshick, Ross.: Mask r-cnn. In ICCV, (2017)

  25. Redmon J., Divvala, S., Girshick, R., et al.: You Only Look Once: Unified, Real-Time Object Detection[C]// Computer Vision & Pattern Recognition. IEEE, (2016)

  26. Andriluka, M.., Roth, S.., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation, in: Proceedings of the CVPR, (2009), pp. 1014-1021

  27. Sun, M., Kohli, P., Shotton, J.: Conditional regression forests for human pose estimation, in: Proceedings of the CVPR, (2012), pp. 3394-3401

  28. Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet Conditioned Pictorial Structures, in: Proceedings of the CVPR, (2013), pp. 588-595

  29. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep High-Resolution Representation Learning for Human Pose Estimation, in: Proceedings of the CVPR, (2019), pp. 5686-5696

  30. oshev, A. T., Szegedy, C.: DeepPose: Human Pose Estimation via Deep Neural Net works, in: Proceedings of the CVPR, (2014), pp. 1653-1660

  31. Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the ECCV, (2016), pp. 483-499

  32. Papandreou, George., Zhu, Tyler., Chen, Liang chieh., Gidaris, Spyros., Tompson, Jonathan., Murphy, Kevin.: Personlab: Person pose estimation and instance segmentation with a part-based geometric embedding model. In ECCV, (2018).1, 2, 5, 6

  33. ZHU, X., JIANG, Y., LUO, Z.: Multi-person pose estimation for posetrack with enhanced part affinity fields[C]//ICCV PoseTrack Workshop. (2017), 7

  34. ZHANG, H., OUYANG, H., LIU, S.: ff. Human pose estimation with spatial contextual information[J]. arXiv preprint arXiv:1901.01760, (2019)

  35. Luo, Y., Xu, Z., Liu, P., Du, Y., Guo, J.: Multi-Person Pose Estimation via Multi-Layer Fractal Network and Joints Kinship Pattern. TIP 28, 142–155 (2019)

    MathSciNet  MATH  Google Scholar 

  36. Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T., Zhang, L.: HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation, in: Proceedings of the CVPR, (2020), pp. 5386-5395

  37. Nie, X., Feng, J., Zhang, J., Yan, S.: Single-Stage Multi-Person Pose Machines, in: Proceedings of the ICCV, (2019), pp. 6950-6959

  38. Chen, X., Yang, G.: Multi-Person Pose Estimation with LIMB Detection Heatmaps[C]// 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, (2018)

  39. Zhang, F., Zhu, X., Dai, H., et al.: Distribution-aware coordinate representation for human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2020): 7093-7102

  40. Zhang, Zhiqian, Luo, Yanmin, Gou, Jin: Double anchor embedding for accurate multi-person 2D pose estimation[J]. Image and Vision Computing 111(1), 104198 (2021)

    Article  Google Scholar 

  41. Ou, Zhilong., Luo, YanMin., Chen, Jin., Chen, Geng.: SRFNet: selective receptive field network for human pose estimation.J Supercomputing (2021). https://doi.org/10.1007/s11227-021-03889-z

  42. BULAT, A., TZIMIROPOULOS, G.: Human pose estimation via convolutional part heatmap regression[C]//European Conference on Computer Vision. Springer, (2016): 717–732

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Fujian Province, China under grant 2020J01082, and in part by the Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by the National Natural Science Foundation of China under Grant 61901183

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to YanMin Luo.

Ethics declarations

Funding

The Natural Science Foundation of Fujian Province, China under grant 2020J01082, and in part by The Science and Technology Bureau of Quanzhou under Grant 2018C113R, and in part by the National Natural Science Foundation of China under Grant 61901183.

Conflicts of interest

There are no conflicts of interest.

Availability of data and material

The data comes from the common dataset

Code availability

Custom code

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, T., Luo, Y., Zhang, Z. et al. TSNet : Tree structure network for human pose estimation. SIViP 16, 551–558 (2022). https://doi.org/10.1007/s11760-021-01999-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01999-y

Keywords

Navigation