Skip to main content
Log in

Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Hard-joint localization in human pose estimation is a challenging task for some reasons, such as the disappearance of joint points caused by clothing and lighting, the shelter caused by complex environment and the destruction of dependence among each joint point. A majority of existing approaches for hard-joint pose estimation achieve high accuracy by obtaining more high-level feature information. However, most networks suffer from information loss, which is caused by down-sampling. This would result in the loss of joint location. The compensation of information loss introduces useless information to network learning, affecting the extraction of useful information associated with hard joints. Herein, a residual down-sampling module is proposed to replace the pooling layer for down-sampling and fuse high-level features with low-resolution feature maps. This module aims to address the information loss issue. A strategy to guide network learning based on the attention mechanism is proposed, which makes the network focus on useful feature information. A convolutional block attention module is combined with a residual module outside the basic sub-network. The network can learn more effective high-level features. An eight-stack hourglass is used as the basic network, and the proposed method is validated on the MPII and LSP Human Pose dataset. Compared with eight-stack hourglass and HRNet, the proposed method achieves higher accuracy for hard-joint localization. The experimental results show our proposed methods effective for hard-joint localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available in the MPII Human Pose Dataset, http://human-pose.mpi-inf.mpg.de/.

References

  1. Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)

  2. Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10863–10872 (2019)

  3. Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2334–2343 (2017)

  4. Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017)

  5. Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1107–1116 (2019)

  6. Sypetkowski, M., Sarwas, G., Trzcinski, T.: Synthetic image translation for football players pose estimation. J. UCS 25(6), 683–700 (2019)

    Google Scholar 

  7. Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Lecture Notes in Computer Science Proceedings of the 11th European Conference on Computer Vision: Part II, pp. 406–420 (2010)

  8. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  9. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014). https://doi.org/10.1109/CVPR.2014.214

  10. Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2019)

  11. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)

  12. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

  13. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)

  14. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Lecture Notes in Computer Science European Conference on Computer Vision. Springer, Cham, pp. 483–499 (2016)

  15. Newell, A., Huang, Z., Deng, J.: Associative embedding: end-toend learning for joint detection and grouping. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2277–2287 (2017)

  16. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp. 4278–4284 (2016)

  17. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)

  19. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  20. Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019)

    Article  Google Scholar 

  21. Jiang, T., Zhang, Z., Yang, Y.: Modeling coverage with semantic embedding for image caption generation. Vis. Comput. 35(11), 1655–1665 (2019)

    Article  Google Scholar 

  22. Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial transformer networks. Adv. Neural. Inf. Process. Syst. 28, 2017–2025 (2015)

    Google Scholar 

  23. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Lecture Notes in Computer Science Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018)

  24. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1831–1840 (2017)

  25. Su, K., Yu, D., Xu, Z., Geng, X., Wang, C.: Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5674–5682 (2019)

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). arXiv:1512.03385

  27. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp: 3686–3693 (2014)

  28. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, no. 4, p. 5 (2010)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Qiaoning Yang: Conceptualization, Methodology, Writing and reviewing and editing; Weimin Shi: Software, Data curation, Writing original draft; Juan Chen: Supervision and Validation; Yang Tang: Data preprocessing.

Corresponding author

Correspondence to Qiaoning Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 40676 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Q., Shi, W., Chen, J. et al. Localization of hard joints in human pose estimation based on residual down-sampling and attention mechanism. Vis Comput 38, 2447–2459 (2022). https://doi.org/10.1007/s00371-021-02122-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02122-5

Keywords

Navigation