Advertisement

Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation

  • Srinika SelvamEmail author
  • Deepak MishraEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11941)

Abstract

In this paper, we propose attention maps at various scales on multi-resolution feature extractor baseline network for human pose estimation. The baseline network captures information across various scales with the help of repeated bottom-up and top-down approach using successive pooling and up-sampling. We propose a network named Refinement Net for regressing the predicted heatmaps to 2D joint locations to remove ambiguities in predicted position. We experiment with three levels of attention schemes - global, heatmap and multi-resolution. Attention masks helps in generating basin of attraction that helps the network on deciding where to “look”. The proposed network performance is at par with the state-of-the-art two dimensional pose estimation methods on MPII dataset.

Keywords

Human pose estimation Multi-resolution Attention maps 

References

  1. 1.
    Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: CVPR (2018)Google Scholar
  2. 2.
    Chen, Y., Zhao, D., Lv, L., Li, C.: A visual attention based convolutional neural network for image classification. In: 2016 WCICA (2016)Google Scholar
  3. 3.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: CVPR (2018)Google Scholar
  4. 4.
    Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation. In: 2018 APSIPA ASC (2018)Google Scholar
  5. 5.
    Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: CVPR (2017)Google Scholar
  6. 6.
    Guo, C., Du, W., Ying, N.: Multi-scale stacked hourglass network for human pose estimation (2018)Google Scholar
  7. 7.
    Hara, K., Liu, M.Y., Tuzel, O., Farahmand, A.M.: Attentional network for visual object detection. arXiv preprint arXiv:1702.01478 (2017)
  8. 8.
    Huang, F., Zeng, A., Liu, M., Qin, J., Xu, Q.: Structure-aware 3D hourglass network for hand pose estimation from single depth image. arXiv preprint arXiv:1812.10320 (2018)
  9. 9.
    Insafutdinov, E., et al.: Arttrack: articulated multi-person tracking in the wild. In: CVPR (2017)Google Scholar
  10. 10.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  11. 11.
    Li, L., Tang, S., Deng, L., Zhang, Y., Tian, Q.: Image caption with global-local attention. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)Google Scholar
  12. 12.
    Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 246–260. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_16CrossRefGoogle Scholar
  13. 13.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  14. 14.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)Google Scholar
  15. 15.
    Sun, G., Ye, C., Wang, K.: Focus on what’s important: self-attention model for human pose estimation. arXiv preprint arXiv:1809.08371 (2018)
  16. 16.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation (2014)Google Scholar
  17. 17.
    Wang, W., Shen, J.: Deep visual attention prediction. IEEE Trans. Image Process. 27, 2368–2378 (2018)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  19. 19.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)Google Scholar
  20. 20.
    Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: ICCV (2017)Google Scholar
  21. 21.
    You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016)Google Scholar
  22. 22.
    Zhang, D.Z., Liu, C.C.: A visual attention based object detection model beyond top-down and bottom-up mechanism. In: ITM Web of Conferences (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Indian Institute of Space Science and TechnologyThiruvananthapuramIndia

Personalised recommendations