Advertisement

Multi-person Pose Estimation for Pose Tracking with Enhanced Cascaded Pyramid Network

  • Dongdong Yu
  • Kai Su
  • Jia Sun
  • Changhu WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

Multi-person pose estimation is a fundamental yet challenging task in machine learning. In parallel, recent development of pose estimation has increased interests on pose tracking in recent years. In this work, we propose an efficient and powerful method to locate and track human pose. Our proposed method builds upon the state-of-the-art single person pose estimation system (Cascaded Pyramid Network), and adopts the IOU-tracker module to identify the people in the wild. We conduct experiments on the released multi-person video pose estimation benchmark (PoseTrack2018) to validate the effectiveness of our network. Our model achieves an accuracy of 80.9% on the validation and 77.1% on the test set using the Mean Average Precision (MAP) metric, an accuracy of 64.0% on the validation and 57.4% on the test set using the Multi-Object Tracking Accuracy (MOTA) metric.

Keywords

Pose estimation Pose tracking 

References

  1. 1.
    PoseTrack 2018 Challenge: PoseTrack challenge 2018 dataset. https://posetrack.net/
  2. 2.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, vol. 1, p. 7 (2017)Google Scholar
  3. 3.
    AI challenger: AI challenger dataset. https://challenger.ai/
  4. 4.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. arXiv preprint arXiv:1711.07319 (2017)
  5. 5.
    Dai, J., et al.: Deformable convolutional networks. CoRR, abs/1703.06211 1(2), 3 (2017)Google Scholar
  6. 6.
    Dix, A.: Human-computer interaction. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1327–1331. Springer, Boston (2009).  https://doi.org/10.1007/978-0-387-39940-9_192CrossRefGoogle Scholar
  7. 7.
    Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D.: Detect-and-track: efficient pose estimation in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 350–359 (2018)Google Scholar
  8. 8.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  9. 9.
    Insafutdinov, E., et al.: Arttrack: articulated multi-person tracking in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 4327. IEEE (2017)Google Scholar
  10. 10.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  11. 11.
    MPII: Mpii human pose dataset. http://human-pose.mpi-inf.mpg.de/
  12. 12.
    MS-COCO: Coco keypoint leaderboard. http://cocodataset.org/
  13. 13.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp. 2274–2284 (2017)Google Scholar
  14. 14.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  15. 15.
    Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: CVPR, vol. 3, p. 6 (2017)Google Scholar
  16. 16.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)Google Scholar
  17. 17.
    Singh, B., Najibi, M., Davis, L.S.: SNIPER: efficient multi-scale training. arXiv preprint arXiv:1805.09300 (2018)
  18. 18.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  19. 19.
    Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 915–922. IEEE (2013)Google Scholar
  20. 20.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)Google Scholar
  21. 21.
    Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977 (2018)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ByteDance AI LabBeijingChina
  2. 2.MOE Key Laboratory of Computer Network and Information IntegrationSoutheast UniversityNanjingChina

Personalised recommendations