Advertisement

Hierarchical Online Multi-person Pose Tracking with Multiple Cues

  • Chuanzhi Xu
  • Yue Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11306)

Abstract

Multi-person articulated pose tracking is a newly proposed computer vision task which aims at associating corresponding person articulated joints to establish pose trajectories. In this paper, we propose a region-based deep appearance model combined with an LSTM pose model to measure the similarity between different identities. A novel hierarchical association method is proposed to reduce the time consumption for deep feature extraction. We divide the association procedure into two stages and extract deep feature only when the pairs of identities are difficult to distinguish. Extensive experiments are conducted on the newly released multi-person pose tracking benchmark: PoseTrack. The results show that the tracking accuracy gains an obvious improvement when adopting multiple association cues, and the hierarchical association method could improve the tracking speed obviously.

Keywords

Multi-person pose tracking Hierarchical association Region-based deep network LSTM pose model 

Notes

Acknowledgments

This work is supported by National High-Tech R&D Program (863 Program) under Grant 2015AA016402.

References

  1. 1.
    Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. arXiv preprint arXiv:1710.10000 (2017)
  2. 2.
    Bae, S.H., Yoon, K.J.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218–1225. IEEE Press, New York (2014)Google Scholar
  3. 3.
    Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. Eurasip J. Image Video Process. 2008(1), 246–309 (2008)Google Scholar
  4. 4.
    Emami, P., Pardalos, P.M., Elefteriadou, L., Ranka, S.: Machine learning methods for solving assignment problems in multi-target tracking. arXiv preprint arXiv:1802.06897 (2018)
  5. 5.
    Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D.: Detect-and-track: efficient pose estimation in videos. arXiv preprint arXiv:1712.09184 (2017)
  6. 6.
    Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE Press, New York (Dec 2015)Google Scholar
  7. 7.
    Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 91–102. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21227-7_9CrossRefGoogle Scholar
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735 (1997)CrossRefGoogle Scholar
  9. 9.
    Kawanishi, Y., Wu, Y., Mukunoki, M., Minoh, M.: Shinpuhkan 2014: a multi-camera pedestrian dataset for tracking people across multiple cameras. In: 20th Korea-Japan Joint Workshop on Frontiers of Computer Vision, vol. 5. Citeseer (2014)Google Scholar
  10. 10.
    Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logistics 52(1), 7–21 (2005)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Li, W., Wang, X.: Locally aligned feature transforms across views. In: Computer Vision and Pattern Recognition, pp. 3594–3601. IEEE Press, New York (2013)Google Scholar
  12. 12.
    Luo, W., et al.: Multiple object tracking: a literature review. arXiv preprint arXiv:1409.7618 (2014)
  13. 13.
    Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937. IEEE Press, New York (2016)Google Scholar
  14. 14.
    PoseTrack: Posetrack leader board. https://posetrack.net/leaderboard.php
  15. 15.
    Ren, S., Girshick, R., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  16. 16.
    Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. arXiv preprint arXiv:1701.01909, 4(5) 6 (2017)
  17. 17.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE Press, New York (2015)Google Scholar
  18. 18.
    Wang, B., et al.: Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. IEEE Press, New York (2016)Google Scholar
  19. 19.
    Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE Press, New York (2017)Google Scholar
  20. 20.
    Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. arXiv preprint arXiv:1804.06208 (2018)
  21. 21.
    Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose Flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977 (2018)
  22. 22.
    Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1290–1299. IEEE Press, New York, October 2017Google Scholar
  23. 23.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: IEEE International Conference on Computer Vision, pp. 1116–1124. IEEE Press, New York (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations