Advertisement

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

  • Julian TankeEmail author
  • Juergen Gall
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)

Abstract

In this work we propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Estimating 3D human poses from multiple views has several compelling properties: human poses are estimated within a global coordinate space and multiple cameras provide an extended field of view which helps in resolving ambiguities, occlusions and motion blur. Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views. We utilize bipartite matching to track multiple people over multiple frames. This proofs to be especially efficient as problems associated with greedy matching such as occlusion can be easily resolved in 3D. Our approach achieves state-of-the-art results on popular benchmarks and may serve as a baseline for future work.

Notes

Acknowledgement

The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) GA 1927/5-1 (FOR 2535 Anticipating Human Behavior) and the ERC Starting Grant ARCA (677650).

References

  1. 1.
    Aa, N.v.d., Luo, X., Giezeman, G., Tan, R., Veltkamp, R.: Utrecht Multi-Person Motion (UMPM) benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: Workshop on Human Interaction in Computer Vision (2011)Google Scholar
  2. 2.
    Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: British Machine Vision Conference (2013)Google Scholar
  3. 3.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  4. 4.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures revisited: multiple human pose estimation. Trans. Pattern Anal. Mach. Intell. 38, 1929–1942 (2016) CrossRefGoogle Scholar
  5. 5.
    Bergtholdt, M., Kappes, J., Schmidt, S., Schnörr, C.: A study ofparts-based object class detection using complete graphs. Int. J. Comput. Vis. 87, 93 (2010)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: Workshop on Visual Surveillance (2006)Google Scholar
  7. 7.
    Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  8. 8.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  9. 9.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  11. 11.
    Doering, A., Iqbal, U., Gall, J.: JointFlow: temporal flow fields for multi person tracking. In: British Machine Vision Conference (2018)Google Scholar
  12. 12.
    Elhayek, A., et al.: Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In: Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  13. 13.
    Ershadi-Nasab, S., Noury, E., Kasaei, S., Sanaei, E.: Multiple human 3D poseestimation from multiview images. Multimed. Tools Appl. 77, 15573–15601 (2018)CrossRefGoogle Scholar
  14. 14.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for objectrecognition. Int. J. Comput. Vis. 61, 55–79 (2005)CrossRefGoogle Scholar
  15. 15.
    Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: Conference on Computer Vision and Pattern Recognition Workshops (2018)Google Scholar
  16. 16.
    Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. Pattern Anal. Mach. Intell. 30, 267–282 (2007)CrossRefGoogle Scholar
  17. 17.
    Guo, H., Tang, T., Luo, G., Chen, R., Lu, Y., Wen, L.: Multi-domain pose network for multi-person pose estimation and tracking. In: European Conference on Computer Vision (2018)Google Scholar
  18. 18.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  19. 19.
    Iqbal, U., Doering, A., Yasin, H., Krüger, B., Weber, A., Gall, J.: Adual-source approach for 3D human pose estimation from single images. Comput. Vis. Image Underst. 172, 37–49 (2018)CrossRefGoogle Scholar
  20. 20.
    Iqbal, U., Milan, A., Gall, J.: PoseTrack: joint multi-person pose estimation and tracking. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  21. 21.
    Iqbal, U., Molchanov, P., Breuel Jürgen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Proceedings of the European Conference on Computer Vision (2018)Google Scholar
  22. 22.
    Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: International Conference on Computer Vision (2015)Google Scholar
  23. 23.
    Kazemi, V., Burenius, M., Azizpour, H., Sullivan, J.: Multi-view body part recognition with random forests. In: British Machine Vision Conference (2013)Google Scholar
  24. 24.
    Kocabas, M., Karagoz, S., Akbas, E.: MultiPoseNet: fast multi-person pose estimation using pose residual network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 437–453. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_26CrossRefGoogle Scholar
  25. 25.
    Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. In: British Machine Vision Conference (2014)Google Scholar
  26. 26.
    Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  27. 27.
    Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: International Conference on Computer Vision (2017)Google Scholar
  28. 28.
    Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: International Conference on 3D Vision (2017)Google Scholar
  29. 29.
    Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: International Conference on 3D Vision (2018)Google Scholar
  30. 30.
    Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5, 32–38 (1957)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: End-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  32. 32.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  33. 33.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  34. 34.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3D human pose annotations. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  35. 35.
    Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  36. 36.
    Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. Trans. Pattern Anal. Mach. Intell. (2019)Google Scholar
  37. 37.
    Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: International Conference on Computer Vision (2011)Google Scholar
  38. 38.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  39. 39.
    Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_29CrossRefGoogle Scholar
  40. 40.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: International Conference on Computer Vision (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of BonnBonnGermany

Personalised recommendations