Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views
- 2 Citations
- 783 Downloads
Abstract
In this work we propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Estimating 3D human poses from multiple views has several compelling properties: human poses are estimated within a global coordinate space and multiple cameras provide an extended field of view which helps in resolving ambiguities, occlusions and motion blur. Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views. We utilize bipartite matching to track multiple people over multiple frames. This proofs to be especially efficient as problems associated with greedy matching such as occlusion can be easily resolved in 3D. Our approach achieves state-of-the-art results on popular benchmarks and may serve as a baseline for future work.
Notes
Acknowledgement
The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) GA 1927/5-1 (FOR 2535 Anticipating Human Behavior) and the ERC Starting Grant ARCA (677650).
References
- 1.Aa, N.v.d., Luo, X., Giezeman, G., Tan, R., Veltkamp, R.: Utrecht Multi-Person Motion (UMPM) benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: Workshop on Human Interaction in Computer Vision (2011)Google Scholar
- 2.Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: British Machine Vision Conference (2013)Google Scholar
- 3.Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
- 4.Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures revisited: multiple human pose estimation. Trans. Pattern Anal. Mach. Intell. 38, 1929–1942 (2016) CrossRefGoogle Scholar
- 5.Bergtholdt, M., Kappes, J., Schmidt, S., Schnörr, C.: A study ofparts-based object class detection using complete graphs. Int. J. Comput. Vis. 87, 93 (2010)MathSciNetCrossRefGoogle Scholar
- 6.Bernardin, K., Elbs, A., Stiefelhagen, R.: Multiple object tracking performance metrics and evaluation in a smart room environment. In: Workshop on Visual Surveillance (2006)Google Scholar
- 7.Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
- 8.Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
- 9.Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
- 10.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
- 11.Doering, A., Iqbal, U., Gall, J.: JointFlow: temporal flow fields for multi person tracking. In: British Machine Vision Conference (2018)Google Scholar
- 12.Elhayek, A., et al.: Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In: Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
- 13.Ershadi-Nasab, S., Noury, E., Kasaei, S., Sanaei, E.: Multiple human 3D poseestimation from multiview images. Multimed. Tools Appl. 77, 15573–15601 (2018)CrossRefGoogle Scholar
- 14.Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for objectrecognition. Int. J. Comput. Vis. 61, 55–79 (2005)CrossRefGoogle Scholar
- 15.Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: Conference on Computer Vision and Pattern Recognition Workshops (2018)Google Scholar
- 16.Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. Pattern Anal. Mach. Intell. 30, 267–282 (2007)CrossRefGoogle Scholar
- 17.Guo, H., Tang, T., Luo, G., Chen, R., Lu, Y., Wen, L.: Multi-domain pose network for multi-person pose estimation and tracking. In: European Conference on Computer Vision (2018)Google Scholar
- 18.Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
- 19.Iqbal, U., Doering, A., Yasin, H., Krüger, B., Weber, A., Gall, J.: Adual-source approach for 3D human pose estimation from single images. Comput. Vis. Image Underst. 172, 37–49 (2018)CrossRefGoogle Scholar
- 20.Iqbal, U., Milan, A., Gall, J.: PoseTrack: joint multi-person pose estimation and tracking. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
- 21.Iqbal, U., Molchanov, P., Breuel Jürgen Gall, T., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Proceedings of the European Conference on Computer Vision (2018)Google Scholar
- 22.Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: International Conference on Computer Vision (2015)Google Scholar
- 23.Kazemi, V., Burenius, M., Azizpour, H., Sullivan, J.: Multi-view body part recognition with random forests. In: British Machine Vision Conference (2013)Google Scholar
- 24.Kocabas, M., Karagoz, S., Akbas, E.: MultiPoseNet: fast multi-person pose estimation using pose residual network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 437–453. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_26CrossRefGoogle Scholar
- 25.Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. In: British Machine Vision Conference (2014)Google Scholar
- 26.Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
- 27.Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: International Conference on Computer Vision (2017)Google Scholar
- 28.Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: International Conference on 3D Vision (2017)Google Scholar
- 29.Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: International Conference on 3D Vision (2018)Google Scholar
- 30.Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5, 32–38 (1957)MathSciNetCrossRefGoogle Scholar
- 31.Newell, A., Huang, Z., Deng, J.: Associative embedding: End-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems (2017)Google Scholar
- 32.Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
- 33.Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
- 34.Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3D human pose annotations. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
- 35.Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
- 36.Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. Trans. Pattern Anal. Mach. Intell. (2019)Google Scholar
- 37.Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: International Conference on Computer Vision (2011)Google Scholar
- 38.Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
- 39.Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29CrossRefGoogle Scholar
- 40.Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: International Conference on Computer Vision (2015)Google Scholar