A unified multi-view multi-person tracking framework

Yang, Fan; Odashima, Shigeyuki; Yamao, Sosuke; Fujimoto, Hiroaki; Masui, Shoichi; Jiang, Shan

doi:10.1007/s41095-023-0334-8

A unified multi-view multi-person tracking framework

Research Article
Open access
Published: 30 November 2023

Volume 10, pages 137–160, (2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

A unified multi-view multi-person tracking framework

Download PDF

Fan Yang¹,
Shigeyuki Odashima¹,
Sosuke Yamao¹,
Hiroaki Fujimoto¹,
Shoichi Masui¹ &
…
Shan Jiang¹

686 Accesses
1 Citation
Explore all metrics

Abstract

Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.

Article PDF

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

Article 08 July 2022

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

Article Open access 07 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Black, J.; Ellis, T. Multi camera image tracking. Image and Vision Computing Vol. 24, No. 11, 1256–1267, 2006.
Article Google Scholar
Sternig, S.; Mauthner, T.; Irschara, A.; Roth, P. M.; Bischof, H. Multi-camera multi-object tracking by robust hough-based homography projections. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 1689–1696, 2011.
He, Y. H.; Wei, X.; Hong, X. P.; Shi, W. W.; Gong, Y. H. Multi-target multi-camera tracking by tracklet-to-target assignment. IEEE Transactions on Image Processing Vol. 29, 5191–5205, 2020.
Article MathSciNet Google Scholar
Chen, H.; Guo, P. F.; Li, P. F.; Lee, G. H.; Chirikjian, G. Multi-person 3D pose estimation in crowded scenes based on multi-view geometry. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12348. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 541–557, 2020.
Google Scholar
Chen, L.; Ai, H. Z.; Chen, R.; Zhuang, Z. J.; Liu, S. Cross-view tracking for multi-human 3D pose estimation at over 100 FPS. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3276–3285, 2020.
Ohashi, T.; Ikegami, Y.; Nakamura, Y. Synergetic reconstruction from 2D pose and 3D motion for wide-space multi-person video motion capture in the wild. Image and Vision Computing Vol. 104, 104028, 2020.
Article Google Scholar
Dong, J. T.; Fang, Q.; Jiang, W.; Yang, Y. R.; Huang, Q. X.; Bao, H. J.; Zhou, X. W. Fast and robust multi-person 3D pose estimation and tracking from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 10, 6981–6992, 2022.
Article Google Scholar
Zhang, Y. F.; Wang, C. Y.; Wang, X. G.; Liu, W. Y.; Zeng, W. J. VoxelTrack: Multi-person 3D human pose estimation and tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 45, No. 2, 2613–2626, 2023.
Article Google Scholar
Wen, L. Y.; Lei, Z.; Chang, M. C.; Qi, H. G.; Lyu, S. W. Multi-camera multi-target tracking with space-timeview hyper-graph. International Journal of Computer Vision Vol. 122, No. 2, 313–333, 2017.
Article MathSciNet Google Scholar
Köhl, P.; Specker, A.; Schumann, A.; Beyerer, J. The MTA dataset for multi target multi camera pedestrian tracking by weighted distance aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 4489–4498, 2020.
Canton-Ferrer, C.; Casas, J. R.; Pardàs, M. Towards a Bayesian approach to robust finding correspondences in multiple view geometry environments. In: Computational Science–ICCS 2005. Lecture Notes in Computer Science, Vol. 3515. Sunderam, V. S.; van Albada, G. D.; Sloot, P. M. A.; Dongarra, J. J. Eds. Springer Berlin Heidelberg, 281–289, 2005.
Google Scholar
Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 267–282, 2008.
Article Google Scholar
Belagiannis, V.; Amin, S.; Andriluka, M.; Schiele, B.; Navab, N.; Ilic, S. 3D pictorial structures revisited: Multiple human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 1929–1942, 2016.
Article Google Scholar
Yang, F.; Wang, Z.; Wu, Y.; Sakti, S.; Nakamura, S. Tackling multiple object tracking with complicated motions—Re-designing the integration of motion and appearance. Image and Vision Computing Vol. 124, 104514, 2022.
Article Google Scholar
Zeng, F. G.; Dong, B.; Zhang, Y. A.; Wang, T. C.; Zhang, X. Y.; Wei, Y. C. MOTR: End-to-end multiple-object tracking with transformer. In: Computer Vision - ECCV 2022. Lecture Notes in Computer Science, Vol. 13687. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 659–675, 2022.
Chapter Google Scholar
Zhou, X. Y.; Yin, T. W.; Koltun, V.; Krähenbühl, P. Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8761–8770, 2022.
Du, Y. H.; Zhao, Z. C.; Song, Y.; Zhao, Y. Y.; Su, F.; Gong, T.; Meng, H. Y. StrongSORT: Make DeepSORT great again. arXiv preprint arXiv:2202.13514, 2022.
Yang, F.; Odashima, S.; Masui, S.; Jiang, S. Hard to track objects with irregular motions and similar appearances? Make it easier by buffering the matching space. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4788–4797, 2023.
Giancola, S.; Cioppa, A.; Deliège, A.; Magera, F.; Somers, V.; Kang, L.; Zhou, X.; Barnich, O.; De Vleeschouwer, C.; Alahi, A.; et al. SoccerNet 2022 challenges results. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, 75–86, 2022.
Dong, J. T.; Jiang, W.; Huang, Q. X.; Bao, H. J.; Zhou, X. W. Fast and robust multi-person 3D pose estimation from multiple views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7784–7793, 2019.
Leal-Taixé, L.; Pons-Moll, G.; Rosenhahn, B. Branch-and-price global optimization for multi-view multitarget tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1987–1994, 2012.
Zhang, Y. X.; An, L.; Yu, T.; Li, X.; Li, K.; Liu, Y. B. 4D association graph for realtime multi-person motion capture using multiple video cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1321–1330, 2020.
Bewley, A.; Ge, Z. Y.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In: Proceedings of the IEEE International Conference on Image Processing, 3464–3468, 2016.
Roberts, S. J. Parametric and non-parametric unsuper-vised cluster analysis. Pattern Recognition Vol. 30, No. 2, 261–272, 1997.
Article MathSciNet Google Scholar
Andrew, A. M. Multiple view geometry in computer vision. Kybernetes Vol. 30, Nos. 9/10, 1333–1341, 2001.
Article Google Scholar
Fischler, M. A.; Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM Vol. 24, No. 6, 381–395, 1981.
Article MathSciNet Google Scholar
Iskakov, K.; Burkov, E.; Lempitsky, V.; Malkov, Y. Learnable triangulation of human pose. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7717–7726, 2019.
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Article Google Scholar
Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. WIREs Data Mining and Knowledge Discovery Vol. 2, No. 1, 86–97, 2012.
Article Google Scholar
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In: Proceedings of the IEEE International Conference on Image Processing, 3645–3649, 2017.
Chavdarova, T.; Baqué, P.; Bouquet, S.; Maksai, A.; Jose, C.; Bagautdinov, T.; Lettry, L.; Fua, P.; Van Gool, L.; Fleuret, F. WILDTRACK: A multi-camera HD dataset for dense unscripted pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5030–5039, 2018.
Han, X. T.; You, Q. Z.; Wang, C. Y.; Zhang, Z. Z.; Chu, P.; Hu, H. D.; Wang, J.; Liu, Z. C. MMPTRACK: Large-scale densely annotated multi-camera multiple people tracking benchmark. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4849–4858, 2023.
Nguyen, D. M. H.; Henschel, R.; Rosenhahn, B.; Sonntag, D.; Swoboda, P. LMGP: Lifted multicut meets geometry projections for multi-camera multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8856–8865, 2022.
Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942, 2015.
Milan, A.; Leal-Taixe, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
Ong, J.; Vo, B. T.; Vo, B. N.; Kim, D. Y.; Nordholm, S. A Bayesian filter for multi-view 3D multi-object tracking with occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 5, 2246–2263, 2022.
Article Google Scholar
You, Q. Z.; Jiang, H. Real-time 3D deep multi-camera tracking. arXiv preprint arXiv:2003.11753, 2020.
Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. Journal on Image and Video Processing Vol. 2008, 1, 2008.
Article Google Scholar
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multitarget, multi-camera tracking. In: Computer Vision–ECCV 2016 Workshops. Lecture Notes in Computer-Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 17–35, 2016.
Google Scholar
Ge, Z.; Liu, S. T.; Wang, F.; Li, Z. M.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430, 2021.
Luo, H.; Jiang, W.; Gu, Y. Z.; Liu, F. X.; Liao, X. Y.; Lai, S. Q.; Gu, J. Y. A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia Vol. 22, No. 10, 2597–2609, 2020.
Article Google Scholar
Belagiannis, V.; Amin, S.; Andriluka, M.; Schiele, B.; Navab, N.; Ilic, S. 3D pictorial structures for multiple human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1669–1676, 2014.
Belagiannis, V.; Wang, X. C.; Schiele, B.; Fua, P.; Ilic, S.; Navab, N. Multiple human pose estimation with temporally consistent 3D pictorial structures. In: Computer Vision - ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8925. Agapito, L.; Bronstein, M.; Rother, C. Eds. Springer Cham, 742–754, 2015.
Google Scholar
Ershadi-Nasab, S.; Noury, E.; Kasaei, S.; Sanaei, E. Multiple human 3D pose estimation from multiview images. Multimedia Tools and Applications Vol. 77, No. 12, 15573–15601, 2018.
Article Google Scholar
Ye, H.; Zhu, W.; Wang, C.; Wu, R.; Wang, Y. Faster VoxelPose: Real-time 3D human pose estimation by orthographic projection. In: Computer Vision–ECCV 2022. Lecture Notes in Computer Science, Vol. 13666. Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; Hassner, T. Eds. Springer Cham, 142–159, 2022.
Google Scholar
Tu, H. Y.; Wang, C. Y.; Zeng, W. J. VoxelPose: Towards multi-camera 3D human pose estimation in wild environment. In: Computer Vision–ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 197–212, 2020.
Google Scholar
Cao, Z.; Simon, T.; Wei, S. H.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1302–1310, 2017.
Sha, Z. J.; Zeng, Z. L.; Wang, Z.; Natori, Y.; Taniguchi, Y.; Satoh, S. Progressive domain adaptation for robot vision person re-identification. In: Proceedings of the 28th ACM International Conference on Multimedia, 4488–4490, 2020.
Yang, F.; Chang, X.; Dang, C. Y.; Zheng, Z. Q.; Sakti, S.; Nakamura, S.; Wu, Y. ReMOTS: Self-supervised refining multi-object tracking and segmentation. arXiv preprint arXiv:2007.03200, 2020.
Yang, F.; Chang, X.; Sakti, S.; Wu, Y.; Nakamura, S. ReMOT: A model-agnostic refinement for multiple object tracking. Image and Vision Computing Vol. 106, 104091, 2021.
Article Google Scholar
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd edn. Cambridge, UK: Cambridge University Press, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Research, Tokyo, Japan
Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui & Shan Jiang

Authors

Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shigeyuki Odashima
View author publications
You can also search for this author in PubMed Google Scholar
Sosuke Yamao
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Fujimoto
View author publications
You can also search for this author in PubMed Google Scholar
Shoichi Masui
View author publications
You can also search for this author in PubMed Google Scholar
Shan Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Yang.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Fan Yang received his B.S. and Ph.D. degrees in information sciences from Nanjing University, China and Nara Institute of Science and Technology, Japan, in 2012 and 2021, respectively. He is currently a researcher at Fujitsu Research. His research focuses on action recognition, pose estimation, and multi-object tracking. He participates in tracking competitions at CVPR, ICCV, and ECCV, and has obtained three 1st places, two 2nd places, and one 4th place.

Shigeyuki Odashima received his B.E., M.E., and Ph.D. degrees from the University of Tokyo in 2008, 2010, and 2013, respectively. He is currently a research scientist at Fujitsu Research. His research interests include robotics, computer vision, ubiquitous computing, and data mining, including human activity recognition, human pose estimation, and human motion assessment.

Sosuke Yamao received his B.E. degree in information engineering and M.S. degree in information sciences from Tohoku University, Sendai, Japan, in 2013 and 2015, respectively. He is currently a researcher at Fujitsu Research. His research interests include image-based 3D scene modeling, human pose estimation, neural rendering, and machine learning for computer vision.

Hiroaki Fujimoto received his B.S. and M.S. degrees in engineering from Tokyo Metropolitan University in 1997 and 1999 respectively. He is currently a principal researcher at Fujitsu Research. His research interests include pose estimation from depth sensors and RGB camera images.

Shoichi Masui received his B.S. and M.S. degrees from Nagoya University, Japan, in 1982 and 1984, respectively. He received his Ph.D. degree from Tokyo Institute of Technology in 2006. From 1990 to 1992, he was a visiting scholar at Stanford University. In 1999, he joined Fujitsu Limited; from 2000 to 2007, he was with Fujitsu Laboratories Ltd., where he was engaged in various IC design projects. In 2001, he was a visiting scholar at the University of Toronto. From 2007 to 2012, he was a professor in the Research Institute of Electrical Communication of Tohoku University. In 2012 he returned to Fujitsu Laboratories. He is currently engaged in pose estimation from depth sensors and RGB cameras for sports applications at Fujitsu Research. He was the recipient of a commendation from the Japanese Minister of Education, Culture, Sports, Science, and Technology in 2004.

Shan Jiang is a project manager at Fujitsu Research. He received his doctoral degree in control engineering from Shanghai Jiao Tong University. His research interests include robotics, human interaction, 3D reconstruction, and image analysis and synthesis. He has served as Director of the Robotics Society of Japan since 2021.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Yang, F., Odashima, S., Yamao, S. et al. A unified multi-view multi-person tracking framework. Comp. Visual Media 10, 137–160 (2024). https://doi.org/10.1007/s41095-023-0334-8

Download citation

Received: 30 September 2022
Accepted: 02 January 2023
Published: 30 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s41095-023-0334-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A unified multi-view multi-person tracking framework

Abstract

Article PDF

Similar content being viewed by others

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A unified multi-view multi-person tracking framework

Abstract

Article PDF

Similar content being viewed by others

Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views

3D pedestrian localization using multiple cameras: a generalizable approach

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation