First Steps Towards 3D Pedestrian Detection and Tracking from Single Image

Mancusi, Gianluca; Fabbri, Matteo; Egidi, Sara; Verasani, Mattia; Scarabelli, Paolo; Calderara, Simone; Cucchiara, Rita

doi:10.1007/978-3-031-06430-2_28

Gianluca Mancusi^12,13,
Matteo Fabbri^12,14,
Sara Egidi¹³,
Mattia Verasani¹³,
Paolo Scarabelli¹³,
Simone Calderara¹² &
…
Rita Cucchiara¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13232))

Included in the following conference series:

International Conference on Image Analysis and Processing

Abstract

Since decades, the problem of multiple people tracking has been tackled leveraging 2D data only. However, people moves and interact in a three-dimensional space. For this reason, using only 2D data might be limiting and overly challenging, especially due to occlusions and multiple overlapping people. In this paper, we take advantage of 3D synthetic data from the novel MOTSynth dataset, to train our proposed 3D people detector, whose observations are fed to a tracker that works in the corresponding 3D space. Compared to conventional 2D trackers, we show an overall improvement in performance with a reduction of identity switches on both real and synthetic data. Additionally, we propose a tracker that jointly exploits 3D and 2D data, showing an improvement over the proposed baselines. Our experiments demonstrate that 3D data can be beneficial, and we believe this paper will pave the road for future efforts in leveraging 3D data for tackling multiple people tracking. The code is available at (https://github.com/GianlucaMancusi/LoCO-Det).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rlm-tracking: online multi-pedestrian tracking supported by relative location mapping

Article Open access 18 January 2024

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Article Open access 23 December 2020

Large Scale Real-World Multi-person Tracking

References

Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: ICCV (2019)
Google Scholar
Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), September 2016
Google Scholar
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: CVPR (2020)
Google Scholar
Dendorfer, P., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)
Dendorfer, P., et al.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 129(4), 845–881 (2021)
Google Scholar
Fabbri, M., et al.: Compressed volumetric heatmaps for multi-person 3D pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7204–7213 (2020)
Google Scholar
Fabbri, M., et al.: Learning to detect and track visible and occluded body joints in a virtual world. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 450–466. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_27
Chapter Google Scholar
Fabbri, M., et al.: MOTSynth: how can synthetic data help pedestrian detection and tracking? In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Fan, T., et al.: Revitalizing optimization for 3D human pose and shape estimation: a sparse constrained formulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Gordon, D.M., Paul, R.E., Thorpe, K.: What is the function of encounter patterns in ant colonies? Anim. Behav. 45(6), 1083–1100 (1993). ISSN: 0003-3472
Google Scholar
Huang, Y., et al.: SQE: a self quality evaluation metric for parameters optimization in multi-object tracking. In: CVPR (2020)
Google Scholar
Kim, C., Li, F., Rehg, J.M.: Multi-object tracking with neural gating using bilinear LSTM. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 208–224. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_13
Chapter Google Scholar
Kwon, O.-H., Tanke, J., Gall, J.: Recursive Bayesian filtering for multiple human pose tracking from multiple cameras. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 438–453. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_27
Chapter Google Scholar
Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese CNN for robust target association. In: CVPR Workshops (2016)
Google Scholar
Luo, W., et al.: Multiple object tracking: a literature review. Artif. Intell. 293, 103448 (2021)
Google Scholar
Milan, A., et al.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV, pp. 10133–10142 (2019)
Google Scholar
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking, June 2021
Google Scholar
Pham, N.T., Huang, W., Ong, S.H.: Probability hypothesis density approach for multi-camera multi-object tracking. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4843, pp. 875–884. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76386-4_83
Chapter Google Scholar
Quach, K.G., et al.: DyGLIP: a dynamic graph model with link prediction for accurate multi-camera multiple object tracking. In: CVPR, pp. 13784–13793, June 2021
Google Scholar
Rajasegaran, J., et al.: Tracking people by predicting 3D appearance, location & pose. ArXiv abs/2112.04477 (2021)
Google Scholar
Rajasegaran, J., et al.: Tracking people with 3D representations. In: NeurIPS (2021)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Reid, D.B.: An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24, 843–854 (1979)
Article Google Scholar
Sato, S.: Multilayer lidar-based pedestrian tracking in urban environments. In: IEEE Intelligent Vehicles Symposium, pp. 849–854. IEEE (2010)
Google Scholar
Son, J., et al.: Multi-object tracking with quadruplet convolutional neural networks. In: CVPR (2017)
Google Scholar
Tokmakov, P., et al.: Learning to track with object permanence (2021)
Google Scholar
Weng, X., et al.: GNN3DMOT: graph neural network for 3D multi-object tracking with 2D-3D multi-feature learning. In: CVPR, pp. 6499–6508 (2020)
Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: ICIP, pp. 3645–3649. IEEE (2017)
Google Scholar
Xu, Y., et al.: How to train your deep multi-object tracker. In: CVPR (2020)
Google Scholar
Yin, J., et al.: A unified object motion and affinity model for online multi-object tracking. In: CVPR (2020)
Google Scholar
Zeng, F., et al.: MOTR: end-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247 (2021)
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. arXiv preprint arXiv:2110.06864 (2021)
Zhang, Y., et al.: 4D association graph for realtime multi-person motion capture using multiple video cameras. In: CVPR, pp. 1324–1333 (2020)
Google Scholar
Zheng, C., et al.: 3D human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11656–11665, October 2021
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points arXiv preprint arXiv:1904.07850 (2019)

Download references

Acknowledgements

Partially supported by the PREVUE “PRediction of activities and Events by Vision in an Urban Environment” project (CUP E94I19000650001), PRIN National Research Program, Italian Ministry for Education, University and Research (MIUR), by ROADSTER “Road Sustainable Twins in Emilia Romagna” project, International Foundation Big Data and Artificial Intelligence for Human Development, and by Tetra Pak Packaging Solutions S.P.A.

Author information

Authors and Affiliations

University of Modena and Reggio Emilia, Modena, Italy
Gianluca Mancusi, Matteo Fabbri, Simone Calderara & Rita Cucchiara
Tetra Pak Packaging Solutions S.P.A., Modena, Italy
Gianluca Mancusi, Sara Egidi, Mattia Verasani & Paolo Scarabelli
GoatAI S.r.l., Modena, Italy
Matteo Fabbri

Authors

Gianluca Mancusi
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Fabbri
View author publications
You can also search for this author in PubMed Google Scholar
Sara Egidi
View author publications
You can also search for this author in PubMed Google Scholar
Mattia Verasani
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Scarabelli
View author publications
You can also search for this author in PubMed Google Scholar
Simone Calderara
View author publications
You can also search for this author in PubMed Google Scholar
Rita Cucchiara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianluca Mancusi .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mancusi, G. et al. (2022). First Steps Towards 3D Pedestrian Detection and Tracking from Single Image. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-06430-2_28
Published: 17 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

First Steps Towards 3D Pedestrian Detection and Tracking from Single Image

Abstract

Access this chapter

Similar content being viewed by others

Rlm-tracking: online multi-pedestrian tracking supported by relative location mapping

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Large Scale Real-World Multi-person Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

First Steps Towards 3D Pedestrian Detection and Tracking from Single Image

Abstract

Access this chapter

Similar content being viewed by others

Rlm-tracking: online multi-pedestrian tracking supported by relative location mapping

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Large Scale Real-World Multi-person Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation