Skip to main content

First Steps Towards 3D Pedestrian Detection and Tracking from Single Image

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Abstract

Since decades, the problem of multiple people tracking has been tackled leveraging 2D data only. However, people moves and interact in a three-dimensional space. For this reason, using only 2D data might be limiting and overly challenging, especially due to occlusions and multiple overlapping people. In this paper, we take advantage of 3D synthetic data from the novel MOTSynth dataset, to train our proposed 3D people detector, whose observations are fed to a tracker that works in the corresponding 3D space. Compared to conventional 2D trackers, we show an overall improvement in performance with a reduction of identity switches on both real and synthetic data. Additionally, we propose a tracker that jointly exploits 3D and 2D data, showing an improvement over the proposed baselines. Our experiments demonstrate that 3D data can be beneficial, and we believe this paper will pave the road for future efforts in leveraging 3D data for tackling multiple people tracking. The code is available at (https://github.com/GianlucaMancusi/LoCO-Det).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: ICCV (2019)

    Google Scholar 

  2. Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), September 2016

    Google Scholar 

  3. Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: CVPR (2020)

    Google Scholar 

  4. Dendorfer, P., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)

  5. Dendorfer, P., et al.: Motchallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 129(4), 845–881 (2021)

    Google Scholar 

  6. Fabbri, M., et al.: Compressed volumetric heatmaps for multi-person 3D pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7204–7213 (2020)

    Google Scholar 

  7. Fabbri, M., et al.: Learning to detect and track visible and occluded body joints in a virtual world. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 450–466. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_27

    Chapter  Google Scholar 

  8. Fabbri, M., et al.: MOTSynth: how can synthetic data help pedestrian detection and tracking? In: International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  9. Fan, T., et al.: Revitalizing optimization for 3D human pose and shape estimation: a sparse constrained formulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  10. Gordon, D.M., Paul, R.E., Thorpe, K.: What is the function of encounter patterns in ant colonies? Anim. Behav. 45(6), 1083–1100 (1993). ISSN: 0003-3472

    Google Scholar 

  11. Huang, Y., et al.: SQE: a self quality evaluation metric for parameters optimization in multi-object tracking. In: CVPR (2020)

    Google Scholar 

  12. Kim, C., Li, F., Rehg, J.M.: Multi-object tracking with neural gating using bilinear LSTM. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 208–224. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_13

    Chapter  Google Scholar 

  13. Kwon, O.-H., Tanke, J., Gall, J.: Recursive Bayesian filtering for multiple human pose tracking from multiple cameras. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 438–453. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_27

    Chapter  Google Scholar 

  14. Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese CNN for robust target association. In: CVPR Workshops (2016)

    Google Scholar 

  15. Luo, W., et al.: Multiple object tracking: a literature review. Artif. Intell. 293, 103448 (2021)

    Google Scholar 

  16. Milan, A., et al.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)

  17. Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV, pp. 10133–10142 (2019)

    Google Scholar 

  18. Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking, June 2021

    Google Scholar 

  19. Pham, N.T., Huang, W., Ong, S.H.: Probability hypothesis density approach for multi-camera multi-object tracking. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4843, pp. 875–884. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76386-4_83

    Chapter  Google Scholar 

  20. Quach, K.G., et al.: DyGLIP: a dynamic graph model with link prediction for accurate multi-camera multiple object tracking. In: CVPR, pp. 13784–13793, June 2021

    Google Scholar 

  21. Rajasegaran, J., et al.: Tracking people by predicting 3D appearance, location & pose. ArXiv abs/2112.04477 (2021)

    Google Scholar 

  22. Rajasegaran, J., et al.: Tracking people with 3D representations. In: NeurIPS (2021)

    Google Scholar 

  23. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  24. Reid, D.B.: An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24, 843–854 (1979)

    Article  Google Scholar 

  25. Sato, S.: Multilayer lidar-based pedestrian tracking in urban environments. In: IEEE Intelligent Vehicles Symposium, pp. 849–854. IEEE (2010)

    Google Scholar 

  26. Son, J., et al.: Multi-object tracking with quadruplet convolutional neural networks. In: CVPR (2017)

    Google Scholar 

  27. Tokmakov, P., et al.: Learning to track with object permanence (2021)

    Google Scholar 

  28. Weng, X., et al.: GNN3DMOT: graph neural network for 3D multi-object tracking with 2D-3D multi-feature learning. In: CVPR, pp. 6499–6508 (2020)

    Google Scholar 

  29. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: ICIP, pp. 3645–3649. IEEE (2017)

    Google Scholar 

  30. Xu, Y., et al.: How to train your deep multi-object tracker. In: CVPR (2020)

    Google Scholar 

  31. Yin, J., et al.: A unified object motion and affinity model for online multi-object tracking. In: CVPR (2020)

    Google Scholar 

  32. Zeng, F., et al.: MOTR: end-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247 (2021)

  33. Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. arXiv preprint arXiv:2110.06864 (2021)

  34. Zhang, Y., et al.: 4D association graph for realtime multi-person motion capture using multiple video cameras. In: CVPR, pp. 1324–1333 (2020)

    Google Scholar 

  35. Zheng, C., et al.: 3D human pose estimation with spatial and temporal transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11656–11665, October 2021

    Google Scholar 

  36. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points arXiv preprint arXiv:1904.07850 (2019)

Download references

Acknowledgements

Partially supported by the PREVUE “PRediction of activities and Events by Vision in an Urban Environment” project (CUP E94I19000650001), PRIN National Research Program, Italian Ministry for Education, University and Research (MIUR), by ROADSTER “Road Sustainable Twins in Emilia Romagna” project, International Foundation Big Data and Artificial Intelligence for Human Development, and by Tetra Pak Packaging Solutions S.P.A.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluca Mancusi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mancusi, G. et al. (2022). First Steps Towards 3D Pedestrian Detection and Tracking from Single Image. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06430-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06429-6

  • Online ISBN: 978-3-031-06430-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics