Skip to main content
Log in

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Multi-object tracking in dense scenes has always been a major difficulty in this field. Although some existing algorithms achieve excellent results in multi-object tracking, they fail to achieve good generalization when the application background is transferred to more challenging dense scenarios. In this work, we propose PTDS(Pedestrian Tracking in Dense Scene) CenterTrack based on the CenterTrack for object center point detection and tracking. It utilizes dense inter-frame similarity to perform object appearance feature comparisons to predict the inter-frame position changes of objects, extending CenterTrack by using only motion features. We propose a feature enhancement method based on a hybrid attention mechanism, which adds information on the temporal dimension between frames to the features required for object detection, and connects the two tasks of detection and tracking. Under the MOT20 benchmark, PTDS CenterTrack has achieved 55.6%MOTA, 55.1%IDF1, 45.1%HOTA, which is an increase of 10.1 percentage points, 4.0 percentage points, and 4.8 percentage points respectively compared to CenterTrack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Artacho, B., Savakis, A.: Unipose+: a unified framework for 2d and 3d human pose estimation in images and videos. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9641–9653 (2022). https://doi.org/10.1109/TPAMI.2021.3124736

    Article  Google Scholar 

  2. Ban, Y., Ba, S., Alameda-Pineda, X., et al.: Tracking multiple persons based on a variational bayesian model. In: European Conference on Computer Vision, Springer, pp 52–67, https://doi.org/10.1007/978-3-319-48881-3_5 (2016)

  3. Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951, https://doi.org/10.1109/ICCV.2019.00103 (2019)

  4. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear mot metrics. EURASIP J. Image Video Process. 2008, 1–10 (2008). https://doi.org/10.1155/2008/246309

    Article  Google Scholar 

  5. Bertasius, G., Feichtenhofer, C., Tran, D., et al.: Learning temporal pose estimation from sparsely-labeled videos. In: Wallach H, Larochelle H, Beygelzimer A, et al (eds) Advances in neural information processing systems, vol 32. Curran Associates, Inc., https://proceedings.neurips.cc/paper_files/paper/2019/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf (2019)

  6. Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3464–3468, https://doi.org/10.1109/ICIP.2016.7533003 (2016)

  7. Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6247–6257, https://doi.org/10.1109/CVPR42600.2020.00628 (2020)

  8. Chen, D., Zhang, S., Ouyang, W., et al.: Person search via a mask-guided two-stream CNN Model. pp 734–750, https://openaccess.thecvf.com/content_ECCV_2018/html/Di_Chen_Person_Search_via_ECCV_2018_paper.html (2018)

  9. Chen, X., Fang, H., Lin, T.Y., et al.: Microsoft coco captions: data collection and evaluation server. arXiv:1504.00325https://doi.org/10.48550/arXiv.1504.00325 (2015)

  10. Ciaparrone, G., Luque Sánchez, F., Tabik, S., et al.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020). https://doi.org/10.1016/j.neucom.2019.11.023

  11. Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. Adv. Neural Inform. Process. Syst. (2016). https://doi.org/10.5555/3157096.3157139

    Article  Google Scholar 

  12. Dai, J., Qi, H., Xiong, Y., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 764–773, https://doi.org/10.1109/ICCV.2017.89 (2017)

  13. Dave, A., Khurana, T., Tokmakov, P., et al.: Tao: a large-scale benchmark for tracking any object. In: European Conference on Computer Vision, Springer, pp 436–454, https://doi.org/10.1007/978-3-030-58558-7_26 (2020)

  14. Dendorfer, P., Rezatofighi, H., Milan, A., et al.: Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprinthttps://doi.org/10.48550/arXiv.2003.09003 (2020)

  15. Ge, Z., Liu, S., Wang, F., et al.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430https://doi.org/10.48550/arXiv.2107.08430 (2021)

  16. Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013). https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  17. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448, https://doi.org/10.1109/ICCV.2015.169 (2015)

  18. Jaouedi, N., Boujnah, N., Bouhlel, M.S.: A new hybrid deep learning model for human action recognition. J. King Saud Univ.-Comput. Inform. Sci. 32(4), 447–453 (2020). https://doi.org/10.1016/j.jksuci.2019.09.004

    Article  Google Scholar 

  19. Karthik, S., Prabhu, A., Gandhi, V.: Simple unsupervised multi-object tracking. arXiv preprint arXiv:2006.02609https://doi.org/10.48550/arXiv.2006.02609 (2020)

  20. Kasturi, R., Goldgof, D., Soundararajan, P., et al.: Performance Evaluation Protocol for Face, Person and Vehicle Detection and Tracking in Video Analysis and Content Extraction (vace-ii). Computer Science & Engineering University of South Florida, Tampa (2006)

    Google Scholar 

  21. Kong, T., Sun, F., Liu, H., et al.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020). https://doi.org/10.1109/TIP.2020.3002345

    Article  Google Scholar 

  22. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750, https://doi.org/10.1007/s11263-019-01204-1 (2018)

  23. Leal-Taixé, L., Milan, A., Reid, I., et al.: Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942https://doi.org/10.48550/arXiv.1504.01942 (2015)

  24. Li, G.B., Yang, L.L., Wang, W.J., et al.: Id-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein-ligand interactions. J. Chem. Inf. Model. 53(3), 592–600 (2013). https://doi.org/10.1021/ci300493w

    Article  MathSciNet  Google Scholar 

  25. Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  26. Liu, W., Anguelov, D., Erhan, D., et al.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, Springer, pp 21–37, https://doi.org/10.1007/978-3-319-46448-0_2 (2016)

  27. Luiten, J., Osep, A., Dendorfer, P., et al.: Hota: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 129(2), 548–578 (2021). https://doi.org/10.1007/s11263-020-01375-2

    Article  Google Scholar 

  28. Maekawa, T., Ohara, K., Zhang, Y., et al.: Deep learning-assisted comparative analysis of animal trajectories with deephl. Nat. Commun. 11(1), 1–15 (2020). https://doi.org/10.1038/s41467-020-19105-0

    Article  Google Scholar 

  29. Milan, A., Leal-Taixé, L., Reid, I., et al.: Mot16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831https://doi.org/10.48550/arXiv.1603.00831 (2016)

  30. Pang, B., Li, Y., Zhang, Y., et al.: Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6308–6318, https://doi.org/10.1109/CVPR42600.2020.00634 (2020)

  31. Papakis, I., Sarkar, A., Karpatne, A.: A graph convolutional neural network based approach for traffic monitoring using augmented detections with optical flow. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), IEEE, pp 2980–2986, https://doi.org/10.48550/arXiv.2010.00067 (2021)

  32. Pedersen, M., Haurum, J.B., Dendorfer, P., et al.: MOTCOM: the multi-object tracking dataset complexity metric. In: Avidan, S., Brostow, G., Cissé, M., et al (eds) Computer Vision - ECCV 2022. Springer Nature Switzerland, Cham, Lecture Notes in Computer Science, pp 20–37, https://doi.org/10.1007/978-3-031-20074-8_2 (2022)

  33. Peng, J., Wang, C., Wan, F., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision, Springer, pp 145–161, https://doi.org/10.1007/978-3-030-58548-8_9 (2020)

  34. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767https://doi.org/10.48550/arXiv.1804.02767 (2018)

  35. Reid, D.: An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24(6), 843–854 (1979). https://doi.org/10.1109/TAC.1979.1102177

    Article  Google Scholar 

  36. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. (2015). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  37. Ren, W., Wang, X., Tian, J., et al.: Tracking-by-counting: using network flows on crowd density maps for tracking multiple targets. IEEE Trans. Image Process. 30, 1439–1452 (2020). https://doi.org/10.1109/TIP.2020.3044219

    Article  MathSciNet  Google Scholar 

  38. Schulter, S., Vernaza, P., Choi, W., et al.: Deep network flow for multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6951–6960, https://doi.org/10.1109/CVPR.2017.292 (2017)

  39. Shao, S., Zhao, Z., Li, B., et al.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123https://doi.org/10.48550/arXiv.1805.00123 (2018)

  40. Sun, P., Cao, J., Jiang, Y., et al.: Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460https://doi.org/10.48550/arXiv.2012.15460 (2020)

  41. Tian, Z., Shen, C., Chen, H., et al.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9627–9636, https://doi.org/10.1109/CVPR.2019.00094 (2019)

  42. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Advances in neural information processing systems 30. https://doi.org/10.48550/arXiv.1706.03762 (2017)

  43. Wang, G., Wang, Y., Gu, R., et al.: Split and connect: a universal tracklet booster for multi-object tracking. IEEE Trans. Multimedia (2022). https://doi.org/10.1109/TMM.2022.3140919

    Article  Google Scholar 

  44. Wang, Q., Zheng, Y., Pan, P., et al.: Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3876–3886, https://doi.org/10.1109/CVPR46437.2021.00387 (2021a)

  45. Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press, p 13708-13715, https://doi.org/10.1109/ICRA48506.2021.9561110, (2021b)

  46. Wang, Z., Zheng, L., Liu, Y., et al.: Towards real-time multi-object tracking. In: European Conference on Computer Vision, Springer, pp 107–122, https://doi.org/10.1007/978-3-030-58621-8_7 (2020)

  47. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3645–3649, https://doi.org/10.1109/icip.2017.8296962 (2017)

  48. Wu, J., Cao, J., Song, L., et al.: Track to detect and segment: An online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12352–12361, https://doi.org/10.48550/arXiv.2103.08808 (2021)

  49. Xu, Y., Ban, Y., Delorme, G., et al.: Transcenter: Transformers with dense representations for multiple-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence pp 1–16. https://doi.org/10.1109/TPAMI.2022.3225078 (2022)

  50. Xu, Y., Ban, Y., Delorme, G., et al.: Transcenter: transformers with dense representations for multiple-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7820–7835 (2023). https://doi.org/10.1109/TPAMI.2022.3225078

    Article  Google Scholar 

  51. Zhang, H., Chang, H., Ma, B., et al.: Cascade retinanet: Maintaining consistency for single-stage object detection. British Machine Vision Conference https://doi.org/10.48550/arXiv.1907.06881, https://api.semanticscholar.org/CorpusID:196831468 (2019)

  52. Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8, https://doi.org/10.1109/CVPR.2008.4587584 (2008)

  53. Zhang, Y., Sheng, H., Wu, Y., et al.: Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet Things J. 7(9), 7892–7902 (2020). https://doi.org/10.1109/JIOT.2020.2996609

    Article  Google Scholar 

  54. Zhang, Y., Wang, C., Wang, X., et al.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021). https://doi.org/10.1007/S11263-021-01513-4

    Article  MathSciNet  Google Scholar 

  55. Zhang, Y., Sun, P., Jiang, Y., et al.: Bytetrack: Multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., et al. (eds.) Computer Vision - ECCV 2022, pp. 1–21. Springer Nature Switzerland, Cham (2022)

  56. Zhang, Z., Cheng, D., Zhu, X., et al.: Integrated object detection and tracking with tracklet-conditioned detection. arXiv preprint arXiv:1811.11167https://doi.org/10.48550/arXiv.1811.11167 (2018)

  57. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850https://doi.org/10.48550/arxiv.1904.07850 (2019a)

  58. Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp 850–859, https://doi.org/10.1109/CVPR.2019.00094 (2019b)

  59. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490, https://doi.org/10.1007/978-3-030-58548-8_28 (2020)

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant No.62271166) and the Interdisciplinary Research Foundation of HIT (No.IR2021104).

Funding

This study was funded by the National Natural Science Foundation of China (Grant No.62271166) and the Interdisciplinary Research Foundation of HIT (No.IR2021104).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huanyu Liu.

Ethics declarations

Conflict of interest

None

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, J., Liu, H. & Li, J. PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement. Machine Vision and Applications 35, 54 (2024). https://doi.org/10.1007/s00138-024-01520-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01520-8

Keywords

Navigation