Skip to main content
Log in

A hybrid approach to real-time multi-target tracking

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Multi-Object Tracking, also known as Multi-Target Tracking, is an important area of computer vision with various applications in different domains. The advent of deep learning has had a profound impact on this field, forcing researchers to explore innovative avenues. Deep learning methods have become the cornerstone of today's state-of-the-art solutions, consistently delivering exceptional tracking results. However, the significant computational demands of deep learning models require powerful hardware resources that do not always match real-time tracking requirements, limiting their practical applicability in real-world scenarios. Thus, there is an imperative to strike a balance by merging robust deep learning strategies with conventional approaches to enable more accessible, cost-effective solutions that meet real-time requirements. This paper embarks on this endeavor by presenting a hybrid strategy for real-time multi-target tracking. It effectively combines a classical optical flow algorithm with a deep learning architecture tailored for human crowd tracking systems. This hybrid approach achieves a commendable balance between tracking accuracy and computational efficiency. The proposed architecture, subjected to extensive experimentation in various settings, demonstrated notable results, achieving a Mean Object Tracking Accuracy (MOTA) of 0.608. This level of performance placed it as the highest ranking solution on the MOT15 benchmark, surpassing the state-of-the-art benchmark of 0.549, and consistently ranked among the superior models on the MOT17 and MOT20 benchmarks. Additionally, the incorporation of the optical flow phase resulted in a substantial reduction in processing time, nearly halving the duration, while simultaneously maintaining accuracy levels comparable to established techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The datasets used for both training and testing purposes may be provided upon request.

Notes

  1. The official GitHub repository of this work is freely available at https://github.com/Dantekk/A-hybrid-approach-to-Real-Time-Multi-Target-Tracking.

References

  1. Zhao Z, Chen Z, Voros S, Cheng X (2019) Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Comput Assist Surg 24(sup1):20–29. https://doi.org/10.1080/24699322.2018.1560097

    Article  Google Scholar 

  2. Amamra A (2021) Smooth head tracking for virtual reality applications. arXiv:2110.14193

  3. Chandrajit M, Girisha R, Vasudev T (2016) Multiple objects tracking in surveillance video using color and hu moments. arXiv:1608.06148

  4. Leal-Taixé L, Milan A, Reid ID, Roth S, Schindler K (2015) Motchallenge 2015: toward a benchmark for multi-target tracking. arXiv:1504.01942

  5. Hornáková A, Henschel R, Rosenhahn B, Swoboda P (2020) Lifted disjoint paths with application in multiple object tracking. arXiv:2006.14550

  6. Brasó G, Leal-Taixé L (2019) Learning a neural solver for multiple object tracking. arXiv:1912.07515

  7. Hornáková A, Kaiser T, Swoboda P, Rolínek M, Rosenhahn B, Henschel R (2021) Making higher order MOT scalable: an efficient approximate solver for lifted disjoint paths. arXiv:2108.10606

  8. Yang J, Ge H, Yang J, Tong Y, Su S (2022) Online multi-object tracking using multi-function integration and tracking simulation training. Appl Intell 52(2):1268–1288. https://doi.org/10.1007/s10489-021-02457-5

    Article  Google Scholar 

  9. Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067

  10. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE computer society, Los Alamitos, CA, USA, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103

  11. Han J, Li W, Pan F, Zheng D, Gao Q (2022) Spatial-attention location-aware multi-object tracking. In: 2022 41st Chinese Control Conference (CCC), pp 6341–6346. https://doi.org/10.23919/CCC55666.2022.9902510

  12. Xu Y, Ban Y, Alameda-Pineda X, Horaud R (2019) Deepmot: a differentiable framework for training multiple object trackers. arXiv:1906.06618

  13. Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Eantrack: an efficient attention network for visual tracking. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3319676

    Article  Google Scholar 

  14. Yuan D, Shu X, Liu Q, He Z (2023) Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans Circuits Syst II Express Briefs 70(3):1224–1228. https://doi.org/10.1109/TCSII.2022.3223871

    Article  Google Scholar 

  15. Gu F, Lu J, Cai C (2022) Rpformer: a robust parallel transformer for visual tracking in complex scenes. IEEE Trans Instrum Meas 71:1–14. https://doi.org/10.1109/TIM.2022.3170972

    Article  Google Scholar 

  16. Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking. Neural Comput Appl 35(28):20581

    Article  Google Scholar 

  17. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:1–19. https://doi.org/10.1007/s11263-021-01513-4

    Article  Google Scholar 

  18. Zhang Y, Sun P, Jiang Y, Yu D, Yuan Z, Luo P, Liu W, Wang X (2021) Bytetrack: multi-object tracking by associating every detection box. arXiv:2110.06864

  19. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  21. Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME-J Basic Eng 82(Series D):35–45

    Article  MathSciNet  Google Scholar 

  22. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007. https://doi.org/10.1109/ICCV.2017.324

  23. Liu Z, Mao H, Wu C, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. arXiv:2201.03545

  24. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946

  25. Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 239–256

    Chapter  Google Scholar 

  26. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI'81. Morgan Kaufmann Publishers Inc., San Francisco, pp 674–679

    Google Scholar 

  27. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision—ECCV 2006. Springer, Berlin, pp 430–443

    Chapter  Google Scholar 

  28. Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  29. Sun P, Cao J, Jiang Y, Yuan Z, Bai S, Kitani K, Luo P (2021) Dancetrack: multi-object tracking in uniform appearance and diverse motion. arXiv:2111.14690

  30. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123

  31. Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4457–4465

  32. Xiao T, Li S, Wang B, Lin L, Wang X (2016) End-to-end deep learning for person search. arXiv:1604.01850

  33. Zheng L, Zhang H, Sun S, Chandraker M, Tian Q (2016) Person re-identification in the wild. arXiv:1604.02531

  34. Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581

  35. Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) MOT16: a benchmark for multi-object tracking. arXiv:1603.00831

  36. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid ID, Roth S, Schindler K, Leal-Taixé L (2020) MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003

  37. Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization

  38. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Hua G, Jégou H (eds) Computer Vision, ECCV 2016 Workshops. Springer, Cham, pp 17–35

    Chapter  Google Scholar 

  39. Luiten J, Osep A, Dendorfer P, Torr PHS, Geiger A, Leal-Taixé L, Leibe B (2020) HOTA: a higher order metric for evaluating multi-object tracking. arXiv:2009.07736

  40. Li Y, Huang C, Nevatia R (2009) Learning to associate: HybridBoosted multi-target tracker for crowded scene. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2953–2960. https://doi.org/10.1109/CVPR.2009.5206735

  41. Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. arXiv:2103.14258

  42. Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) TransCenter: transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell 45(6):7820–7835

    Article  Google Scholar 

  43. Boragule A, Jang H, Ha N, Jeon M (2022) Pixel-guided association for multi-object tracking. Sensors. https://doi.org/10.3390/s22228922

    Article  Google Scholar 

  44. Zeng K, You Y, Shen T, Qingwang W, Tao Z, Wang Z, Liu Q (2023) NCT: noise-control multi-object tracking. Complex Intell Syst 9:1–17

    Article  Google Scholar 

  45. Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333

    Article  Google Scholar 

  46. Girbau A, Giró-i-Nieto X, Rius I, Marqués F (2021)Multiple object tracking with mixture density networks for trajectory estimation. arXiv:2106.10950

  47. You S, Yao H, Bao B-K, Xu C (2023) UTM: a unified multiple object tracking model with identity-aware feature enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21876–21886

  48. Kawanishi Y (2022) Label-based multiple object ensemble tracking with randomized frame dropping. In: 2022 26th international conference on pattern recognition (ICPR), pp 900–906. https://doi.org/10.1109/ICPR56361.2022.9956158

  49. Cetintas O, Brasó G, Leal-Taixé L (2023) Unifying short and long-term tracking with graph hierarchies

  50. Stadler D, Beyerer J (2021) Multi-pedestrian tracking with clusters. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–10. https://doi.org/10.1109/AVSS52988.2021.9663829

  51. Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10958–10967

  52. Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696

  53. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2016) FlowNet 2.0: evolution of optical flow estimation with deep networks

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Vincenzo M. Scarrica.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scarrica, V.M., Panariello, C., Ferone, A. et al. A hybrid approach to real-time multi-target tracking. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09799-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09799-4

Keywords

Navigation