A hybrid approach to real-time multi-target tracking

Scarrica, Vincenzo M.; Panariello, Ciro; Ferone, Alessio; Staiano, Antonino

doi:10.1007/s00521-024-09799-4

A hybrid approach to real-time multi-target tracking

Review
Published: 29 April 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Vincenzo M. Scarrica ORCID: orcid.org/0009-0008-4640-2693^1,2,
Ciro Panariello²,
Alessio Ferone² &
…
Antonino Staiano²

72 Accesses
Explore all metrics

Abstract

Multi-Object Tracking, also known as Multi-Target Tracking, is an important area of computer vision with various applications in different domains. The advent of deep learning has had a profound impact on this field, forcing researchers to explore innovative avenues. Deep learning methods have become the cornerstone of today's state-of-the-art solutions, consistently delivering exceptional tracking results. However, the significant computational demands of deep learning models require powerful hardware resources that do not always match real-time tracking requirements, limiting their practical applicability in real-world scenarios. Thus, there is an imperative to strike a balance by merging robust deep learning strategies with conventional approaches to enable more accessible, cost-effective solutions that meet real-time requirements. This paper embarks on this endeavor by presenting a hybrid strategy for real-time multi-target tracking. It effectively combines a classical optical flow algorithm with a deep learning architecture tailored for human crowd tracking systems. This hybrid approach achieves a commendable balance between tracking accuracy and computational efficiency. The proposed architecture, subjected to extensive experimentation in various settings, demonstrated notable results, achieving a Mean Object Tracking Accuracy (MOTA) of 0.608. This level of performance placed it as the highest ranking solution on the MOT15 benchmark, surpassing the state-of-the-art benchmark of 0.549, and consistently ranked among the superior models on the MOT17 and MOT20 benchmarks. Additionally, the incorporation of the optical flow phase resulted in a substantial reduction in processing time, nearly halving the duration, while simultaneously maintaining accuracy levels comparable to established techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Data Availability

The datasets used for both training and testing purposes may be provided upon request.

Notes

The official GitHub repository of this work is freely available at https://github.com/Dantekk/A-hybrid-approach-to-Real-Time-Multi-Target-Tracking.

References

Zhao Z, Chen Z, Voros S, Cheng X (2019) Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Comput Assist Surg 24(sup1):20–29. https://doi.org/10.1080/24699322.2018.1560097
Article Google Scholar
Amamra A (2021) Smooth head tracking for virtual reality applications. arXiv:2110.14193
Chandrajit M, Girisha R, Vasudev T (2016) Multiple objects tracking in surveillance video using color and hu moments. arXiv:1608.06148
Leal-Taixé L, Milan A, Reid ID, Roth S, Schindler K (2015) Motchallenge 2015: toward a benchmark for multi-target tracking. arXiv:1504.01942
Hornáková A, Henschel R, Rosenhahn B, Swoboda P (2020) Lifted disjoint paths with application in multiple object tracking. arXiv:2006.14550
Brasó G, Leal-Taixé L (2019) Learning a neural solver for multiple object tracking. arXiv:1912.07515
Hornáková A, Kaiser T, Swoboda P, Rolínek M, Rosenhahn B, Henschel R (2021) Making higher order MOT scalable: an efficient approximate solver for lifted disjoint paths. arXiv:2108.10606
Yang J, Ge H, Yang J, Tong Y, Su S (2022) Online multi-object tracking using multi-function integration and tracking simulation training. Appl Intell 52(2):1268–1288. https://doi.org/10.1007/s10489-021-02457-5
Article Google Scholar
Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE computer society, Los Alamitos, CA, USA, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103
Han J, Li W, Pan F, Zheng D, Gao Q (2022) Spatial-attention location-aware multi-object tracking. In: 2022 41st Chinese Control Conference (CCC), pp 6341–6346. https://doi.org/10.23919/CCC55666.2022.9902510
Xu Y, Ban Y, Alameda-Pineda X, Horaud R (2019) Deepmot: a differentiable framework for training multiple object trackers. arXiv:1906.06618
Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Eantrack: an efficient attention network for visual tracking. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3319676
Article Google Scholar
Yuan D, Shu X, Liu Q, He Z (2023) Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans Circuits Syst II Express Briefs 70(3):1224–1228. https://doi.org/10.1109/TCSII.2022.3223871
Article Google Scholar
Gu F, Lu J, Cai C (2022) Rpformer: a robust parallel transformer for visual tracking in complex scenes. IEEE Trans Instrum Meas 71:1–14. https://doi.org/10.1109/TIM.2022.3170972
Article Google Scholar
Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking. Neural Comput Appl 35(28):20581
Article Google Scholar
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:1–19. https://doi.org/10.1007/s11263-021-01513-4
Article Google Scholar
Zhang Y, Sun P, Jiang Y, Yu D, Yuan Z, Luo P, Liu W, Wang X (2021) Bytetrack: multi-object tracking by associating every detection box. arXiv:2110.06864
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME-J Basic Eng 82(Series D):35–45
Article MathSciNet Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007. https://doi.org/10.1109/ICCV.2017.324
Liu Z, Mao H, Wu C, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. arXiv:2201.03545
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946
Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 239–256
Chapter Google Scholar
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI'81. Morgan Kaufmann Publishers Inc., San Francisco, pp 674–679
Google Scholar
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision—ECCV 2006. Springer, Berlin, pp 430–443
Chapter Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Sun P, Cao J, Jiang Y, Yuan Z, Bai S, Kitani K, Luo P (2021) Dancetrack: multi-object tracking in uniform appearance and diverse motion. arXiv:2111.14690
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4457–4465
Xiao T, Li S, Wang B, Lin L, Wang X (2016) End-to-end deep learning for person search. arXiv:1604.01850
Zheng L, Zhang H, Sun S, Chandraker M, Tian Q (2016) Person re-identification in the wild. arXiv:1604.02531
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581
Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) MOT16: a benchmark for multi-object tracking. arXiv:1603.00831
Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid ID, Roth S, Schindler K, Leal-Taixé L (2020) MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003
Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Hua G, Jégou H (eds) Computer Vision, ECCV 2016 Workshops. Springer, Cham, pp 17–35
Chapter Google Scholar
Luiten J, Osep A, Dendorfer P, Torr PHS, Geiger A, Leal-Taixé L, Leibe B (2020) HOTA: a higher order metric for evaluating multi-object tracking. arXiv:2009.07736
Li Y, Huang C, Nevatia R (2009) Learning to associate: HybridBoosted multi-target tracker for crowded scene. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2953–2960. https://doi.org/10.1109/CVPR.2009.5206735
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. arXiv:2103.14258
Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) TransCenter: transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell 45(6):7820–7835
Article Google Scholar
Boragule A, Jang H, Ha N, Jeon M (2022) Pixel-guided association for multi-object tracking. Sensors. https://doi.org/10.3390/s22228922
Article Google Scholar
Zeng K, You Y, Shen T, Qingwang W, Tao Z, Wang Z, Liu Q (2023) NCT: noise-control multi-object tracking. Complex Intell Syst 9:1–17
Article Google Scholar
Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333
Article Google Scholar
Girbau A, Giró-i-Nieto X, Rius I, Marqués F (2021)Multiple object tracking with mixture density networks for trajectory estimation. arXiv:2106.10950
You S, Yao H, Bao B-K, Xu C (2023) UTM: a unified multiple object tracking model with identity-aware feature enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21876–21886
Kawanishi Y (2022) Label-based multiple object ensemble tracking with randomized frame dropping. In: 2022 26th international conference on pattern recognition (ICPR), pp 900–906. https://doi.org/10.1109/ICPR56361.2022.9956158
Cetintas O, Brasó G, Leal-Taixé L (2023) Unifying short and long-term tracking with graph hierarchies
Stadler D, Beyerer J (2021) Multi-pedestrian tracking with clusters. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–10. https://doi.org/10.1109/AVSS52988.2021.9663829
Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10958–10967
Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2016) FlowNet 2.0: evolution of optical flow estimation with deep networks

Download references

Author information

Authors and Affiliations

National PhD Program in AI – Agrifood and Environment, University of Naples Federico II, Corso Umberto I 40, 80138, Naples, Campania, Italy
Vincenzo M. Scarrica
Department of Science and Technology, University of Naples Parthenope, Centro Direzionale Isola C4, 80143, Naples, Campania, Italy
Vincenzo M. Scarrica, Ciro Panariello, Alessio Ferone & Antonino Staiano

Authors

Vincenzo M. Scarrica
View author publications
You can also search for this author in PubMed Google Scholar
Ciro Panariello
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Ferone
View author publications
You can also search for this author in PubMed Google Scholar
Antonino Staiano
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Vincenzo M. Scarrica.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Scarrica, V.M., Panariello, C., Ferone, A. et al. A hybrid approach to real-time multi-target tracking. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09799-4

Download citation

Received: 05 October 2023
Accepted: 25 March 2024
Published: 29 April 2024
DOI: https://doi.org/10.1007/s00521-024-09799-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid approach to real-time multi-target tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Attention mechanisms in computer vision: A survey

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid approach to real-time multi-target tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Attention mechanisms in computer vision: A survey

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation