Skip to main content
Log in

Igtracker: task and instance information gaps in multiple object tracking

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Pedestrian multiple object tracking targets to track multiple pedestrian instances in real-time. Recently, the methods based on joint detection and embedding have improved performance by sharing task features. However, it has two obvious shortcomings: inconsistent task information and ambiguous neighbor instance overlap. Hence, the branch tasks information gap and instances information gap need to be carefully addressed. In this paper, IGTracker is proposed as a novel online tracking framework, which bridges different branch task optimization requirements from the perspective of task-specific information gaps and nearest instance information gaps. Firstly, to alleviate the competitive conflict between subtasks, we propose a shuffle involution decoupling (SID) module, which constructs task-specific features by focusing on local interaction information and global long-range dependencies of key points. Secondly, the nearest neighbor information enhancement (NNIE) strategy is proposed to reduce the ambiguity between similar instances by leveraging the adjacency key point information gap. As a bonus, our proposed IGTracker achieves competitive performance compared to various existing methods on the MOTChallenge benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The datasets analyzed during the current study are available in the Ref [10, 11, 69]

References

  1. Ahmed I, Ahmad M, Ahmad A, Jeon G (2021) Top view multiple people tracking by detection using deep sort and yolov3 with transfer learning: within 5g infrastructure. Int J Mach Learn Cybern 12:3053–3067

    Article  Google Scholar 

  2. Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S, Aggarwal J, Lee H, Davis L (2011) A large-scale benchmark dataset for event recognition in surveillance video. In: CVPR 2011, pp. 3153–3160. IEEE

  3. Shao C, Yang Y, Juneja SG, Seetharam T (2022) Iot data visualization for business intelligence in corporate finance. Inf Process Manag 59(1):102736

    Article  Google Scholar 

  4. Liu Z, Zhang O, Gao Y, Zhao Y, Sun Y, Liu J (2022) Adaptive neural network-based fixed-time control for trajectory tracking of robotic systems. IEEE Trans Circ Syst II Express Briefs 70(1):241–245

    Google Scholar 

  5. Tan S, Yang J, Ding H (2023) A prediction and compensation method of robot tracking error considering pose-dependent load decomposition. Robot Comput-Integr Manuf 80:102476

    Article  Google Scholar 

  6. Janai J, Güney F, Behl A, Geiger A (2020) Computer vision for autonomous vehicles: problems, datasets and state of the art. Found Trends® Comput Graph Vis 12(1–3):1–308

  7. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454

  8. Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645

  9. Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking

  10. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) MOT16: A benchmark for multi-object tracking

  11. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) Mot20: A benchmark for multi object tracking in crowded scenes

  12. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE

  13. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE

  14. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 36–42. Springer

  15. Zhou Q, Zhong B, Zhang Y, Li J, Fu Y (2018) Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans Multimedia 21(5):1183–1194

    Article  Google Scholar 

  16. Dai P, Wang X, Zhang W, Chen J (2018) Instance segmentation enabled hybrid data association and discriminative hashing for online multi-object tracking. IEEE Trans Multimedia 21(7):1709–1723

    Article  Google Scholar 

  17. Tan K, Xu T-B, Wei Z (2022) Online visual tracking via background-aware siamese networks. Int J Mach Learn Cybern 13(10):2825–2842

    Article  Google Scholar 

  18. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. IEEE

  19. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 878–885. IEEE

  20. Chen L, Liu H, Mo J, Zhang D, Yang J, Lin F, Zheng Z, Jia R (2022) Cross channel aggregation similarity network for salient object detection. Int J Mach Learn Cybern 13(8):2153–2169

    Article  Google Scholar 

  21. He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15013–15022

  22. Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14930–14939

  23. Luo X, Jiang M, Kong J (2022) Selective relation-aware representations for person re-identification. Int J Mach Learn Cybern 13(11):3523–3541

    Article  Google Scholar 

  24. Zhang X, Cheng L, Li B, Hu H-M (2018) Too far to see? not really!-pedestrian detection with scale-aware localization policy. IEEE Trans Image Process 27(8):3703–3715. https://doi.org/10.1109/TIP.2018.2818018

    Article  MathSciNet  Google Scholar 

  25. Xiaowei Z, Jianwei M, Hong L, Hai-Miao H, Peng Y (2022) Dual attentional siamese network for visual tracking. Displays: Technology and Applications

  26. Zhang X, Li L, Liu H, Yang P, Gao Y(2022) Disentangling classification and regression in siamese-based network for visual tracking. Concurrency and Computation: Practice and Experience 34

  27. Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vision 129:3069–3087

    Article  Google Scholar 

  28. Li M, Wu J, Wang X, Chen C, Qin J, Xiao X, Wang R, Zheng M, Pan X (2023) AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

  29. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122. Springer

  30. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951

  31. Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678

  32. Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans Image Process 31:3182–3196

    Article  Google Scholar 

  33. Yu E, Li Z, Han S, Wang H (2022) Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Transactions on Multimedia

  34. Zhou C, Jiang M, Kong J (2023) Bgtracker: Cross-task bidirectional guidance strategy for multiple object tracking. IEEE Transactions on Multimedia 25, 8132–8144 https://doi.org/10.1109/TMM.2023.3256761

  35. Mo E, Kong J, Jiang M, Liu T (2023) Motion information supplement for joint detection and embedding tracking. J Electron Imaging 32(5):053007–053007

    Article  Google Scholar 

  36. Liu J, Kong J, Jiang M, Liu T (2023) Caltracker: Cross-task association learning for multiple object tracking. IEEE Signal Processing Letters 30, 1622–1626 https://doi.org/10.1109/LSP.2023.3329419

  37. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578

  38. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666

  39. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750

  40. Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10519–10528

  41. Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. In: European Conference on Computer Vision, pp. 399–416. Springer

  42. Lan S, Ren Z, Wu Y, Davis LS, Hua G (2020) Saccadenet: A fast and accurate object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10397–10406

  43. Gao T, Pan H, Wang Z, Gao H (2021) A crf-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimedia 24:995–1007

    Article  Google Scholar 

  44. Guo M, Haque A, Huang D-A, Yeung S, Fei-Fei L (2018) Dynamic task prioritization for multitask learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–287

  45. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28

  46. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137

  47. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement

  48. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer

  49. Kalman RE (1960) A new approach to linear filtering and prediction problems

  50. Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE

  51. Bochinski E, Senst T, Sikora T (2018) Extending iou based multi-object tracking by visual information. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE

  52. Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417

  53. Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8136–8145

  54. Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 7942–7951

  55. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969

  56. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490. Springer

  57. Xia Z, Pan X, Song S, Li LE, Huang G (2022) Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4794–4803

  58. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30

  59. Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412

  60. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773

  61. Kuhn HW (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2(1–2):83–97

    Article  MathSciNet  Google Scholar 

  62. Kuncheva LI (2010) Full-class set classification using the hungarian algorithm. Int J Mach Learn Cybern 1(1–4):53–61

    Article  Google Scholar 

  63. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722

  64. Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485

    Google Scholar 

  65. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258

  66. Zhang Q-L, Yang Y-B (2021) Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE

  67. Li D, Hu J, Wang C, Li X, She Q, Zhu L, Zhang T, Chen Q (2021) Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330

  68. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988

  69. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd

  70. Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE

  71. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221

  72. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. IEEE

  73. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424

  74. Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376

  75. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008:1–10

    Article  Google Scholar 

  76. Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548–578

    Article  Google Scholar 

  77. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer

  78. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization

  79. Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6308–6318

  80. Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715. IEEE

  81. Zhou X, Yin T, Koltun V, Krähenbühl P (2022) Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780

  82. Lee S-H, Park D-H, Bae S-H (2023) Decode-mot: How can we hurdle frames to go beyond tracking-by-detection? IEEE Transactions on Image Processing 32, 4378–4392 https://doi.org/10.1109/TIP.2023.3298538

  83. Fukui H, Miyagawa T, Morishita Y (2023) Multi-object tracking as attention mechanism. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 505–509. https://doi.org/10.1109/ICIP49359.2023.10222207

  84. Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10958–10967

  85. Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333–347

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (62371209, 62371208), China Postdoctoral Science Foundation (2015M581720, 2016M600360), 111 Projects under Grant B12018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Kong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Kong, J., Jiang, M. et al. Igtracker: task and instance information gaps in multiple object tracking. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02182-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02182-8

Keywords

Navigation