Abstract
Pedestrian multiple object tracking targets to track multiple pedestrian instances in real-time. Recently, the methods based on joint detection and embedding have improved performance by sharing task features. However, it has two obvious shortcomings: inconsistent task information and ambiguous neighbor instance overlap. Hence, the branch tasks information gap and instances information gap need to be carefully addressed. In this paper, IGTracker is proposed as a novel online tracking framework, which bridges different branch task optimization requirements from the perspective of task-specific information gaps and nearest instance information gaps. Firstly, to alleviate the competitive conflict between subtasks, we propose a shuffle involution decoupling (SID) module, which constructs task-specific features by focusing on local interaction information and global long-range dependencies of key points. Secondly, the nearest neighbor information enhancement (NNIE) strategy is proposed to reduce the ambiguity between similar instances by leveraging the adjacency key point information gap. As a bonus, our proposed IGTracker achieves competitive performance compared to various existing methods on the MOTChallenge benchmarks.
Similar content being viewed by others
References
Ahmed I, Ahmad M, Ahmad A, Jeon G (2021) Top view multiple people tracking by detection using deep sort and yolov3 with transfer learning: within 5g infrastructure. Int J Mach Learn Cybern 12:3053–3067
Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S, Aggarwal J, Lee H, Davis L (2011) A large-scale benchmark dataset for event recognition in surveillance video. In: CVPR 2011, pp. 3153–3160. IEEE
Shao C, Yang Y, Juneja SG, Seetharam T (2022) Iot data visualization for business intelligence in corporate finance. Inf Process Manag 59(1):102736
Liu Z, Zhang O, Gao Y, Zhao Y, Sun Y, Liu J (2022) Adaptive neural network-based fixed-time control for trajectory tracking of robotic systems. IEEE Trans Circ Syst II Express Briefs 70(1):241–245
Tan S, Yang J, Ding H (2023) A prediction and compensation method of robot tracking error considering pose-dependent load decomposition. Robot Comput-Integr Manuf 80:102476
Janai J, Güney F, Behl A, Geiger A (2020) Computer vision for autonomous vehicles: problems, datasets and state of the art. Found Trends® Comput Graph Vis 12(1–3):1–308
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B (2020) Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454
Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) MOT16: A benchmark for multi-object tracking
Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) Mot20: A benchmark for multi object tracking in crowded scenes
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468. IEEE
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: Multiple object tracking with high performance detection and appearance feature. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, pp. 36–42. Springer
Zhou Q, Zhong B, Zhang Y, Li J, Fu Y (2018) Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Trans Multimedia 21(5):1183–1194
Dai P, Wang X, Zhang W, Chen J (2018) Instance segmentation enabled hybrid data association and discriminative hashing for online multi-object tracking. IEEE Trans Multimedia 21(7):1709–1723
Tan K, Xu T-B, Wei Z (2022) Online visual tracking via background-aware siamese networks. Int J Mach Learn Cybern 13(10):2825–2842
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. IEEE
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 878–885. IEEE
Chen L, Liu H, Mo J, Zhang D, Yang J, Lin F, Zheng Z, Jia R (2022) Cross channel aggregation similarity network for salient object detection. Int J Mach Learn Cybern 13(8):2153–2169
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15013–15022
Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14930–14939
Luo X, Jiang M, Kong J (2022) Selective relation-aware representations for person re-identification. Int J Mach Learn Cybern 13(11):3523–3541
Zhang X, Cheng L, Li B, Hu H-M (2018) Too far to see? not really!-pedestrian detection with scale-aware localization policy. IEEE Trans Image Process 27(8):3703–3715. https://doi.org/10.1109/TIP.2018.2818018
Xiaowei Z, Jianwei M, Hong L, Hai-Miao H, Peng Y (2022) Dual attentional siamese network for visual tracking. Displays: Technology and Applications
Zhang X, Li L, Liu H, Yang P, Gao Y(2022) Disentangling classification and regression in siamese-based network for visual tracking. Concurrency and Computation: Practice and Experience 34
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int J Comput Vision 129:3069–3087
Li M, Wu J, Wang X, Chen C, Qin J, Xiao X, Wang R, Zheng M, Pan X (2023) AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2020) Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122. Springer
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 941–951
Lu Z, Rathod V, Votel R, Huang J (2020) Retinatrack: Online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678
Liang C, Zhang Z, Zhou X, Li B, Zhu S, Hu W (2022) Rethinking the competition between detection and reid in multiobject tracking. IEEE Trans Image Process 31:3182–3196
Yu E, Li Z, Han S, Wang H (2022) Relationtrack: Relation-aware multiple object tracking with decoupled representation. IEEE Transactions on Multimedia
Zhou C, Jiang M, Kong J (2023) Bgtracker: Cross-task bidirectional guidance strategy for multiple object tracking. IEEE Transactions on Multimedia 25, 8132–8144 https://doi.org/10.1109/TMM.2023.3256761
Mo E, Kong J, Jiang M, Liu T (2023) Motion information supplement for joint detection and embedding tracking. J Electron Imaging 32(5):053007–053007
Liu J, Kong J, Jiang M, Liu T (2023) Caltracker: Cross-task association learning for multiple object tracking. IEEE Signal Processing Letters 30, 1622–1626 https://doi.org/10.1109/LSP.2023.3329419
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10519–10528
Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. In: European Conference on Computer Vision, pp. 399–416. Springer
Lan S, Ren Z, Wu Y, Davis LS, Hua G (2020) Saccadenet: A fast and accurate object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10397–10406
Gao T, Pan H, Wang Z, Gao H (2021) A crf-based framework for tracklet inactivation in online multi-object tracking. IEEE Trans Multimedia 24:995–1007
Guo M, Haque A, Huang D-A, Yeung S, Fei-Fei L (2018) Dynamic task prioritization for multitask learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–287
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer
Kalman RE (1960) A new approach to linear filtering and prediction problems
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE
Bochinski E, Senst T, Sikora T (2018) Extending iou based multi-object tracking by visual information. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE
Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417
Guo S, Wang J, Wang X, Tao D (2021) Online multiple object tracking with cross-task synergy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8136–8145
Voigtlaender P, Krause M, Osep A, Luiten J, Sekar BBG, Geiger A, Leibe B (2019) Mots: Multi-object tracking and segmentation. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 7942–7951
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, pp. 474–490. Springer
Xia Z, Pan X, Song S, Li LE, Huang G (2022) Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4794–4803
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773
Kuhn HW (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2(1–2):83–97
Kuncheva LI (2010) Full-class set classification using the hungarian algorithm. Int J Mach Learn Cybern 1(1–4):53–61
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722
Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258
Zhang Q-L, Yang Y-B (2021) Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE
Li D, Hu J, Wang C, Li X, She Q, Zhu L, Zhang T, Chen Q (2021) Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE
Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311. IEEE
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008:1–10
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548–578
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization
Pang B, Li Y, Zhang Y, Li M, Lu C (2020) Tubetk: Adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6308–6318
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715. IEEE
Zhou X, Yin T, Koltun V, Krähenbühl P (2022) Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780
Lee S-H, Park D-H, Bae S-H (2023) Decode-mot: How can we hurdle frames to go beyond tracking-by-detection? IEEE Transactions on Image Processing 32, 4378–4392 https://doi.org/10.1109/TIP.2023.3298538
Fukui H, Miyagawa T, Morishita Y (2023) Multi-object tracking as attention mechanism. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 505–509. https://doi.org/10.1109/ICIP49359.2023.10222207
Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10958–10967
Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333–347
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (62371209, 62371208), China Postdoctoral Science Foundation (2015M581720, 2016M600360), 111 Projects under Grant B12018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, J., Kong, J., Jiang, M. et al. Igtracker: task and instance information gaps in multiple object tracking. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02182-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02182-8