CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

Feng, Xin; Wu, Hao-Ming; Yin, Yi-Hao; Lan, Li-Bin

doi:10.1007/s11390-022-2204-8

CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

Regular Paper
Published: 31 May 2022

Volume 37, pages 626–640, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xin Feng¹,
Hao-Ming Wu¹,
Yi-Hao Yin¹ &
…
Li-Bin Lan¹

278 Accesses
2 Citations
Explore all metrics

Abstract

Most current online multi-object tracking (MOT) methods include two steps: object detection and data association, where the data association step relies on both object feature extraction and affinity computation. This often leads to additional computation cost, and degrades the efficiency of MOT methods. In this paper, we combine the object detection and data association module in a unified framework, while getting rid of the extra feature extraction process, to achieve a better speed-accuracy trade-off for MOT. Considering that a pedestrian is the most common object category in real-world scenes and has particularity characteristics in objects relationship and motion pattern, we present a novel yet efficient one-stage pedestrian detection and tracking method, named CGTracker. In particular, CGTracker detects the pedestrian target as the center point of the object, and directly extracts the object features from the feature representation of the object center point, which is used to predict the axis-aligned bounding box. Meanwhile, the detected pedestrians are constructed as an object graph to facilitate the multi-object association process, where the semantic features, displacement information and relative position relationship of the targets between two adjacent frames are used to perform the reliable online tracking. CGTracker achieves the multiple object tracking accuracy (MOTA) of 69.3% and 65.3% at 9 FPS on MOT17 and MOT20, respectively. Extensive experimental results under widely-used evaluation metrics demonstrate that our method is one of the best techniques on the leader board for the MOT17 and MOT20 challenges at the time of submission of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Kim C, Li F, Rehg J M. Multi-object tracking with neural gating using bilinear LSTM. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.208-224. 10.1007/978-3-030-01237-3_13.
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple online and realtime tracking. In Proc. the 2016 IEEE International Conference on Image Processing, September 2016, pp.3464-3468. https://doi.org/10.1109/ICIP.2016.7533003.
Tang S, Andriluka M, Andres B, Schiele B. Multiple people tracking by lifted multicut and person re-identification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3701-3710. https://doi.org/10.1109/CVPR.2017.394.
Possegger H, Mauthner T, Roth P M, Bischof H. Occlusion geodesics for online multi-object tracking. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1306-1313. https://doi.org/10.1109/CVPR.2014.170.
He A, Luo C, Tian X, Zeng W. A twofold Siamese network for real-time object tracking. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.4834-4843. https://doi.org/10.1109/CVPR.2018.00508.
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 2016, 39: 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525. https://doi.org/10.1109/CVPR.2017.690.
Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv:1804.02767, 2018. https://arxiv.org/abs/1804.02767, Jan. 2022.
Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934, 2020. https://arxiv.org/abs/2004.10934, April 2022.
Rosebrock A. Intersection over Union (IoU) for object detection. https://pyimagesearch.com/2016/11/07/intersectionover-union-iou-for-object-detection/, July 2021.
Feng X, Xue Y, Wang Y. An object based graph representation for video comparison. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2548-2552. https://doi.org/10.1109/ICIP.2017.8296742.
Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.3645-3649. https://doi.org/10.1109/ICIP.2017.8296962.
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J. POI: Multiple object tracking with high performance detection and appearance feature. In Proc. the 14th European Conference on Computer Vision Workshops, October 2016, pp.36-42. https://doi.org/10.1007/978-3-319-48881-3_3.
Sun S, Akhtar N, Song H, Mian A, Shah M. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 104-119. https://doi.org/10.1109/TPAMI.2019.2929520.
Article Google Scholar
Wang Z, Zheng L, Liu Y, Li Y, Wang S. Towards real-time multi-object tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.107-122. https://doi.org/10.1007/978-3-030-58621-8_7.
Lu Z, Rathod V, Votel R, Huang J. RetinaTrack: Online single stage joint detection and tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.14656-14666. https://doi.org/10.1109/CVPR42600.2020.01468.
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M H. Online multi-object tracking with dual matching attention networks. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.379-396. https://doi.org/10.1007/978-3-030-01228-1_23.
Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Fu Y. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.145-161. https://doi.org/10.1007/978-3-030-58548-8_9.
Zhou X, Koltun V, Krähenbühl P. Tracking objects as points. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.474-490. https://doi.org/10.1007/978-3-030-58548-8_28.
Zhang Y, Wang C, Wang X, Zeng W, Liu W. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 2021, 129(11): 3069-3087. https://doi.org/10.1007/s11263-021-01513-4.
Article Google Scholar
Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv:1904.07850, 2019. https://arxiv.org/abs/1904.07850, April 2022.
Yu F, Wang D, Shelhamer E, Darrell T. Deep layer aggregation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.2403-2412. https://doi.org/10.1109/CVPR.2018.00255.
Wang X, Liu Z. Salient object detection by optimizing robust background detection. In Proc. the 18th IEEE International Conference on Communication Technology, October 2018, pp.1164-1168. https://doi.org/10.1109/ICCT.2018.8600184.
Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.765-781. https://doi.org/10.1007/978-3-030-01264-9_45.
Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9626-9635. https://doi.org/10.1109/ICCV.2019.00972.
Neubeck A, van Gool L. Efficient non-maximum suppression. In Proc. the 18th International Conference on Pattern Recognition, August 2006, pp.850-855. https://doi.org/10.1109/ICPR.2006.479.
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831, 2016. https://arxiv.org/abs/1603.00831, Jan. 2022.
Dendorfer P, Rezatofighi H, Milan A et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003, 2020. https://arxiv.org/abs/ 2003.09003, March 2022.
Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645. https://doi.org/10.1109/TPAMI.2009.167.
Article Google Scholar
Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2129-2137. https://doi.org/10.1109/CVPR.2016.234.
Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 2008: Article No. 1. https://doi.org/10.1155/2008/246309.
Luiten J, Ošep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B. HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 2021, 129(2): 548-578. https://doi.org/10.1007/s11263-020-01375-2.
Article Google Scholar
Paszke A, Gross S, Chintala S et al. Automatic differentiation in PyTorch. In Proc. the 31st Conference on Neural Information Processing Systems Workshop, Dec. 2017.
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J. CrowdHuman: A benchmark for detecting human in a crowd. arXiv:1805.00123, 2018. https://arxiv.org/abs/1805.00123, Jan. 2022.
Zhang S, Xie Y, Wan J, Xia H, Li S Z, Guo G. Wider-Person: A diverse dataset for dense pedestrian detection in the wild. IEEE Transactions on Multimedia, 2019, 22(2): 380-393. https://doi.org/10.1109/TMM.2019.2929005.
Article Google Scholar
Zhang S, Benenson R, Schiele B. CityPersons: A diverse dataset for pedestrian detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, June 2017, pp.4457-4465. https://doi.org/10.1109/CVPR.2017.474.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2022.
Pang B, Li Y, Zhang Y, Li M, Lu C. TubeTK: Adopting tubes to track multi-object in a one-step training model. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6307-6317. https://doi.org/10.1109/CVPR42600.2020.00634.
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 2021, 7(9): 7892-7902. https://doi.org/10.1109/JIOT.2020.2996609.
Article Google Scholar
Li W, Xiong Y, Yang S, Xu M, Wang Y, Xia W. Semi-TCL: Semi-supervised track contrastive representation learning. arXiv:2107.02396, 2021. https://arxiv.org/abs/2107.02396, Jan. 2022.

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Chongqing University of Technology, Chongqing, 400054, China
Xin Feng, Hao-Ming Wu, Yi-Hao Yin & Li-Bin Lan

Authors

Xin Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Ming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hao Yin
View author publications
You can also search for this author in PubMed Google Scholar
Li-Bin Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Feng.

Supplementary Information

ESM 1

(PDF 396 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, X., Wu, HM., Yin, YH. et al. CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking. J. Comput. Sci. Technol. 37, 626–640 (2022). https://doi.org/10.1007/s11390-022-2204-8

Download citation

Received: 03 February 2022
Accepted: 06 May 2022
Published: 31 May 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11390-022-2204-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation