Skip to main content
Log in

CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Most current online multi-object tracking (MOT) methods include two steps: object detection and data association, where the data association step relies on both object feature extraction and affinity computation. This often leads to additional computation cost, and degrades the efficiency of MOT methods. In this paper, we combine the object detection and data association module in a unified framework, while getting rid of the extra feature extraction process, to achieve a better speed-accuracy trade-off for MOT. Considering that a pedestrian is the most common object category in real-world scenes and has particularity characteristics in objects relationship and motion pattern, we present a novel yet efficient one-stage pedestrian detection and tracking method, named CGTracker. In particular, CGTracker detects the pedestrian target as the center point of the object, and directly extracts the object features from the feature representation of the object center point, which is used to predict the axis-aligned bounding box. Meanwhile, the detected pedestrians are constructed as an object graph to facilitate the multi-object association process, where the semantic features, displacement information and relative position relationship of the targets between two adjacent frames are used to perform the reliable online tracking. CGTracker achieves the multiple object tracking accuracy (MOTA) of 69.3% and 65.3% at 9 FPS on MOT17 and MOT20, respectively. Extensive experimental results under widely-used evaluation metrics demonstrate that our method is one of the best techniques on the leader board for the MOT17 and MOT20 challenges at the time of submission of this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kim C, Li F, Rehg J M. Multi-object tracking with neural gating using bilinear LSTM. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.208-224. 10.1007/978-3-030-01237-3_13.

  2. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple online and realtime tracking. In Proc. the 2016 IEEE International Conference on Image Processing, September 2016, pp.3464-3468. https://doi.org/10.1109/ICIP.2016.7533003.

  3. Tang S, Andriluka M, Andres B, Schiele B. Multiple people tracking by lifted multicut and person re-identification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3701-3710. https://doi.org/10.1109/CVPR.2017.394.

  4. Possegger H, Mauthner T, Roth P M, Bischof H. Occlusion geodesics for online multi-object tracking. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.1306-1313. https://doi.org/10.1109/CVPR.2014.170.

  5. He A, Luo C, Tian X, Zeng W. A twofold Siamese network for real-time object tracking. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.4834-4843. https://doi.org/10.1109/CVPR.2018.00508.

  6. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 2016, 39: 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031.

  7. Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525. https://doi.org/10.1109/CVPR.2017.690.

  8. Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv:1804.02767, 2018. https://arxiv.org/abs/1804.02767, Jan. 2022.

  9. Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934, 2020. https://arxiv.org/abs/2004.10934, April 2022.

  10. Rosebrock A. Intersection over Union (IoU) for object detection. https://pyimagesearch.com/2016/11/07/intersectionover-union-iou-for-object-detection/, July 2021.

  11. Feng X, Xue Y, Wang Y. An object based graph representation for video comparison. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.2548-2552. https://doi.org/10.1109/ICIP.2017.8296742.

  12. Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric. In Proc. the 2017 IEEE International Conference on Image Processing, September 2017, pp.3645-3649. https://doi.org/10.1109/ICIP.2017.8296962.

  13. Yu F, Li W, Li Q, Liu Y, Shi X, Yan J. POI: Multiple object tracking with high performance detection and appearance feature. In Proc. the 14th European Conference on Computer Vision Workshops, October 2016, pp.36-42. https://doi.org/10.1007/978-3-319-48881-3_3.

  14. Sun S, Akhtar N, Song H, Mian A, Shah M. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 104-119. https://doi.org/10.1109/TPAMI.2019.2929520.

    Article  Google Scholar 

  15. Wang Z, Zheng L, Liu Y, Li Y, Wang S. Towards real-time multi-object tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.107-122. https://doi.org/10.1007/978-3-030-58621-8_7.

  16. Lu Z, Rathod V, Votel R, Huang J. RetinaTrack: Online single stage joint detection and tracking. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.14656-14666. https://doi.org/10.1109/CVPR42600.2020.01468.

  17. Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M H. Online multi-object tracking with dual matching attention networks. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.379-396. https://doi.org/10.1007/978-3-030-01228-1_23.

  18. Peng J, Wang C, Wan F, Wu Y, Wang Y, Tai Y, Fu Y. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.145-161. https://doi.org/10.1007/978-3-030-58548-8_9.

  19. Zhou X, Koltun V, Krähenbühl P. Tracking objects as points. In Proc. the 16th European Conference on Computer Vision, August 2020, pp.474-490. https://doi.org/10.1007/978-3-030-58548-8_28.

  20. Zhang Y, Wang C, Wang X, Zeng W, Liu W. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 2021, 129(11): 3069-3087. https://doi.org/10.1007/s11263-021-01513-4.

    Article  Google Scholar 

  21. Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv:1904.07850, 2019. https://arxiv.org/abs/1904.07850, April 2022.

  22. Yu F, Wang D, Shelhamer E, Darrell T. Deep layer aggregation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.2403-2412. https://doi.org/10.1109/CVPR.2018.00255.

  23. Wang X, Liu Z. Salient object detection by optimizing robust background detection. In Proc. the 18th IEEE International Conference on Communication Technology, October 2018, pp.1164-1168. https://doi.org/10.1109/ICCT.2018.8600184.

  24. Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In Proc. the 15th European Conference on Computer Vision, October 2018, pp.765-781. https://doi.org/10.1007/978-3-030-01264-9_45.

  25. Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9626-9635. https://doi.org/10.1109/ICCV.2019.00972.

  26. Neubeck A, van Gool L. Efficient non-maximum suppression. In Proc. the 18th International Conference on Pattern Recognition, August 2006, pp.850-855. https://doi.org/10.1109/ICPR.2006.479.

  27. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831, 2016. https://arxiv.org/abs/1603.00831, Jan. 2022.

  28. Dendorfer P, Rezatofighi H, Milan A et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003, 2020. https://arxiv.org/abs/ 2003.09003, March 2022.

  29. Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645. https://doi.org/10.1109/TPAMI.2009.167.

    Article  Google Scholar 

  30. Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2129-2137. https://doi.org/10.1109/CVPR.2016.234.

  31. Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 2008: Article No. 1. https://doi.org/10.1155/2008/246309.

  32. Luiten J, Ošep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B. HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 2021, 129(2): 548-578. https://doi.org/10.1007/s11263-020-01375-2.

    Article  Google Scholar 

  33. Paszke A, Gross S, Chintala S et al. Automatic differentiation in PyTorch. In Proc. the 31st Conference on Neural Information Processing Systems Workshop, Dec. 2017.

  34. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J. CrowdHuman: A benchmark for detecting human in a crowd. arXiv:1805.00123, 2018. https://arxiv.org/abs/1805.00123, Jan. 2022.

  35. Zhang S, Xie Y, Wan J, Xia H, Li S Z, Guo G. Wider-Person: A diverse dataset for dense pedestrian detection in the wild. IEEE Transactions on Multimedia, 2019, 22(2): 380-393. https://doi.org/10.1109/TMM.2019.2929005.

    Article  Google Scholar 

  36. Zhang S, Benenson R, Schiele B. CityPersons: A diverse dataset for pedestrian detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, June 2017, pp.4457-4465. https://doi.org/10.1109/CVPR.2017.474.

  37. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2022.

  38. Pang B, Li Y, Zhang Y, Li M, Lu C. TubeTK: Adopting tubes to track multi-object in a one-step training model. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.6307-6317. https://doi.org/10.1109/CVPR42600.2020.00634.

  39. Zhang Y, Sheng H, Wu Y, Wang S, Ke W, Xiong Z. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 2021, 7(9): 7892-7902. https://doi.org/10.1109/JIOT.2020.2996609.

    Article  Google Scholar 

  40. Li W, Xiong Y, Yang S, Xu M, Wang Y, Xia W. Semi-TCL: Semi-supervised track contrastive representation learning. arXiv:2107.02396, 2021. https://arxiv.org/abs/2107.02396, Jan. 2022.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Feng.

Supplementary Information

ESM 1

(PDF 396 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, X., Wu, HM., Yin, YH. et al. CGTracker: Center Graph Network for One-Stage Multi-Pedestrian-Object Detection and Tracking. J. Comput. Sci. Technol. 37, 626–640 (2022). https://doi.org/10.1007/s11390-022-2204-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-2204-8

Keywords

Navigation