Skip to main content
Log in

Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Acquiring 3D trajectories of on-road vehicles is an essential visual task for autonomous driving systems. Existing 3D vehicle tracking methods either rely on point cloud data or need to be trained on visual tracking datasets. In comparison, a decoupled monocular 3D vehicle tracking framework is proposed in this paper. Because our framework is the first of its kind, a previous decoupled LiDAR-based method is taken as the baseline by substituting its detector with a monocular one. On this foundation, we further employ global coordinates to cancel out ego motion and introduce the angular rate into the 3D Kalman filter. In order to tackle the problem of long-term association, a trajectory management scheme is proposed with our novel hibernation mechanism. Furthermore, it is pointed out that current monocular 3D tracking methods have not been tailored for the depth estimation uncertainty produced by monocular 3D detectors. In this regard, we propose a depth-aware association strategy which endows remoter vehicles with larger matching regions in the data association stage. As another contribution, we discuss the defects of current metrics for evaluating 3D tracking performance and devise a nonuniform metric which is dedicated to monocular vision. Through extensive experiments conducted on the KITTI tracking benchmark, the superiority of proposed monocular 3D vehicle tracking framework and metric is demonstrated by both quantitative results and qualitative intuition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Kim A, Ošep A, Leal-Taixé L (2021) Eagermot: 3d multi-object tracking via sensor fusion. arXiv:2104.14682

  2. Wu H, Han W, Wen C, Li X, Wang C (2021) 3d multi-object tracking in point clouds based on prediction confidence-guided data association. IEEE Trans Intell Transp Syst

  3. Chaabane M, Zhang P, Beveridge JR, O’Hara S (2021) Deft: Detection embeddings for tracking. arXiv:2102.02267

  4. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490

  5. Weng X, Kitani K (2019) A baseline for 3d multi-object tracking. arXiv:1907.03961

  6. Kuhn HW (1955) The hungarian method for the assignment problem, vol 2

  7. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008:1–10

    Article  Google Scholar 

  8. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, Springer, pp 17–35

  9. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3354–3361

  10. Brazil G, Liu X (2019) M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9287–9296

  11. Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2040–2049

  12. He T, Soatto S (2019) Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8409–8416

  13. Manhardt F, Kehl W, Gaidon A (2019) Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2069–2078

  14. Qin Z, Wang J, Lu Y (2019) Monogrnet: A geometric reasoning network for monocular 3d object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8851–8858

  15. Simonelli A, Bulo SR, Porzi L, Antequera ML, Kontschieder P (2020) Disentangling monocular 3d object detection: From single to multi-class recognition. IEEE Trans Pattern Anal Mach Intell

  16. Chen Y, Tai L, Sun K, Li M (2020) Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12093–12102

  17. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

  18. Gao T, Pan H, Gao H (2020) Monocular 3d object detection with sequential feature association and depth hint augmentation. arXiv:2011.14589

  19. Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3d object detection via keypoint estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 996–997

  20. Ma X, Zhang Y, Xu D, Zhou D, Yi S, Li H, Ouyang W (2021) Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4721–4730

  21. Li P, Zhao H (2021) Monocular 3d detection with geometric constraint embedding and semi-supervised training. IEEE Robotics and Automation Letters 6(3):5565–5572

    Article  Google Scholar 

  22. Li P, Zhao H, Liu P, Cao F (2020) Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, Springer, pp 644–660

  23. Cai Y, Li B, Jiao Z, Li H, Zeng X, Wang X (2020) Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 10478–10485

  24. Ding M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 1000–1001

  25. Bao W, Xu B, Chen Z (2019) Monofenet: Monocular 3d object detection with feature enhancement networks. IEEE Trans Image Process 29:2753–2765

    Article  MATH  Google Scholar 

  26. Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell, pp 1–30

  27. Hu H-N, Cai Q-Z, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3d vehicle detection and tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5390–5399

  28. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  29. Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951

  30. Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412

  31. Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50(10):3125–3136

    Article  Google Scholar 

  32. Yin G, Yu M, Wang M, Hu Y, Zhang Y (2021) Research on highway vehicle detection based on faster r-cnn and domain adaptation. Appl Intell, pp 1–16

  33. Wang K, Liu M (2021) Yolov3-mt: A yolov3 using multi-target tracking for vehicle visual detection. Appl Intell, pp 1–22

  34. Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision 129 (2):548–578

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U1964201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiyang Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, T., Jia, Z., Lin, W. et al. Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric. Appl Intell 53, 746–756 (2023). https://doi.org/10.1007/s10489-022-03432-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03432-4

Keywords

Mathematics Subject Classification (2010)

Navigation