Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric

Gao, Tianze; Jia, Zhixiang; Lin, Weiyang; Li, Yu

doi:10.1007/s10489-022-03432-4

Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric

Published: 20 April 2022

Volume 53, pages 746–756, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tianze Gao¹,
Zhixiang Jia^1,2,
Weiyang Lin ORCID: orcid.org/0000-0002-0493-1289¹ &
…
Yu Li³

Abstract

Acquiring 3D trajectories of on-road vehicles is an essential visual task for autonomous driving systems. Existing 3D vehicle tracking methods either rely on point cloud data or need to be trained on visual tracking datasets. In comparison, a decoupled monocular 3D vehicle tracking framework is proposed in this paper. Because our framework is the first of its kind, a previous decoupled LiDAR-based method is taken as the baseline by substituting its detector with a monocular one. On this foundation, we further employ global coordinates to cancel out ego motion and introduce the angular rate into the 3D Kalman filter. In order to tackle the problem of long-term association, a trajectory management scheme is proposed with our novel hibernation mechanism. Furthermore, it is pointed out that current monocular 3D tracking methods have not been tailored for the depth estimation uncertainty produced by monocular 3D detectors. In this regard, we propose a depth-aware association strategy which endows remoter vehicles with larger matching regions in the data association stage. As another contribution, we discuss the defects of current metrics for evaluating 3D tracking performance and devise a nonuniform metric which is dedicated to monocular vision. Through extensive experiments conducted on the KITTI tracking benchmark, the superiority of proposed monocular 3D vehicle tracking framework and metric is demonstrated by both quantitative results and qualitative intuition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PLI-VIO: Real-time Monocular Visual-inertial Odometry Using Point and Line Interrelated Features

Article 09 May 2023

Dynamic vehicle pose estimation and tracking based on motion feedback for LiDARs

Article 07 May 2022

Behavior analysis of distant vehicles using LIDAR point cloud

Article 16 February 2018

References

Kim A, Ošep A, Leal-Taixé L (2021) Eagermot: 3d multi-object tracking via sensor fusion. arXiv:2104.14682
Wu H, Han W, Wen C, Li X, Wang C (2021) 3d multi-object tracking in point clouds based on prediction confidence-guided data association. IEEE Trans Intell Transp Syst
Chaabane M, Zhang P, Beveridge JR, O’Hara S (2021) Deft: Detection embeddings for tracking. arXiv:2102.02267
Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision, Springer, pp 474–490
Weng X, Kitani K (2019) A baseline for 3d multi-object tracking. arXiv:1907.03961
Kuhn HW (1955) The hungarian method for the assignment problem, vol 2
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing 2008:1–10
Article Google Scholar
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, Springer, pp 17–35
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3354–3361
Brazil G, Liu X (2019) M3d-rpn: Monocular 3d region proposal network for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9287–9296
Chabot F, Chaouch M, Rabarisoa J, Teuliere C, Chateau T (2017) Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2040–2049
He T, Soatto S (2019) Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8409–8416
Manhardt F, Kehl W, Gaidon A (2019) Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2069–2078
Qin Z, Wang J, Lu Y (2019) Monogrnet: A geometric reasoning network for monocular 3d object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8851–8858
Simonelli A, Bulo SR, Porzi L, Antequera ML, Kontschieder P (2020) Disentangling monocular 3d object detection: From single to multi-class recognition. IEEE Trans Pattern Anal Mach Intell
Chen Y, Tai L, Sun K, Li M (2020) Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12093–12102
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Gao T, Pan H, Gao H (2020) Monocular 3d object detection with sequential feature association and depth hint augmentation. arXiv:2011.14589
Liu Z, Wu Z, Tóth R (2020) Smoke: Single-stage monocular 3d object detection via keypoint estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 996–997
Ma X, Zhang Y, Xu D, Zhou D, Yi S, Li H, Ouyang W (2021) Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4721–4730
Li P, Zhao H (2021) Monocular 3d detection with geometric constraint embedding and semi-supervised training. IEEE Robotics and Automation Letters 6(3):5565–5572
Article Google Scholar
Li P, Zhao H, Liu P, Cao F (2020) Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, Springer, pp 644–660
Cai Y, Li B, Jiao Z, Li H, Zeng X, Wang X (2020) Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 10478–10485
Ding M, Huo Y, Yi H, Wang Z, Shi J, Lu Z, Luo P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 1000–1001
Bao W, Xu B, Chen Z (2019) Monofenet: Monocular 3d object detection with feature enhancement networks. IEEE Trans Image Process 29:2753–2765
Article MATH Google Scholar
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell, pp 1–30
Hu H-N, Cai Q-Z, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3d vehicle detection and tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5390–5399
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 941–951
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50(10):3125–3136
Article Google Scholar
Yin G, Yu M, Wang M, Hu Y, Zhang Y (2021) Research on highway vehicle detection based on faster r-cnn and domain adaptation. Appl Intell, pp 1–16
Wang K, Liu M (2021) Yolov3-mt: A yolov3 using multi-target tracking for vehicle visual detection. Appl Intell, pp 1–22
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision 129 (2):548–578
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U1964201.

Author information

Authors and Affiliations

The Research Institute of Intelligent Control and Systems, Harbin Institute of Technology, Harbin, 150001, China
Tianze Gao, Zhixiang Jia & Weiyang Lin
Ningbo Institute of Intelligent Control and Systems, Harbin Institute of Technology, Harbin, 150001, China
Zhixiang Jia
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Yu Li

Authors

Tianze Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiang Jia
View author publications
You can also search for this author in PubMed Google Scholar
Weiyang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiyang Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, T., Jia, Z., Lin, W. et al. Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric. Appl Intell 53, 746–756 (2023). https://doi.org/10.1007/s10489-022-03432-4

Download citation

Accepted: 22 February 2022
Published: 20 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03432-4

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric

Abstract

Access this article

Similar content being viewed by others

PLI-VIO: Real-time Monocular Visual-inertial Odometry Using Point and Line Interrelated Features

Dynamic vehicle pose estimation and tracking based on motion feedback for LiDARs

Behavior analysis of distant vehicles using LIDAR point cloud

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric

Abstract

Access this article

Similar content being viewed by others

PLI-VIO: Real-time Monocular Visual-inertial Odometry Using Point and Line Interrelated Features

Dynamic vehicle pose estimation and tracking based on motion feedback for LiDARs

Behavior analysis of distant vehicles using LIDAR point cloud

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation