3D Object Detection in Autonomous Driving

Yun, Peng; Liu, Yuxuan; Yan, Xiaoyang; Li, Jiahang; Wang, Jiachen; Tai, Lei; Jin, Na; Fan, Rui; Liu, Ming

doi:10.1007/978-981-99-4287-9_5

Peng Yun¹⁵,
Yuxuan Liu¹⁵,
Xiaoyang Yan¹⁵,
Jiahang Li¹⁶,
Jiachen Wang¹⁷,
Lei Tai¹⁸,
Na Jin¹⁵,
Rui Fan¹⁶ &
…
Ming Liu¹⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

885 Accesses

Abstract

3D object detection is an important perception module in autonomous driving systems. It recognizes sensor observations and predicts locations, sizes and orientations of key objects, which provides both semantic and spatial information for high-level decision making. In this chapter, we first introduce and analyze the properties of commonly used perceptual sensors in autonomous vehicles: cameras, LiDARs and RADARs. Then we define the research problem, detail the assumptions and introduce evaluation metrics of 3D object detection in the context of autonomous driving. The main body reviews the state-of-the-art techniques and categorize them into camera-based, LiDAR-based, RADAR-based and multi-sensor fusion methods. For each method, we point out the main problems and their existing solutions. By analyzing the limitations of existing methods, we propose promising directions and open problems for future research.

Peng Yun, Yuxuan Liu, Xiaoyang Yan—equal contribution.

Rui Fan, Ming Liu—co-corresponding author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is noted that, different from the method calculating with intersection over union in the KITTI dataset, nuScenes calculate the \(AP_{BEV}\) with the 2D center distance on the ground plane.
2.
If a point cloud is partitioned into a [10,400,352] dense grid, only around 5300 voxels are non-empty.

References

He K et al (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
Google Scholar
Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448
Google Scholar
Liu W et al (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV). Springer, pp 21–37
Google Scholar
Redmon J et al (2018) YOLOv3: an incremental improvement. Computing research repository (CoRR). https://arxiv.org/abs/1804.02767
Zhang C, et al (2018) Robust LIDAR localization for autonomous driving in rain. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3409–3415
Google Scholar
Arnold E et al (2019) A survey on 3D object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst (TITS) 3782–3795
Google Scholar
Guo Y et al (2020) Deep learning for 3D point clouds: a survey. IEEE Trans Pattern Anal Mach Intell (TPAMI) 43(12):4338–4364
Article Google Scholar
Alaba SY et al (2022) A survey on deep-learning-based LiDAR 3D object detection for autonomous driving. Sensors 22(24):9577. https://www.mdpi.com/1424-8220/22/24/9577
Chen X et al (2018) 3D object proposals using stereo imagery for accurate object class detection. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(5):1259–1272
Article Google Scholar
Xiaozhi C et al (2016) Monocular 3D object detection for autonomous driving. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2147–2156
Google Scholar
Ku J et al (2018) Joint 3D proposal generation and object detection from view aggregation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–8
Google Scholar
Caesar H et al (2020) nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11 621–11 631
Google Scholar
Fan R et al (2017) Real-time implementation of stereo vision based on optimised normalised cross-correlation and propagated search range on a GPU. In: IEEE international conference on imaging systems and techniques (IST), pp 1–6
Google Scholar
Chang J-R et al (2018) Pyramid stereo matching network. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5410–5418
Google Scholar
Mayer N et al (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4040–4048
Google Scholar
Xiang Y et al (2015) Data-driven 3D voxel patterns for object category recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1903–1911
Google Scholar
Li P et al (2019) Stereo r-CNN based 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7644–7652
Google Scholar
Geronimo D et al (2010) Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(7):1239–1258
Article Google Scholar
Kim J et al (2018) Robust camera lidar sensor fusion via deep gated information fusion network. In: IEEE intelligent vehicles symposium (IV), pp 1620–1625
Google Scholar
Geiger A et al (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3354–3361
Google Scholar
Sun P et al (2020) Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2446–2454
Google Scholar
Simon M et al (2018) Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In: European conference on computer vision (ECCV). Springer, pp 197–200
Google Scholar
Zhou Y et al (2018) VoxelNet: end-to-end learning for point cloud based 3D object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4490–4499
Google Scholar
Qi CR et al (2018) Frustum point nets for 3D object detection from RGB-D data. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 918–927
Google Scholar
Liang M et al (2018) Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 641–656
Google Scholar
Redondo-Cabrera CO (2016) Pose estimation errors, the ultimate diagnosis. In: European conference on computer vision (ECCV). Springer, pp 118–134
Google Scholar
Mousavian A et al (2017) 3D bounding box estimation using deep learning and geometry. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5632–5640
Google Scholar
Liu Y et al (2021) YOLOStereo3D: a step back to 2D for efficient stereo 3D detection. In: International conference on robotics and automation (ICRA). IEEE, pp 13 018–13 024
Google Scholar
Chen Y et al (2020) MonoPair: monocular 3D object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12 093–12 102
Google Scholar
Vaswani A et al (2017) Attention is all you need. In: Advances in neural information processing systems (NeurIPS), vol 30
Google Scholar
Carion N et al (2020) End-to-end object detection with transformers. In: European conference on computer vision (ECCV). Springer, pp 213–229
Google Scholar
Huang K-C et al (2022) MonoDTR: monocular 3D object detection with depth-aware transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4012–4021
Google Scholar
Wang L et al (2021) Depth-conditioned dynamic message propagation for monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 454–463
Google Scholar
Park D et al (2021) Is pseudo-lidar needed for monocular 3D object detection? In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3142–3152
Google Scholar
Wang Y et al (2018) Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. Computing research repository (CoRR), vol abs/1812.07179. https://arxiv.org/abs/1812.07179
Li P et al (2021) Monocular 3D detection with geometric constraints embedding and semi-supervised training. IEEE Robot Autom Lett (RAL) 6(3):5565–5572
Article Google Scholar
Zhang Y et al (2021) Objects are different: flexible monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3289–3298
Google Scholar
Brazil G et al (2019) M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 9287–9296
Google Scholar
Liu Y et al (2021) Ground-aware monocular 3D object detection for autonomous driving. IEEE Robot Autom Lett (RAL), pp 919–926
Google Scholar
Lu Y et al (2021) Geometry uncertainty projection network for monocular 3D object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3111–3121
Google Scholar
You Y et al (2019) Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving. Computing research repository (CoRR). https://arxiv.org/abs/1906.06310
Vianney JMU et al (2019) RefinedMPL: refined monocular PseudoLiDAR for 3D object detection in autonomous driving. Computing research repository (CoRR). https://arxiv.org/abs/1911.09712
Qian R et al (2020) End-to-end pseudo-LiDAR for image-based 3D object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 5881–5890
Google Scholar
Li C et al (2020) Confidence guided stereo 3D object detection with split depth estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5776–5783
Google Scholar
Philion J et al (2020) Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 194–210
Google Scholar
Chen Y et al (2022) DSGN++: exploiting visual-spatial relation for stereo-based 3D detectors. IEEE Trans Pattern Anal Mach Intell (TPAMI) 1–14
Google Scholar
Chen Y et al (2020) DSGN: deep stereo geometry network for 3D object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 12 536–12 545
Google Scholar
Guo X et al (2021) LIGA-Stereo: learning LiDAR geometry aware representations for stereo-based 3D detector. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3153–3163
Google Scholar
Reading C et al (2021) Categorical depth distribution network for monocular 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8555–8564
Google Scholar
Liu Z et al (2022) BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. Computing research repository (CoRR). https://arxiv.org/abs/2205.13542
Liu Y et al (2022) Petr: position embedding transformation for multi-view 3d object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). Springer, pp 531–548
Google Scholar
Li Z et al (2022) Bevformer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In: Proceedings of the European conference on computer vision (ECCV). Springer, pp 1–18
Google Scholar
Xia Z et al (2022) Vision transformer with deformable attention. Computing research repository (CoRR). https://arxiv.org/abs/2201.00520
Ma X et al (2019) Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: IEEE/CVF international conference on computer vision (ICCV), pp 6850–6859
Google Scholar
Beltrán J et al (2018) BirdNet: a 3D object detection framework from LiDAR information. In: International conference on intelligent transportation systems (ITSC), pp 3517–3523
Google Scholar
Yang B et al (2018) PIXOR: real-time 3D object detection from point clouds. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7652–7660
Google Scholar
Lang AH et al (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12 697–12 705
Google Scholar
Li B (2017) 3D fully convolutional network for vehicle detection in point cloud. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1513–1518
Google Scholar
Yan Y et al (2018) SECOND: sparsely embedded convolutional detection. Sensors 18(10):3337
Article Google Scholar
Yin T et al (2021) Center-based 3D object detection and tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11 784–11 793
Google Scholar
He C et al (2020) Structure aware single-stage 3D object detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11 873–11 882
Google Scholar
Wu Z et al (2021) CIA-SSD: confident IoU-aware single-stage object detector from point cloud. Proc AAAI Conf Artif Intell (AAAI) 35(4):3555–3562
Google Scholar
Ye D et al (2022) LidarMutliNet: unifying LiDAR semantic segmentation, 3D object detection, and panoptic segmentation in a single multi-task network. Computing research repository (CoRR). https://arxiv.org/abs/2206.11428
Lin T-Y et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
Google Scholar
Graham B et al (2017) Submanifold sparse convolutional networks. Computing research repository (CoRR). https://arxiv.org/abs/1706.01307
Zhou X et al (2020) Tracking objects as points. In: European conference on computer vision (ECCV). Springer, pp 474–490
Google Scholar
Zhou X et al (2019) Objects as points. Computing research repository (CoRR). https://arxiv.org/abs/1904.07850
Teichmann M et al (2018) MultiNet: real-time joint semantic reasoning for autonomous driving. In: IEEE intelligent vehicles symposium (IV). IEEE, pp. 1013–1020
Google Scholar
Gkioxari G et al (2019) Mesh R-CNN. In: Proceedings of the IEEE/CVF international conference on computer vision (CVPR), pp 9785–9795
Google Scholar
Xu Q et al (2022) Behind the curtain: learning occluded shapes for 3D object detection. Proc AAAI Conf Artif Intell (AAAI) 36(3):2893–2901
MathSciNet Google Scholar
Qi CR et al (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85
Google Scholar
Qi C et al (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems (NeurIPS), pp 5099–5108
Google Scholar
Li J et al (2018) SO-Net: self-organizing network for point cloud analysis. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9397–9406
Google Scholar
Wang Y et al (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph (TOG) 38(5):1–12
Article Google Scholar
Shi S et al (2019) PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–779
Google Scholar
Qi CR et al (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 9277–9286
Google Scholar
Zhang Y et al (2022) Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18 953–18 962
Google Scholar
Shi S et al (2020) PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10 529–10 538
Google Scholar
Yang Z et al (2019) STD: sparse-to-dense 3D object detector for point cloud. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1951–1960
Google Scholar
Noh J et al (2021) HVPR: hybrid voxel-point representation for single-stage 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14 605–14 614
Google Scholar
Deng J et al (2021) Voxel R-CNN: towards high performance voxel-based 3D object detection. Proc AAAI Conf Artif Intell (AAAI) 35(2):1201–1209
Google Scholar
Bartsch A et al (2012) Pedestrian recognition using automotive radar sensors. Adv Radio Sci 10(B.2), 45–55
Google Scholar
Patel K et al (2019) Deep learning-based object classification on automotive radar spectra. In: IEEE radar conference (RadarConf), pp 1–6
Google Scholar
Scheiner N et al (2020) Off-the-shelf sensor vs. experimental radar How much resolution is necessary in automotive radar classification?. In: International conference on information fusion (FUSION), pp 1–8
Google Scholar
Schumann O et al (2019) Scene Understanding With Automotive Radar. IEEE Trans Intell Veh (TIV) 5(2):188–203
Article Google Scholar
Danzer A et al (2019) 2d car detection in radar data with pointnets. In: IEEE intelligent transportation systems conference (ITSC). IEEE, pp 61–66
Google Scholar
Dreher M et al (2020) Radar-based 2D car detection using deep neural networks. In: International conference on intelligent transportation systems (ITSC). IEEE, pp 1–8
Google Scholar
Scheiner N et al (2021) Object detection for automotive radar point clouds - a comparison. AI Perspect 3(1):1–23
Article MathSciNet Google Scholar
Chen X et al (2017) Multi-view 3D object detection network for autonomous driving. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6526–6534
Google Scholar
Zhang G et al (2019) Object detection and 3D estimation via an FMCW radar using a fully convolutional network. Computing research repository (CoRR). https://arxiv.org/abs/1902.05394
Sindagi VA et al (2019) Mvx-net: multimodal voxelnet for 3d object detection. In: International conference on robotics and automation (ICRA). IEEE, pp 7276–7282
Google Scholar
Nabati R et al (2021) Center fusion: center-based radar and camera fusion for 3D object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 1527–1536
Google Scholar
Li Y et al (2022) DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 17 182–17 191
Google Scholar
Bai X et al (2022) TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1090–1099
Google Scholar
Yang Z et al (2022) DeepInteraction: 3D object detection via modality interaction. Computing research repository (CoRR). https://arxiv.org/abs/2208.11112
Qian K et al (2021) Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 444–453
Google Scholar
Li Y et al (2022) Unifying voxel-based representation with transformer for 3d object detection. In: Advances in neural information processing systems (NeurIPS). https://openreview.net/forum?id=XA4ru9mfxTP
Xu S et al (2021) Fusion painting: multimodal fusion with adaptive attention for 3D object detection. In: IEEE international intelligent transportation systems conference (ITSC). IEEE, pp 3047–3054
Google Scholar
Xu D et al (2018) Point fusion: deep sensor fusion for 3D bounding box estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 244–253
Google Scholar
Goodfellow I et al (2014) Generative adversarial networks. In: Neural Information processing systems (NeurIPS), pp 2672–2680
Google Scholar
Porav H et al (2018) Adversarial training for adverse conditions: robust metric localisation using appearance transfer. In: IEEE international conference on robotics and automation (ICRA), pp 1011–1018
Google Scholar
Latif Y et al (2018) Addressing challenging place recognition tasks using generative adversarial networks. In: IEEE international conference on robotics and automation (ICRA), pp 2349–2355
Google Scholar
Kendall A et al (2017) What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in neural information processing systems (NeurIPS), pp 5574–5584
Google Scholar
Kendall A et al (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7482–7491
Google Scholar
Yun P et al (2023) Laplace approximation based epistemic uncertainty estimation in 3D object detection. In: Conference on robot learning (CoRL). PMLR, pp 1125–1135
Google Scholar
Yun P et al (2019) Focal Loss in 3D Object Detection. IEEE Robot Autom Lett (RAL) 4(2):1263–1270
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Peng Yun, Yuxuan Liu, Xiaoyang Yan, Na Jin & Ming Liu
Tongji University, Shanghai, China
Jiahang Li & Rui Fan
Jilin University, Changchun, China
Jiachen Wang
Huawei Autonomous Driving Solutions, Shanghai, China
Lei Tai

Authors

Peng Yun
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiahang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiachen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Tai
View author publications
You can also search for this author in PubMed Google Scholar
Na Jin
View author publications
You can also search for this author in PubMed Google Scholar
Rui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Yun .

Editor information

Editors and Affiliations

Control Science & Engineering, Tongji University and Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China
Rui Fan
Control Science & Engineering, Tongji University and Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China
Sicen Guo
Electrical & Electronic Engineering, University of Bristol, Bristol, UK
Mohammud Junaid Bocus

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yun, P. et al. (2023). 3D Object Detection in Autonomous Driving. In: Fan, R., Guo, S., Bocus, M.J. (eds) Autonomous Driving Perception. Advances in Computer Vision and Pattern Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-99-4287-9_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-4287-9_5
Published: 07 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4286-2
Online ISBN: 978-981-99-4287-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

3D Object Detection in Autonomous Driving