Lidar Point Cloud Guided Monocular 3D Object Detection

Peng, Liang; Liu, Fei; Yu, Zhengxu; Yan, Senbo; Deng, Dan; Yang, Zheng; Liu, Haifeng; Cai, Deng

doi:10.1007/978-3-031-19769-7_8

Liang Peng^12,13,
Fei Liu¹⁴,
Zhengxu Yu¹²,
Senbo Yan^12,13,
Dan Deng¹³,
Zheng Yang¹³,
Haifeng Liu¹² &
…
Deng Cai^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

3427 Accesses
21 Citations

Abstract

Monocular 3D object detection is a challenging task in the self-driving and computer vision community. As a common practice, most previous works use manually annotated 3D box labels, where the annotating process is expensive. In this paper, we find that the precisely and carefully annotated labels may be unnecessary in monocular 3D detection, which is an interesting and counterintuitive finding. Using rough labels that are randomly disturbed, the detector can achieve very close accuracy compared to the one using the ground-truth labels. We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). This framework is capable of either reducing the annotation costs or considerably boosting the detection accuracy without introducing extra annotation costs. Specifically, It generates pseudo labels from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in 3D space, such pseudo labels can replace manually annotated labels in the training of monocular 3D detectors, since their 3D location information is precise. LPCG can be applied into any monocular 3D detector to fully use massive unlabeled data in a self-driving system. As a result, in KITTI benchmark, we take the first place on both monocular 3D and BEV (bird’s-eye-view) detection with a significant margin. In Waymo benchmark, our method using 10% labeled data achieves comparable accuracy to the baseline detector using 100% labeled data. The codes are released at https://github.com/SPengLiang/LPCG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
From RTM3D official implementation.

References

Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9287–9296 (2019)
Google Scholar
Chen, H., Huang, Y., Tian, W., Gao, Z., Xiong, L.: MonoRUn: monocular 3D object detection by reconstruction and uncertainty propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10379–10388 (2021)
Google Scholar
Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1259–1272 (2017)
Article Google Scholar
Chen, Y., Tai, L., Sun, K., Li, M.: MonoPair: Monocular 3D object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12093–12102 (2020)
Google Scholar
Chu, X., et al.: Neighbor-vote: improving monocular 3d object detection through neighbor distance voting. arXiv preprint arXiv:2107.02493 (2021)
Ding, M., et al.: Learning depth-guided convolutions for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11672–11681 (2020)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kumar, A., Brazil, G., Liu, X.: GrooMeD-NMS: grouped mathematically differentiable NMS for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8973–8983 (2021)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving. arXiv preprint arXiv:2001.03343 (2020)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, Y., Yixuan, Y., Liu, M.: Ground-aware monocular 3D object detection for autonomous driving. IEEE Robot. Autom. Lett. 6(2), 919–926 (2021)
Article Google Scholar
Liu, Z., Zhou, D., Lu, F., Fang, J., Zhang, L.: Autoshape: real-time shape-aware monocular 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15641–15650 (2021)
Google Scholar
Lu, Y., et al.: Geometry uncertainty projection network for monocular 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3111–3121 (2021)
Google Scholar
Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-LiDAR representation. arXiv preprint arXiv:2008.04582 (2020)
Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., Fan, X.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6851–6860 (2019)
Google Scholar
Ma, X., et al.: Delving into localization errors for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4721–4730 (2021)
Google Scholar
Manhardt, F., Kehl, W., Gaidon, A.: ROI-10D: monocular lifting of 2D detection to 6D pose and metric shape. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2069–2078 (2019)
Google Scholar
Park, D., Ambrus, R., Guizilini, V., Li, J., Gaidon, A.: Is pseudo-lidar needed for monocular 3D object detection? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3142–3152 (2021)
Google Scholar
Peng, L., Liu, F., Yan, S., He, X., Cai, D.: OCM3D: object-centric monocular 3D object detection. arXiv preprint arXiv:2104.06041 (2021)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for monocular 3D object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8851–8858 (2019)
Google Scholar
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555–8564 (2021)
Google Scholar
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Google Scholar
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. arXiv preprint arXiv:1907.03670 (2019)
Shi, W., Rajkumar, R.: Point-GNN: graph neural network for 3D object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719 (2020)
Google Scholar
Shi, X., Ye, Q., Chen, X., Chen, C., Chen, Z., Kim, T.K.: Geometry-based distance decomposition for monocular 3D object detection. arXiv preprint arXiv:2104.03775 (2021)
Simonelli, A., Bulo, S.R., Porzi, L., Kontschieder, P., Ricci, E.: Are we missing confidence in pseudo-LiDAR methods for monocular 3D object detection? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3225–3233 (2021)
Google Scholar
Simonelli, A., Bulo, S.R., Porzi, L., López-Antequera, M., Kontschieder, P.: Disentangling monocular 3D object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1991–1999 (2019)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Google Scholar
Toussaint, G.T.: Solving geometric problems with the rotating calipers. In: Proceedings of IEEE Melecon, vol. 83, p. A10 (1983)
Google Scholar
Wang, L., et al.: Depth-conditioned dynamic message propagation for monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 454–463 (2021)
Google Scholar
Wang, L., Zhang, L., Zhu, Y., Zhang, Z., He, T., Li, M., Xue, X.: Progressive coordinate transforms for monocular 3D object detection. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)
Google Scholar
Weng, X., Kitani, K.: Monocular 3D object detection with pseudo-LiDAR point cloud. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)
Google Scholar
Ye, M., Xu, S., Cao, T.: HVNet: hybrid voxel network for lidar based 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1631–1640 (2020)
Google Scholar
Zakharov, S., Kehl, W., Bhargava, A., Gaidon, A.: Autolabeling 3D objects with differentiable rendering of SDF shape priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12224–12233 (2020)
Google Scholar
Zhang, Y., Lu, J., Zhou, J.: Objects are different: flexible monocular 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3289–3298 (2021)
Google Scholar
Zheng, W., Tang, W., Jiang, L., Fu, C.W.: SE-SSD: self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14494–14503 (2021)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar
Zhou, Y., He, Y., Zhu, H., Wang, C., Li, H., Jiang, Q.: Monocular 3D object detection: an extrinsic parameter free approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7556–7566 (2021)
Google Scholar

Download references

Acknowledgments

This work was supported in part by The National Key Research and Development Program of China (Grant Nos: 2018AAA0101400), in part by The National Nature Science Foundation of China (Grant Nos: 62036009, U1909203, 61936006, 61973271), in part by Innovation Capability Support Program of Shaanxi (Program No. 2021TD-05).

Author information

Authors and Affiliations

State Key Lab of CAD &CG, Zhejiang University, Hangzhou, China
Liang Peng, Zhengxu Yu, Senbo Yan, Haifeng Liu & Deng Cai
Fabu Inc., Hangzhou, China
Liang Peng, Senbo Yan, Dan Deng, Zheng Yang & Deng Cai
State Key Lab of Industrial Control and Technology, Zhejiang University, Hangzhou, China
Fei Liu

Authors

Liang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Fei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengxu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Senbo Yan
View author publications
You can also search for this author in PubMed Google Scholar
Dan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Deng Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deng Cai .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18947 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, L. et al. (2022). Lidar Point Cloud Guided Monocular 3D Object Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_8
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics