SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection

Zhao, Deze; Zhao, Shengjie; Liang, Shuang

doi:10.1007/978-981-99-7025-4_17

Deze Zhao¹²,
Shengjie Zhao^12,13 &
Shuang Liang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14327))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

610 Accesses

Abstract

Voxel-based 3D object detection methods have gained more popularity in autonomous driving. However, due to the sparse nature of LiDAR point clouds, voxels from conventional cubic partition lead to incomplete representation of objects in farther range. This poses significant challenges to 3D object perception. In this paper, we propose a novel 3D object detector dubbed SVFNeXt, a Sparse Voxel Fusion Network that performs cross-representation (X) feature learning. It is because cylindrical voxel representation considers the rotational or radial scanning of LiDAR that we can better explore the inherent 3D geometric structure of point clouds. To further enchance cubic voxel features, we innovatively integrates the features of cylindrical voxels into cubic voxels, incorporating both local and global features. We particularly attend to informative voxels by two additional losses, striking a good speed-accuracy tradeoff. Extensive experiments on the WOD and KITTI datasets demonstrate consistent improvements over baselines. Our SVFNeXt achieves competitive results compared to state-of-the-art methods, especially for small objects(e.g., cyclist, pedestrian).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5105–5114 (2017)
Google Scholar
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: AAAI, pp. 1201–1209 (2021)
Google Scholar
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR, pp. 10529–10538 (2020)
Google Scholar
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: CVPR, pp. 11784–11793 (2021)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)
Google Scholar
Shi, S., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. Int. J. Comput. Vision 131(2), 531–551 (2023). https://doi.org/10.1007/s11263-022-01710-9
Article Google Scholar
Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In: CVPR, pp. 9939–9948 (2021)
Google Scholar
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in lidar point clouds. In: CoRL, pp. 923–932 (2020)
Google Scholar
Wang, Y.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)
Google Scholar
Liang, T., et al.: BEVFusion: a simple and robust lidar-camera fusion framework. In: NeurIPS, pp. 10421–10434 (2022)
Google Scholar
Li, Y., et al.: DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection. In: CVPR, pp. 17182–17191 (2022)
Google Scholar
Mao, J., et al.: Voxel transformer for 3D object detection. In: CVPR, pp. 3164–3173 (2021)
Google Scholar
He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: CVPR, pp. 8417–8427 (2022)
Google Scholar
Sun, P., et al.: SWFormer: sparse window transformer for 3D object detection in point clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pp. 426–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_25
Chapter Google Scholar
Zhou, Z., Zhao, X., Wang, Yu., Wang, P., Foroosh, H.: CenterFormer: center-based transformer for 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 496–513. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_29
Chapter Google Scholar
Sheng, H., et al.: Improving 3D object detection with channel-wise transformer. In: ICCV, pp. 2743–2752 (2021)
Google Scholar
Hu, J.S., Kuai, T., Waslander, S.L.: Point density-aware voxels for lidar 3D object detection. In: CVPR, pp. 8469–8478 (2022)
Google Scholar
Chen, Y., Liu, J., Zhang, X., Qi, X., Jia, J.: VoxelNeXt: fully sparse voxelnet for 3D Object detection and tracking. In: CVPR, pp. 21674–21683 (2023)
Google Scholar
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Chapter Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR, pp. 11040–11048 (2020)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Google Scholar
Team O.D.: OpenPCDet: an open-source toolbox for 3D object detection from point clouds (2020). https://github.com/open-mmlab/OpenPCDet

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Project under Grant 2019YFB2102300, in part by the National Natural Science Foundation of China under Grant 61936014, 62076183, 61976159, in part by the Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0100, in part by the Shanghai Science and Technology Innovation Action Plan Project No. 22511105300 and 20511100700, in part by the Natural Science Foundation of Shanghai under Grant 20ZR147350, in part by the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Tongji University, Shanghai, China
Deze Zhao & Shengjie Zhao
School of Software Engineering, Tongji University, Shanghai, China
Shengjie Zhao & Shuang Liang

Authors

Deze Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shengjie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shengjie Zhao or Shuang Liang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, D., Zhao, S., Liang, S. (2024). SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-7025-4_17
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection