Abstract
Voxel-based 3D object detection methods have gained more popularity in autonomous driving. However, due to the sparse nature of LiDAR point clouds, voxels from conventional cubic partition lead to incomplete representation of objects in farther range. This poses significant challenges to 3D object perception. In this paper, we propose a novel 3D object detector dubbed SVFNeXt, a Sparse Voxel Fusion Network that performs cross-representation (X) feature learning. It is because cylindrical voxel representation considers the rotational or radial scanning of LiDAR that we can better explore the inherent 3D geometric structure of point clouds. To further enchance cubic voxel features, we innovatively integrates the features of cylindrical voxels into cubic voxels, incorporating both local and global features. We particularly attend to informative voxels by two additional losses, striking a good speed-accuracy tradeoff. Extensive experiments on the WOD and KITTI datasets demonstrate consistent improvements over baselines. Our SVFNeXt achieves competitive results compared to state-of-the-art methods, especially for small objects(e.g., cyclist, pedestrian).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5105–5114 (2017)
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: AAAI, pp. 1201–1209 (2021)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR, pp. 10529–10538 (2020)
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: CVPR, pp. 11784–11793 (2021)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)
Shi, S., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. Int. J. Comput. Vision 131(2), 531–551 (2023). https://doi.org/10.1007/s11263-022-01710-9
Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In: CVPR, pp. 9939–9948 (2021)
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in lidar point clouds. In: CoRL, pp. 923–932 (2020)
Wang, Y.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)
Liang, T., et al.: BEVFusion: a simple and robust lidar-camera fusion framework. In: NeurIPS, pp. 10421–10434 (2022)
Li, Y., et al.: DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection. In: CVPR, pp. 17182–17191 (2022)
Mao, J., et al.: Voxel transformer for 3D object detection. In: CVPR, pp. 3164–3173 (2021)
He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: CVPR, pp. 8417–8427 (2022)
Sun, P., et al.: SWFormer: sparse window transformer for 3D object detection in point clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pp. 426–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_25
Zhou, Z., Zhao, X., Wang, Yu., Wang, P., Foroosh, H.: CenterFormer: center-based transformer for 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 496–513. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_29
Sheng, H., et al.: Improving 3D object detection with channel-wise transformer. In: ICCV, pp. 2743–2752 (2021)
Hu, J.S., Kuai, T., Waslander, S.L.: Point density-aware voxels for lidar 3D object detection. In: CVPR, pp. 8469–8478 (2022)
Chen, Y., Liu, J., Zhang, X., Qi, X., Jia, J.: VoxelNeXt: fully sparse voxelnet for 3D Object detection and tracking. In: CVPR, pp. 21674–21683 (2023)
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR, pp. 11040–11048 (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Team O.D.: OpenPCDet: an open-source toolbox for 3D object detection from point clouds (2020). https://github.com/open-mmlab/OpenPCDet
Acknowledgements
This work is supported in part by the National Key Research and Development Project under Grant 2019YFB2102300, in part by the National Natural Science Foundation of China under Grant 61936014, 62076183, 61976159, in part by the Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0100, in part by the Shanghai Science and Technology Innovation Action Plan Project No. 22511105300 and 20511100700, in part by the Natural Science Foundation of Shanghai under Grant 20ZR147350, in part by the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, D., Zhao, S., Liang, S. (2024). SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-7025-4_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)