Skip to main content

SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection

  • Conference paper
  • First Online:
PRICAI 2023: Trends in Artificial Intelligence (PRICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14327))

Included in the following conference series:

  • 610 Accesses

Abstract

Voxel-based 3D object detection methods have gained more popularity in autonomous driving. However, due to the sparse nature of LiDAR point clouds, voxels from conventional cubic partition lead to incomplete representation of objects in farther range. This poses significant challenges to 3D object perception. In this paper, we propose a novel 3D object detector dubbed SVFNeXt, a Sparse Voxel Fusion Network that performs cross-representation (X) feature learning. It is because cylindrical voxel representation considers the rotational or radial scanning of LiDAR that we can better explore the inherent 3D geometric structure of point clouds. To further enchance cubic voxel features, we innovatively integrates the features of cylindrical voxels into cubic voxels, incorporating both local and global features. We particularly attend to informative voxels by two additional losses, striking a good speed-accuracy tradeoff. Extensive experiments on the WOD and KITTI datasets demonstrate consistent improvements over baselines. Our SVFNeXt achieves competitive results compared to state-of-the-art methods, especially for small objects(e.g., cyclist, pedestrian).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5105–5114 (2017)

    Google Scholar 

  2. Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  3. Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: AAAI, pp. 1201–1209 (2021)

    Google Scholar 

  4. Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR, pp. 10529–10538 (2020)

    Google Scholar 

  5. Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: CVPR, pp. 11784–11793 (2021)

    Google Scholar 

  6. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)

    Google Scholar 

  7. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)

    Google Scholar 

  8. Shi, S., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. Int. J. Comput. Vision 131(2), 531–551 (2023). https://doi.org/10.1007/s11263-022-01710-9

    Article  Google Scholar 

  9. Zhu, X., et al.: Cylindrical and asymmetrical 3D convolution networks for lidar segmentation. In: CVPR, pp. 9939–9948 (2021)

    Google Scholar 

  10. Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in lidar point clouds. In: CoRL, pp. 923–932 (2020)

    Google Scholar 

  11. Wang, Y.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2

    Chapter  Google Scholar 

  12. Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)

    Google Scholar 

  13. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)

    Google Scholar 

  14. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)

    Google Scholar 

  15. Liang, T., et al.: BEVFusion: a simple and robust lidar-camera fusion framework. In: NeurIPS, pp. 10421–10434 (2022)

    Google Scholar 

  16. Li, Y., et al.: DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection. In: CVPR, pp. 17182–17191 (2022)

    Google Scholar 

  17. Mao, J., et al.: Voxel transformer for 3D object detection. In: CVPR, pp. 3164–3173 (2021)

    Google Scholar 

  18. He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: CVPR, pp. 8417–8427 (2022)

    Google Scholar 

  19. Sun, P., et al.: SWFormer: sparse window transformer for 3D object detection in point clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pp. 426–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_25

    Chapter  Google Scholar 

  20. Zhou, Z., Zhao, X., Wang, Yu., Wang, P., Foroosh, H.: CenterFormer: center-based transformer for 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 496–513. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_29

    Chapter  Google Scholar 

  21. Sheng, H., et al.: Improving 3D object detection with channel-wise transformer. In: ICCV, pp. 2743–2752 (2021)

    Google Scholar 

  22. Hu, J.S., Kuai, T., Waslander, S.L.: Point density-aware voxels for lidar 3D object detection. In: CVPR, pp. 8469–8478 (2022)

    Google Scholar 

  23. Chen, Y., Liu, J., Zhang, X., Qi, X., Jia, J.: VoxelNeXt: fully sparse voxelnet for 3D Object detection and tracking. In: CVPR, pp. 21674–21683 (2023)

    Google Scholar 

  24. Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45

    Chapter  Google Scholar 

  25. Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR, pp. 11040–11048 (2020)

    Google Scholar 

  26. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)

    Google Scholar 

  27. Team O.D.: OpenPCDet: an open-source toolbox for 3D object detection from point clouds (2020). https://github.com/open-mmlab/OpenPCDet

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Project under Grant 2019YFB2102300, in part by the National Natural Science Foundation of China under Grant 61936014, 62076183, 61976159, in part by the Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0100, in part by the Shanghai Science and Technology Innovation Action Plan Project No. 22511105300 and 20511100700, in part by the Natural Science Foundation of Shanghai under Grant 20ZR147350, in part by the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shengjie Zhao or Shuang Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, D., Zhao, S., Liang, S. (2024). SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7025-4_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7024-7

  • Online ISBN: 978-981-99-7025-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics