Skip to main content
Log in

NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we propose a two-stage framework based on voxel neighborhood feature aggregation for 3D object detection in autonomous driving, named Neighbor Voxels to Point-RCNN (NV2P-RCNN). The point representation of point clouds can encode refined features, and the voxel representation provides an efficient processing framework, so we take advantage of both point representation and voxel representation of the point cloud in this paper. In the first stage, we add point density to the voxel feature encoding and extract voxel features by a 3D sparse convolutional network. In the second stage, the features of the raw point cloud are extracted and fused with the voxel features. To achieve the fast aggregation of voxel-to-point features, we design a neighbor voxels query method named NV-Query to find neighbor voxels directly through the voxel spatial coordinates of the points. The results on the KITTI and ONCE datasets show that NV2P-RCNN achieves higher detection precision compared with other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 779–788 . https://doi.org/10.1109/cvpr.2016.91

  2. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  3. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NeurIPS), pp. 379–387

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S.E, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision (ECCV), pp. 21–37

  5. Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digital Signal Process 123:103442. https://doi.org/10.1016/j.dsp.2022.103442

    Article  Google Scholar 

  6. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6526–6534. https://doi.org/10.1109/cvpr.2017.691

  7. Song S, Chandraker M (2015) Joint sfm and detection cues for monocular 3d localization in road scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3734–3742. https://doi.org/10.1109/cvpr.2015.7298997

  8. Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. In: Advances in neural information processing systems (NeurIPS), pp. 424–432. https://doi.org/10.1109/tpami.2017.2706685

  9. Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. In: Robotics: science and systems (RSS), vol. 12 . https://doi.org/10.15607/rss.2016.xii.042

  10. Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4490–4499. https://doi.org/10.1109/cvpr.2018.00472

  11. Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. In: Advances in neural information processing systems (NeurIPS), pp. 963–973

  12. Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337. https://doi.org/10.3390/s18103337

    Article  Google Scholar 

  13. Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 77–85. https://doi.org/10.1109/cvpr.2017.16

  14. Qi C.R, Yi L, Su H, Guibas L.J (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems (NeurIPS), pp. 5099–5108

  15. Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 9621–9630 . https://doi.org/10.1109/cvpr.2019.00985

  16. Thomas H, Qi C.R, Deschaud J.-E, Marcotegui B, Goulette F, Guibas L (2019) Kpconv: Flexible and deformable convolution for point clouds. In: IEEE international conference on computer vision (ICCV), pp. 6411–6420 . https://doi.org/10.1109/iccv.2019.00651

  17. Engelmann F, Kontogianni T, Leibe B (2020) Dilated point convolutions: on the receptive field size of point convolutions on 3d point clouds. In: IEEE international conference on robotics and automation (ICRA), pp. 9463–9469. https://doi.org/10.1109/icra40945.2020.9197503

  18. Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 29–38. https://doi.org/10.1109/cvpr.2017.11

  19. Zarzar J, Giancola S, Ghanem B (2019) Pointrgcn: graph convolution networks for 3d vehicles detection refinement. arXiv preprint arXiv:1911.12236

  20. Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1711–1719. https://doi.org/10.1109/cvpr42600.2020.00178

  21. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1912–1920. https://doi.org/10.1109/cvpr.2015.7298801

  22. Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: IEEE international conference on intelligent robots and systems (IROS), pp. 922–928. https://doi.org/10.1109/iros.2015.7353481

  23. Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–779. https://doi.org/10.1109/cvpr.2019.00086

  24. Jiang M, Wu Y, Lu C (2018) Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652

  25. Qi C.R, Liu W, Wu C, Su H, Guibas L.J (2018) Frustum pointnets for 3d object detection from rgb-d data. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 918–927. https://doi.org/10.1109/cvpr.2018.00102

  26. Zhao H, Jiang L, Fu C.-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5565–5573. https://doi.org/10.1109/cvpr.2019.00571

  27. Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: IEEE international conference on intelligent robots and systems (IROS), pp. 1–8. https://doi.org/10.1109/iros.2018.8594049

  28. Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Gross H.M (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1–10. https://doi.org/10.1109/cvprw.2019.00158

  29. Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 663–678 . https://doi.org/10.1007/978-3-030-01270-0_39

  30. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 7345–7353 (2019). https://doi.org/10.1109/cvpr.2019.00752

  31. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6517–6525. https://doi.org/10.1109/cvpr.2017.690

  32. Lang A.H, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 12697–12705. https://doi.org/10.1109/cvpr.2019.01298

  33. Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1631–1640. https://doi.org/10.1109/cvpr42600.2020.00170

  34. Li X, Guivant J.E, Kwok N, Xu Y (2019) 3d backbone network for 3d object detection. arXiv preprint arXiv:1901.08373

  35. Shi S, Wang Z, Shi J, Wang X, Li H (2021) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 43(8):2647–2664. https://doi.org/10.1109/tpami.2020.2977026

    Article  Google Scholar 

  36. Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: sparse-to-dense 3d object detector for point cloud. In: IEEE international conference on computer vision (ICCV), pp. 1951–1960 . https://doi.org/10.1109/iccv.2019.00204

  37. Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 11040–11048 . https://doi.org/10.1109/cvpr42600.2020.01105

  38. Qi C.R, Litany O, He K, Guibas L (2019) Deep hough voting for 3d object detection in point clouds. In: IEEE international conference on computer vision (ICCV), pp. 9277–9286 . https://doi.org/10.1109/iccv.2019.00937

  39. Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) IPOD: Intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276

  40. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/tpami.2016.2644615

    Article  Google Scholar 

  41. Zhang W, Wang X, You W, Chen J, Dai P, Zhang P (2019) RESLS: region and edge synergetic level set framework for image segmentation. IEEE Trans Image Process 29:57–71. https://doi.org/10.1109/tip.2019.2928134

    Article  MathSciNet  MATH  Google Scholar 

  42. Yu X (2014) Blurred trace infrared image segmentation based on template approach and immune factor. Infrared Phys Technol 67:116–120. https://doi.org/10.1016/j.infrared.2014.07.002

    Article  Google Scholar 

  43. Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: IEEE international conference on computer vision (ICCV), pp. 9775–9784 . https://doi.org/10.1109/iccv.2019.00987

  44. Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 10529–10538. https://doi.org/10.1109/cvpr42600.2020.01054

  45. Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, Wang X, Li H (2021) Pv-rcnn++: point-voxel feature set abstraction with local vector representation for 3d object detection. arXiv preprint arXiv:2102.00463

  46. Li J, Sun Y, Luo S, Zhu Z, Dai H, Krylov AS, Ding Y, Shao L (2021) P2v-rcnn: point to voxel feature learning for 3d object detection from point clouds. IEEE Access 9:98249–98260. https://doi.org/10.1109/access.2021.3094562

    Article  Google Scholar 

  47. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778. https://doi.org/10.1109/cvpr.2016.90

  48. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/iccv.2017.324

    Article  Google Scholar 

  49. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3354–3361 (2012). https://doi.org/10.1109/cvpr.2012.6248074

  50. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  51. Mao J, Niu M, Jiang C, Liang H, Chen J, Liang X, Li Y, Ye C, Zhang W, Li Z, et al. (2021) One million scenes for autonomous driving: once dataset. arXiv preprint arXiv:2106.11037

Download references

Funding

This work is supported by a project of the National Natural Science Foundation of China (62072025).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Ren.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huo, W., Jing, T. & Ren, S. NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection. Neural Process Lett 55, 6925–6945 (2023). https://doi.org/10.1007/s11063-023-11244-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11244-x

Keywords

Navigation