NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection

Huo, Weile; Jing, Tao; Ren, Shuang

doi:10.1007/s11063-023-11244-x

NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection

Published: 24 March 2023

Volume 55, pages 6925–6945, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Weile Huo¹,
Tao Jing¹ &
Shuang Ren¹

226 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose a two-stage framework based on voxel neighborhood feature aggregation for 3D object detection in autonomous driving, named Neighbor Voxels to Point-RCNN (NV2P-RCNN). The point representation of point clouds can encode refined features, and the voxel representation provides an efficient processing framework, so we take advantage of both point representation and voxel representation of the point cloud in this paper. In the first stage, we add point density to the voxel feature encoding and extract voxel features by a 3D sparse convolutional network. In the second stage, the features of the raw point cloud are extracted and fused with the voxel features. To achieve the fast aggregation of voxel-to-point features, we design a neighbor voxels query method named NV-Query to find neighbor voxels directly through the voxel spatial coordinates of the points. The results on the KITTI and ONCE datasets show that NV2P-RCNN achieves higher detection precision compared with other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Small-Size 3d Object Detection Network for Analyzing the Sparsity of Raw Lidar Point Cloud

Article 24 November 2023

GVnet: Gaussian model with voxel-based 3D detection network for autonomous driving

Article 17 May 2021

SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection

References

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 779–788 . https://doi.org/10.1109/cvpr.2016.91
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NeurIPS), pp. 379–387
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S.E, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision (ECCV), pp. 21–37
Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digital Signal Process 123:103442. https://doi.org/10.1016/j.dsp.2022.103442
Article Google Scholar
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6526–6534. https://doi.org/10.1109/cvpr.2017.691
Song S, Chandraker M (2015) Joint sfm and detection cues for monocular 3d localization in road scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3734–3742. https://doi.org/10.1109/cvpr.2015.7298997
Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. In: Advances in neural information processing systems (NeurIPS), pp. 424–432. https://doi.org/10.1109/tpami.2017.2706685
Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. In: Robotics: science and systems (RSS), vol. 12 . https://doi.org/10.15607/rss.2016.xii.042
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4490–4499. https://doi.org/10.1109/cvpr.2018.00472
Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. In: Advances in neural information processing systems (NeurIPS), pp. 963–973
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337. https://doi.org/10.3390/s18103337
Article Google Scholar
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 77–85. https://doi.org/10.1109/cvpr.2017.16
Qi C.R, Yi L, Su H, Guibas L.J (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems (NeurIPS), pp. 5099–5108
Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 9621–9630 . https://doi.org/10.1109/cvpr.2019.00985
Thomas H, Qi C.R, Deschaud J.-E, Marcotegui B, Goulette F, Guibas L (2019) Kpconv: Flexible and deformable convolution for point clouds. In: IEEE international conference on computer vision (ICCV), pp. 6411–6420 . https://doi.org/10.1109/iccv.2019.00651
Engelmann F, Kontogianni T, Leibe B (2020) Dilated point convolutions: on the receptive field size of point convolutions on 3d point clouds. In: IEEE international conference on robotics and automation (ICRA), pp. 9463–9469. https://doi.org/10.1109/icra40945.2020.9197503
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 29–38. https://doi.org/10.1109/cvpr.2017.11
Zarzar J, Giancola S, Ghanem B (2019) Pointrgcn: graph convolution networks for 3d vehicles detection refinement. arXiv preprint arXiv:1911.12236
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1711–1719. https://doi.org/10.1109/cvpr42600.2020.00178
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1912–1920. https://doi.org/10.1109/cvpr.2015.7298801
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: IEEE international conference on intelligent robots and systems (IROS), pp. 922–928. https://doi.org/10.1109/iros.2015.7353481
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–779. https://doi.org/10.1109/cvpr.2019.00086
Jiang M, Wu Y, Lu C (2018) Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652
Qi C.R, Liu W, Wu C, Su H, Guibas L.J (2018) Frustum pointnets for 3d object detection from rgb-d data. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 918–927. https://doi.org/10.1109/cvpr.2018.00102
Zhao H, Jiang L, Fu C.-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5565–5573. https://doi.org/10.1109/cvpr.2019.00571
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: IEEE international conference on intelligent robots and systems (IROS), pp. 1–8. https://doi.org/10.1109/iros.2018.8594049
Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Gross H.M (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1–10. https://doi.org/10.1109/cvprw.2019.00158
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 663–678 . https://doi.org/10.1007/978-3-030-01270-0_39
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 7345–7353 (2019). https://doi.org/10.1109/cvpr.2019.00752
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6517–6525. https://doi.org/10.1109/cvpr.2017.690
Lang A.H, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 12697–12705. https://doi.org/10.1109/cvpr.2019.01298
Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1631–1640. https://doi.org/10.1109/cvpr42600.2020.00170
Li X, Guivant J.E, Kwok N, Xu Y (2019) 3d backbone network for 3d object detection. arXiv preprint arXiv:1901.08373
Shi S, Wang Z, Shi J, Wang X, Li H (2021) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 43(8):2647–2664. https://doi.org/10.1109/tpami.2020.2977026
Article Google Scholar
Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: sparse-to-dense 3d object detector for point cloud. In: IEEE international conference on computer vision (ICCV), pp. 1951–1960 . https://doi.org/10.1109/iccv.2019.00204
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 11040–11048 . https://doi.org/10.1109/cvpr42600.2020.01105
Qi C.R, Litany O, He K, Guibas L (2019) Deep hough voting for 3d object detection in point clouds. In: IEEE international conference on computer vision (ICCV), pp. 9277–9286 . https://doi.org/10.1109/iccv.2019.00937
Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) IPOD: Intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/tpami.2016.2644615
Article Google Scholar
Zhang W, Wang X, You W, Chen J, Dai P, Zhang P (2019) RESLS: region and edge synergetic level set framework for image segmentation. IEEE Trans Image Process 29:57–71. https://doi.org/10.1109/tip.2019.2928134
Article MathSciNet MATH Google Scholar
Yu X (2014) Blurred trace infrared image segmentation based on template approach and immune factor. Infrared Phys Technol 67:116–120. https://doi.org/10.1016/j.infrared.2014.07.002
Article Google Scholar
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: IEEE international conference on computer vision (ICCV), pp. 9775–9784 . https://doi.org/10.1109/iccv.2019.00987
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 10529–10538. https://doi.org/10.1109/cvpr42600.2020.01054
Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, Wang X, Li H (2021) Pv-rcnn++: point-voxel feature set abstraction with local vector representation for 3d object detection. arXiv preprint arXiv:2102.00463
Li J, Sun Y, Luo S, Zhu Z, Dai H, Krylov AS, Ding Y, Shao L (2021) P2v-rcnn: point to voxel feature learning for 3d object detection from point clouds. IEEE Access 9:98249–98260. https://doi.org/10.1109/access.2021.3094562
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778. https://doi.org/10.1109/cvpr.2016.90
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/iccv.2017.324
Article Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3354–3361 (2012). https://doi.org/10.1109/cvpr.2012.6248074
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297
Article Google Scholar
Mao J, Niu M, Jiang C, Liang H, Chen J, Liang X, Li Y, Ye C, Zhang W, Li Z, et al. (2021) One million scenes for autonomous driving: once dataset. arXiv preprint arXiv:2106.11037

Download references

Funding

This work is supported by a project of the National Natural Science Foundation of China (62072025).

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Weile Huo, Tao Jing & Shuang Ren

Authors

Weile Huo
View author publications
You can also search for this author in PubMed Google Scholar
Tao Jing
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuang Ren.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huo, W., Jing, T. & Ren, S. NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection. Neural Process Lett 55, 6925–6945 (2023). https://doi.org/10.1007/s11063-023-11244-x

Download citation

Accepted: 08 March 2023
Published: 24 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11244-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection

Abstract

Access this article

Similar content being viewed by others

A Small-Size 3d Object Detection Network for Analyzing the Sparsity of Raw Lidar Point Cloud

GVnet: Gaussian model with voxel-based 3D detection network for autonomous driving

SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection

Abstract

Access this article

Similar content being viewed by others

A Small-Size 3d Object Detection Network for Analyzing the Sparsity of Raw Lidar Point Cloud

GVnet: Gaussian model with voxel-based 3D detection network for autonomous driving

SVFNeXt: Sparse Voxel Fusion for LiDAR-Based 3D Object Detection

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation