ODSPC: deep learning-based 3D object detection using semantic point cloud

Song, Shuang; Huang, Tengchao; Zhu, Qingyuan; Hu, Huosheng

doi:10.1007/s00371-023-02820-2

ODSPC: deep learning-based 3D object detection using semantic point cloud

Original article
Published: 18 March 2023

Volume 40, pages 849–863, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shuang Song¹,
Tengchao Huang¹,
Qingyuan Zhu ORCID: orcid.org/0000-0001-5521-5023¹ &
…
Huosheng Hu²

520 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Three-dimensional object detection plays a key role in autonomous driving, which becomes extremely challenging in occlusion situations. This paper presents a novel multimodal 3D object detection framework which fuses visual semantic information and depth point cloud information to accurately detect targets with distant object features and occlusion situations. The framework consists of the four steps. Firstly, an improved semantic segmentation network is used to extract semantic information of objects containing similar features. Secondly, semantic images and point clouds are combined to generate pixel-level fusion data so that the semantic information and training capability of sparse and far-point clouds can be improved. Thirdly, a deep learning-based point cloud classification network is used for training of the fused data to output accurate detection frames. Fourthly, an extended Kalman filter is incorporated into point cloud prediction for image-based object detection to further enhance the robustness of object detection. Both Cityscapes and KITTI datasets are used in ablation study and experiments to validate the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Gonzalez, D., Perez, J., Milanes, V., Nashashibi, F.: A review of motion planning techniques for automated vehicles. IEEE Trans. Intell. Transp. Syst. 17(4), 1135–1145 (2016)
Article Google Scholar
Yi, C.L., Zhang, K.F., Peng, N.L.: A multi-sensor fusion and object tracking algorithm for self-driving vehicles. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 233(9), 2293–2300 (2019)
Article Google Scholar
Zhang, X.Q., Wang, X.X., Gu, C.H.: Online multi-object tracking with pedestrian re-identification and occlusion processing. Vis. Comput. 37(5), 1089–1099 (2021)
Article Google Scholar
Dames, P.M.: Distributed multi-target search and tracking using the phd filter. Auton. Robot. 44(3–4), 673–689 (2020)
Article Google Scholar
Mao, Q.C., Sun, H.M., Zuo, L.Q., Jia, R.S.: Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl. Intell. 50, 3125–3136 (2020)
Article Google Scholar
Wu, P., Gu, L.P., Yan, X.F., et al.: Pv-rcnn plus: semantical point-voxel feature interaction for 3D object detection. Vis. Comput. 45, 456 (2022). https://doi.org/10.1007/s00371-022-02672-2
Article Google Scholar
Maiettini, E., Pasquale, G., Rosasco, L., Natale, L.: On-line object detection: a robotics challenge. J. Auton. Robots 44(5), 739–757 (2020)
Article Google Scholar
Wang, Y.N., Wang, H.W., Cao, J.Z.: A contour self-compensated network for salient object detection. Visual Computer 37(6), 1467–1479 (2020)
Article Google Scholar
Yuan, J.Y., Zhang, G.X., Li, F.P., et al.: Independent moving object detection based on a vehicle mounted binocular camera. IEEE Sens. J. 21(10), 11522–11531 (2021)
Article Google Scholar
Chen, L., Fan, L., Xie, G.D., Huang, K., Nuchter, A.: Moving-object detection from consecutive stereo pairs using slanted plane smoothing. IEEE Trans. Intell. Transp. Syst. 18(11), 3093–3102 (2017)
Article Google Scholar
Jafari, O.H., Mitzel, D., Leibe, B.: Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5636–5643 (2014)
Cao, L.Y., Zhang, X.L., Wang, Z.S., Ding, G.Y.: Multi angle rotation object detection for remote sensing image based on modified feature pyramid networks. Int. J. Remote Sens. 42(14), 5257–5280 (2021)
Article Google Scholar
Zhao, J.X., Xu, H., Liu, H.C., Wu, J.Q., Zheng, Y.C., Wu, D.Y.: Detection and tracking of pedestrians and vehicles using roadside lidar sensors. Transp. Res. Part C Emerg. Technol. 100, 68–87 (2019)
Article Google Scholar
Wang, H., Lou, X.Y., Cai, Y.F., Li, Y.C., Chen, L.: Real-time vehicle detection algorithm based on vision and lidar point cloud fusion. J. Sens. 2019, 1–9 (2019)
Google Scholar
Bello, S.A., Yu, S.S., Wang, C., Adam, J.M., Li, J.: Review: deep learning on 3D point clouds. Remote Sens. 12(11), 1–34 (2020)
Article Google Scholar
Yan, Z., Duckett, T., Bellotto, N.: Online learning for 3D lidar-based human detection: experimental analysis of point cloud clustering and classification methods. Auton. Robots 44(2), 147–164 (2020)
Article Google Scholar
Xie, Q., Lai, Y.K., Wu, J., Wang, Z.T., Zhang, Y.M., Xu, K., Wang, J.: Vote-based 3D object detection with context modeling and sob-3dnms. Int. J. Comput. Vis. 129, 1857–1874 (2021)
Article Google Scholar
Asvadi, A., Premebida, C., Peixoto, P., Nunes, U.: 3D lidar-based static and moving obstacle detection in driving environments: an approach based on voxels and multi-region ground planes. Robot. Auton. Syst. 83, 299–311 (2016)
Article Google Scholar
Mao, J., Shi, S., Wang, X., Li, H.: 3D object detection for autonomous driving: a review and new outlooks (2022). arXiv preprint arXiv:2206.09474
Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., Tai, C.-L.: Transfusion: robust lidar-camera fusion for 3D object detection with transformers (2022). arXiv preprint arXiv:2203.11496
Du, X.X., Ang, M.H., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3194–3200 (2018)
Xu, D.F., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 244–253 (2018)
Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: enhancing point features with image semantics for 3D object detection (2020). arXiv preprint arXiv:2007.08856
Wang, Z.J., Zhao, Z., Jin, Z., Che, Z.P., Tang, J., Peng, Y.X.: Multi-stage fusion for multi-class 3D lidar detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3113–3121 (2021)
Shi, S.S., Wang, X.G., Li, H.S.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–779 (2019)
Yang, Z.T., Sun, Y., Liu, S., Jia, J.Y.: 3DSSD: Point-based 3D single stage object detector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11037–11045 (2020)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud-based 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499 (2018)
Qi, C.R., Liu, W., Wu, C.X., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from rgb-d data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 918–927 (2018)
Chen, X.Z., Ma, H.M., Wan, J., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.6526–6534 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.5750–5757 (2018)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pvrcnn: point-voxel feature set abstraction for 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.10529–10538 (2020)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.77–85 (2017)
Cordts, M., Omran, M., Ramos, S. et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.3213–3223 (2016)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp.3354–3361 (2012)
Kesten, R., Usman, M., Houston, J., Pandya, T. et al.: Level 5 perception dataset 2020 (2019). https://level-5.global/level5/data/

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (Grant No.52075461), the Key Project in Science and Technology Plan of Xiamen, China (Grant No. 3502Z20201015), and the Innovation Method Special Project of Ministry of Science and Technology of China (Grant No. 2020IM010100).

Author information

Authors and Affiliations

Department of Mechanical and Electrical Engineering, Xiamen University, Xiamen, 361005, China
Shuang Song, Tengchao Huang & Qingyuan Zhu
School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, England, UK
Huosheng Hu

Authors

Shuang Song
View author publications
You can also search for this author in PubMed Google Scholar
Tengchao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qingyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Huosheng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingyuan Zhu.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, S., Huang, T., Zhu, Q. et al. ODSPC: deep learning-based 3D object detection using semantic point cloud. Vis Comput 40, 849–863 (2024). https://doi.org/10.1007/s00371-023-02820-2

Download citation

Accepted: 26 February 2023
Published: 18 March 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00371-023-02820-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ODSPC: deep learning-based 3D object detection using semantic point cloud

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ODSPC: deep learning-based 3D object detection using semantic point cloud

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation