Abstract
Three-dimensional object detection plays a key role in autonomous driving, which becomes extremely challenging in occlusion situations. This paper presents a novel multimodal 3D object detection framework which fuses visual semantic information and depth point cloud information to accurately detect targets with distant object features and occlusion situations. The framework consists of the four steps. Firstly, an improved semantic segmentation network is used to extract semantic information of objects containing similar features. Secondly, semantic images and point clouds are combined to generate pixel-level fusion data so that the semantic information and training capability of sparse and far-point clouds can be improved. Thirdly, a deep learning-based point cloud classification network is used for training of the fused data to output accurate detection frames. Fourthly, an extended Kalman filter is incorporated into point cloud prediction for image-based object detection to further enhance the robustness of object detection. Both Cityscapes and KITTI datasets are used in ablation study and experiments to validate the effectiveness of the proposed framework.
Similar content being viewed by others
References
Gonzalez, D., Perez, J., Milanes, V., Nashashibi, F.: A review of motion planning techniques for automated vehicles. IEEE Trans. Intell. Transp. Syst. 17(4), 1135–1145 (2016)
Yi, C.L., Zhang, K.F., Peng, N.L.: A multi-sensor fusion and object tracking algorithm for self-driving vehicles. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 233(9), 2293–2300 (2019)
Zhang, X.Q., Wang, X.X., Gu, C.H.: Online multi-object tracking with pedestrian re-identification and occlusion processing. Vis. Comput. 37(5), 1089–1099 (2021)
Dames, P.M.: Distributed multi-target search and tracking using the phd filter. Auton. Robot. 44(3–4), 673–689 (2020)
Mao, Q.C., Sun, H.M., Zuo, L.Q., Jia, R.S.: Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl. Intell. 50, 3125–3136 (2020)
Wu, P., Gu, L.P., Yan, X.F., et al.: Pv-rcnn plus: semantical point-voxel feature interaction for 3D object detection. Vis. Comput. 45, 456 (2022). https://doi.org/10.1007/s00371-022-02672-2
Maiettini, E., Pasquale, G., Rosasco, L., Natale, L.: On-line object detection: a robotics challenge. J. Auton. Robots 44(5), 739–757 (2020)
Wang, Y.N., Wang, H.W., Cao, J.Z.: A contour self-compensated network for salient object detection. Visual Computer 37(6), 1467–1479 (2020)
Yuan, J.Y., Zhang, G.X., Li, F.P., et al.: Independent moving object detection based on a vehicle mounted binocular camera. IEEE Sens. J. 21(10), 11522–11531 (2021)
Chen, L., Fan, L., Xie, G.D., Huang, K., Nuchter, A.: Moving-object detection from consecutive stereo pairs using slanted plane smoothing. IEEE Trans. Intell. Transp. Syst. 18(11), 3093–3102 (2017)
Jafari, O.H., Mitzel, D., Leibe, B.: Real-time rgb-d based people detection and tracking for mobile robots and head-worn cameras. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5636–5643 (2014)
Cao, L.Y., Zhang, X.L., Wang, Z.S., Ding, G.Y.: Multi angle rotation object detection for remote sensing image based on modified feature pyramid networks. Int. J. Remote Sens. 42(14), 5257–5280 (2021)
Zhao, J.X., Xu, H., Liu, H.C., Wu, J.Q., Zheng, Y.C., Wu, D.Y.: Detection and tracking of pedestrians and vehicles using roadside lidar sensors. Transp. Res. Part C Emerg. Technol. 100, 68–87 (2019)
Wang, H., Lou, X.Y., Cai, Y.F., Li, Y.C., Chen, L.: Real-time vehicle detection algorithm based on vision and lidar point cloud fusion. J. Sens. 2019, 1–9 (2019)
Bello, S.A., Yu, S.S., Wang, C., Adam, J.M., Li, J.: Review: deep learning on 3D point clouds. Remote Sens. 12(11), 1–34 (2020)
Yan, Z., Duckett, T., Bellotto, N.: Online learning for 3D lidar-based human detection: experimental analysis of point cloud clustering and classification methods. Auton. Robots 44(2), 147–164 (2020)
Xie, Q., Lai, Y.K., Wu, J., Wang, Z.T., Zhang, Y.M., Xu, K., Wang, J.: Vote-based 3D object detection with context modeling and sob-3dnms. Int. J. Comput. Vis. 129, 1857–1874 (2021)
Asvadi, A., Premebida, C., Peixoto, P., Nunes, U.: 3D lidar-based static and moving obstacle detection in driving environments: an approach based on voxels and multi-region ground planes. Robot. Auton. Syst. 83, 299–311 (2016)
Mao, J., Shi, S., Wang, X., Li, H.: 3D object detection for autonomous driving: a review and new outlooks (2022). arXiv preprint arXiv:2206.09474
Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., Tai, C.-L.: Transfusion: robust lidar-camera fusion for 3D object detection with transformers (2022). arXiv preprint arXiv:2203.11496
Du, X.X., Ang, M.H., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3194–3200 (2018)
Xu, D.F., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 244–253 (2018)
Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: enhancing point features with image semantics for 3D object detection (2020). arXiv preprint arXiv:2007.08856
Wang, Z.J., Zhao, Z., Jin, Z., Che, Z.P., Tang, J., Peng, Y.X.: Multi-stage fusion for multi-class 3D lidar detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3113–3121 (2021)
Shi, S.S., Wang, X.G., Li, H.S.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–779 (2019)
Yang, Z.T., Sun, Y., Liu, S., Jia, J.Y.: 3DSSD: Point-based 3D single stage object detector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11037–11045 (2020)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud-based 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499 (2018)
Qi, C.R., Liu, W., Wu, C.X., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from rgb-d data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 918–927 (2018)
Chen, X.Z., Ma, H.M., Wan, J., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.6526–6534 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.5750–5757 (2018)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pvrcnn: point-voxel feature set abstraction for 3D object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.10529–10538 (2020)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.77–85 (2017)
Cordts, M., Omran, M., Ramos, S. et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.3213–3223 (2016)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp.3354–3361 (2012)
Kesten, R., Usman, M., Houston, J., Pandya, T. et al.: Level 5 perception dataset 2020 (2019). https://level-5.global/level5/data/
Acknowledgements
This work was funded by the National Natural Science Foundation of China (Grant No.52075461), the Key Project in Science and Technology Plan of Xiamen, China (Grant No. 3502Z20201015), and the Innovation Method Special Project of Ministry of Science and Technology of China (Grant No. 2020IM010100).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, S., Huang, T., Zhu, Q. et al. ODSPC: deep learning-based 3D object detection using semantic point cloud. Vis Comput 40, 849–863 (2024). https://doi.org/10.1007/s00371-023-02820-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02820-2