RGB-LiDAR fusion for accurate 2D and 3D object detection

Mousa-Pasandi, Morteza; Liu, Tianran; Massoud, Yahya; Laganière, Robert

doi:10.1007/s00138-023-01435-w

RGB-LiDAR fusion for accurate 2D and 3D object detection

Original Paper
Published: 20 August 2023

Volume 34, article number 86, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Morteza Mousa-Pasandi¹^na1,
Tianran Liu¹^na1,
Yahya Massoud ORCID: orcid.org/0000-0003-2898-8834¹^na1 &
…
Robert Laganière¹

450 Accesses
Explore all metrics

Abstract

Effective detection of road objects in diverse environmental conditions is a critical requirement for autonomous driving systems. Multi-modal sensor fusion is a promising approach for improving perception, as it enables the combination of information from multiple sensor streams in order to optimize the integration of their respective data. Fusion operators are employed within fully convolutional architectures to combine features derived from different modalities. In this research, we present a framework that utilizes early fusion mechanisms to train and evaluate 2D object detection algorithms. Our evaluation shows that sensor fusion outperforms RGB-only detection methods, yielding a boost of +15.07% for car detection, +10.81% for pedestrian detection, and +19.86% for cyclist detection. In our comparative study, we evaluated three arithmetic-based fusion operators and two learnable fusion operators. Furthermore, we conducted a performance comparison between early- and mid-level fusion techniques and investigated the effects of early fusion on state-of-the-art 3D object detectors. Lastly, we provide a comprehensive analysis of the computational complexity of our proposed framework, along with an ablation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LiDAR-Camera-Based Deep Dense Fusion for Robust 3D Object Detection

Real time object detection using LiDAR and camera fusion for autonomous driving

Article Open access 17 May 2023

Sensor Fusion Operators for Multimodal 2D Object Detection

References

Huang, K., Shi, B., Li, X., Li, X., Huang, S., Li, Y.: Multi-modal sensor fusion for auto driving perception: a survey. arXiv abs/2202.02703 (2022)
Massoud, Y.: Sensor fusion for 3d object detection for autonomous vehicles. Master’s thesis, Université d’Ottawa / University of Ottawa (2021)
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8 (2018)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641–656 (2018)
Wang, S., Suo, S., Ma, W.-C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Chen, X., Gupta, A.: An implementation of faster rcnn with study for region sampling. arXiv preprint arXiv:1702.02138 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 28 (2015)
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499 (2021). IEEE Computer Society
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: arXiv Preprint arXiv:2207.02696 (2022)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-YOLOv4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021 (pp. 13029-13038).(2021)
Law, H., & Deng, J.: CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput Vision. https://doi.org/10.1007/s11263-019-01204-1 (2020)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Kim, J., Choi, J., Kim, Y., Koh, J., Chung, C.C., Choi, J.W.: Robust camera lidar sensor fusion via deep gated information fusion network. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1620–1625 (2018). https://doi.org/10.1109/IVS.2018.8500711
Condat, R., Rogozan, A., Bensrhair, A.: GFD-retina: gated fusion double retinanet for multimodal 2D road object detection. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6 (2020)
Du, X., Ang, M.H., Karaman, S., Rus, D.: A general pipeline for 3d detection of vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3194–3200 (2018). https://doi.org/10.1109/ICRA.2018.8461232
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2018)
Wu, X., Peng, L., Yang, H., Xie, L., Huang, C., Deng, C., Liu, H., Cai, D.: Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5418–5427 (2022)
Zhu, H., Deng, J., Zhang, Y., Ji, J., Mao, Q., Li, H., Zhang, Y.: VPFNet: improving 3d object detection with virtual point based lidar and stereo data fusion. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3189778
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: ICCV, pp. 1839–1848 (2017)
Rahman, M.A., Laganière, R.: Mid-level fusion for end-to-end temporal activity detection in untrimmed video. In: BMVC (2020)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: BiSeNet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129(11), 3051–3068 (2021)
Article Google Scholar
Deng, J., Zhou, W., Zhang, Y., Li, H.: From multi-view to hollow-3d: hallucinated hollow-3d R-CNN for 3d object detection. IEEE Trans. Circuits Syst. Video Technol. 31(12), 4722–4734 (2021)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals using stereo imagery for accurate object class detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1259–1272 (2017)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

Download references

Author information

Morteza Mousa-Pasandi, Tianran Liu and Yahya Massoud have contributed equally to this work.

Authors and Affiliations

Univeristy of Ottawa, Ottawa, ON, Canada
Morteza Mousa-Pasandi, Tianran Liu, Yahya Massoud & Robert Laganière

Authors

Morteza Mousa-Pasandi
View author publications
You can also search for this author in PubMed Google Scholar
Tianran Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yahya Massoud
View author publications
You can also search for this author in PubMed Google Scholar
Robert Laganière
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yahya Massoud.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mousa-Pasandi, M., Liu, T., Massoud, Y. et al. RGB-LiDAR fusion for accurate 2D and 3D object detection. Machine Vision and Applications 34, 86 (2023). https://doi.org/10.1007/s00138-023-01435-w

Download citation

Received: 15 March 2023
Revised: 03 July 2023
Accepted: 16 July 2023
Published: 20 August 2023
DOI: https://doi.org/10.1007/s00138-023-01435-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-LiDAR fusion for accurate 2D and 3D object detection

Abstract

Access this article

Similar content being viewed by others

LiDAR-Camera-Based Deep Dense Fusion for Robust 3D Object Detection

Real time object detection using LiDAR and camera fusion for autonomous driving

Sensor Fusion Operators for Multimodal 2D Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB-LiDAR fusion for accurate 2D and 3D object detection

Abstract

Access this article

Similar content being viewed by others

LiDAR-Camera-Based Deep Dense Fusion for Robust 3D Object Detection

Real time object detection using LiDAR and camera fusion for autonomous driving

Sensor Fusion Operators for Multimodal 2D Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation