Skip to main content
Log in

GIoU-CLOCs: Import Generalized Intersection Over Union-Involved Object-Detection Tasks Based on Lidar and Camera

  • Published:
Journal of Russian Laser Research Aims and scope

In recent years, the application of LIDAR has become much more extensive, especially in object detection. While laser researches have encouraging performance in detection, they are typically based on a single modality, being unable to collect information from other modalities. In this paper, we introduce a late fusion way to fuse data from LIDAR and RGB camera. For the disorder of laser, we introduce polynomial functions into the 3D network, which enable the network to take higher-order moments of a given shape into account. Considering the geometric and semantic consistency, we fuse point clouds and images to generate more accurate 3D and 2D detection results. Finally, we address the weaknesses of the intersection over union in the fusion network, employing a generalized version as both a new loss and a new metric. The experimental evaluation of the challenging KITTI object detection benchmark shows significant improvements, especially in the birds’ eye view, which shows the feasibility and applicability of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C.-Y.Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv:2207.02696 (2022).

  2. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2014), pp. 580–587; https://doi.org/10.1109/CVPR.2014.81

  3. R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, IEEE (2015), pp. 1440–1448; https://doi.org/10.1109/ICCV.2015.169

  4. S. Ren, K. He, R. Girshick, and J. Sun, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1137 (2017); https://doi.org/10.1109/TPAMI.2016.2577031

  5. H. Rezatofighi, N. Tsoi, J. Y. Gwak, et al., “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2019), pp. 658–666; https://doi.org/10.1109/CVPR.2019.00075

  6. M. Joseph-Rivlin, A. Zvirin, and R. Kimmel, “Momen(e)t: flavor the moments in learning to classify shapes,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, IEEE (2019); https://doi.org/10.1109/iccvw.2019.00503

  7. S. Pang, D. Morris, and H. Radha, “CLOCs: camera-LiDAR object candidates fusion for 3D object detection,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2020), pp. 10386–10393; https://doi.org/10.1109/IROS45743.2020.9341791

  8. V. A. Sindagi, Y. Zhou, and O. Tuzel, “MVX-net: multimodal voxelnet for 3D object detection,” 2019 International Conference on Robotics and Automation (ICRA), IEEE (2019), pp. 7276–7282; https://doi.org/10.1109/ICRA.2019.8794195

  9. Y. Zhou and O. Tuzel, “Voxelnet: end-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2018), pp. 4490–4499; https://doi.org/10.1109/cvpr.2018.00472

  10. H. Su, S. Maji, E. Kalogerakis, et al., “Multi-view convolutional neural networks for 3D shape recognition,” in Proceedings of the IEEE International Conference on Computer Vision, IEEE (2015), pp. 945–953; https://doi.org/10.1109/ICCV.2015.114

  11. B. Jiang, R. Luo, J. Mao, et al., “Acquisition of localization confidence for accurate object detection,” in Proceedings of the European Conference on Computer Vision (ECCV), (2018), pp. 784–799; https://doi.org/10.1007/978-3-030-01264-9 48

  12. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2012), pp. 3354–3361; https://doi.org/10.1109/cvpr.2012.6248074

  13. A. Mousavian, D. Anguelov, J. Flynn, et al., “3D bounding box estimation using deep learning and geometry,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, IEEE (2017), pp. 7074–7082; https://doi.org/10.1109/cvpr.2017.597

  14. Z. Cai and N. Vasconcelos, IEEE Trans. Pattern Anal. Mach. Intell., 43, 1483 (2019); https://doi.org/10.1109/tpami.2019.2956516

  15. Y. Yan, Y. Mao, and B. Li, Sensors, 18, 3337 (2018); https://doi.org/10.3390/s18103337

  16. A. H. Lang, S. Vora, H. Caesar, et al. “Pointpillars: fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2019), pp. 12697–12705; https://doi.org/10.1109/cvpr.2019.01298

  17. S. Vora, A. H. Lang, B. Helou, et al., “Pointpainting: sequential fusion for 3D object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2020), pp. 4604–4612; https://doi.org/10.1109/cvpr42600.2020.00466

  18. T. Huang, Z. Liu, X. Chen, et al., “Epnet: enhancing point features with image semantics for 3D object detection,” European Conference on Computer Vision, Springer, Cham (2020), pp. 35–52; https://doi.org/10.1007/978-3-030-58555-6 3

  19. X. Chen, H. Ma, J. Wan, et al., “Multi-view 3D object detection network for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2017); pp. 1907–1915; https://doi.org/10.1109/CVPR.2017.691

  20. J. Deng, S. Shi, P. Li, et al., “Voxel R-CNN: towards high performance voxel-based 3D object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1201 (2021); https://doi.org/10.1609/aaai.v35i2.16207

  21. C. R. Qi, H. Su, K. Mo, et al., “Pointnet: deep learning on point sets for 3D classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 652–660; https://doi.org/10.1109/CVPR.2017.16

  22. A. M. Bronstein, M. M. Bronstein, R. Kimmel, “Rock, paper, and scissors: extrinsic vs. intrinsic similarity of non-rigid shapes,” 2007 IEEE 11th International Conference on Computer Vision, IEEE (2007), pp. 1–6; https://doi.org/10.1109/iccv.2007.4409076

  23. Y. Li, A. W. Yu, T. Meng, et al., “Deepfusion: lidar-camera deep fusion for multi-modal 3D object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE (2022), pp. 17182–17191; https://doi.org/10.1109/cvpr52688.2022.01667

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifeng Wang.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, T., Zhou, Y., Wang, S. et al. GIoU-CLOCs: Import Generalized Intersection Over Union-Involved Object-Detection Tasks Based on Lidar and Camera. J Russ Laser Res 44, 100–109 (2023). https://doi.org/10.1007/s10946-023-10113-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10946-023-10113-1

Keywords

Navigation