Abstract
Leveraging LiDAR-based detectors or real LiDAR point data to guide monocular 3D detection has brought significant improvement, e.g., Pseudo-LiDAR methods. However, the existing methods usually apply non-end-to-end training strategies and insufficiently leverage the LiDAR information, where the rich potential of the LiDAR data has not been well exploited. In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. Moreover, we further extend CMKD as a semi-supervised training framework by distilling knowledge from large-scale unlabeled data and significantly boost the performance. Until submission, CMKD ranks \(1^{st}\) among the monocular 3D detectors with publications on both KITTI test set and Waymo val set with significant performance gains compared to previous state-of-the-art methods. Our code will be released at https://github.com/Cc-Hy/CMKD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Brazil, G., Liu, X.: M3d-rpn: Monocular 3d region proposal network for object detection. In: ICCV (2019)
Caesar, H., Bankiti, V., Lang, A.H., et al.: Nuscenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Chen, H., Huang, Y., Tian, W., et al.: Monorun: monocular 3d object detection by reconstruction and uncertainty propagation. In: CVPR (2021)
Chen, L., Papandreou, G., Schroff, F., et al.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)
Chen, X., Kundu, K., Zhu, Y., et al.: 3d object proposals for accurate object class detection. In: NIPS (2015)
Chen, Y.N., Dai, H., Ding, Y.: Pseudo-stereo for monocular 3d object detection in autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 887–897 (2022)
Chong, Z., Ma, X., Zhang, H., Yue, Y., Li, H., Wang, Z., Ouyang, W.: Monodistill: learning spatial features for monocular 3d object detection (2022)
Dai, X., Jiang, Z., Wu, Z., et al.: General instance distillation for object detection. In: CVPR (2021)
Deng, J., Shi, S., Li, P., et al.: Voxel r-cnn: towards high performance voxel-based 3d object detection. In: AAAI (2021)
Ding, M., Huo, Y., Yi, H., et al.: Learning depth-guided convolutions for monocular 3d object detection. In: CVPR (2020)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)
Ettinger, S., Cheng, S., Caine, B., et al.: Large scale interactive motion forecasting for autonomous driving: the waymo open motion dataset. CoRR abs/2104.10133 (2021)
Fu, H., Gong, M., Wang, C., et al.: Deep ordinal regression network for monocular depth estimation. In: CVPR (2018)
Furlanello, T., Lipton, Z.C., Tschannen, M., et al.: Born-again neural networks. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. In: International Journal of Robotics Research (IJRR) (2013)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)
Gülçehre, Ç., Bengio, Y.: Knowledge matters: importance of prior information for optimization. In: ICLR (2013)
Guo, X., Shi, S., et al.: Liga:learning lidar geometry aware representations for stereo-based 3d detector. In: ICCV (2021)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. CoRR abs/1707.01219 (2017)
Jörgensen, E., Zach, C., Kahl, F.: Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss. CoRR abs/1906.08070
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: IROS (2018)
Ku, J., Mozifian, M., Lee, J., et al.: Joint 3d proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018)
Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In: CVPR (2019)
Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: Proceedings of the IEEE Intelligent Transportation Systems Conference (2019)
Lee, J.H., Han, M.K., Ko, D.W., et al.: From big to small: multi-scale local planar guidance for monocular depth estimation (2019)
Li, J., Dai, H., Shao, L., Ding, Y.: Anchor-free 3d single stage detector with mask-guided attention for point cloud. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 553–562 (2021)
Li, J., Dai, H., Shao, L., Ding, Y.: From voxel to point: Iou-guided 3d object detection for point cloud with voxel-to-point decoder. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4622–4631 (2021)
Li, J., et al.: 3d iou-net: Iou guided 3d object detector for point clouds. arXiv preprint. arXiv:2004.04962 (2020)
Li, J., et al.: P2v-rcnn: point to voxel feature learning for 3d object detection from point clouds. IEEE Access 9, 98249–98260 (2021)
Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3d object detection for autonomous driving. In: CVPR (2019)
Li, X., Wang, W., Wu, L., et al.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In: NIPS (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, J., Hou, Q., Cheng, M., et al.: Improving convolutional networks with self-calibrated convolutions. In: CVPR (2020)
Lu, X., Li, Q., Li, B., Yan, J.: MimicDet: bridging the gap between one-stage and two-stage object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 541–557. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_32
Lu, Y., Ma, X., Y ang, L., et al.: Geometry uncertainty projection network for monocular 3d object detection. arXiv preprint. arXiv:2107.13774 (2021)
Luo, S., Dai, H., Shao, L., Ding, Y.: M3dssd: Monocular 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6145–6154 (2021)
Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., Ouyang, W.: Rethinking pseudo-lidar representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 311–327. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_19
Ma, X., Wang, Z., Li, H., et al.: Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: ICCV (2019)
Ma, X., Zhang, Y., Xu, D., et al.: Delving into localization errors for monocular 3d object detection. In: CVPR (2021)
Pang, S., Morris, D.D., Radha, H.: Clocs: camera-lidar object candidates fusion for 3d object detection. In: IROS (2020)
Park, D., Ambrus, R., Guizilini, V.O.: Is pseudo-lidar needed for monocular 3d object detection? In: ICCV (2021)
Peng, L., Liu, F., Yu, Z., et al.: Lidar point cloud guided monocular 3d object detection. CoRR (2021)
Qi, C.R., Wei, L., Wu, C., et al.: Frustum pointnets for 3d object detection from rgb-d data. In: CVPR (2018)
Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS (2017)
Qin, Z., Wang, J., Lu, Y.: Monogrnet: a geometric reasoning network for 3d object localization. In: AAAI (2019)
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: CVPR (2021)
Romero, A., Ballas, N., Kahou, S.E., et al.: Fitnets: hints for thin deep nets. In: ICLR (2015)
Shi, S., Guo, C., Jiang, L., et al.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: CVPR (2020)
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR (2019)
Shi, X., Ye, Q., Chen, X., et al.: Geometry-based distance decomposition for monocular 3d object detection. In: ICCV (2021)
Simonelli, A., Bulò, S.R., Porzi, L., et al.: Demystifying pseudo-lidar for monocular 3d object detection. CoRR abs/2012.05796 (2020)
Sun, J., Chen, L., Xie, Y., et al.: Disp r-cnn: stereo 3d object detection via shape prior guided instance disparity estimation. In: CVPR (2020)
Wang, L., Zhang, L., Zhu, Y., et al.: Progressive coordinate transforms for monocular 3d object detection. In: NIPS (2021)
Wang, Y., Chao, W.L., Garg, D., et al.: Pseudo-lidar from visual depth estimation: bridging the gap in 3d object detection for autonomous driving. In: CVPR (2019)
Weng, X., Kitani, K.: Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud. arXiv:1903.09847 (2019)
Xu, Z., Hsu, Y., et al.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In: ICLR (2018)
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Ye, X., et al.: Monocular 3D object detection via feature domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 17–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_2
You, Y., Wang, Y., Chao, W.L., et al.: Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. In: ICLR (2020)
Zhang, Y., Lu, J., Zhou, J.: Objects are different: flexible monocular 3d object detection. In: CVPR (2021)
Zheng, W., Tang, W., Jiang, L., et al.: Se-ssd: self-ensembling single-stage object detector from point cloud. In: CVPR (2021)
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. CoRR abs/1711.06396 (2017)
Zou, Z., Ye, X., Du, L., et al.: The devil is in the task: exploiting reciprocal appearance-localization features for monocular 3d object detection. In: ICCV (2021)
Acknowledgement
This work was supported by the National Key Research and Development Program of China (Grant No. 2018YFE0183900) and the YUNJI Technology Co. Ltd.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, Y., Dai, H., Ding, Y. (2022). Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13670. Springer, Cham. https://doi.org/10.1007/978-3-031-20080-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-20080-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20079-3
Online ISBN: 978-3-031-20080-9
eBook Packages: Computer ScienceComputer Science (R0)