Advertisement

Monocular 3D Object Detection via Feature Domain Adaptation

Conference paper
  • 835 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)

Abstract

Monocular 3D object detection is a challenging task due to unreliable depth, resulting in a distinct performance gap between monocular and LiDAR-based approaches. In this paper, we propose a novel domain adaptation based monocular 3D object detection framework named DA-3Ddet, which adapts the feature from unsound image-based pseudo-LiDAR domain to the accurate real LiDAR domain for performance boosting. In order to solve the overlooked problem of inconsistency between the foreground mask of pseudo and real LiDAR caused by inaccurately estimated depth, we also introduce a context-aware foreground segmentation module which helps to involve relevant points for foreground masking. Extensive experiments on KITTI dataset demonstrate that our simple yet effective framework outperforms other state-of-the-arts by a large margin.

Keywords

Monocular 3D Object detection Domain adaptation Pseudo-Lidar 

Notes

Acknowledgments

This work was supported by National Key R&D Program of China (2019YFA0709502), the 111 Project (NO. B18015), the key project of Shanghai Science & Technology (No. 16JC1420402), Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01) and ZJLab, National Key R&D Program of China (No. 2018YFC1312900), National Natural Science Foundation of China (NSFC 91630314).

Supplementary material

504446_1_En_2_MOESM1_ESM.pdf (15.8 mb)
Supplementary material 1 (pdf 16165 KB)

References

  1. 1.
    Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3D point clouds. arXiv preprint arXiv:1707.02392 (2017)
  2. 2.
    Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2018)
  3. 3.
    Atoum, Y., Roth, J., Bliss, M., Zhang, W., Liu, X.: Monocular video-based trailer coupler detection using multiplexer convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5477–5485 (2017)Google Scholar
  4. 4.
    Biegelbauer, G., Vincze, M.: Efficient 3D object detection by fitting superquadrics to range image data for robot’s object manipulation. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 1086–1091. IEEE (2007)Google Scholar
  5. 5.
    Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9287–9296 (2019)Google Scholar
  6. 6.
    Cai, Y., Li, B., Jiao, Z., Li, H., Zeng, X., Wang, X.: Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation. arXiv preprint arXiv:2002.01619 (2020)
  7. 7.
    Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of the IEEE Conference on computer vision and Pattern Recognition, pp. 2040–2049 (2017)Google Scholar
  8. 8.
    Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)Google Scholar
  9. 9.
    Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)Google Scholar
  10. 10.
    Chen, X., et al.: 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2015)Google Scholar
  11. 11.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)Google Scholar
  12. 12.
    Du, L., et al.: Associate-3Ddet: perceptual-to-conceptual association for 3D point cloud object detection. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020Google Scholar
  13. 13.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)Google Scholar
  14. 14.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)Google Scholar
  15. 15.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  16. 16.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)Google Scholar
  17. 17.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)Google Scholar
  18. 18.
    Hoffman, J., Wang, D., Yu, F., Darrell, T.: FCNs in the wild: pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649 (2016)
  19. 19.
    Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5001–5009 (2018)Google Scholar
  20. 20.
    Jörgensen, E., Zach, C., Kahl, F.: Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss. arXiv preprint arXiv:1906.08070 (2019)
  21. 21.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  22. 22.
    Kouw, W.M., Loog, M.: An introduction to domain adaptation and transfer learning. arXiv preprint arXiv:1812.11806 (2018)
  23. 23.
    Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)Google Scholar
  24. 24.
    Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11867–11876 (2019)Google Scholar
  25. 25.
    Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1019–1028 (2019)Google Scholar
  26. 26.
    Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving. arXiv preprint arXiv:2001.03343 (2020)
  27. 27.
    Li, X., Grandvalet, Y., Davoine, F.: Explicit inductive bias for transfer learning with convolutional networks. arXiv preprint arXiv:1802.01483 (2018)
  28. 28.
    Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1057–1066 (2019)Google Scholar
  29. 29.
    Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., Fan, X.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6851–6860 (2019)Google Scholar
  30. 30.
    Manhardt, F., Kehl, W., Gaidon, A.: ROI-10D: monocular lifting of 2D detection to 6d pose and metric shape. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2069–2078 (2019)Google Scholar
  31. 31.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)Google Scholar
  32. 32.
    Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7074–7082 (2017)Google Scholar
  33. 33.
    Naiden, A., Paunescu, V., Kim, G., Jeon, B., Leordeanu, M.: Shift R-CNN: deep monocular 3D object detection with closed-form geometric constraints. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 61–65. IEEE (2019)Google Scholar
  34. 34.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)Google Scholar
  35. 35.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)Google Scholar
  36. 36.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)Google Scholar
  37. 37.
    Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for monocular 3D object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8851–8858 (2019)Google Scholar
  38. 38.
    Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3D object detection. arXiv preprint arXiv:1811.08188 (2018)
  39. 39.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)Google Scholar
  40. 40.
    Simonelli, A., Bulo, S.R., Porzi, L., López-Antequera, M., Kontschieder, P.: Disentangling monocular 3D object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1991–1999 (2019)Google Scholar
  41. 41.
    Vianney, J.M.U., Aich, S., Liu, B.: RefinedMPL: refined monocular PseudoLiDAR for 3D object detection in autonomous driving. arXiv preprint arXiv:1911.09712 (2019)
  42. 42.
    Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)Google Scholar
  43. 43.
    Weng, X., Kitani, K.: Monocular 3D object detection with pseudo-LiDAR point cloud. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  44. 44.
    Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2345–2353 (2018)Google Scholar
  45. 45.
    Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)CrossRefGoogle Scholar
  46. 46.
    You, Y., et al.: Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving. arXiv preprint arXiv:1906.06310 (2019)
  47. 47.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Baidu Inc.BeijingChina
  2. 2.Institute of Science and Technology for Brain-Inspired IntelligenceFudan UniversityShanghaiChina
  3. 3.Key Laboratory of Computational Neuroscience and Brain-Inspired IntelligenceFudan University, Ministry of EducationShanghaiChina

Personalised recommendations