Abstract
While expensive LiDAR and stereo camera rigs have enabled the development of successful 3D object detection methods, monocular RGB-only approaches lag much behind. This work advances the state of the art by introducing MoVi-3D, a novel, single-stage deep architecture for monocular 3D object detection. MoVi-3D builds upon a novel approach which leverages geometrical information to generate, both at training and test time, virtual views where the object appearance is normalized with respect to distance. These virtually generated views facilitate the detection task as they significantly reduce the visual appearance variability associated to objects placed at different distances from the camera. As a consequence, the deep model is relieved from learning depth-specific representations and its complexity can be significantly reduced. In particular, in this work we show that, thanks to our virtual views generation process, a lightweight, single-stage architecture suffices to set new state-of-the-art results on the popular KITTI3D benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Official KITTI3D benchmark http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d..
- 2.
Official KITTI3D benchmark http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d
References
Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: ICCV, pp. 9287–9296 (2019)
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep manta: a coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: CVPR, pp. 2040–2049 (2017)
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015)
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
He, T., Soatto, S.: Mono3D++: monocular 3D vehicle detection with two-scale 3D hypotheses and task priors. In: AAAI, pp. 8409–8416 (2019)
Jorgensen, E., Zach, C., Kahl, F.: Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss. CoRR abs/1906.08070 (2019)
Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: CVPR, pp. 11867–11876 (2019)
Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: CVPR, pp. 3559–3568 (2018)
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: ECCV, pp. 642–656 (2018)
Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving. In: CVPR, pp. 1019–1028 (2019)
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR, pp. 7345–7353 (2019)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. CoRR abs/1612.03144 (2016)
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR abs/1708.02002 (2017)
Liu, L., et al.: Deep learning for generic object detection: a survey. CoRR abs/1809.02165 (2018)
Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. CoRR abs/1904.12681 (2019)
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Liu, Z., Wu, Z., Tóth, R.: Smoke: single-stage monocular 3D object detection via keypoint estimation. CoRR abs/2002.10111 (2020)
Manhardt, F., Kehl, W., Gaidon, A.: ROI-10D: monocular lifting of 2D detection to 6d pose and metric shape. In: CVPR, pp. 2069–2078 (2019)
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: CVPR, pp. 5632–5640 (2017)
Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for 3D object localization. In: AAAI, pp. 8851–8858 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 1137–1149 (2015)
Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3D object detection. CoRR abs/1811.08188 (2018)
Rota Bulò, S., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: CVPR, pp. 5639–5647 (2018)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)
Shin, K., Kwon, Y.P., Tomizuka, M.: RoarNet: a robust 3D object detection based on region approximation refinement. CoRR abs/1811.03818 (2018)
Simonelli, A., Rota Bulò, S., Porzi, L., López-Antequera, M., Kontschieder, P.: Disentangling monocular 3D object detection. In: ICCV, pp. 1991–1999 (2019)
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: CVPR, pp. 8437–8445 (2019)
Wang, Z., Jia, K.: Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection. CoRR abs/1903.01864 (2019)
Acknowledgements
We acknowledge that the University of Trento received financial support from H2020 EU project SPRING – Socially Pertinent Robots in Gerontological Healthcare. This work was carried out under the Vision and Learning joint Laboratory between FBK and UNITN.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Simonelli, A., Buló, S.R., Porzi, L., Ricci, E., Kontschieder, P. (2020). Towards Generalization Across Depth for Monocular 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-58542-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)