Towards Generalization Across Depth for Monocular 3D Object Detection

Simonelli, Andrea; Buló, Samuel Rota; Porzi, Lorenzo; Ricci, Elisa; Kontschieder, Peter

doi:10.1007/978-3-030-58542-6_46

Andrea Simonelli^13,14,
Samuel Rota Buló¹²,
Lorenzo Porzi¹²,
Elisa Ricci^13,14 &
…
Peter Kontschieder¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12367))

Included in the following conference series:

European Conference on Computer Vision

3216 Accesses
27 Citations

Abstract

While expensive LiDAR and stereo camera rigs have enabled the development of successful 3D object detection methods, monocular RGB-only approaches lag much behind. This work advances the state of the art by introducing MoVi-3D, a novel, single-stage deep architecture for monocular 3D object detection. MoVi-3D builds upon a novel approach which leverages geometrical information to generate, both at training and test time, virtual views where the object appearance is normalized with respect to distance. These virtually generated views facilitate the detection task as they significantly reduce the visual appearance variability associated to objects placed at different distances from the camera. As a consequence, the deep model is relieved from learning depth-specific representations and its complexity can be significantly reduced. In particular, in this work we show that, thanks to our virtual views generation process, a lightweight, single-stage architecture suffices to set new state-of-the-art results on the popular KITTI3D benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Article 23 October 2023

Notes

1.
Official KITTI3D benchmark http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d..
2.
Official KITTI3D benchmark http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

References

Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: ICCV, pp. 9287–9296 (2019)
Google Scholar
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep manta: a coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In: CVPR, pp. 2040–2049 (2017)
Google Scholar
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015)
Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, T., Soatto, S.: Mono3D++: monocular 3D vehicle detection with two-scale 3D hypotheses and task priors. In: AAAI, pp. 8409–8416 (2019)
Google Scholar
Jorgensen, E., Zach, C., Kahl, F.: Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss. CoRR abs/1906.08070 (2019)
Google Scholar
Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: CVPR, pp. 11867–11876 (2019)
Google Scholar
Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: CVPR, pp. 3559–3568 (2018)
Google Scholar
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: ECCV, pp. 642–656 (2018)
Google Scholar
Li, B., Ouyang, W., Sheng, L., Zeng, X., Wang, X.: GS3D: an efficient 3D object detection framework for autonomous driving. In: CVPR, pp. 1019–1028 (2019)
Google Scholar
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: CVPR, pp. 7345–7353 (2019)
Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. CoRR abs/1612.03144 (2016)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR abs/1708.02002 (2017)
Google Scholar
Liu, L., et al.: Deep learning for generic object detection: a survey. CoRR abs/1809.02165 (2018)
Google Scholar
Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. CoRR abs/1904.12681 (2019)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Google Scholar
Liu, Z., Wu, Z., Tóth, R.: Smoke: single-stage monocular 3D object detection via keypoint estimation. CoRR abs/2002.10111 (2020)
Google Scholar
Manhardt, F., Kehl, W., Gaidon, A.: ROI-10D: monocular lifting of 2D detection to 6d pose and metric shape. In: CVPR, pp. 2069–2078 (2019)
Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: CVPR, pp. 5632–5640 (2017)
Google Scholar
Qin, Z., Wang, J., Lu, Y.: MonoGRNet: a geometric reasoning network for 3D object localization. In: AAAI, pp. 8851–8858 (2019)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 1137–1149 (2015)
Google Scholar
Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3D object detection. CoRR abs/1811.08188 (2018)
Google Scholar
Rota Bulò, S., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: CVPR, pp. 5639–5647 (2018)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)
Google Scholar
Shin, K., Kwon, Y.P., Tomizuka, M.: RoarNet: a robust 3D object detection based on region approximation refinement. CoRR abs/1811.03818 (2018)
Google Scholar
Simonelli, A., Rota Bulò, S., Porzi, L., López-Antequera, M., Kontschieder, P.: Disentangling monocular 3D object detection. In: ICCV, pp. 1991–1999 (2019)
Google Scholar
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: CVPR, pp. 8437–8445 (2019)
Google Scholar
Wang, Z., Jia, K.: Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection. CoRR abs/1903.01864 (2019)
Google Scholar

Download references

Acknowledgements

We acknowledge that the University of Trento received financial support from H2020 EU project SPRING – Socially Pertinent Robots in Gerontological Healthcare. This work was carried out under the Vision and Learning joint Laboratory between FBK and UNITN.

Author information

Authors and Affiliations

Facebook, Cambridge, USA
Samuel Rota Buló, Lorenzo Porzi & Peter Kontschieder
University of Trento, Trento, Italy
Andrea Simonelli & Elisa Ricci
Fondazione Bruno Kessler, Trento, Italy
Andrea Simonelli & Elisa Ricci

Authors

Andrea Simonelli
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Rota Buló
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Porzi
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Ricci
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kontschieder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Simonelli .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1883 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simonelli, A., Buló, S.R., Porzi, L., Ricci, E., Kontschieder, P. (2020). Towards Generalization Across Depth for Monocular 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-58542-6_46
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Generalization Across Depth for Monocular 3D Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1883 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Generalization Across Depth for Monocular 3D Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1883 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation