Skip to main content

Monocular Differentiable Rendering for Self-supervised 3D Object Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

Abstract

3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale. To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective derived from a pretrained monocular depth estimation network. We use the KITTI 3D object detection dataset to evaluate the accuracy of the method. Experiments demonstrate that we can effectively use noisy monocular depth and differentiable rendering as an alternative to expensive 3D ground-truth labels or LiDAR information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. In: CoRR (2015)

    Google Scholar 

  2. Chen, W., et al.: Learning to predict 3D objects with an interpolation-based differentiable renderer. In: NeurIPS (2019)

    Google Scholar 

  3. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR (2016)

    Google Scholar 

  4. Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS (2015)

    Google Scholar 

  5. Engelmann, F., Stückler, J., Leibe, B.: Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 219–230. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45886-1_18

    Chapter  Google Scholar 

  6. Engelmann, F., Stückler, J., Leibe, B.: SAMP: shape and motion priors for 4D vehicle reconstruction. In: WACV (2017)

    Google Scholar 

  7. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for Autonomous Driving? The KITTI vision benchmark suite. In: CVPR (2012)

    Google Scholar 

  8. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)

    Google Scholar 

  9. Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: PackNet-SfM: 3D packing for self-supervised monocular depth estimation. In: CoRR (2019)

    Google Scholar 

  10. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  11. Ushiku, Y., Kato, H., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)

    Google Scholar 

  12. Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: NIPS (2018)

    Google Scholar 

  13. Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: CVPR (2019)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  15. Kulkarni, N., Gupta, A., Tulsiani, S.: Canonical surface mapping via geometric cycle consistency. In: ICCV (2019)

    Google Scholar 

  16. Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: CVPR (2018)

    Google Scholar 

  17. Li, T.-M., Aittala, M., Durand, F., Lehtinen, J.: Differentiable Monte Carlo ray tracing through edge sampling. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37(6), 222:1–222:11 (2018)

    Google Scholar 

  18. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  19. Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: ICCV (2019)

    Google Scholar 

  20. Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11

    Chapter  Google Scholar 

  21. Ma, X., Wang, Z., Li, H., Ouyang, W., Zhang, P.: Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving. In: ICCV (2019)

    Google Scholar 

  22. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and egomotion from monocular video using 3D geometric constraints. In: CVPR (2018)

    Google Scholar 

  23. Manhardt, F., Kehl, W., Gaidon, A.: ROI-10D: monocular lifting of 2D detection to 6D pose and metric shape. In: CVPR (2019)

    Google Scholar 

  24. Choi, H.M., Kang, H., Hyun, Y.: Multi-view reprojection architecture for orientation estimation. In: ICCVW (2019)

    Google Scholar 

  25. Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: CVPR (2017)

    Google Scholar 

  26. Pillai, S., Ambruş, R., Gaidon, A.: SuperDepth: self-supervised, super-resolved monocular depth estimation. In: ICRA (2019)

    Google Scholar 

  27. Simonelli, A., Bulo, S.R., Porzi, L., López-Antequera, M., Kontschieder, P.: Disentangling monocular 3D object detection. In: ICCV (2019)

    Google Scholar 

  28. Simonelli, A., Bulò, S.R., Porzi, L., Ricci, E., Kontschieder,P.: Single-stage monocular 3D object detection with virtual cameras. In: CoRR (2019)

    Google Scholar 

  29. Stutz, D., Geiger, A.: Learning 3D shape completion under weak supervision. In: IJCV (2018)

    Google Scholar 

  30. Tulsiani, S., Efros, A.A., Malik, J.: Multi-view consistency as supervisory signal for learning shape and pose prediction. In: CVPR (2018)

    Google Scholar 

  31. Wang, R., Yang, N., Stueckler, J., Cremers, D.: DirectShape: photometric alignment of shape priors for visual vehicle pose and shape estimation. In: ICRA (2020)

    Google Scholar 

  32. Wang, Y., Chao, W.-L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: CVPR (2019)

    Google Scholar 

  33. Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  34. Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: CVPR (2018)

    Google Scholar 

  35. You, Y., et al.: Pseudo-LiDAR++: accurate depth for 3D object detection in autonomous driving. In: ICLR (2020)

    Google Scholar 

  36. Zakharov, S., Kehl, W., Bhargava, A., Gaidon, A.: Autolabeling 3D objects with differentiable rendering of SDF shape priors. In: CVPR (2020)

    Google Scholar 

  37. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  38. Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.J.: Three-D safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported by Toyota Research Institute Advanced Development, Inc. The authors would like to thank Richard Calland, Karim Hamzaoui, Rares Ambrus, Vitor Guizilini for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deniz Beker .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 57518 KB)

Supplementary material 1 (pdf 2846 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beker, D. et al. (2020). Monocular Differentiable Rendering for Self-supervised 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58589-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58588-4

  • Online ISBN: 978-3-030-58589-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics