Advertisement

Mapillary Planet-Scale Depth Dataset

Conference paper
  • 825 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

Learning-based methods produce remarkable results on single image depth tasks when trained on well-established benchmarks, however, there is a large gap from these benchmarks to real-world performance that is usually obscured by the common practice of fine-tuning on the target dataset. We introduce a new depth dataset that is an order of magnitude larger than previous datasets, but more importantly, contains an unprecedented gamut of locations, camera models and scene types while offering metric depth (not just up-to-scale). Additionally, we investigate the problem of training single image depth networks using images captured with many different cameras, validating an existing approach and proposing a simpler alternative. With our contributions we achieve excellent results on challenging benchmarks before fine-tuning, and set the state of the art on the popular KITTI dataset after fine-tuning.

The dataset is available at mapillary.com/dataset/depth.

References

  1. 1.
  2. 2.
    Bulo, S.R., Porzi, L., Kontschieder, P.: In-place activated BatchNorm for memory-optimized training of DNNs. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5639–5647 (2018).  https://doi.org/10.1109/CVPR.2018.00591
  3. 3.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. CoRR arXiv:abs/1903.11027 (2019)
  4. 4.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation (2017). http://arxiv.org/abs/1706.05587
  5. 5.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  6. 6.
    Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  7. 7.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network (2014). http://arxiv.org/abs/1406.2283
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)CrossRefGoogle Scholar
  9. 9.
    Facil, J.M., et al.: CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth (2019). http://arxiv.org/abs/1904.02028
  10. 10.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep Ordinal Regression Network for Monocular Depth Estimation (2018).  https://doi.org/10.1109/CVPR.2018.00214. http://arxiv.org/abs/1806.02446
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012).  https://doi.org/10.1109/CVPR.2012.6248074. http://ieeexplore.ieee.org/document/6248074/
  12. 12.
    Godard, C., Mac Aodha, O., Brostow, G.: Digging Into Self-Supervised Monocular Depth Estimation (2018). http://arxiv.org/abs/1806.01260
  13. 13.
    Gordon, A., Li, H., Jonschkowski, R., Angelova, A.: Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras (2019). http://arxiv.org/abs/1904.04998
  14. 14.
    Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). arxiv:1712.01259. http://arxiv.org/abs/1712.01259
  15. 15.
    Koch, T., Liebel, L., Fraundorfer, F., Körner, M.: Evaluation of CNN-based single-image depth estimation methods. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 331–348. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-11015-4_25CrossRefGoogle Scholar
  16. 16.
    Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation (2019). http://arxiv.org/abs/1907.10326
  17. 17.
    Li, Z., Snavely, N.: MegaDepth: Learning Single-View Depth Prediction from Internet Photos (2018).  https://doi.org/10.1109/CVPR.2018.00218. http://arxiv.org/abs/1804.00607
  18. 18.
    Lin, T., et al.: Microsoft COCO: Common objects in context. CoRR arXiv:abs/1405.0312 (2014)
  19. 19.
    Liu, R., et al.: An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution (NeurIPS), 1–26 (2018). http://arxiv.org/abs/1807.03247
  20. 20.
    López-Antequera, M., et al.: Deep single image camera calibration with radial distortion. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  21. 21.
    Moulon, P., et al.: Openmvg. https://github.com/openMVG/openMVG
  22. 22.
    Neuhold, G., Ollmann, T., Rota Bulò, S., Kontschieder, P.: The Mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  23. 23.
    Porzi, L., Bulò, S.R., Colovic, A., Kontschieder, P.: Seamless scene segmentation. In: CVPR (2019)Google Scholar
  24. 24.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009).  https://doi.org/10.1109/TPAMI.2008.132CrossRefGoogle Scholar
  25. 25.
    Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016).  https://doi.org/10.1109/CVPR.2016.445. http://ieeexplore.ieee.org/document/7780814/
  26. 26.
    Shen, S.: Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 22(5), 1901–1914 (2013).  https://doi.org/10.1109/TIP.2013.2237921MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: SIGGRAPH Conference Proceedings, pp. 835–846. ACM Press, New York, NY, USA (2006)Google Scholar
  28. 28.
    Sturm, P.: Critical motion sequences for monocular self-calibration and uncalibrated euclidean reconstruction. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (1997).  https://doi.org/10.1109/CVPR.1997.609467
  29. 29.
    Vasiljevic, I., et al.: DIODE: A Dense Indoor and Outdoor DEpth Dataset (2019). http://arxiv.org/abs/1908.00463
  30. 30.
    Wang, C., Lucey, S., Perazzi, F., Wang, O.: Web stereo video supervision for depth prediction from dynamic scenes. CoRR arXiv:abs/1904.11112 (2019). http://arxiv.org/abs/1904.11112
  31. 31.
    Yu, F., et al.: BDD100K: A diverse driving video database with scalable annotation tooling. CoRR arXiv:abs/1805.04687 (2018)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.FacebookMenlo ParkUSA
  2. 2.Institute of Computer Graphics and VisionGraz University of TechnologyGrazAustria

Personalised recommendations