Mapillary Planet-Scale Depth Dataset

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)


Learning-based methods produce remarkable results on single image depth tasks when trained on well-established benchmarks, however, there is a large gap from these benchmarks to real-world performance that is usually obscured by the common practice of fine-tuning on the target dataset. We introduce a new depth dataset that is an order of magnitude larger than previous datasets, but more importantly, contains an unprecedented gamut of locations, camera models and scene types while offering metric depth (not just up-to-scale). Additionally, we investigate the problem of training single image depth networks using images captured with many different cameras, validating an existing approach and proposing a simpler alternative. With our contributions we achieve excellent results on challenging benchmarks before fine-tuning, and set the state of the art on the popular KITTI dataset after fine-tuning.

The dataset is available at


  1. 1.
  2. 2.
    Bulo, S.R., Porzi, L., Kontschieder, P.: In-place activated BatchNorm for memory-optimized training of DNNs. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5639–5647 (2018).
  3. 3.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. CoRR arXiv:abs/1903.11027 (2019)
  4. 4.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation (2017).
  5. 5.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  6. 6.
    Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  7. 7.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network (2014).
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)CrossRefGoogle Scholar
  9. 9.
    Facil, J.M., et al.: CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth (2019).
  10. 10.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep Ordinal Regression Network for Monocular Depth Estimation (2018).
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012).
  12. 12.
    Godard, C., Mac Aodha, O., Brostow, G.: Digging Into Self-Supervised Monocular Depth Estimation (2018).
  13. 13.
    Gordon, A., Li, H., Jonschkowski, R., Angelova, A.: Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras (2019).
  14. 14.
    Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). arxiv:1712.01259.
  15. 15.
    Koch, T., Liebel, L., Fraundorfer, F., Körner, M.: Evaluation of CNN-based single-image depth estimation methods. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 331–348. Springer, Cham (2018). Scholar
  16. 16.
    Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation (2019).
  17. 17.
    Li, Z., Snavely, N.: MegaDepth: Learning Single-View Depth Prediction from Internet Photos (2018).
  18. 18.
    Lin, T., et al.: Microsoft COCO: Common objects in context. CoRR arXiv:abs/1405.0312 (2014)
  19. 19.
    Liu, R., et al.: An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution (NeurIPS), 1–26 (2018).
  20. 20.
    López-Antequera, M., et al.: Deep single image camera calibration with radial distortion. In: Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  21. 21.
    Moulon, P., et al.: Openmvg.
  22. 22.
    Neuhold, G., Ollmann, T., Rota Bulò, S., Kontschieder, P.: The Mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  23. 23.
    Porzi, L., Bulò, S.R., Colovic, A., Kontschieder, P.: Seamless scene segmentation. In: CVPR (2019)Google Scholar
  24. 24.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009). Scholar
  25. 25.
    Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016).
  26. 26.
    Shen, S.: Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 22(5), 1901–1914 (2013). Scholar
  27. 27.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: SIGGRAPH Conference Proceedings, pp. 835–846. ACM Press, New York, NY, USA (2006)Google Scholar
  28. 28.
    Sturm, P.: Critical motion sequences for monocular self-calibration and uncalibrated euclidean reconstruction. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (1997).
  29. 29.
    Vasiljevic, I., et al.: DIODE: A Dense Indoor and Outdoor DEpth Dataset (2019).
  30. 30.
    Wang, C., Lucey, S., Perazzi, F., Wang, O.: Web stereo video supervision for depth prediction from dynamic scenes. CoRR arXiv:abs/1904.11112 (2019).
  31. 31.
    Yu, F., et al.: BDD100K: A diverse driving video database with scalable annotation tooling. CoRR arXiv:abs/1805.04687 (2018)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.FacebookMenlo ParkUSA
  2. 2.Institute of Computer Graphics and VisionGraz University of TechnologyGrazAustria

Personalised recommendations