Skip to main content
Log in

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Depth validation dataset construction in rigid scenes for the masses

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A new framework called Rigid Depth Constructor (RDC) is proposed, allowing a user to create his own dataset for the validation of depth map estimation algorithms in the context of autonomous navigation. Compared to the existing tools that rely on high quality fixed Lidar sensor, RDC is usable in low-cost setups requiring only a camera and any (e.g. handheld, or UAV-carried) Lidar sensor, which implies more flexible - and much faster - scene scan. Furthermore, unlike photogrammetry tools that use sparse RGB views, it can be applied to smooth videos while remaining computationally tractable. The framework includes a test suite to get insightful information from the evaluated algorithm. As examples, validation videos made from UAV footage are provided to evaluate two depth prediction algorithms initially tested on in-car driving video datasets, which shows that the drone context is dramatically different. This supports the need to benchmark depth estimation algorithms on a dataset that fits one’s particular context, which often means creating a brand new one. An open source implementation accompanies the paper, designed to be as user-friendly as possible, to make depth dataset creation possible even for small teams. The key contributions are the following: (1) a complete, open-source and almost fully automatic software application for creating validation datasets with densely annotated depth, adaptable to a wide variety of image, video and range data; (2) selection tools to adapt the dataset to specific validation needs, and conversion tools to other dataset formats; (3) as use case examples, two new real datasets, outdoor and indoor, readily usable in UAV navigation context are provided, and used as test sets in the evaluation of two depth prediction algorithms, using a collection of comprehensive (e.g. distribution based) metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. i.e. when the 3d scene and/or the camera position produces an image where perception of distance or object sizes is ambiguous, or even deceptive (famous extreme examples are the Ame’s room or the Corridor illusion).

  2. https://www.meshlab.net/

  3. http://www.cloudcompare.org/

  4. https://github.com/ClementPinard/depth-dataset-builder/

  5. https://github.com/ClementPinard/depth-dataset-builder/tree/master/evaluation_toolkit

  6. https://youtube.com/playlist?list=PLMeM2q87QjqhYA_LfJY925ZAGyD5cS6Q-

  7. https://github.com/ClementPinard/depth-dataset-builder#depth-algorithm-evaluation

  8. https://www.airdeco-drone.com

  9. https://www.parrot.com

  10. https://www.geomesure.fr/

References

  1. Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256

    Article  Google Scholar 

  2. Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng M-M, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in neural information processing systems. Curran associates inc., vol 32

  3. Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon A et al (eds) European conf on computer vision (ECCV), Part IV, LNCS 7577. Springer, pp 611–625

  4. Cai Z, Han J, Liu L, Shao L (2017) RGB-D datasets using Microsoft Kinect or similar sensors: a survey. Multimed Tools Appl 76:4313–4355

    Article  Google Scholar 

  5. Chen Y, Medioni G (1992) Object modelling by registration of multiple range images. Image Vis Comput 10(3):145–155. Range image understanding

    Article  Google Scholar 

  6. Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  7. Clément P (2019) Robust learning of a depth map for obstacle avoidance with a monocular stabilized flying camera. Theses, Université Paris Saclay (COmUE)

  8. Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbaş C, Golkov V, Smagt PVD, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV)

  9. Eigen D, Puhrsch RFC (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 27:2366–2374

    Google Scholar 

  10. Fragkiadaki A, Seybold B, Schmid C, Sukthankar R, Vijayanarasimhan S, Ricco S (2017) Self-supervised learning of structure and motion from video. In: arxiv (2017)

  11. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)

  12. Garg R, Kumar BGV, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometryto the rescue. In: European conference on computer vision. Springer, pp 740–756

  13. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361

  14. Godard C, Aodha OM, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth prediction. In: The international conference on computer vision (ICCV), October

  15. Gollob C, Ritter T, Nothdurft A (2020) Comparison of 3D point clouds obtained by terrestrial laser scanning and personal laser scanning on forest inventory sample plots. MDPI - Data, vol 5(4)

  16. Gordon A, Li H, Jonschkowski R, Angelova A (2019) Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

  17. Hanhan L, Ariel G, Hang Z, Vincent C, Anelia A (2021) Unsupervised monocular depth learning in dynamic scenes. In: Kober J, Ramos F, Tomlin C (eds) Proceedings of the 2020 conference on robot learning, vol 155 of proceedings of machine learning research, pp 1908–1917

  18. Hartley R, Zisserman A (2004) Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  19. Jonas U, Nick S, Lukas S, Uwe F, Thomas B, Andreas G (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV)

  20. Kazhdan M, Bolitho M, Hoppe H (2006) Poisson surface reconstruction. In: Proceedings of the fourth Eurographics symposium on Geometry processing, vol 7

  21. Keyang Z, Kailun Y, Kaiwei W (2021) Panoramic depth estimation via supervised and unsupervised learning in indoor scenes. Appl Optics 60 (26):8188–8197

    Article  Google Scholar 

  22. Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph, vol 36(4)

  23. Kraus K, Harley IA, Kyle S (2011) Photogrammetry: Geometry from Images and Laser Scans. De Gruyter, Berlin

    Google Scholar 

  24. Kristan M, Matas J, Leonardis A, Vojir T, Pflugfelder R, Fernandez G, Nebehay G, Porikli F, Čehovin L (2016) A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11):2137–2155

    Article  Google Scholar 

  25. Labatut P, Pons J-P, Keriven R (2009) Robust and efficient surface reconstruction from range. Data Comput Graph Forum

  26. Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326

  27. Li H, Gordon A, Zhao H, Casser V, Angelova A (2020) Unsupervised monocular depth learning in dynamic scenes. In: Conference on robot learning (CoRL)

  28. Lopez BT, Jonathan PH (2017) Aggressive 3-d collision avoidance for high-speed navigation. In: IEEE international conference on robotics and automation ICRA, IEEE, pp 5759–5765

  29. Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281–297

  30. Matteo P, Filippo A, Fabio T, Stefano M (2020) On the uncertainty of self-supervised monocular depth estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  31. Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163

    Article  Google Scholar 

  32. Nalpantidis L, Kostavelis I, Gasteratos A (2009) Stereovision-based algorithm for obstacle avoidance. In: International conference on intelligent robotics and applications. Springer, pp 195–204

  33. Payen de La Garanderie G, Atapour-Abarghouei A, Breckon TP (2018) Eliminating the blind spot: adapting 3D object detection and monocular depth estimation to 360 panoramic imagery. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018. Cham, Springer international publishing, pp 812–830

  34. Pinard C, Chevalley L, Manzanera A, Filliat D (2017) End-to-end depth from motion with stabilized monocular videos. In: ISPRS annals of photogrammetry remote sensing and spatial information sciences, IV-2/W3, pp 67–74

  35. Pinard C, Chevalley L, Manzanera A, Filliat D (2018) Learning structure-from-motion from motion. In: Proceedings of the european conference on computer vision (ECCV)

  36. Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639

    Article  Google Scholar 

  37. Saxena A, Chung SH, AY N (2008) 3-d depth reconstruction from a single still image. Int J Comp Vision 76(1):53–69

    Article  Google Scholar 

  38. Schilling H, Gutsche M, Brock A, Späth D, Rother C, Krispin K (2020) Mind the gap - a benchmark for dense depth prediction beyond lidar. In: 2020 IEEE conference on computer vision and pattern recognition workshops (CVPRW), volume in press

  39. Schönberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: European conference on computer vision and pattern recognition (CVPR)

  40. Schönberger JL, Price T, Sattler T, Frahm J-M, Pollefeys M (2016) A vote-and-verify strategy for fast spatial verification in image retrieval. In: Asian conference on computer vision (ACCV)

  41. Schönberger JL, Zheng E, Pollefeys M, Frahm J-M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV)

  42. Schöps T, Schönberger JL, Galliani S, Sattler T, Schindler K, Pollefeys M, Geiger A (2017) A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Conference on computer vision and pattern recognition (CVPR)

  43. Serdar GM, Panus N (2014) New Technique for distance estimation using SIFT for mobile robots. In: 2014 International electrical engineering congress (iEECON) pp 1–4

  44. Shan T, Englot B, Meyers D, Wang W, Ratti C, Daniela R (2020) Lio-sam: tightly-coupled lidar inertial odometry via smoothing and mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 5135–5142

  45. Silberman PKN, Hoiem D, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV

  46. Tam GKL, Cheng Z, Lai Y, Langbein FC, Liu Y, Marshall D, Martin RR, Sun X, Rosin PL (2013) Registrationof 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans Visual Comput Graph 19(7):1199–1217

    Article  Google Scholar 

  47. Van Dijk T, De Croon G (2019) How do neural networks see depth in single images. In: Proceedings of the IEEE CVF international conference on computer vision (ICCV)

  48. Vasiljevic I, Kolkin N, Zhang S, Luo R, Wang H, Dai FZ, Daniele AF, Mostajabi M, Basart S, Walter MR, Shakhnarovich G (2019) DIODE: a Dense Indoor and Outdoor DEpth Dataset 1908.0463

  49. Yin Z, GeoNet JS (2018) Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  50. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR

Download references

Acknowledgements

Acquisitions for the Manoir dataset were made in collaboration with AIRD’ECO-DroneFootnote 8 company, thanks to the financial support of ParrotFootnote 9 company. Acquisitions for the University hall dataset were made entirely by Clément Pinard, thanks to the equipment and training provided by GeomesureFootnote 10 company.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoine Manzanera.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest or competing interest related to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pinard, C., Manzanera, A. Does it work outside this benchmark? Introducing the rigid depth constructor tool. Multimed Tools Appl 82, 41641–41667 (2023). https://doi.org/10.1007/s11042-023-14743-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14743-0

Keywords

Navigation