Does it work outside this benchmark? Introducing the rigid depth constructor tool

Pinard, Clément; Manzanera, Antoine

doi:10.1007/s11042-023-14743-0

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Depth validation dataset construction in rigid scenes for the masses

Published: 04 April 2023

Volume 82, pages 41641–41667, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

111 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

A new framework called Rigid Depth Constructor (RDC) is proposed, allowing a user to create his own dataset for the validation of depth map estimation algorithms in the context of autonomous navigation. Compared to the existing tools that rely on high quality fixed Lidar sensor, RDC is usable in low-cost setups requiring only a camera and any (e.g. handheld, or UAV-carried) Lidar sensor, which implies more flexible - and much faster - scene scan. Furthermore, unlike photogrammetry tools that use sparse RGB views, it can be applied to smooth videos while remaining computationally tractable. The framework includes a test suite to get insightful information from the evaluated algorithm. As examples, validation videos made from UAV footage are provided to evaluate two depth prediction algorithms initially tested on in-car driving video datasets, which shows that the drone context is dramatically different. This supports the need to benchmark depth estimation algorithms on a dataset that fits one’s particular context, which often means creating a brand new one. An open source implementation accompanies the paper, designed to be as user-friendly as possible, to make depth dataset creation possible even for small teams. The key contributions are the following: (1) a complete, open-source and almost fully automatic software application for creating validation datasets with densely annotated depth, adaptable to a wide variety of image, video and range data; (2) selection tools to adapt the dataset to specific validation needs, and conversion tools to other dataset formats; (3) as use case examples, two new real datasets, outdoor and indoor, readily usable in UAV navigation context are provided, and used as test sets in the evaluation of two depth prediction algorithms, using a collection of comprehensive (e.g. distribution based) metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward evaluation of visual navigation algorithms on RGB-D data from the first- and second-generation Kinect

Article Open access 30 July 2016

4Seasons: A Cross-Season Dataset for Multi-Weather SLAM in Autonomous Driving

3D Vehicle Trajectory Reconstruction in Monocular Video Data Using Environment Structure Constraints

Notes

i.e. when the 3d scene and/or the camera position produces an image where perception of distance or object sizes is ambiguous, or even deceptive (famous extreme examples are the Ame’s room or the Corridor illusion).
https://www.meshlab.net/
http://www.cloudcompare.org/
https://github.com/ClementPinard/depth-dataset-builder/
https://github.com/ClementPinard/depth-dataset-builder/tree/master/evaluation_toolkit
https://youtube.com/playlist?list=PLMeM2q87QjqhYA_LfJY925ZAGyD5cS6Q-
https://github.com/ClementPinard/depth-dataset-builder#depth-algorithm-evaluation
https://www.airdeco-drone.com
https://www.parrot.com
https://www.geomesure.fr/

References

Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256
Article Google Scholar
Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng M-M, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in neural information processing systems. Curran associates inc., vol 32
Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon A et al (eds) European conf on computer vision (ECCV), Part IV, LNCS 7577. Springer, pp 611–625
Cai Z, Han J, Liu L, Shao L (2017) RGB-D datasets using Microsoft Kinect or similar sensors: a survey. Multimed Tools Appl 76:4313–4355
Article Google Scholar
Chen Y, Medioni G (1992) Object modelling by registration of multiple range images. Image Vis Comput 10(3):145–155. Range image understanding
Article Google Scholar
Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Clément P (2019) Robust learning of a depth map for obstacle avoidance with a monocular stabilized flying camera. Theses, Université Paris Saclay (COmUE)
Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbaş C, Golkov V, Smagt PVD, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV)
Eigen D, Puhrsch RFC (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 27:2366–2374
Google Scholar
Fragkiadaki A, Seybold B, Schmid C, Sukthankar R, Vijayanarasimhan S, Ricco S (2017) Self-supervised learning of structure and motion from video. In: arxiv (2017)
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)
Garg R, Kumar BGV, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometryto the rescue. In: European conference on computer vision. Springer, pp 740–756
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Godard C, Aodha OM, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth prediction. In: The international conference on computer vision (ICCV), October
Gollob C, Ritter T, Nothdurft A (2020) Comparison of 3D point clouds obtained by terrestrial laser scanning and personal laser scanning on forest inventory sample plots. MDPI - Data, vol 5(4)
Gordon A, Li H, Jonschkowski R, Angelova A (2019) Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Hanhan L, Ariel G, Hang Z, Vincent C, Anelia A (2021) Unsupervised monocular depth learning in dynamic scenes. In: Kober J, Ramos F, Tomlin C (eds) Proceedings of the 2020 conference on robot learning, vol 155 of proceedings of machine learning research, pp 1908–1917
Hartley R, Zisserman A (2004) Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge
Book MATH Google Scholar
Jonas U, Nick S, Lukas S, Uwe F, Thomas B, Andreas G (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV)
Kazhdan M, Bolitho M, Hoppe H (2006) Poisson surface reconstruction. In: Proceedings of the fourth Eurographics symposium on Geometry processing, vol 7
Keyang Z, Kailun Y, Kaiwei W (2021) Panoramic depth estimation via supervised and unsupervised learning in indoor scenes. Appl Optics 60 (26):8188–8197
Article Google Scholar
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph, vol 36(4)
Kraus K, Harley IA, Kyle S (2011) Photogrammetry: Geometry from Images and Laser Scans. De Gruyter, Berlin
Google Scholar
Kristan M, Matas J, Leonardis A, Vojir T, Pflugfelder R, Fernandez G, Nebehay G, Porikli F, Čehovin L (2016) A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11):2137–2155
Article Google Scholar
Labatut P, Pons J-P, Keriven R (2009) Robust and efficient surface reconstruction from range. Data Comput Graph Forum
Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326
Li H, Gordon A, Zhao H, Casser V, Angelova A (2020) Unsupervised monocular depth learning in dynamic scenes. In: Conference on robot learning (CoRL)
Lopez BT, Jonathan PH (2017) Aggressive 3-d collision avoidance for high-speed navigation. In: IEEE international conference on robotics and automation ICRA, IEEE, pp 5759–5765
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281–297
Matteo P, Filippo A, Fabio T, Stefano M (2020) On the uncertainty of self-supervised monocular depth estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Article Google Scholar
Nalpantidis L, Kostavelis I, Gasteratos A (2009) Stereovision-based algorithm for obstacle avoidance. In: International conference on intelligent robotics and applications. Springer, pp 195–204
Payen de La Garanderie G, Atapour-Abarghouei A, Breckon TP (2018) Eliminating the blind spot: adapting 3D object detection and monocular depth estimation to 360^∘ panoramic imagery. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018. Cham, Springer international publishing, pp 812–830
Pinard C, Chevalley L, Manzanera A, Filliat D (2017) End-to-end depth from motion with stabilized monocular videos. In: ISPRS annals of photogrammetry remote sensing and spatial information sciences, IV-2/W3, pp 67–74
Pinard C, Chevalley L, Manzanera A, Filliat D (2018) Learning structure-from-motion from motion. In: Proceedings of the european conference on computer vision (ECCV)
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639
Article Google Scholar
Saxena A, Chung SH, AY N (2008) 3-d depth reconstruction from a single still image. Int J Comp Vision 76(1):53–69
Article Google Scholar
Schilling H, Gutsche M, Brock A, Späth D, Rother C, Krispin K (2020) Mind the gap - a benchmark for dense depth prediction beyond lidar. In: 2020 IEEE conference on computer vision and pattern recognition workshops (CVPRW), volume in press
Schönberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: European conference on computer vision and pattern recognition (CVPR)
Schönberger JL, Price T, Sattler T, Frahm J-M, Pollefeys M (2016) A vote-and-verify strategy for fast spatial verification in image retrieval. In: Asian conference on computer vision (ACCV)
Schönberger JL, Zheng E, Pollefeys M, Frahm J-M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV)
Schöps T, Schönberger JL, Galliani S, Sattler T, Schindler K, Pollefeys M, Geiger A (2017) A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Conference on computer vision and pattern recognition (CVPR)
Serdar GM, Panus N (2014) New Technique for distance estimation using SIFT for mobile robots. In: 2014 International electrical engineering congress (iEECON) pp 1–4
Shan T, Englot B, Meyers D, Wang W, Ratti C, Daniela R (2020) Lio-sam: tightly-coupled lidar inertial odometry via smoothing and mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 5135–5142
Silberman PKN, Hoiem D, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV
Tam GKL, Cheng Z, Lai Y, Langbein FC, Liu Y, Marshall D, Martin RR, Sun X, Rosin PL (2013) Registrationof 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans Visual Comput Graph 19(7):1199–1217
Article Google Scholar
Van Dijk T, De Croon G (2019) How do neural networks see depth in single images. In: Proceedings of the IEEE CVF international conference on computer vision (ICCV)
Vasiljevic I, Kolkin N, Zhang S, Luo R, Wang H, Dai FZ, Daniele AF, Mostajabi M, Basart S, Walter MR, Shakhnarovich G (2019) DIODE: a Dense Indoor and Outdoor DEpth Dataset 1908.0463
Yin Z, GeoNet JS (2018) Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR

Download references

Acknowledgements

Acquisitions for the Manoir dataset were made in collaboration with AIRD’ECO-Drone^{Footnote 8} company, thanks to the financial support of Parrot^{Footnote 9} company. Acquisitions for the University hall dataset were made entirely by Clément Pinard, thanks to the equipment and training provided by Geomesure^{Footnote 10} company.

Author information

Authors and Affiliations

U2IS, ENSTA Paris, Institut Polytechnique de Paris, 828, Boulevard des Maréchaux, 91762 Palaiseau, Cedex, France
Clément Pinard & Antoine Manzanera

Authors

Clément Pinard
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Manzanera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoine Manzanera.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest or competing interest related to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pinard, C., Manzanera, A. Does it work outside this benchmark? Introducing the rigid depth constructor tool. Multimed Tools Appl 82, 41641–41667 (2023). https://doi.org/10.1007/s11042-023-14743-0

Download citation

Received: 02 July 2021
Revised: 28 April 2022
Accepted: 04 February 2023
Published: 04 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-14743-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Abstract

Access this article

Similar content being viewed by others

Toward evaluation of visual navigation algorithms on RGB-D data from the first- and second-generation Kinect

4Seasons: A Cross-Season Dataset for Multi-Weather SLAM in Autonomous Driving

3D Vehicle Trajectory Reconstruction in Monocular Video Data Using Environment Structure Constraints

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Does it work outside this benchmark? Introducing the rigid depth constructor tool

Abstract

Access this article

Similar content being viewed by others

Toward evaluation of visual navigation algorithms on RGB-D data from the first- and second-generation Kinect

4Seasons: A Cross-Season Dataset for Multi-Weather SLAM in Autonomous Driving

3D Vehicle Trajectory Reconstruction in Monocular Video Data Using Environment Structure Constraints

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation