Single image depth estimation using improved U-Net and edge-guide loss

He, Mengfei; Gao, Yan; Long, Yan

doi:10.1007/s11042-024-19235-3

Single image depth estimation using improved U-Net and edge-guide loss

Published: 27 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mengfei He^1,2,3,
Yan Gao^1,2,3 &
Yan Long^1,2,3

30 Accesses
Explore all metrics

Abstract

Monocular depth estimation is regarded as a critical link in context-aware scene comprehension, which typically uses image data from a single point of view as the input to directly predict the depth value corresponding to each pixel in the image. However, predicting accurate object borders without replicating texture is difficult, resulting in missing tiny objects and blurry object edge in predicted depth images. In this paper, we propose a method for estimating monocular depth using an improved U-Net-based encoder-decoder network structure. We propose a new training loss term called edge-guide loss, which pushes the network to focus on object edges, resulting in better accuracy of the depth of tiny objects and edges. In the network, we build the encoder using DenseNet-169 and the decoder using 2 × bilinear up-sampling, skip-connections and hybrid dilated convolution. And skip-connections are used to send multi-scale feature maps from encoder to decoder. We specifically create a new loss function, edge-guide loss and three basic loss terms. We test our algorithm on the NYU Depth V2 dataset. The results of the experiments show that the proposed network can create depth image from a single RGB image with unambiguous borders and more tiny object depth. In the meantime, compared with state-of-the-art approaches, our proposed network outperforms for both visual quality and objective measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised monocular Depth estimation with multi-scale structure similarity loss

Article 03 October 2022

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Data availability

Our experiments are deployed on the public dataset, the NYU Depth V2 dataset [41], which is available in [41].

References

Huang C-H, Tsung W-N, Yang W-J, Chen C-H (2019) Unsupervised monocular depth estimation for autonomous driving. In: Proceedings of the international display workshops (IDV), pp 128–131
Lai C, Su K (2018) Development of an intelligent mobile robot localization system using Kinect RGB-D mapping and neural network. Comput Electr Eng 67:620–628
Article Google Scholar
Lee J, Joo S (2021) Three-dimensional depth estimation of virtual objects in augmented reality. J Vision 21(9):2485. https://doi.org/10.1167/jov.21.9.2485
Article Google Scholar
Smisek J, Jancosek M, Pajdla T (2011) 3D with kinect. In: IEEE international conference on computer vision workshops (ICCV Workshops), pp 1154–1160. https://doi.org/10.1109/ICCVW.2011.6130380
Dubayah RO, Drake JB (2000) Lidar remote sensing for forestry. J Forest 98(6):44–46
Article Google Scholar
Godard C, Aodha O, Firman M, Brostow G (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 3827–3837. https://doi.org/10.1109/ICCV.2019.00393
Chen KY, Chien CC, Tseng CT (2013) Improving the accuracy of depth estimation in binocular vision for robotic applications. Appl Mech Mater 284–287:1862–1866. https://doi.org/10.4028/www.scientific.net/AMM.284-287.1862
Article Google Scholar
Allison RS, Gillam BJ, Vecellio E (2009) Binocular depth discrimination and estimation beyond interaction space. J Vision 9(1):1–14. https://doi.org/10.1167/9.1.10
Article Google Scholar
Zhou T, Brown M, Snavely N, Lowe D (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6612–6621. https://doi.org/10.1109/CVPR.2017.700
Wang C, Buenaposada JM, Rui Z, Lucey S (2018) Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2022–2030. https://doi.org/10.1109/CVPR.2018.00216
Ranjan A, Jampani V, Balles L, Kim K, Sun D, Wulff J, Black MJ (2019) Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 12232–12241. https://doi.org/10.1109/CVPR.2019.01252
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 3:2366–2374
Google Scholar
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2650–2658. https://doi.org/10.1109/ICCV.2015.304
Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning. arXiv:1812.11941
Laga H, Jospin L, Boussaid F, Bennamoun M (2022) A survey on deep learning techniques for stereo-based depth estimation. IEEE T Pattern Anal 44(4):1738–1764
Article Google Scholar
Bolles RC, Baker HH, Marimont DH (1987) Epipolar-plane image analysis: An approach to determining structure from motion. Int J Comput Vis 1(1):7–55
Article Google Scholar
Prados E, Faugeras O (2005) A generic and provably convergent shape-from-shading method for orthographic and pinhole cameras. Int J Comput Vision 65(1–2):97–125
Article Google Scholar
Nayar SK, Nakagawa Y (1994) Shape from focus. IEEE T Pattern Anal 16(8):824–831
Article Google Scholar
Paolo F, Stefano S (2005) A geometric approach to shape from defocus. IEEE T Pattern Anal 27(3):406–417
Article Google Scholar
Huang G, Liu Z, Laurens V, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Hao Z, Li Y, You S, Lu F (2018) Detail preserving depth estimation from a single image using attention guided networks. In: Proceedings of the international conference on 3D vision (3DV), pp 304–313. https://doi.org/10.1109/3DV.2018.00043
Lee J, Kim C (2019) Monocular depth estimation using relative depth maps. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 9721–9730. https://doi.org/10.1109/CVPR.2019.00996
Xue F, Cao J, Zhou Y, Sheng F, Wang Y, Ming A (2021) Boundary-induced and scene-aggregated network for monocular depth prediction. Pattern Recogn 115. https://doi.org/10.1016/j.patcog.2021.107901
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the international conference on 3D vision (3DV), pp 239–248. https://doi.org/10.1109/3DV.2016.32
Wang L, Zhang J, Wang O, Lin Z, Lu H (2020) SDC-depth: semantic divide-and-conquer network for monocular depth estimation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 538–547. https://doi.org/10.1109/CVPR42600.2020.00062
Lyu X, Liu L, Wang M, Kong X, Liu L, Liu Y, Chen X, Yuan Y (2021) HR-depth: high resolution self-supervised monocular depth estimation. In: 35th AAAI conference on artificial intelligence (AAAI), pp 2294–2301
Li B, Shen C, Dai Y, Hengel AVD, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 1119–1127. https://doi.org/10.1109/CVPR.2015.7298715
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu J, Ozay M, Zhang Y, Okatani T (2019) Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 1043–1051. https://doi.org/10.1109/WACV.2019.00116
Chen W, Fu Z, Yang D, Deng J (2016) Single-image depth perception in the wild. In: Proceedings of the annual conference on neural information processing systems (NIPS), pp 730–738
Xian K, Zhang J, Wang O, Mai L, Lin Z, Cao Z (2020) Structure-guided ranking loss for single image depth prediction. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 608–617. https://doi.org/10.1109/CVPR42600.2020.00069
Zeiler M, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2018–2025. https://doi.org/10.1109/ICCV.2011.6126474
Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. Lect Notes Comput Sci 8689:818–833
Article Google Scholar
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE T Pattern Anal 39(4):640–651
Article Google Scholar
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations (ICLR)
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: Proceedings of the IEEE winter conference on applications of computer vision (WACV), pp 1451–1460. https://doi.org/10.1109/WACV.2018.00163
Zhu S, Brazil G, Liu X (2020) The edge of depth: explicit constraints between segmentation and depth. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 13113–13122. https://doi.org/10.1109/CVPR42600.2020.01313
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE T Image Process 13(4):600–612
Article Google Scholar
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6602–6611. https://doi.org/10.1109/CVPR.2017.699
Canny J (1986) A computational approach to edge detection. IEEE T Pattern Anal 8(6):679–698
Article Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Proceedings of the european conference on computer vision (ECCV), pp 746–760. https://doi.org/10.1007/978-3-642-33715-4_54
Levin A, Lischinski D, Weiss Y (2004) Colorization using optimization. Acm T Graphic 23:689–694
Article Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: Proceedings of the international conference on learning representations (ICLR)
Jia D, Wei D, Socher R, Li L, Kai L, Li F (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

Download references

Acknowledgements

This work was supported by the Key R&D Program Project of Shaanxi Province, China (Grant Numbers 2020NY-144). The authors appreciate the funding organization for their financial supports. The authors would also like to thank the helpful comments and suggestions provided by all the authors cited in this article and the anonymous reviewers.

Funding

The research leading to these results received funding from the Key R&D Program Project of Shaanxi Province, China (Grant Numbers 2020NY-144).

Author information

Authors and Affiliations

College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
Mengfei He, Yan Gao & Yan Long
Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Yangling, 712100, Shaanxi, China
Mengfei He, Yan Gao & Yan Long
Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, 712100, Shaanxi, China
Mengfei He, Yan Gao & Yan Long

Authors

Mengfei He
View author publications
You can also search for this author in PubMed Google Scholar
Yan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Long.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, M., Gao, Y. & Long, Y. Single image depth estimation using improved U-Net and edge-guide loss. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19235-3

Download citation

Received: 21 October 2022
Revised: 21 February 2024
Accepted: 09 April 2024
Published: 27 April 2024
DOI: https://doi.org/10.1007/s11042-024-19235-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single image depth estimation using improved U-Net and edge-guide loss

Abstract

Access this article

Similar content being viewed by others

Self-supervised monocular Depth estimation with multi-scale structure similarity loss

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Single image depth estimation using improved U-Net and edge-guide loss

Abstract

Access this article

Similar content being viewed by others

Self-supervised monocular Depth estimation with multi-scale structure similarity loss

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation