Research on self-supervised depth estimation algorithm of driving scene based on monocular vision

Xie, Zhengchun; Zhou, Su; Zheng, Miao; Pei, Fenglai

doi:10.1007/s11760-022-02303-2

Research on self-supervised depth estimation algorithm of driving scene based on monocular vision

Original Paper
Published: 16 August 2022

Volume 17, pages 991–999, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Zhengchun Xie ORCID: orcid.org/0000-0002-0470-027X¹,
Su Zhou^1,2,
Miao Zheng¹ &
…
Fenglai Pei³

231 Accesses
1 Citation
Explore all metrics

Abstract

A self-supervised algorithm based on deep learning is designed to estimate the depth of the driving scene. The depth estimation network and pose estimation network designed based on convolutional neural network take the video information obtained by monocular camera as the input, and output the depth map of each frame of input image and the pose changes between two adjacent frames of input images, respectively. The view synthesis, that is, the image reconstruction loss between two adjacent frame images, is used as the supervision signal to train the neural network. The problem of scale inconsistency in monocular depth estimation is solved through the scale consistency loss, and the weight mask obtained from the scale inconsistency loss is used to solve the dynamic problems and the adverse effects of occluded objects in driving environment. The tests results show that the designed self-supervised depth estimation algorithm based on monocular video information shows high accuracy on the KITTI dataset and almost reaches the same level as the supervised algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Convolutional Neural Network for Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Article 30 May 2020

Improvement of Self-supervised Depth and Motion Learning with Vision Transformer

References

Chen, Q., Xie, Y., Guo, S., Bai, J., Shu, Q.: Sensing system of environmental perception technologies for driverless vehicle: a review of state of the art and challenges. Sens Actuat: A Phys 319, 112566 (2021)
Article Google Scholar
Li, X., Yang, A., Qin, B., Jia, S., Qiu, H.: Monocular camera three dimensional reconstruction based on optical flow feedback. Acta Optica Sinica 35(5), 1–8 (2015)
Google Scholar
Zhan, K., Chen, W., Li, W., Zhang, L.: Line laser 3D scene reconstruction system and error analysis. Chinese J Lasers. 45(12), 1–9 (2018)
Google Scholar
Bi, T., Liu, Y., Weng, D., Wang, Y.: Survey on supervised learning based depth estimation from a single image. J Comp-Aid Design & Comp Grap 30(8), 1383–1393 (2018)
Google Scholar
Žbontar, J., Lecun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Resear 17(1), 2287–2318 (2016)
MATH Google Scholar
Heiko, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Conference on computer vision and pattern recognition (CVPR). (2005)
Zhao, S., Zhang, L., Shen, Y., et al.: Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint. IEEE Access. 7, 16323–16335 (2019)
Article Google Scholar
Lei, H., Qiulei, D., Zhanyi, H.: The inherent ambiguity in scene depth learning from single images. Scientia Sinica Informationis. 46(7), 811–818 (2016)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. MIT Press. 2, 2366–2374 (2014)
Google Scholar
Grigorev, A., Jiang, F., Rho, S., Sori, W.J., Liu, S., Sai, S.: Depth estimation from single monocular images using deep hybrid network. Multim Tool Appl 76(18), 18585–18604 (2017)
Article Google Scholar
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Patt Anal. Mach. Intell. 38, 2024–2039 (2016)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Laina, I., Rupprecht, C., Belagiannis, V., F Tombari, Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Fourth International conference on 3d vision, (2016)
Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)
Article Google Scholar
Xie, J., Girshick, R., Farhadi, A., Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks. Springer International Publishing, Berlin (2016)
Garg, R., Bg, V. K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: European conference on computer vision, pp. 740–756 (2016)
Godard, C., Aodha, O. M., Brostow, G. J.: Unsupervised monocular depth estimation with left-right consistency. In: IEEE Conference on computer vision and pattern recognition, pp. 6602–6609 (2017)
Athiwaratkun, B., Finzi, M., Izmailov, P., Wilson, A. G.: There are many consistent explanations of unlabeled data: Why you should average. arXiv preprint arXiv:1806.05594 (2018)
Bachman, P., Hjelm, R. D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems. (2019)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp. 6612–6619 (2017)
Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. AAAI (2018). https://doi.org/10.1609/aaai.v33i01.33018001
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)
Young, M.: Pinhole optics. Appl. Opt. 10(12), 2763–2767 (1971)
Article Google Scholar
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. of 7th International joint conference on artificial intelligence (IJCAI), pp.674–679 (1997)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.6647–6655(2017)
Ranjan, A., Jampani, V., Balles, L., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.12240–12249 (2019)

Download references

Author information

Authors and Affiliations

College of Automotive Studies, Tongji University, Shanghai, 201804, China
Zhengchun Xie, Su Zhou & Miao Zheng
Sino-German Postgraduate School, Tongji University, Shanghai, 200092, China
Su Zhou
Shanghai Motor Vehicle Inspection Certification and Technology Innovation Center, Shanghai, 201805, China
Fenglai Pei

Authors

Zhengchun Xie
View author publications
You can also search for this author in PubMed Google Scholar
Su Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Miao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Fenglai Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fenglai Pei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, Z., Zhou, S., Zheng, M. et al. Research on self-supervised depth estimation algorithm of driving scene based on monocular vision. SIViP 17, 991–999 (2023). https://doi.org/10.1007/s11760-022-02303-2

Download citation

Received: 09 January 2022
Revised: 04 June 2022
Accepted: 27 June 2022
Published: 16 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11760-022-02303-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on self-supervised depth estimation algorithm of driving scene based on monocular vision

Abstract

Access this article

Similar content being viewed by others

An Improved Convolutional Neural Network for Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Improvement of Self-supervised Depth and Motion Learning with Vision Transformer

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on self-supervised depth estimation algorithm of driving scene based on monocular vision

Abstract

Access this article

Similar content being viewed by others

An Improved Convolutional Neural Network for Monocular Depth Estimation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Improvement of Self-supervised Depth and Motion Learning with Vision Transformer

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation