Skip to main content
Log in

Research on self-supervised depth estimation algorithm of driving scene based on monocular vision

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

A self-supervised algorithm based on deep learning is designed to estimate the depth of the driving scene. The depth estimation network and pose estimation network designed based on convolutional neural network take the video information obtained by monocular camera as the input, and output the depth map of each frame of input image and the pose changes between two adjacent frames of input images, respectively. The view synthesis, that is, the image reconstruction loss between two adjacent frame images, is used as the supervision signal to train the neural network. The problem of scale inconsistency in monocular depth estimation is solved through the scale consistency loss, and the weight mask obtained from the scale inconsistency loss is used to solve the dynamic problems and the adverse effects of occluded objects in driving environment. The tests results show that the designed self-supervised depth estimation algorithm based on monocular video information shows high accuracy on the KITTI dataset and almost reaches the same level as the supervised algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Chen, Q., Xie, Y., Guo, S., Bai, J., Shu, Q.: Sensing system of environmental perception technologies for driverless vehicle: a review of state of the art and challenges. Sens Actuat: A Phys 319, 112566 (2021)

    Article  Google Scholar 

  2. Li, X., Yang, A., Qin, B., Jia, S., Qiu, H.: Monocular camera three dimensional reconstruction based on optical flow feedback. Acta Optica Sinica 35(5), 1–8 (2015)

    Google Scholar 

  3. Zhan, K., Chen, W., Li, W., Zhang, L.: Line laser 3D scene reconstruction system and error analysis. Chinese J Lasers. 45(12), 1–9 (2018)

    Google Scholar 

  4. Bi, T., Liu, Y., Weng, D., Wang, Y.: Survey on supervised learning based depth estimation from a single image. J Comp-Aid Design & Comp Grap 30(8), 1383–1393 (2018)

    Google Scholar 

  5. Žbontar, J., Lecun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Resear 17(1), 2287–2318 (2016)

    MATH  Google Scholar 

  6. Heiko, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Conference on computer vision and pattern recognition (CVPR). (2005)

  7. Zhao, S., Zhang, L., Shen, Y., et al.: Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint. IEEE Access. 7, 16323–16335 (2019)

    Article  Google Scholar 

  8. Lei, H., Qiulei, D., Zhanyi, H.: The inherent ambiguity in scene depth learning from single images. Scientia Sinica Informationis. 46(7), 811–818 (2016)

    Article  Google Scholar 

  9. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. MIT Press. 2, 2366–2374 (2014)

    Google Scholar 

  10. Grigorev, A., Jiang, F., Rho, S., Sori, W.J., Liu, S., Sai, S.: Depth estimation from single monocular images using deep hybrid network. Multim Tool Appl 76(18), 18585–18604 (2017)

    Article  Google Scholar 

  11. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Patt Anal. Mach. Intell. 38, 2024–2039 (2016)

    Article  Google Scholar 

  12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  14. Laina, I., Rupprecht, C., Belagiannis, V., F Tombari, Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Fourth International conference on 3d vision, (2016)

  15. Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77(1–3), 157–173 (2008)

    Article  Google Scholar 

  16. Xie, J., Girshick, R., Farhadi, A., Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks. Springer International Publishing, Berlin (2016)

  17. Garg, R., Bg, V. K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: European conference on computer vision, pp. 740–756 (2016)

  18. Godard, C., Aodha, O. M., Brostow, G. J.: Unsupervised monocular depth estimation with left-right consistency. In: IEEE Conference on computer vision and pattern recognition, pp. 6602–6609 (2017)

  19. Athiwaratkun, B., Finzi, M., Izmailov, P., Wilson, A. G.: There are many consistent explanations of unlabeled data: Why you should average. arXiv preprint arXiv:1806.05594 (2018)

  20. Bachman, P., Hjelm, R. D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems. (2019)

  21. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp. 6612–6619 (2017)

  22. Casser, V., Pirk, S., Mahjourian, R., Angelova, A.: Depth prediction without the sensors: leveraging structure for unsupervised learning from monocular videos. AAAI (2018). https://doi.org/10.1609/aaai.v33i01.33018001

    Article  Google Scholar 

  23. Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)

  24. Young, M.: Pinhole optics. Appl. Opt. 10(12), 2763–2767 (1971)

    Article  Google Scholar 

  25. Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. of 7th International joint conference on artificial intelligence (IJCAI), pp.674–679 (1997)

  26. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  27. Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.6647–6655(2017)

  28. Ranjan, A., Jampani, V., Balles, L., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.12240–12249 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fenglai Pei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Z., Zhou, S., Zheng, M. et al. Research on self-supervised depth estimation algorithm of driving scene based on monocular vision. SIViP 17, 991–999 (2023). https://doi.org/10.1007/s11760-022-02303-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02303-2

Keywords

Navigation