Semisupervised learning-based depth estimation with semantic inference guidance

Zhang, Yan; Fan, XiaoPeng; Zhao, DeBin

doi:10.1007/s11431-021-1948-3

Semisupervised learning-based depth estimation with semantic inference guidance

Article
Published: 25 February 2022

Volume 65, pages 1098–1106, (2022)
Cite this article

Science China Technological Sciences Aims and scope Submit manuscript

Yan Zhang¹,
XiaoPeng Fan¹ &
DeBin Zhao¹

150 Accesses
8 Citations
Explore all metrics

Abstract

Depth estimation is a fundamental computer vision problem that infers three-dimensional (3D) structures from a given scene. As it is an ill-posed problem, to fit the projection function from the given scene to the 3D structure, traditional methods generally require mass amounts of annotated data. Such pixel-level annotation is quite labor consuming, especially when addressing reflective surfaces such as mirrors or water. The widespread application of deep learning further intensifies the demand for large amounts of annotated data. Therefore, it is urgent and necessary to propose a framework that is able to reduce the requirement on the amount of data. In this paper, we propose a novel semisupervised learning framework to infer the 3D structure from the given scene. First, semantic information is employed to make the depth inference more accurate. Second, we make both the depth estimation and semantic segmentation coarse-to-fine frameworks; thus, the depth estimation can be gradually guided by semantic segmentation. We compare our model with state-of-the-art methods. The experimental results demonstrate that our method is better than many supervised learning-based methods, which proves the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive noisy label learning paradigm based on confidence measurement for semi-supervised depth completion

Article 07 February 2024

KIL: Knowledge Interactiveness Learning for Joint Depth Estimation and Semantic Segmentation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

Article 30 May 2020

References

Saxena A, Min Sun A, Ng AY. Make3D: Learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell, 2009, 31: 824–840
Article Google Scholar
Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems. Montreal, Quebec, 2014. 2366–2374
Liu B, Gould S, Koller D. Single image depth estimation from predicted semantic labels. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010. 1253–1260
Chapter Google Scholar
Li C, Kowdle A, Saxena A, et al. Toward holistic scene understanding: Feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell, 2012, 34: 1394–1408
Article Google Scholar
Li B, Shen C, Dai Y, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarc hical CRFs. In: Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1119–1127
Google Scholar
Zhou Z H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44–53
Article Google Scholar
Ben-David S, Blitzer J, Crammer K, et al. A theory of learning from different domains. Mach Learn, 2010, 79: 151–175
Article MathSciNet MATH Google Scholar
Zhang M S. A survey of syntactic-semantic parsing based on constituent and dependency structures. Sci China Tech Sci, 2020, 63: 1898–1920
Article Google Scholar
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778
Google Scholar
Hu R, Monebhurrun V, Himeno R, et al. A statistical parsimony method for uncertainty quantification of FDTD computation based on the PCA and ridge regression. IEEE Trans Antennas Propagat, 2019, 67: 4726–4737
Article Google Scholar
Hu R, Monebhurrun V, Himeno R, et al. An adaptive least angle regression method for uncertainty quantification in FDTD computation. IEEE Trans Antennas Propagat, 2018, 66: 7188–7197
Article Google Scholar
Ladicky L, Shi J, Pollefeys M. Pulling things out of perspective. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 89–96
Chapter Google Scholar
Yuan J H, Wu Y, Lu X, et al. Recent advances in deep learning based sentiment analysis. Sci China Tech Sci, 2020, 63: 1947–1970
Article Google Scholar
Song W, Liu L Z. Representation learning in discourse parsing: A survey. Sci China Tech Sci, 2020, 63: 1921–1946
Article Google Scholar
Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction. In: IEEE International Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 2215–2223
Google Scholar
Luo Y, Ren J S J, Lin M, et al. Single view stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition. Salty Lake: IEEE, 2018. 155–163
Google Scholar
Zhang Z, Takanobu R, Zhu Q, et al. Recent advances and challenges in task-oriented dialog systems. Sci China Tech Sci, 2020, 63: 2011–2027
Article Google Scholar
Zhang J J, Zong C Q. Neural machine translation: Challenges, progress and future. Sci China Tech Sci, 2020, 63: 2028–2050
Article Google Scholar
Xu D, Wang W, Tang H, et al. Structured attention guided convolutional neural fields for monocular depth estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. Salty Lake: IEEE, 2018. 3917–3925
Google Scholar
Lan X, Zhu X, Gong S. Knowledge distillation by on-the-fly native ensemble. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, 2018. 7528–7538
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: International Conference on Computer Vision. Santiago: IEEE, 2015. 2650–2658
Google Scholar
Garg R, BG V K, Carneiro G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: Leibe B, Matas J, Sebe N, et al., eds. Computer Vision — ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Cham: Springer, 2016
Google Scholar
Godard C, Aodha O M, Firman M, et al. Digging into self-supervised monocular depth estimation. In: International Conference on Computer Vision. Seoul: IEEE, 2019. 3827–3837
Google Scholar
Watson J, Firman M, Brostow G J, et al. Selfsupervised monocular depth hints. In: 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 2162–2171
Google Scholar
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640–651
Article Google Scholar
Mousavian A, Pirsiavash H, Kosecka J. Joint semantic segmentation and depth estimation with deep convolutional networks. In: International Conference on 3D Vision. Stanford: IEEE, 2016. 611–619
Google Scholar
Wang P, Shen X, Lin Z, et al. Towards unified depth and semantic prediction from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 2800–2809
Chapter Google Scholar
Menze M, Geiger A. Object scene flow for autonomous vehicles. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015
Google Scholar
Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016
Google Scholar
Silberman N, Hoiem D, Kohli P, et al. Indoor segmentation and support inference from RGBD images. In: Fitzgibbon A, Lazebnik S, Perona P, et al, eds. Computer Vision — ECCV 2012. ECCV 2012. Lecture Notes in Computer Science. Vol. 7576. Berlin, Heidelberg: Springer, 2012
Google Scholar
Zhuo W, Salzmann M, He X, et al. Indoor scene structure analysis for single image depth estimation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 614–622
Chapter Google Scholar
Liu F, Shen C, Lin G. Deep convolutional neural fields for depth estimation from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 5162–5170
Chapter Google Scholar
Atapour-Abarghouei A, Breckon T P. Veritatem dies aperit — Temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach. In: Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3373–3384
Google Scholar
Guizilini V, Ambrus R, Pillai S, et al. 3D packing for self-supervised monocular depth estimation. In: CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 2482–2491
Google Scholar
Tosi F, Aleotti F, Poggi M, et al. Learning monocular depth estimation infusing traditional stereo knowledge. In: Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 9799–9809
Google Scholar
Cheng B, Saggu I S, Shah R, et al. S³Net: Semantic-aware self-supervised depth estimation with monocular videos and synthetic data. In: European Conference on Computer Vision. Vol. 12375. Glasgow, 2020. 52–69
Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 2024–2039
Article Google Scholar
Godard C, Mac Aodha O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6602–6611
Google Scholar
Zhou T, Brown M, Snavely N, et al. Unsupervised learning of depth and ego-motion from video. In: Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6612–6619
Google Scholar
Yin Z, Shi J. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 1983–1992
Zhao S, Fu H, Gong M, et al. Geometry-aware symmetric domain adaptation for monocular depth estimation. In: Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 9788–9798
Google Scholar
Johnston A, Carneiro G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 4755–4764
Google Scholar
Klingner M, Termohlen J A, Mikolajczyk J, et al. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European Conference on Computer Vision. Vol. 12365. Glasgow, 2020. 582–600
Spencer J, Bowden R, Hadfield S. Defeat-net: General monocular depth via simultaneous unsupervised representation learning. In: Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 390–401
Google Scholar
Chakrabarti A, Shao G, Shakhnarovich G. Depth from a single image by harmonizing overcomplete local network predictions. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, 2016. 2658–2666
Karsch K, Liu C, Kang S B. Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 2144–2158
Article Google Scholar
Liu M, Salzmann M, He X. Discrete-continuous depth estimation from a single image. In: Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 716–723
Google Scholar
Long M, Cao Y, Wang J. Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, 2015. 97–105
Wang P, Shen X, Russell B. Surge: Surface regularized geometry estimation from a single image. In: Advances in Neural Information Processing Systems. Barcelona, 2016. 172–180
Roy A, Todorovic S. Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016. 5506–5514
Baig M H, Torresani L. Coupled depth learning. In: Winter Conference on Applications of Computer Vision (WACV). Lake Placid: IEEE, 2016: 1–10
Google Scholar
Laina I, Rupprecht C, Belagiannis V, et al. Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision. Stanford: IEEE, 2016
Google Scholar
Lee J H, Heo M, Kim C S. Single-image depth estimation based on Fourier domain analysis. In: CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018
Google Scholar
Gur S, Wolf L. Single image depth estimation trained via depth from defocus cues. In: CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7683–7692
Google Scholar
Zhang Z, Cui Z, Xu C, et al. Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019
Google Scholar
Wang L, Zhang J, Wang O, et al. SDC-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020
Google Scholar
Wang L, Zhang J, Wang Y, et al. CLIFFNet for monocular depth estimation with hierarchical embedding loss. In: Vedaldi A, Bischof H, Brox T, et al, eds. Computer Vision — ECCV 2020. ECCV 2020. Lecture Notes in Computer Science. Vol. 12350. Cham: Springer, 2020
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Yan Zhang, XiaoPeng Fan & DeBin Zhao

Authors

Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
XiaoPeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
DeBin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to XiaoPeng Fan.

Additional information

This work was supported in part by the National High Technology Research and Development Program of China (Grant No. 2021YFF0900500), and the National Natural Science Foundation of China (Grant Nos. 61972115 and 61872116).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Fan, X. & Zhao, D. Semisupervised learning-based depth estimation with semantic inference guidance. Sci. China Technol. Sci. 65, 1098–1106 (2022). https://doi.org/10.1007/s11431-021-1948-3

Download citation

Received: 31 May 2021
Accepted: 22 October 2021
Published: 25 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11431-021-1948-3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semisupervised learning-based depth estimation with semantic inference guidance

Abstract

Access this article

Similar content being viewed by others

Recursive noisy label learning paradigm based on confidence measurement for semi-supervised depth completion

KIL: Knowledge Interactiveness Learning for Joint Depth Estimation and Semantic Segmentation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Semisupervised learning-based depth estimation with semantic inference guidance

Abstract

Access this article

Similar content being viewed by others

Recursive noisy label learning paradigm based on confidence measurement for semi-supervised depth completion

KIL: Knowledge Interactiveness Learning for Joint Depth Estimation and Semantic Segmentation

Semi-Supervised Monocular Depth Estimation Based on Semantic Supervision

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation