Skip to main content
Log in

Edge-Aware Spatial Propagation Network for Multi-view Depth Estimation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Deep learning has made great improvements in multi-view stereo. Recent approaches typically adopt raw images as input and estimate depth through deep networks. However, as a primary geometric cue, edge information, which captures the structures of scenes well, is ignored by the existing multi-view stereo networks. To this end, we present an Edge-aware Spatial Propagation Network, named ESPDepth, a novel depth estimation network that utilizes edges to assist in the understanding of scene structures. To be exact, we first generate a coarse initial depth map with a shallow network. Then we design an Edge Information Encoding (EIE) module, to encode edge-aware features from the initial depth. Subsequently, we apply the proposed Edge-Aware spatial Propagation (EAP) module, to guide the iterative propagation on cost volumes. Finally, the edge optimized cost volumes are utilized to obtain the final depth map, serving as a refinement process. By introducing the edge information in the propagation of cost volumes, the proposed method performs well when capturing geometric shapes, thus alleviating the negative effects of the greatly changed depth on edges of real scenes. Experiments on ScanNet and 7-Scenes datasets demonstrate our method produces precise depth estimation, gaining improvements both on global structures and detailed regions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Long X, Liu L, Theobalt C, Wang W (2020) Occlusion-aware depth estimation with adaptive normal constraints. In: European conference on computer vision, pp. 640–657. Springer

  2. Zheng E, Dunn E, Jojic V, Frahm J-M (2014) Patchmatch based joint view selection and depthmap estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1510–1517

  3. Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE international conference on computer vision, pp. 873–881

  4. Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision, pp. 501–518. Springer

  5. Xu Q, Tao W (2019) Multi-scale geometric consistency guided multi-view stereo. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 5483–5492

  6. Xu Q, Tao W (2020) Planar prior assisted patchmatch multi-view stereo. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12516–12523

  7. Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: Depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV), pp. 767–783

  8. Gao Z, Li E, Wang Z, Yang G, Lu J, Ouyang B, Xu D, Liang Z (2021) Object reconstruction based on attentive recurrent network from single and multiple images. Neural Process Lett 53(1):653–670

    Article  Google Scholar 

  9. Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5525–5534

  10. Yang J, Mao W, Alvarez JM, Liu M (2020) Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4877–4886

  11. Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2495–2504

  12. Yu Z, Gao S (2020) Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1949–1958

  13. Kusupati U, Cheng S, Chen R, Su H (2020) Normal assisted stereo depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2189–2199

  14. Yu Z, Jin L, Gao S (2020) P\(^2\)net: Patch-match and plane-regularization for unsupervised indoor depth estimation. In: Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pp. 206–222. Springer

  15. Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 283–291

  16. Zhao W, Liu S, Wei Y, Guo H, Liu Y-J (2021) A confidence-based iterative solver of depths and surface normals for deep multi-view stereo. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 6168–6177

  17. Long X, Lin C, Liu L, Li W, Theobalt C, Yang R, Wang W (2021) Adaptive surface normal constraint for depth estimation. arXiv preprint arXiv:2103.15483

  18. Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5684–5693

  19. Yoon G-J, Song J, Hong Y-J, Yoon SM (2022) Single image based three-dimensional scene reconstruction using semantic and geometric priors. Neural Process Lett 54(5):3679–3694

    Article  Google Scholar 

  20. Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. In: Asian conference on computer vision, Springer, pp. 20–35

  21. Hu J, Ozay M, Zhang Y, Okatani T (2019) Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 1043–1051

  22. Wang K, Shen S (2018) Mvdepthnet: real-time multiview depth estimation neural network. In: 2018 International conference on 3d vision (3DV). IEEE, pp. 248–257

  23. Khamis S, Fanello S, Rhemann C, Kowdle A, Valentin J, Izadi S (2018) Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European conference on computer vision (ECCV), pp. 573–590

  24. Qi X, Liu Z, Liao R, Torr PH, Urtasun R, Jia J (2020) Geonet++: iterative geometric neural network with edge-aware refinement for joint depth and surface normal estimation. IEEE Trans Patt Anal Mach Intell 44(2):969–984

    Article  Google Scholar 

  25. Im S, Jeon H-G, Lin S, Kweon (2019) IS DPSNet: end-to-end deep plane sweep stereo. In: International conference on learning representations. https://openreview.net/forum?id=ryeYHi0ctQ

  26. Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR)

  27. Hou Y, Kannala J, Solin A (2019) Multi-view stereo by temporal nonparametric fusion. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 2651–2660

  28. Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12508–12515

  29. Xu Q, Tao W (2020) Pvsnet: Pixelwise visibility-aware multi-view stereo network. arXiv preprint arXiv:2007.07714

  30. Yang W, Ai X, Yang Z, Xu Y, Zhao Y (2020) Dedge-agmnet: An effective stereo matching network optimized by depth edge auxiliary task. In: Giacomo, G.D., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, A., Lang, J. (eds.) ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on prestigious applications of artificial intelligence (PAIS 2020). Frontiers in artificial intelligence and applications, vol. 325, pp. 2784–2791. https://doi.org/10.3233/FAIA200419

  31. Zhu S, Brazil G, Liu X (2020) The edge of depth: explicit constraints between segmentation and depth. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 13116–13125

  32. Xue F, Cao J, Zhou Y, Sheng F, Wang Y, Ming A (2021) Boundary-induced and scene-aggregated network for monocular depth prediction. Patt Recognit 115:107901

    Article  Google Scholar 

  33. Gallup D, Frahm J-M, Mordohai P, Yang Q, Pollefeys M (2007) Real-time plane-sweeping stereo with multiple sweeping directions. In: 2007 IEEE conference on computer vision and pattern recognition, pp. 1–8 . https://doi.org/10.1109/CVPR.2007.383245

  34. Canny J (1986) A computational approach to edge detection. IEEE Trans Patt Anal Mach Intell PAMI 8(6):679–698. https://doi.org/10.1109/TPAMI.1986.4767851

    Article  Google Scholar 

  35. Zhang J, Yao Y, Li S, Luo Z, Fang T (2020) Visibility-aware multi-view stereo network. CoRR arXiv:abs/2008.07928

  36. Cheng X, Wang P, Yang R (2020) Learning depth with convolutional spatial propagation network. IEEE Trans Patt Anal Mach Intell 42(10):2361–2379

    Article  Google Scholar 

  37. Liu S, De Mello S, Gu J, Zhong G, Yang M, Kautz J (2017) Learning affinity via spatial propagation networks. Advances in Neural Information Processing Systems 2017-December, pp. 1521–1531

  38. Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5828–5839

  39. Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A (2013) Scene coordinate regression forests for camera relocalization in rgb-d images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2930–2937

  40. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32:1231–1237

    Article  Google Scholar 

  41. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: 28th Annual conference on neural information processing systems 2014, Neural information processing systems foundation, NIPS 2014, pp. 2366–2374

  42. Yang Z, Ren Z, Shan Q, Huang Q (2021) MVS2D: efficient multi-view stereo via attention-driven 2D convolutions

  43. Curless B, Levoy M (1996) A volumetric method for building complex models from range images. In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques, pp. 303–312

  44. Gan Y, Xu X, Sun W, Lin L (2018) Monocular depth estimation with affinity, vertical pooling, and label enhancement. In: Proceedings of the European conference on computer vision (ECCV), pp. 224–239

  45. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2002–2011

  46. Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326

  47. Bhat SF, Alhashim I, Wonka P (2021) Adabins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4009–4018

  48. Kim D, Ga W, Ahn P, Joo D, Chun S, Kim J (2022) Global-local path networks for monocular depth estimation with vertical cutdepth. arXiv preprint arXiv:2201.07436

  49. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp. 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986

  50. Liu N, Han J (2016) Dhsnet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 678–686

  51. Fang C, Tian H, Zhang D, Zhang Q, Han J, Han J (2022) Densely nested top-down flows for salient object detection. Science China Inform Sci 65(8):182103

    Article  MathSciNet  Google Scholar 

  52. Ramesh K, Kumar GK, Swapna K, Datta D, Rajest SS (2021) A review of medical image segmentation algorithms. EAI Endors Trans Pervas Health Technol 7(27):6–6

    Google Scholar 

  53. Zhang D, Huang G, Zhang Q, Han J, Han J, Yu Y (2021) Cross-modality deep feature learning for brain tumor segmentation. Patt Recognit 110:107562

    Article  Google Scholar 

  54. Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, Davidson B, Pereira SP, Clarkson MJ, Barratt DC (2018) Automatic multi-organ segmentation on abdominal CT with dense v-networks. IEEE Trans Med Imag 37(8):1822–1834

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbing Tao.

Ethics declarations

Conflict of interest

The authors declare to have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, S., Xu, Q., Su, W. et al. Edge-Aware Spatial Propagation Network for Multi-view Depth Estimation. Neural Process Lett 55, 10905–10923 (2023). https://doi.org/10.1007/s11063-023-11356-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11356-4

Keywords

Navigation