Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

Xu, Qingshan; Su, Wanjuan; Qi, Yuhang; Tao, Wenbing; Pollefeys, Marc

doi:10.1007/s11263-022-01628-2

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

Published: 19 June 2022

Volume 130, pages 2040–2059, (2022)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Qingshan Xu¹,
Wanjuan Su¹,
Yuhang Qi¹,
Wenbing Tao ORCID: orcid.org/0000-0003-3284-864X¹ &
…
Marc Pollefeys^2,3

1323 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, learning-based multi-view stereo methods have achieved promising results. However, most of them overlook the visibility difference among different views, which leads to an indiscriminate multi-view similarity definition and greatly limits their performance on datasets with strong viewpoint variations. To deal with this problem, a pixelwise visibility-aware multi-view stereo network is proposed for robust dense 3D reconstruction. We present a pixelwise visibility estimation network to learn the visibility information for different neighboring images before computing the multi-view similarity, and then construct an adaptive weighted cost volume with the visibility information. Unlike previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. To achieve scalable high-resolution depth map estimation, we construct cost volumes by group-wise correlation and design an ordinal-based uncertainty estimation to progressively refine depth maps. Through extensive experiments on DTU dataset, Tanks and Temples dataset and ETH3D benchmark, we show that our method generalizes well to various datasets and achieves promising results, demonstrating its superior performance on robust dense 3D reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Article 14 October 2022

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

References

Aanæs, H., Jensen, R. R., Vogiatzis, G., Tola, E., & Dahl, A. B. (2016). Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2), 153–168.
Article MathSciNet Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM SIGGRAPH, pp. 24:1–24:11.
Bleyer, M., Rhemann, C., & Rother, C. (2011). Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, 11, 1–11.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Campbell, N. D. F., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 766–779.
Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418.
Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1538–1547.
Chen, R., Han, S., Xu, J., & Su, H. (2020). Visibility-aware point-based multi-view stereo network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3695–3708.
Article Google Scholar
Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Collins, R. T. (1996). A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 358–363.
Fu, Z., & Ardabilian Fard, M. (2018). Learning confidence measures by multi-modal convolutional neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1321–1330.
Fuhrmann, S., Langguth, F., & Goesele, M. (2014). Mve: A multi-view reconstruction environment. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage, pp. 11–18.
Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.
Article Google Scholar
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881.
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3282.
Haala, N., & Rothermel, M. (2012). Dense multi-stereo matching for high quality digital elevation models. Photogrammetrie-Fernerkundung-Geoinformation, 202(4), 331–343.
Article Google Scholar
Hartmann, W., Galliani, S., Havlena, M., Gool, L. V., & Schindler, K. (2017). Learned multi-patch similarity. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1595–1603.
Heise, P., Jensen, B., Klose, S., & Knoll, A. (2015). Variational patchmatch multiview reconstruction and refinement. In Proceedings of the IEEE International Conference on Computer Vision, pp. 882–890.
Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
Article Google Scholar
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., & Gelautz, M. (2013). Fast cost-volume filtering for visual correspondence and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 504–511.
Article Google Scholar
Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.
Article Google Scholar
Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830.
Im, S., Jeon, H.-G., Lin, S., & Kweon, I. S. (2019). Dpsnet: End-to-end deep plane sweep stereo. arXiv:1905.00538.
Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2307–2315.
Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Advances in Neural Information Processing Systems, pp. 365–376.
Kazhdan, Michael, & Hoppe, Hugues. (2013). Screened poisson surface reconstruction. ACM Transactions on Graphics, 32(3), 29:1-29:13.
Article Google Scholar
Kendall, A., Martirosyan, H., Dasgupta, S. & Henry, P. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75.
Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 205–214.
Kim, S., Min, D., Kim, S., & Sohn, K. (2019). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.
Article MathSciNet Google Scholar
Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples benchmark. https://www.tanksandtemples.org.
Arno, K., Jaesik, P., Qian-Yi, Z., & Vladlen, K. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 78:1-78:13.
Google Scholar
Vladimir, K., & Ramin, Z. (2002). Multi-camera scene reconstruction via graph cuts. In Proceedings of the European Conference on Computer Vision, pp. 82–96
Andreas, K., Christian, S., Mattia, R., Oliver, E., & Friedrich, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In Proceedings of the IEEE Conference on on 3D Vision, pp. 404–413.
Li, Zhaoxin, Zuo, Wangmeng, Wang, Zhaoqi, & Zhang, Lei. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.
Article Google Scholar
Keyang, L., Tao, G., Lili, J., Haipeng, H., & Yawei, L. (2019). P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10452–10461.
Keyang, L., Tao, G., Lili, J., Yuesong, W., Zhuo, C., & Yawei, L. (2020). Attention-aware multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1590–1599.
Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703.
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048.
Poggi, M., Tosi, F., & Mattoccia, S. (2017). Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5247.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI, 234–241.
L. Schönberger, J., & Frahm, J. (2016). Structure-from-motion revisited. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4104–4113.
Johannes, L., Schönberger, E. Z., Jan-Michael, F., & Marc, P. (2016). Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 501–518.
Thomas, S., Johannes, L. S., Silvano, G., Torsten, S., Konrad, S., Marc, P., & Andreas, G. ETH3D Benchmark. https://www.eth3d.net.
Schöps, T.., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2538–2547.
Akihito, S., & Marc, P. (2016). Patch based confidence prediction for dense disparity map. In Proceedings of the British Machine Vision Conference, pp. 23.1–23.13.
Christian, S., Patrick, K., Andreas, K., Mattia, R., Thomas, P., & Friedrich, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In Proceedings of the International Conference on 3D Vision, pp. 394–403.
Christian, S., Mattia, R., Andreas, K., & Friedrich, F. (2021). Ib-mvs: An iterative algorithm for deep multi-view stereo based on binary decisions. arXiv:2111.14420.
Tola, Engin, Strecha, Christoph, & Fua, Pascal. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.
Article Google Scholar
Stepan, T., Anton, I., & François, F. (2018). Practical deep stereo (pds): Toward applications-friendly deep stereo matching. In Advances in Neural Information Processing Systems, pp. 5871–5881.
Žbontar, J., LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599.
Fangjinhua, W., Silvano, G., Christoph, V., & Marc, P. (2021). Itermvs: Iterative probability estimation for efficient multi-view stereo. arXiv:2112.05126.
Fangjinhua, W., Silvano, G., Christoph, Vogel., Pablo, Speciale., & Marc, P. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14194–14203.
Qingshan, X., & Wenbing, T. (2019). Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5483–5492.
Qingshan, X., & Wenbing, T. (2020). Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence.
Qingshan, X., & Wenbing, T. (2020). Planar prior assisted patchmatch multi-view stereo. In Proceedings of the AAAI Conference on Artificial Intelligence.
Qingshan, X., & Wenbing T. (2020). Pvsnet: Pixelwise visibility-aware multi-view stereo network. arXiv:2007.07714.
Zhenyu, X., Yiguang, L., Xuelei, S., Ying, W., Yunan, Z. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5981–5990.
Youze, X., Jiansheng, C., Weitao, W., Yiqing, H., Cheng, Y., Tianpeng, L., & Jiayu, B. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4312–4321.
Jianfeng, Y., Zizhuang, W., Hongwei, Y., Mingyu, D., Runze, Z., Yisong, C., Guoping, W., & Yu-Wing, T. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Proceedings of the European Conference on Computer Vision, pp. 674–689.
Jiayu, Y., Wei, M., Jose, M. A., & Miaomiao, L. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Yao, Y., Zixin, L., Shiwei, L., Tian, F., & Long, Q. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pp. 767–783.
Yao, Y., Zixin, L., Shiwei, L., Tianwei, S., Tian, F., & Long, Q. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5534.
Yao, Y., Zixin, L., Shiwei, L., Jingyang, Z., Yufan, R., Lei, Z., Tian, F., & Long, Q. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1790–1799.
Zehao, Y., Shenghua, G. (2020). Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1949–1958.
Feihu, Z., Victor, P., Ruigang, Y., & Philip, H. S. T. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194.
Jingyang, Z., Yao, Y., Shiwei, L., Zixin, L., & Tian, F. (2020). Visibility-aware multi-view stereo network. arXiv:2008.07928.
Xudong, Z., Yutao, H., Haochen, W., Xianbin, C., & Baochang, Z. (2021). Long-range attention network for multi-view stereo. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3782–3791.
E. Zheng, E. D., Jojic, V., & Frahm, J. M. (2014). Patchmatch based joint view selection and depthmap estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1517.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 62176096 and 61991412.

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Qingshan Xu, Wanjuan Su, Yuhang Qi & Wenbing Tao
ETH Zürich, Zürich, Switzerland
Marc Pollefeys
Microsoft, Redmond, USA
Marc Pollefeys

Authors

Qingshan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wanjuan Su
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Wenbing Tao
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pollefeys
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbing Tao.

Additional information

Communicated by D. Scharstein.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Q., Su, W., Qi, Y. et al. Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks. Int J Comput Vis 130, 2040–2059 (2022). https://doi.org/10.1007/s11263-022-01628-2

Download citation

Received: 27 August 2021
Accepted: 10 May 2022
Published: 19 June 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11263-022-01628-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

Abstract

Access this article

Similar content being viewed by others

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Inverse Depth Regression for Pixelwise Visibility-Aware Multi-View Stereo Networks

Abstract

Access this article

Similar content being viewed by others

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation