Uncertainty awareness with adaptive propagation for multi-view stereo

Chen, Jinguang; Yu, Zonghua; Ma, Lili; Zhang, Kaibing

doi:10.1007/s10489-023-04910-z

Uncertainty awareness with adaptive propagation for multi-view stereo

Published: 19 August 2023

Volume 53, pages 26230–26239, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jinguang Chen¹,
Zonghua Yu¹,
Lili Ma¹ &
…
Kaibing Zhang¹

363 Accesses
Explore all metrics

Abstract

The learning-based multi-view stereo method predicts depth maps across various scales in a coarse-to-fine approach, effectively enhancing both the quality and efficiency of reconstruction. Addressing the challenge of obtaining adaptable deep refinement intervals and lightweight functionality within this technique, we propose uncertainty awareness with adaptive propagation for multi-view stereo (AP-UCSNet) as a solution. When sampling the depth hypothesis, we employ convolution operations to compute a set of neighboring points that are consistently located on the same physical surface for each pixel. We then weight and average the uncertainty awareness results for each pixel and all its neighbors to derive spatially related depth hypothesis samples, a process that mitigates the noise influence in some areas of weak texture. Furthermore, we extend the network to a four-scale structure to bolster its performance. The first three scales utilize the 3D UNet structure to regularize the cost volume, whereas for the final scale, the probability volume is constructed directly from the feature map to simplify the regularization process. Our experimental results demonstrate that the proposed method delivers superior results and performance. In comparison to UCSNet, the completeness error and overall error are reduced by 0.051 mm and 0.021 mm, respectively. On the Quadro RTX 5000 GPU, predicting a depth map with a resolution of 1600 × 1184 only requires 0.57s and 4398 M, reflecting a decrease of approximately 19.7% and 34.2% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning a Deep Convolutional Network for Image Super-Resolution

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data Availability

The raw data elaborated during the current research are publicly available.

References

Duan H, Sun Y, Cheng W, Jiang D, Yun J, Liu Y, Liu YB, Zhou D (2020) Gesture recognition based on multi-modal feature weight. Concurrency Computat Pract Exper e5991. https://doi.org/10.1002/cpe.5991
Tang Y, Zhang Y, Han X, Zhang FL, Lai YK, Tong R (2022) 3D corrective nose reconstruction from a single image. Comput Visual Media 8:225–237
Article Google Scholar
Bessaoudi M, Belahcene M, Ouamane A, Chouchane A, Bourennane S (2019) Multilinear enhanced fisher discriminant analysis for robust multimodal 2D and 3D face verification. Appl Intell 49:1339–1354
Article Google Scholar
Devi PRS, Baskaran R (2021) SL2E-AFRE: personalized 3D face reconstruction using autoencoder with simultaneous subspace learning and landmark estimation. Appl Intell 51:2253–2268
Article Google Scholar
Meerits S, Thomas D, Nozick V, Saito H (2018) FusionMLS: highly dynamic 3D reconstruction with consumer-grade RGB-D cameras. Comput Visual Media 4:287–303
Article Google Scholar
Choi S, Zhou QY, Koltun V (2015) Robust reconstruction of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5556–5565
Li J, Gao W, Wu Y, Liu Y, Shen Y (2022) High-quality indoor scene 3D reconstruction with RGB-D cameras: a brief review. Comput Visual Media 8(3):369–393
Article Google Scholar
Yao Y, Luo ZX, Li SW, Fang T, Quan L (2018) MVSNet: Depth inference for unstructured multi-view stereo. In: Proceedings of the 15th European Conference on Computer Vision (ECCV), pp 785–801
Yao Y, Luo ZX, Li SW, Shen TW, Fang T, Quan L (2019) Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5520–5529
Gu XD, Fan ZW, Zhu SY, Dai ZZ, Tan FT, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2492–2501
Yang JY, Mao W, Alvarez JM, Liu MM (2020) Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4876–4885
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2524–2534
Wang F, Galliani S, Vogel C, Speciale P, Pollefeys M (2021) Patchmatchnet: Learned multi-view patchmatch stereo. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14194–14203
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp 234–241
Yu ZH, Gao SH (2020) Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1946–1955
Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944
Chen R, Han SF, Xu J, Su H (2019) Point-based multi-view stereo network. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 1538–1547
Yi H, Wei Z, Ding M, Zhang R, Chen Y, Wang G, Tai YW (2020) Pyramid multi-view stereo net with self-adaptive view aggregation. In: Proceedings of the 16th European Conference on Computer Vision (ECCV), pp 766–782
Zhang J, Yao Y, Li S, Luo Z, Fang T (2020) Visibility-aware multi-view stereo network. In: Proceedings of 31st British Machine Vision Conference (BMVC), pp 7–10
Ji MQ, Gall J, Zheng HT, Liu YB, Fang L (2017) SurfaceNet: An end-to-end 3D neural network for multiview stereopsis. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp 2326–2334
Kar A, Häne C, Malik J (2017) Learning a multi-view stereo machine. Adv Neural Inf Process Syst 30:365–376
Google Scholar
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp 66–75
Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vision 120(2):153–168
Article MathSciNet Google Scholar
Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans Graphics (ToG) 36(4):1–13
Article Google Scholar
Kingma DP, Ba JL (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 11474–11481
Chen R, Han S, Xu J, Su H (2020) Visibility-aware point-based multi-view stereo network. IEEE Trans Pattern Anal Mach Intell 43(10):3695–3708
Article Google Scholar
Su W, Xu Q, Tao W (2022) Uncertainty guided multi-view stereo network for depth estimation. IEEE Trans Circuits Syst Video Technol 32(11):7796–7808
Article Google Scholar
Campbell ND, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Proceedings of the 10th European Conference on Computer Vision (ECCV), pp 766–779
Furukawa Y, Ponce J (2009) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376
Article Google Scholar
Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920
Article Google Scholar
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp 873–881
Schönberger JL, Zheng E, Frahm JM, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: Proceedings of the 14th European Conference on Computer Vision (ECCV), pp 501–518

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 61971339 and 61471161, the Natural Science Basic Research Program of Shaanxi under Grant 2023-JC-YB-826, the Scientific Research Program Funded by Shaanxi Provincial Education Department under Grant 22JP028, and the Postgraduate Innovation Fund of Xi’an Polytechnic University under Grant chx2022019.

Author information

Authors and Affiliations

The Shaanxi Key Laboratory of Clothing Intelligence, School of Computer Science, Xi’an Polytechnic University, Xi’an, 710048, China
Jinguang Chen, Zonghua Yu, Lili Ma & Kaibing Zhang

Authors

Jinguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zonghua Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lili Ma
View author publications
You can also search for this author in PubMed Google Scholar
Kaibing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Not applicable.

Corresponding author

Correspondence to Jinguang Chen.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Ethical and informed consent for data used

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A. Feature extraction network

Our proposed method uses a four-layer Feature Pyramid Network (FPN) for feature extraction, with down sampling occurring at layers 2, 5, and 8. The specific layer configuration for the 2D FPN is provided in Table 9. Assuming the input image resolution to be H×W, the resolutions of the output multi-scale feature maps are H/8×W/8, H/4×W/4, H/2×W/2 and H×W,and the corresponding number of channels for each are 64, 32, 16, and 8, respectively.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., Yu, Z., Ma, L. et al. Uncertainty awareness with adaptive propagation for multi-view stereo. Appl Intell 53, 26230–26239 (2023). https://doi.org/10.1007/s10489-023-04910-z

Download citation

Accepted: 22 July 2023
Published: 19 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04910-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty awareness with adaptive propagation for multi-view stereo

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher’s Note

Appendix

1.1 A. Feature extraction network

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Uncertainty awareness with adaptive propagation for multi-view stereo

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher’s Note

Appendix

Appendix

1.1 A. Feature extraction network

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation