Skip to main content
Log in

Multi-distribution fitting for multi-view stereo

  • ORIGINAL PAPER
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We propose a multi-view stereo network based on multi-distribution fitting (MDF-Net), which achieves high-resolution depth map prediction with low memory and high efficiency. This method adopts a four-stage cascade structure, which mainly has the following three contributions. First, view cost regularization is proposed to weaken the influence of matching noise on building the cost volume. Second, it is suggested to adaptively calculate the depth refinement interval using multi-distribution fitting (MDF). Gaussian distribution fitting is used to refine and correct depth within a large interval, and then Laplace distribution fitting is used to accurately estimate depth within a small interval. Third, the lightweight image super-resolution network is applied to upsample the depth map in the fourth stage to reduce running time and memory requirements. The experimental results on the DTU dataset indicate that MDF-Net has achieved the most advanced results. It has the lowest memory consumption and running time among the high-resolution reconstruction methods, requiring only approximately 4.29G memory for predicting a depth map with the resolution of 1600 × 1184. In addition, we validate the generalization ability on Tanks and Temples dataset, achieving very competitive performance. The code has been released at https://github.com/zongh5a/MDF-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 766–779 (2008)

  2. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)

    Article  Google Scholar 

  3. Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012)

    Article  Google Scholar 

  4. Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 873–881 (2015)

  5. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 785–801 (2018)

  6. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5520–5529 (2019)

  7. Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1538–1547 (2019)

  8. Yu, Z., Gao, S.: Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955 (2020)

  9. Yi, H., et al.: Pyramid multi-view stereo net with self-adaptive view aggregation. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 766–782 (2020)

  10. Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2492–2501 (2020)

  11. Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4876–4885 (2020)

  12. Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchmatchNet: learned multi-view patchmatch stereo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14189–14198 (2021)

  13. Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2521–2531 (2020)

  14. Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)

  15. Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. Int. J. Comput. Vis. 131(1), 199–214 (2023)

    Article  Google Scholar 

  16. Hui, T., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 353–369 (2016)

  17. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654 (2016)

  18. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)

  19. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017)

  20. Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

  21. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference MICCAI, pp. 234–241 (2015)

  22. Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)

  23. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140 (2017)

  24. Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120(2), 153–168 (2016)

    Article  MathSciNet  Google Scholar 

  25. Yao, Y., et al.: BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1787–1796 (2020)

  26. Knapitsch, A., Park, J., Zhou, Q.-Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4), 1–13 (2017)

    Article  Google Scholar 

  27. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Proceedings of NIPS Autodiff Workshop (2017)

  28. Kingma, D.P., Ba, J.: Adam: a Method for Stochastic Optimization. arXiv, Jan. 29, 2017. Accessed: May 28, 2022. [Online]. http://arxiv.org/abs/1412.6980

  29. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. AAAI 34(07), 11474–11481 (2020)

    Article  Google Scholar 

  30. Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1–8 (2007)

  31. Yan, J., et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 674–689 (2020)

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 61971339 and 61471161, the Natural Science Basic Research Program of Shaanxi under Grant 2023-JC-YB-826, the Scientific Research Program Funded by Shaanxi Provincial Education Department under Grant 22JP028, and the Postgraduate Innovation Fund of Xi'an Polytechnic University under Grant chx2022019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinguang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Why use softmax to preprocess feature groups?

Ideally, the more similar the features of different perspectives on the same depth plane, the closer the plane is to the true depth, and the higher the probability. Therefore, we believe that a good cost measurement method should satisfy the following points. First, the performance of the similarity measurement is good. Second, the more similar the features of the depth plane are, the larger the cost is, which is proportional. Third, the value range of the cost metric is consistent with the probability value range, which is between [1]. We choose the inner product operation as the main method of the cost metric (meeting the first point). In addition, compared to vector normalization, the gradient calculation of softmax normalization is simpler, and the value range of the inner product is kept in [1]. The feature group pretreated by softmax function reduces the fitting process, making VCR-Net and 3D CNN more efficient.

1.2 Why can VCR-Net improve the cost volume quality?

The functions of VCR-Net and the 3D CNN regularization network are similar. Both regularize the cost volume to obtain the probability value of each depth plane. The difference is that VCR-Net processes the cost volume of each view separately, takes the probability volume as the weight, and uses the sigmoid activation function. The features of the noise locations are mismatched, with low similarity, and a small weight is obtained by VCR-Net. When calculating the weighted average, a low weight is given to the noise to reduce its matching cost. The VCR-Net network structure is shown in Table

Table 7 Details of VCR-Net, where G is the number of feature groups, D is the number of depth samples, and H and W are the width and height of the feature map, respectively

7.

1.3 Visualization of point cloud results

All qualitative results of our method are shown in Figs.

Fig. 6
figure 6

All qualitative results on the DTU test set. Our method achieves excellent completeness and the best overall quality

6 and

Fig. 7
figure 7

All qualitative results of the Tanks and Temples dataset. The results are comparable to those of advanced methods

7.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Yu, Z., Ma, L. et al. Multi-distribution fitting for multi-view stereo. Machine Vision and Applications 34, 93 (2023). https://doi.org/10.1007/s00138-023-01449-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01449-4

Keywords

Navigation