Skip to main content
Log in

Multi-view stereo network with point attention

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, learning-based multi-view stereo (MVS) reconstruction has gained superiority when compared with traditional methods. In this paper, we introduce a novel point-attention network, with an attention mechanism, based on the point cloud structure. During the reconstruction process, our method with an attention mechanism can guide the network to pay more attention to complex areas such as thin structures and low-texture surfaces. We first infer a coarse depth map using a modified classical MVS deep framework and convert it into the corresponding point cloud. Then, we add the high-frequency features and different-resolution features of the raw images to the point cloud. Finally, our network guides the weight distribution of points in different dimensions through the attention mechanism and computes the depth displacement of each point iteratively as the depth residual, which is added to the coarse depth prediction to obtain the final high-resolution depth map. Experimental results show that our proposed point-attention architecture can achieve a significant improvement in some scenes without reasonable geometrical assumptions on the DTU dataset and the Tanks and Temples dataset, suggesting that our method has a strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets during the current study are available in websites http://roboimagedata.compute.dtu.dk/ and https://www.tanksandtemples.org/.

References

  1. Furukawa Y, Hernandez C (2013) Multi-view stereo: A tutorial. Found Trends Comput Graph Vis 9(1):1–148

    Google Scholar 

  2. Seitz SM, Curless B, James D, Daniel S, Richard S (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 519–528

  3. Strecha C, Von Hansen W, Van Gool L, Fua P, Thoennessen U (2008) On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8

  4. Goesele M, Snavely N, Curless B, Hoppe H, Seitz SM (2007) Multi-view stereo for community photo collections. In: IEEE International Conference on Computer Vision (ICCV), pp 1–8

  5. Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(8):1362–1376

    Article  Google Scholar 

  6. Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp 873–881

  7. Shan Q, Adams R, Curless B, Furukawa Y, Seitz SM (2013) The visual turning test for scene reconstruction, In: International Conference on 3D Vision (3DV), pp 25–32

  8. Shan Q, Curless B, Furukawa Y, Hernandez C, Seitz SM (2014) Occluding contours for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4002–4009

  9. Shen S (2013) Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Trans Image Process (TIP) 22(5):1901–1914

    Article  MathSciNet  MATH  Google Scholar 

  10. Schonberger JL, Zheng E, Frahm JM, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 501–518

  11. Shi W, Liu S, Jiang F, Zhao D (2021) Video Compressed Sensing Using a Convolutional Neural Network. IEEE Trans Circ Syst Video Technol (TCSVT) 31(2):425–438

    Article  Google Scholar 

  12. Xu K, Zhang Z, Ren F (2018) LAPRAN: A scalable Laplacian pyramid reconstructive adversarial network for flexible compressive sensing reconstruction. In: European Conference on Computer Vision (ECCV), pp 491–507

  13. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(6):1137–1149

    Article  Google Scholar 

  14. Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767.[Online]. Available: https://doi.org/10.48550/arXiv.1804.02767

  15. Yang C, Wu W, Wang Y et al (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52:6905–6914

    Article  Google Scholar 

  16. Jing L, Chen Y, Tian Y (2020) Coarse-to-fifine semantic segmentation from image-level labels. IEEE Trans Image Process (TIP) 29:225–236

    Article  MATH  Google Scholar 

  17. Tong Z, Xu P, Denoeux T (2021) Evidential fully convolutional network for semantic segmentation. Appl Intell 51:6376–6399

    Article  Google Scholar 

  18. Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph Attention Convolution for Point Cloud Semantic Segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10296–10305

  19. Laga H, Jospin LV, Boussaid F, Bennamoun M (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 44(4):1738–1764

    Article  Google Scholar 

  20. Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2019) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2524–2534

  21. Song M, Lim S, Kim W (2021) Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans Circ Syst Video Technol (TCSVT) 1(1):99

    Google Scholar 

  22. Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: Depth inference for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 767–783

  23. Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International Conference on Computer Vision (ICCV), pp 1538–1547

  24. Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis (IJCV) 120(2):153–168

    Article  MathSciNet  Google Scholar 

  25. Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans Graph (TOG) 36(4):1–13

    Article  Google Scholar 

  26. Simonovsky M, Komodakis N (2017) Dynamic edge conditioned filters in convolutional neural networks on graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 29–38

  27. Xie CW, Zhou HY, Wu JX (2018) Vortex Pooling: Improving Context Representation in Semantic Segmentation. arXiv: 1804.06242.[Online]. Available: https://doi.org/10.48550/arXiv.1804.06242

  28. Xu QS, Tao WB (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5478–5487

  29. Xu QS, Tao WB (2020) Planar prior assisted patchmatch multi-view stereo. In: AAAI Conference on Artificial Intelligence (AAAI), pp 12516–12523

  30. Vogiatzis G, Hernndez Esteban C, Torr PHS, Cipolla R (2007) Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(12):2241–2246

    Article  Google Scholar 

  31. Furukawa Y, Ponce J (2006) Carved visual hulls for image-based modeling. Int J Comput Vis (IJCV) 81:53–67

    Article  Google Scholar 

  32. Pons JP, Keriven R, Faugeras OD (2005) Modelling dynamic scenes by registering multi-view image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 822–827

  33. Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans Image Process (TIP) 25(2):864–877

    Article  MathSciNet  MATH  Google Scholar 

  34. Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Int (TPAMI) 33(6):1161–1174

    Article  Google Scholar 

  35. Hiep VH, Keriven R, Labatut P, Pons J (2009) Towards high-resolution large-scale multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1430–1437

  36. Zheng E, Dunn E, Jojic V, Frahm JM (2014) Patchmatch based joint view selection and depthmap estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1510–1517

  37. Hane C, Zach C, Cohen A, Pollefeys M (2017) Dense Semantic 3D Reconstruction. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(9):1730–1743

    Article  Google Scholar 

  38. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In: European Conference on Computer Vision (ECCV), pp 628–644

  39. Kar A, Hane C, Malik J (2017) Learning a multi-view stereo machine. In: Neural Information Processing Systems (NIPS), pp 365–376

  40. Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) SurfaceNet: An End-to-End 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp 2307–2315

  41. Paschalidou D, Ulusoy O, Schmitt C, Gool LV, Geiger A (2018) Raynet: Learning volumetric 3d reconstruction with ray potentials. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV), pp 3897–3906

  42. Xie H, Yao H, Zhang S et al (2020) Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. Int J Comput Vision (IJCV) 128(12):2919–2935

    Article  Google Scholar 

  43. Huang P-H, Matzen K, Kopf J, Ahuja N, Huang J-B (2018) Deepmvs: Learning multi-view stereopsis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2820–2830

  44. Gu XD, Fan ZW, Zhu SY, Dai ZZ, Tan FT, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2495–2504

  45. Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent MVSNet for high-resolution multi-view stereo depth inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5525–5534

  46. Xue Y, Chen J, Wan W, Huang Y, Yu C, Li T, Bao J (2019) MVSCRF: Learning multi-view stereo with conditional random fields. In: IEEE International Conference on Computer Vision (ICCV), pp 4312–4321

  47. Yang JY, Mao W, Alvarez JM, Liu MM (2020) Cost volume pyramid based depth inference for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4877–4886

  48. Yu ZH, Gao SH (2020) Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1949–1958

  49. Yi H, Wei Z, Ding M et al (2020) Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation. In: European Conference on Computer Vision (ECCV), pp 766–782

  50. Luo KY, Guan T, Ju LL, Wang YS, Chen Z, Luo YW (2020) Attention-aware multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1590–1599

  51. Chen PH, Yang HC, Chen KW, Chen YS (2020) Mvsnet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process (TIP)29:7261–7263

  52. Yang ZP, Ren ZL, Shan Q, Huang QX (2018) MVS2D: Efficient multi-view stereo via attention-driven 2D convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8564–8574

  53. Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85

  54. Charles RQ, Li Y, Hao S, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS), pp 5105–5114

  55. Hu M, Ye H, Cao F (2021) Convolutional neural networks with hybrid weights for 3D point cloud classification. Appl Intell 51:6983–6996

    Article  Google Scholar 

  56. Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph attention convolution for point cloud semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10288–10297

  57. Xiao M, Zheng S, Liu C, Wang Y, He D, Ke G, Bian J, Lin Z, Liu TY (2020) Invertible Image Rescaling. In: European Conference on Computer Vision (ECCV), pp 126–144

  58. Campbell NDF, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 766–799

  59. Tola E, Strecha C, Fua P (2012) Efficient Large-scale Multi-view Stereo for Ultra High-resolution Image Sets. Mach Vis Appl (MVA) 23(5):903–920

    Article  Google Scholar 

  60. Luo K, Guan T, Ju L et al (2019) P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International Conference on Computer Vision (ICCV), pp 10452–10461

  61. Fujitomi T, Ito S, Kaneko N, Sumi K (2021) Bi-directional recurrent MVSNet for high-resolution multi-view stereo. In: International Conference on Machine Vision Applications (MVA), pp 1–5

  62. Lin K, Li L, Zhang J, Zheng X, Wu S (2021) High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1-6

  63. Wang F, Galiani S, Vogel C et al (2021) IterMVS: Iterative Probability Estimation for Effificient Multi-View Stereo. arXiv: 2112.05126.[Online]. Available: https://doi.org/10.48550/arXiv.2112.05126

Download references

Funding

This work was supported by the National Key R \(\&\) D Program of China (No. 2018YFB2101504), Key R \(\&\) D Program of Shanxi Province of China (No. 201903D121147), Natural Science Foundation of Shanxi Province of China (No. 201901D111150), and Research Project Supported by Shanxi Scholarship Council of China (No. 2020–113).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuoer Gu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, R., Gu, Z., Han, X. et al. Multi-view stereo network with point attention. Appl Intell 53, 26622–26636 (2023). https://doi.org/10.1007/s10489-023-04806-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04806-y

Keywords

Navigation