Multi-view stereo network with point attention

Zhao, Rong; Gu, Zhuoer; Han, Xie; He, Ligang; Sun, Fusheng; Jiao, Shichao

doi:10.1007/s10489-023-04806-y

Multi-view stereo network with point attention

Published: 26 August 2023

Volume 53, pages 26622–26636, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Rong Zhao¹,
Zhuoer Gu ORCID: orcid.org/0000-0002-2912-8660²,
Xie Han¹,
Ligang He³,
Fusheng Sun¹ &
…
Shichao Jiao¹

389 Accesses
Explore all metrics

Abstract

In recent years, learning-based multi-view stereo (MVS) reconstruction has gained superiority when compared with traditional methods. In this paper, we introduce a novel point-attention network, with an attention mechanism, based on the point cloud structure. During the reconstruction process, our method with an attention mechanism can guide the network to pay more attention to complex areas such as thin structures and low-texture surfaces. We first infer a coarse depth map using a modified classical MVS deep framework and convert it into the corresponding point cloud. Then, we add the high-frequency features and different-resolution features of the raw images to the point cloud. Finally, our network guides the weight distribution of points in different dimensions through the attention mechanism and computes the depth displacement of each point iteratively as the depth residual, which is added to the coarse depth prediction to obtain the final high-resolution depth map. Experimental results show that our proposed point-attention architecture can achieve a significant improvement in some scenes without reasonable geometrical assumptions on the DTU dataset and the Tanks and Temples dataset, suggesting that our method has a strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view Stereo Network with Attention Thin Volume

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

A robust framework for multi-view stereopsis

Article 18 March 2021

Data Availability

The datasets during the current study are available in websites http://roboimagedata.compute.dtu.dk/ and https://www.tanksandtemples.org/.

References

Furukawa Y, Hernandez C (2013) Multi-view stereo: A tutorial. Found Trends Comput Graph Vis 9(1):1–148
Google Scholar
Seitz SM, Curless B, James D, Daniel S, Richard S (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 519–528
Strecha C, Von Hansen W, Van Gool L, Fua P, Thoennessen U (2008) On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Goesele M, Snavely N, Curless B, Hoppe H, Seitz SM (2007) Multi-view stereo for community photo collections. In: IEEE International Conference on Computer Vision (ICCV), pp 1–8
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(8):1362–1376
Article Google Scholar
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp 873–881
Shan Q, Adams R, Curless B, Furukawa Y, Seitz SM (2013) The visual turning test for scene reconstruction, In: International Conference on 3D Vision (3DV), pp 25–32
Shan Q, Curless B, Furukawa Y, Hernandez C, Seitz SM (2014) Occluding contours for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4002–4009
Shen S (2013) Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Trans Image Process (TIP) 22(5):1901–1914
Article MathSciNet MATH Google Scholar
Schonberger JL, Zheng E, Frahm JM, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 501–518
Shi W, Liu S, Jiang F, Zhao D (2021) Video Compressed Sensing Using a Convolutional Neural Network. IEEE Trans Circ Syst Video Technol (TCSVT) 31(2):425–438
Article Google Scholar
Xu K, Zhang Z, Ren F (2018) LAPRAN: A scalable Laplacian pyramid reconstructive adversarial network for flexible compressive sensing reconstruction. In: European Conference on Computer Vision (ECCV), pp 491–507
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(6):1137–1149
Article Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767.[Online]. Available: https://doi.org/10.48550/arXiv.1804.02767
Yang C, Wu W, Wang Y et al (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52:6905–6914
Article Google Scholar
Jing L, Chen Y, Tian Y (2020) Coarse-to-fifine semantic segmentation from image-level labels. IEEE Trans Image Process (TIP) 29:225–236
Article MATH Google Scholar
Tong Z, Xu P, Denoeux T (2021) Evidential fully convolutional network for semantic segmentation. Appl Intell 51:6376–6399
Article Google Scholar
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph Attention Convolution for Point Cloud Semantic Segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10296–10305
Laga H, Jospin LV, Boussaid F, Bennamoun M (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 44(4):1738–1764
Article Google Scholar
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2019) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2524–2534
Song M, Lim S, Kim W (2021) Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans Circ Syst Video Technol (TCSVT) 1(1):99
Google Scholar
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: Depth inference for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 767–783
Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International Conference on Computer Vision (ICCV), pp 1538–1547
Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis (IJCV) 120(2):153–168
Article MathSciNet Google Scholar
Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans Graph (TOG) 36(4):1–13
Article Google Scholar
Simonovsky M, Komodakis N (2017) Dynamic edge conditioned filters in convolutional neural networks on graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 29–38
Xie CW, Zhou HY, Wu JX (2018) Vortex Pooling: Improving Context Representation in Semantic Segmentation. arXiv: 1804.06242.[Online]. Available: https://doi.org/10.48550/arXiv.1804.06242
Xu QS, Tao WB (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5478–5487
Xu QS, Tao WB (2020) Planar prior assisted patchmatch multi-view stereo. In: AAAI Conference on Artificial Intelligence (AAAI), pp 12516–12523
Vogiatzis G, Hernndez Esteban C, Torr PHS, Cipolla R (2007) Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(12):2241–2246
Article Google Scholar
Furukawa Y, Ponce J (2006) Carved visual hulls for image-based modeling. Int J Comput Vis (IJCV) 81:53–67
Article Google Scholar
Pons JP, Keriven R, Faugeras OD (2005) Modelling dynamic scenes by registering multi-view image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 822–827
Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans Image Process (TIP) 25(2):864–877
Article MathSciNet MATH Google Scholar
Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Int (TPAMI) 33(6):1161–1174
Article Google Scholar
Hiep VH, Keriven R, Labatut P, Pons J (2009) Towards high-resolution large-scale multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1430–1437
Zheng E, Dunn E, Jojic V, Frahm JM (2014) Patchmatch based joint view selection and depthmap estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1510–1517
Hane C, Zach C, Cohen A, Pollefeys M (2017) Dense Semantic 3D Reconstruction. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(9):1730–1743
Article Google Scholar
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In: European Conference on Computer Vision (ECCV), pp 628–644
Kar A, Hane C, Malik J (2017) Learning a multi-view stereo machine. In: Neural Information Processing Systems (NIPS), pp 365–376
Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) SurfaceNet: An End-to-End 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp 2307–2315
Paschalidou D, Ulusoy O, Schmitt C, Gool LV, Geiger A (2018) Raynet: Learning volumetric 3d reconstruction with ray potentials. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV), pp 3897–3906
Xie H, Yao H, Zhang S et al (2020) Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. Int J Comput Vision (IJCV) 128(12):2919–2935
Article Google Scholar
Huang P-H, Matzen K, Kopf J, Ahuja N, Huang J-B (2018) Deepmvs: Learning multi-view stereopsis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2820–2830
Gu XD, Fan ZW, Zhu SY, Dai ZZ, Tan FT, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2495–2504
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent MVSNet for high-resolution multi-view stereo depth inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5525–5534
Xue Y, Chen J, Wan W, Huang Y, Yu C, Li T, Bao J (2019) MVSCRF: Learning multi-view stereo with conditional random fields. In: IEEE International Conference on Computer Vision (ICCV), pp 4312–4321
Yang JY, Mao W, Alvarez JM, Liu MM (2020) Cost volume pyramid based depth inference for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4877–4886
Yu ZH, Gao SH (2020) Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1949–1958
Yi H, Wei Z, Ding M et al (2020) Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation. In: European Conference on Computer Vision (ECCV), pp 766–782
Luo KY, Guan T, Ju LL, Wang YS, Chen Z, Luo YW (2020) Attention-aware multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1590–1599
Chen PH, Yang HC, Chen KW, Chen YS (2020) Mvsnet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process (TIP)29:7261–7263
Yang ZP, Ren ZL, Shan Q, Huang QX (2018) MVS2D: Efficient multi-view stereo via attention-driven 2D convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8564–8574
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85
Charles RQ, Li Y, Hao S, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS), pp 5105–5114
Hu M, Ye H, Cao F (2021) Convolutional neural networks with hybrid weights for 3D point cloud classification. Appl Intell 51:6983–6996
Article Google Scholar
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph attention convolution for point cloud semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10288–10297
Xiao M, Zheng S, Liu C, Wang Y, He D, Ke G, Bian J, Lin Z, Liu TY (2020) Invertible Image Rescaling. In: European Conference on Computer Vision (ECCV), pp 126–144
Campbell NDF, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 766–799
Tola E, Strecha C, Fua P (2012) Efficient Large-scale Multi-view Stereo for Ultra High-resolution Image Sets. Mach Vis Appl (MVA) 23(5):903–920
Article Google Scholar
Luo K, Guan T, Ju L et al (2019) P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International Conference on Computer Vision (ICCV), pp 10452–10461
Fujitomi T, Ito S, Kaneko N, Sumi K (2021) Bi-directional recurrent MVSNet for high-resolution multi-view stereo. In: International Conference on Machine Vision Applications (MVA), pp 1–5
Lin K, Li L, Zhang J, Zheng X, Wu S (2021) High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1-6
Wang F, Galiani S, Vogel C et al (2021) IterMVS: Iterative Probability Estimation for Effificient Multi-View Stereo. arXiv: 2112.05126.[Online]. Available: https://doi.org/10.48550/arXiv.2112.05126

Download references

Funding

This work was supported by the National Key R \(\&\) D Program of China (No. 2018YFB2101504), Key R \(\&\) D Program of Shanxi Province of China (No. 201903D121147), Natural Science Foundation of Shanxi Province of China (No. 201901D111150), and Research Project Supported by Shanxi Scholarship Council of China (No. 2020–113).

Author information

Authors and Affiliations

School of Computer Science and Techology, North University of China, Taiyuan, China
Rong Zhao, Xie Han, Fusheng Sun & Shichao Jiao
National Innovation Center for Digital Fishery, China Agricultural University, Beijing, China
Zhuoer Gu
Department of Computer, University of Warwick, Coventry, UK
Ligang He

Authors

Rong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoer Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xie Han
View author publications
You can also search for this author in PubMed Google Scholar
Ligang He
View author publications
You can also search for this author in PubMed Google Scholar
Fusheng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoer Gu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, R., Gu, Z., Han, X. et al. Multi-view stereo network with point attention. Appl Intell 53, 26622–26636 (2023). https://doi.org/10.1007/s10489-023-04806-y

Download citation

Accepted: 16 June 2023
Published: 26 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04806-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view stereo network with point attention

Abstract

Access this article

Similar content being viewed by others

Multi-view Stereo Network with Attention Thin Volume

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

A robust framework for multi-view stereopsis

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view stereo network with point attention

Abstract

Access this article

Similar content being viewed by others

Multi-view Stereo Network with Attention Thin Volume

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

A robust framework for multi-view stereopsis

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation