Abstract
In recent years, learning-based multi-view stereo (MVS) reconstruction has gained superiority when compared with traditional methods. In this paper, we introduce a novel point-attention network, with an attention mechanism, based on the point cloud structure. During the reconstruction process, our method with an attention mechanism can guide the network to pay more attention to complex areas such as thin structures and low-texture surfaces. We first infer a coarse depth map using a modified classical MVS deep framework and convert it into the corresponding point cloud. Then, we add the high-frequency features and different-resolution features of the raw images to the point cloud. Finally, our network guides the weight distribution of points in different dimensions through the attention mechanism and computes the depth displacement of each point iteratively as the depth residual, which is added to the coarse depth prediction to obtain the final high-resolution depth map. Experimental results show that our proposed point-attention architecture can achieve a significant improvement in some scenes without reasonable geometrical assumptions on the DTU dataset and the Tanks and Temples dataset, suggesting that our method has a strong generalization ability.
Similar content being viewed by others
Data Availability
The datasets during the current study are available in websites http://roboimagedata.compute.dtu.dk/ and https://www.tanksandtemples.org/.
References
Furukawa Y, Hernandez C (2013) Multi-view stereo: A tutorial. Found Trends Comput Graph Vis 9(1):1–148
Seitz SM, Curless B, James D, Daniel S, Richard S (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 519–528
Strecha C, Von Hansen W, Van Gool L, Fua P, Thoennessen U (2008) On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Goesele M, Snavely N, Curless B, Hoppe H, Seitz SM (2007) Multi-view stereo for community photo collections. In: IEEE International Conference on Computer Vision (ICCV), pp 1–8
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(8):1362–1376
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp 873–881
Shan Q, Adams R, Curless B, Furukawa Y, Seitz SM (2013) The visual turning test for scene reconstruction, In: International Conference on 3D Vision (3DV), pp 25–32
Shan Q, Curless B, Furukawa Y, Hernandez C, Seitz SM (2014) Occluding contours for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4002–4009
Shen S (2013) Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Trans Image Process (TIP) 22(5):1901–1914
Schonberger JL, Zheng E, Frahm JM, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 501–518
Shi W, Liu S, Jiang F, Zhao D (2021) Video Compressed Sensing Using a Convolutional Neural Network. IEEE Trans Circ Syst Video Technol (TCSVT) 31(2):425–438
Xu K, Zhang Z, Ren F (2018) LAPRAN: A scalable Laplacian pyramid reconstructive adversarial network for flexible compressive sensing reconstruction. In: European Conference on Computer Vision (ECCV), pp 491–507
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(6):1137–1149
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767.[Online]. Available: https://doi.org/10.48550/arXiv.1804.02767
Yang C, Wu W, Wang Y et al (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52:6905–6914
Jing L, Chen Y, Tian Y (2020) Coarse-to-fifine semantic segmentation from image-level labels. IEEE Trans Image Process (TIP) 29:225–236
Tong Z, Xu P, Denoeux T (2021) Evidential fully convolutional network for semantic segmentation. Appl Intell 51:6376–6399
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph Attention Convolution for Point Cloud Semantic Segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10296–10305
Laga H, Jospin LV, Boussaid F, Bennamoun M (2020) A survey on deep learning techniques for stereo-based depth estimation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 44(4):1738–1764
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2019) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2524–2534
Song M, Lim S, Kim W (2021) Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans Circ Syst Video Technol (TCSVT) 1(1):99
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: Depth inference for unstructured multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 767–783
Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International Conference on Computer Vision (ICCV), pp 1538–1547
Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis (IJCV) 120(2):153–168
Knapitsch A, Park J, Zhou QY, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans Graph (TOG) 36(4):1–13
Simonovsky M, Komodakis N (2017) Dynamic edge conditioned filters in convolutional neural networks on graphs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 29–38
Xie CW, Zhou HY, Wu JX (2018) Vortex Pooling: Improving Context Representation in Semantic Segmentation. arXiv: 1804.06242.[Online]. Available: https://doi.org/10.48550/arXiv.1804.06242
Xu QS, Tao WB (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5478–5487
Xu QS, Tao WB (2020) Planar prior assisted patchmatch multi-view stereo. In: AAAI Conference on Artificial Intelligence (AAAI), pp 12516–12523
Vogiatzis G, Hernndez Esteban C, Torr PHS, Cipolla R (2007) Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29(12):2241–2246
Furukawa Y, Ponce J (2006) Carved visual hulls for image-based modeling. Int J Comput Vis (IJCV) 81:53–67
Pons JP, Keriven R, Faugeras OD (2005) Modelling dynamic scenes by registering multi-view image sequences. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 822–827
Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans Image Process (TIP) 25(2):864–877
Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Int (TPAMI) 33(6):1161–1174
Hiep VH, Keriven R, Labatut P, Pons J (2009) Towards high-resolution large-scale multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1430–1437
Zheng E, Dunn E, Jojic V, Frahm JM (2014) Patchmatch based joint view selection and depthmap estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1510–1517
Hane C, Zach C, Cohen A, Pollefeys M (2017) Dense Semantic 3D Reconstruction. IEEE Trans Pattern Anal Mach Intell (TPAMI) 39(9):1730–1743
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In: European Conference on Computer Vision (ECCV), pp 628–644
Kar A, Hane C, Malik J (2017) Learning a multi-view stereo machine. In: Neural Information Processing Systems (NIPS), pp 365–376
Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) SurfaceNet: An End-to-End 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp 2307–2315
Paschalidou D, Ulusoy O, Schmitt C, Gool LV, Geiger A (2018) Raynet: Learning volumetric 3d reconstruction with ray potentials. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV), pp 3897–3906
Xie H, Yao H, Zhang S et al (2020) Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. Int J Comput Vision (IJCV) 128(12):2919–2935
Huang P-H, Matzen K, Kopf J, Ahuja N, Huang J-B (2018) Deepmvs: Learning multi-view stereopsis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2820–2830
Gu XD, Fan ZW, Zhu SY, Dai ZZ, Tan FT, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2495–2504
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent MVSNet for high-resolution multi-view stereo depth inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5525–5534
Xue Y, Chen J, Wan W, Huang Y, Yu C, Li T, Bao J (2019) MVSCRF: Learning multi-view stereo with conditional random fields. In: IEEE International Conference on Computer Vision (ICCV), pp 4312–4321
Yang JY, Mao W, Alvarez JM, Liu MM (2020) Cost volume pyramid based depth inference for multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4877–4886
Yu ZH, Gao SH (2020) Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1949–1958
Yi H, Wei Z, Ding M et al (2020) Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation. In: European Conference on Computer Vision (ECCV), pp 766–782
Luo KY, Guan T, Ju LL, Wang YS, Chen Z, Luo YW (2020) Attention-aware multi-view stereo. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1590–1599
Chen PH, Yang HC, Chen KW, Chen YS (2020) Mvsnet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process (TIP)29:7261–7263
Yang ZP, Ren ZL, Shan Q, Huang QX (2018) MVS2D: Efficient multi-view stereo via attention-driven 2D convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8564–8574
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 77–85
Charles RQ, Li Y, Hao S, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS), pp 5105–5114
Hu M, Ye H, Cao F (2021) Convolutional neural networks with hybrid weights for 3D point cloud classification. Appl Intell 51:6983–6996
Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019) Graph attention convolution for point cloud semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10288–10297
Xiao M, Zheng S, Liu C, Wang Y, He D, Ke G, Bian J, Lin Z, Liu TY (2020) Invertible Image Rescaling. In: European Conference on Computer Vision (ECCV), pp 126–144
Campbell NDF, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European Conference on Computer Vision (ECCV), pp 766–799
Tola E, Strecha C, Fua P (2012) Efficient Large-scale Multi-view Stereo for Ultra High-resolution Image Sets. Mach Vis Appl (MVA) 23(5):903–920
Luo K, Guan T, Ju L et al (2019) P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International Conference on Computer Vision (ICCV), pp 10452–10461
Fujitomi T, Ito S, Kaneko N, Sumi K (2021) Bi-directional recurrent MVSNet for high-resolution multi-view stereo. In: International Conference on Machine Vision Applications (MVA), pp 1–5
Lin K, Li L, Zhang J, Zheng X, Wu S (2021) High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1-6
Wang F, Galiani S, Vogel C et al (2021) IterMVS: Iterative Probability Estimation for Effificient Multi-View Stereo. arXiv: 2112.05126.[Online]. Available: https://doi.org/10.48550/arXiv.2112.05126
Funding
This work was supported by the National Key R \(\&\) D Program of China (No. 2018YFB2101504), Key R \(\&\) D Program of Shanxi Province of China (No. 201903D121147), Natural Science Foundation of Shanxi Province of China (No. 201901D111150), and Research Project Supported by Shanxi Scholarship Council of China (No. 2020–113).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, R., Gu, Z., Han, X. et al. Multi-view stereo network with point attention. Appl Intell 53, 26622–26636 (2023). https://doi.org/10.1007/s10489-023-04806-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04806-y