Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network

Wu, Rongcheng; Wang, Mingzhe; Li, Zhidong; Zhou, Jianlong; Chen, Fang; Wang, Xuan; Sun, Changming

doi:10.1007/s11263-023-01953-0

Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network

Published: 24 November 2023

Volume 132, pages 1484–1501, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Rongcheng Wu²^na1,
Mingzhe Wang ORCID: orcid.org/0000-0003-2876-0856¹^na1,
Zhidong Li²,
Jianlong Zhou²,
Fang Chen²,
Xuan Wang³ &
…
Changming Sun⁴

648 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep learning based stereo matching algorithms have been extensively researched in areas such as robot vision and autonomous driving due to their promising performance. However, these algorithms require a large amount of labeled data for training and encounter inadequate domain adaptability, which degraded their applicability and flexibility. This work addresses the two deficiencies and proposes a few-shot trained stereo matching model with high domain adaptability. In the model, stereo matching is formulated as the problem of dynamic optimization in the possible solution space, and a multi-scale matching cost computation method is proposed to obtain the possible solution space for the application scenes. Moreover, an adaptive recurrent 3D convolutional neural network is designed to determine the optimal solution from the possible solution space. Experimental results demonstrate that the proposed model outperforms the state-of-the-art stereo matching algorithms in terms of training requirements and domain adaptability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Open-World Stereo Video Matching with Deep RNN

Faster Self-adaptive Deep Stereo

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Article 13 August 2022

Data Availability Statements

The datasets analyzed in this paper are available in: 1. Middlebury Stereo Dataset, from Middlebury College, available at https://vision.middlebury.edu/stereo/data/. 2. KITTI Stereo Dataset, from University of Tübingen, available at https://www.cvlibs.net/datasets/kitti/. 3. WHU Stereo 2020 Dataset, from Wuhan University, available at http://gpcv.whu.edu.cn/data/WHU_MVS_Stereo_dataset.html. 4. WHU Stereo 2023 Dataset, from Wuhan University, available at https://github.com/Sheng029/WHU-Stereo. 5. SceneFlow Dataset, from University of Freiburg, available at https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html. 6. ETH3D Dataset, from ETH Zürich, available at https://www.eth3d.net/.

Notes

Available at https://vision.middlebury.edu/stereo/.
Available at https://www.cvlibs.net/datasets/kitti/.
Available at http://gpcv.whu.edu.cn/data/WHU_MVS_Stereo_dataset.html.
Available at https://github.com/Sheng029/WHU-Stereo.
Available at https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html.
Available at https://www.eth3d.net/.

References

Cai, C., Poggi, M., Mattoccia, S., & Mordohai, P. (2020). Matching-space stereo networks for cross-domain generalization. In 2020 International conference on 3D vision (3DV) (pp. 364–373).
Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary robust independent elementary features. In European Conference on computer vision-ECCV 2010 (pp. 778–792).
Chang, J.-R., & Chen, Y.-S. (2018). Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5410–5418).
Chang, T., Yang, X., Zhang, T., & Wang, M. (2023). Domain generalized stereo matching via hierarchical visual transformation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9559–9568).
Chen, Z., Sun, X., Wang, L., Yu, Y., & Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 972–980).
Chuah, W., Tennakoon, R., Hoseinnezhad, R., Bab-Hadiashar, A., & Suter, D. (2022). ITSA: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13012–13022).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the KITTI vision benchmark suite. In Conference on computer vision and pattern recognition (CVPR) (pp. 3354–3361).
Gidaris, S., & Komodakis, N. (2017). Detect, replace, refine: Deep structured prediction for pixel wise labeling. In2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7187–7196).
Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3273–3282).
Hallek, M., Boukamcha, H., Mtibaa, A., & Atri, M. (2022). Dynamic programming with adaptive and self-adjusting penalty for real-time accurate stereo matching. Journal of Real-Time Image Processing, 19(2), 233–245.
Article Google Scholar
He, S., Li, S., Jiang, S., & Jiang, W. (2022). HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of highresolution satellite stereo images. ISPRS Journal of Photogrammetry and Remote Sensing, 188, 314–330.
Article Google Scholar
Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
Article Google Scholar
Hirschmuller, H., & Scharstein, D. (2007). Evaluation of cost functions for stereo matching. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8).
Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., & Liu, W. (2018). Leftright comparative recurrent model for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3838–3846).
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 66–75).
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., & Izadi, S. (2018). StereoNet: Guided hierarchical refinement for real-time edge-aware depth prediction. In Proceedings of the European conference on computer vision (ECCV) (pp. 573–590).
Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In 18th International conference on pattern recognition (ICPR’06) (Vol. 3, pp. 15–18).
Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. In Proceedings Eighth IEEE international conference on computer vision-ICCV 2001 (Vol. 2, pp. 508–515).
Laga, H., Jospin, L. V., Boussaid, F., & Bennamoun, M. (2022). A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4), 1738–1764.
Article Google Scholar
Li, S., He, S., Jiang, S., Jiang, W., & Zhang, L. (2023). WHU-stereo: A challenging benchmark for stereo matching of high-resolution satellite images. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–14.
Google Scholar
Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., et al. (2018). Learning for disparity estimation through feature constancy. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2811–2820).
Liu, B., Yu, H., & Qi, G. (2022). GraftNet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13012–13021).
Liu, J., & Ji, S. (2020). A novel recurrent encoder-decoder structure for largescale multi-view stereo reconstruction from an open aerial dataset. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6049–6058).
Liu, R., Yang, C., Sun, W.,Wang, X., & Li, H. (2020). StereoGAN: Bridging synthetic-toreal domain gap by joint optimization of domain translation and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 12757–12766).
Lu, C., Uchiyama, H., Thomas, D., Shimada, A., & Taniguchi, R.-I. (2018). Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 1844:1-1844:12.
Article Google Scholar
Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5695–5703).
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4040–4048).
Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., & Zhang, X. (2011). On building an accurate stereo matching system on graphics hardware. In 2011 IEEE international conference on computer vision workshops (pp. 467–474).
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In Conference on computer vision and pattern recognition (CVPR) (pp. 3061–3070).
Pang, J., Sun, W., Ren, J. S., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops (pp. 887– 895).
Pang, J., Sun, W., Yang, C., Ren, J., Xiao, R., Zeng, J., & Lin, L. (2018). Zoom and learn: Generalizing deep stereo matching to novel domains. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2070–2079).
Poggi, M., Tosi, F., Batsos, K., Mordohai, P., & Mattoccia, S. (2022). On the synergies between machine learning and binocular stereo for depth estimation from images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5314–5334.
Google Scholar
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition (pp. 31–42).
Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multicamera videos. In IEEE/CVF conference on computer vision and pattern recognition (CVPR).
Seki, A., & Pollefeys, M. (2017). SGM-Nets: Semiglobal matching with neural networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6640– 6649).
Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4641–4650).
Shen, Z., Dai, Y., & Rao, Z. (2021). CFNet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13906–13915).
Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., & Shi, J. (2021). AdaStereo: A simple and efficient approach for adaptive stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 10328–10337).
Song, X., Zhao, X., Fang, L., Hu, H., & Yu, Y. (2020). EdgeStereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision, 128(4), 910–930.
Vassiliadis, S., Hakkennes, E., Wong, J., & Pechanek, G. (1998). The sum-absolutedifference motion estimation accelerator. In Proceedings of the 24th conference on EUROMICRO (Vol. 2, pp. 559–566).
Xu, H., & Zhang, J. (2020). AANet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1959–1968).
Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). SegStereo: Exploiting semantic information for disparity estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 660–676).
Yu, L., Wang, Y., Wu, Y., & Jia, Y. (2018). Deep stereo matching with explicit cost aggregation sub-architecture. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, pp. 7517–7524).
Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In European conference on computer vision-ECCV ’94 (pp. 151–158).
Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1592– 1599).
Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H. (2019). GA-Net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., & Torr, P. (2020). Domain-invariant stereo matching networks. In European conference on computer vision-ECCV 2020 (pp. 420–439).
Zhang, J., Wang, X., Bai, X., Wang, C., Huang, L., Chen, Y., et al. (2022). Revisiting domain generalized stereo matching networks from a feature consistency perspective. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13001–13011).

Download references

Acknowledgements

This research is funded by the Digital Finance CRC supported by the Cooperative Research Centres program, an Australian Government initiative, and the National Natural Science Foundation of China (No. 62306223). We thank the anonymous reviewers and the editor for their constructive comments to further improve the quality of this paper.

Author information

Rongcheng Wu and Mingzhe Wang have contributed equally to this work.

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, 710126, China
Mingzhe Wang
Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, 2007, Australia
Rongcheng Wu, Zhidong Li, Jianlong Zhou & Fang Chen
School of Physics and Information Technology, Shaanxi Normal University, Xi’an, 710119, China
Xuan Wang
CSIRO Data61, Epping, NSW, 1710, Australia
Changming Sun

Authors

Rongcheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianlong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Changming Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingzhe Wang.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Communicated by Bernhard Egger.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, R., Wang, M., Li, Z. et al. Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network. Int J Comput Vis 132, 1484–1501 (2024). https://doi.org/10.1007/s11263-023-01953-0

Download citation

Received: 16 December 2022
Accepted: 30 October 2023
Published: 24 November 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11263-023-01953-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network

Abstract

Access this article

Similar content being viewed by others

Open-World Stereo Video Matching with Deep RNN

Faster Self-adaptive Deep Stereo

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Data Availability Statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network

Abstract

Access this article

Similar content being viewed by others

Open-World Stereo Video Matching with Deep RNN

Faster Self-adaptive Deep Stereo

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Data Availability Statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation