Skip to main content
Log in

Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Deep learning based stereo matching algorithms have been extensively researched in areas such as robot vision and autonomous driving due to their promising performance. However, these algorithms require a large amount of labeled data for training and encounter inadequate domain adaptability, which degraded their applicability and flexibility. This work addresses the two deficiencies and proposes a few-shot trained stereo matching model with high domain adaptability. In the model, stereo matching is formulated as the problem of dynamic optimization in the possible solution space, and a multi-scale matching cost computation method is proposed to obtain the possible solution space for the application scenes. Moreover, an adaptive recurrent 3D convolutional neural network is designed to determine the optimal solution from the possible solution space. Experimental results demonstrate that the proposed model outperforms the state-of-the-art stereo matching algorithms in terms of training requirements and domain adaptability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability Statements

The datasets analyzed in this paper are available in: 1. Middlebury Stereo Dataset, from Middlebury College, available at https://vision.middlebury.edu/stereo/data/. 2. KITTI Stereo Dataset, from University of Tübingen, available at https://www.cvlibs.net/datasets/kitti/. 3. WHU Stereo 2020 Dataset, from Wuhan University, available at http://gpcv.whu.edu.cn/data/WHU_MVS_Stereo_dataset.html. 4. WHU Stereo 2023 Dataset, from Wuhan University, available at https://github.com/Sheng029/WHU-Stereo. 5. SceneFlow Dataset, from University of Freiburg, available at https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html. 6. ETH3D Dataset, from ETH Zürich, available at https://www.eth3d.net/.

Notes

  1. Available at https://vision.middlebury.edu/stereo/.

  2. Available at https://www.cvlibs.net/datasets/kitti/.

  3. Available at http://gpcv.whu.edu.cn/data/WHU_MVS_Stereo_dataset.html.

  4. Available at https://github.com/Sheng029/WHU-Stereo.

  5. Available at https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html.

  6. Available at https://www.eth3d.net/.

References

  • Cai, C., Poggi, M., Mattoccia, S., & Mordohai, P. (2020). Matching-space stereo networks for cross-domain generalization. In 2020 International conference on 3D vision (3DV) (pp. 364–373).

  • Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary robust independent elementary features. In European Conference on computer vision-ECCV 2010 (pp. 778–792).

  • Chang, J.-R., & Chen, Y.-S. (2018). Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5410–5418).

  • Chang, T., Yang, X., Zhang, T., & Wang, M. (2023). Domain generalized stereo matching via hierarchical visual transformation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9559–9568).

  • Chen, Z., Sun, X., Wang, L., Yu, Y., & Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 972–980).

  • Chuah, W., Tennakoon, R., Hoseinnezhad, R., Bab-Hadiashar, A., & Suter, D. (2022). ITSA: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13012–13022).

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the KITTI vision benchmark suite. In Conference on computer vision and pattern recognition (CVPR) (pp. 3354–3361).

  • Gidaris, S., & Komodakis, N. (2017). Detect, replace, refine: Deep structured prediction for pixel wise labeling. In2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7187–7196).

  • Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3273–3282).

  • Hallek, M., Boukamcha, H., Mtibaa, A., & Atri, M. (2022). Dynamic programming with adaptive and self-adjusting penalty for real-time accurate stereo matching. Journal of Real-Time Image Processing, 19(2), 233–245.

    Article  Google Scholar 

  • He, S., Li, S., Jiang, S., & Jiang, W. (2022). HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of highresolution satellite stereo images. ISPRS Journal of Photogrammetry and Remote Sensing, 188, 314–330.

    Article  Google Scholar 

  • Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.

    Article  Google Scholar 

  • Hirschmuller, H., & Scharstein, D. (2007). Evaluation of cost functions for stereo matching. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., & Liu, W. (2018). Leftright comparative recurrent model for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3838–3846).

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 66–75).

  • Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., & Izadi, S. (2018). StereoNet: Guided hierarchical refinement for real-time edge-aware depth prediction. In Proceedings of the European conference on computer vision (ECCV) (pp. 573–590).

  • Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In 18th International conference on pattern recognition (ICPR’06) (Vol. 3, pp. 15–18).

  • Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. In Proceedings Eighth IEEE international conference on computer vision-ICCV 2001 (Vol. 2, pp. 508–515).

  • Laga, H., Jospin, L. V., Boussaid, F., & Bennamoun, M. (2022). A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4), 1738–1764.

    Article  Google Scholar 

  • Li, S., He, S., Jiang, S., Jiang, W., & Zhang, L. (2023). WHU-stereo: A challenging benchmark for stereo matching of high-resolution satellite images. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–14.

    Google Scholar 

  • Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., et al. (2018). Learning for disparity estimation through feature constancy. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2811–2820).

  • Liu, B., Yu, H., & Qi, G. (2022). GraftNet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13012–13021).

  • Liu, J., & Ji, S. (2020). A novel recurrent encoder-decoder structure for largescale multi-view stereo reconstruction from an open aerial dataset. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6049–6058).

  • Liu, R., Yang, C., Sun, W.,Wang, X., & Li, H. (2020). StereoGAN: Bridging synthetic-toreal domain gap by joint optimization of domain translation and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 12757–12766).

  • Lu, C., Uchiyama, H., Thomas, D., Shimada, A., & Taniguchi, R.-I. (2018). Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 1844:1-1844:12.

    Article  Google Scholar 

  • Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5695–5703).

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4040–4048).

  • Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., & Zhang, X. (2011). On building an accurate stereo matching system on graphics hardware. In 2011 IEEE international conference on computer vision workshops (pp. 467–474).

  • Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In Conference on computer vision and pattern recognition (CVPR) (pp. 3061–3070).

  • Pang, J., Sun, W., Ren, J. S., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops (pp. 887– 895).

  • Pang, J., Sun, W., Yang, C., Ren, J., Xiao, R., Zeng, J., & Lin, L. (2018). Zoom and learn: Generalizing deep stereo matching to novel domains. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2070–2079).

  • Poggi, M., Tosi, F., Batsos, K., Mordohai, P., & Mattoccia, S. (2022). On the synergies between machine learning and binocular stereo for depth estimation from images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5314–5334.

    Google Scholar 

  • Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition (pp. 31–42).

  • Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multicamera videos. In IEEE/CVF conference on computer vision and pattern recognition (CVPR).

  • Seki, A., & Pollefeys, M. (2017). SGM-Nets: Semiglobal matching with neural networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6640– 6649).

  • Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4641–4650).

  • Shen, Z., Dai, Y., & Rao, Z. (2021). CFNet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13906–13915).

  • Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., & Shi, J. (2021). AdaStereo: A simple and efficient approach for adaptive stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 10328–10337).

  • Song, X., Zhao, X., Fang, L., Hu, H., & Yu, Y. (2020). EdgeStereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision, 128(4), 910–930.

  • Vassiliadis, S., Hakkennes, E., Wong, J., & Pechanek, G. (1998). The sum-absolutedifference motion estimation accelerator. In Proceedings of the 24th conference on EUROMICRO (Vol. 2, pp. 559–566).

  • Xu, H., & Zhang, J. (2020). AANet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1959–1968).

  • Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). SegStereo: Exploiting semantic information for disparity estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 660–676).

  • Yu, L., Wang, Y., Wu, Y., & Jia, Y. (2018). Deep stereo matching with explicit cost aggregation sub-architecture. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, pp. 7517–7524).

  • Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In European conference on computer vision-ECCV ’94 (pp. 151–158).

  • Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1592– 1599).

  • Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H. (2019). GA-Net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).

  • Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., & Torr, P. (2020). Domain-invariant stereo matching networks. In European conference on computer vision-ECCV 2020 (pp. 420–439).

  • Zhang, J., Wang, X., Bai, X., Wang, C., Huang, L., Chen, Y., et al. (2022). Revisiting domain generalized stereo matching networks from a feature consistency perspective. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13001–13011).

Download references

Acknowledgements

This research is funded by the Digital Finance CRC supported by the Cooperative Research Centres program, an Australian Government initiative, and the National Natural Science Foundation of China (No. 62306223). We thank the anonymous reviewers and the editor for their constructive comments to further improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingzhe Wang.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Communicated by Bernhard Egger.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, R., Wang, M., Li, Z. et al. Few-Shot Stereo Matching with High Domain Adaptability Based on Adaptive Recursive Network. Int J Comput Vis 132, 1484–1501 (2024). https://doi.org/10.1007/s11263-023-01953-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01953-0

Keywords

Navigation