SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Chong, Ai-Xin; Yin, Hui; Wan, Jin; Liu, Yan-Ting; Du, Qian-Qian

doi:10.1007/s10489-022-04003-3

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Published: 13 August 2022

Volume 53, pages 9978–9991, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ai-Xin Chong¹,
Hui Yin ORCID: orcid.org/0000-0002-4226-4368¹,
Jin Wan²,
Yan-Ting Liu¹ &
…
Qian-Qian Du²

534 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Although the recent stereo matching methods based on deep learning achieve unprecedented state-of-the-art performance, the accuracy of these approaches suffers a drastic drop when dealing with environments much different in context from those observed at training time. In this paper, we propose a novel Scene-Aware Network (SA-Net) that integrates scene information to achieve cross-domain stereo matching. Specifically, we design a Scene-Aware Module (SAM) to extract rich scene details, which can make the network with it have better generalization ability between different domain. In order to use rich scene information to perfectly guide shallow features to realize cost aggregation, we introduce a new Multi-element Feature Fusion Strategy (MFFS). Extensive quantitative and qualitative evaluations on different domain illustrate that our SA-Net achieves competitive performance and in particular obtains better ability of domain generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

Learning a Deep Convolutional Network for Image Super-Resolution

Data Availability

The datasets used in this study can be downloaded from https://lmb.informatik.uni-freiburg.de/index.php and http://www.cvlibs.net/datasets/kitti/index.php. Our code has been implemented using PyTorch and is available at https://github.com/cax515/SANet-main.

Notes

http://www.cvlibs.net/datasets/kitti/

References

Shaked A, Wolf L (2017) Improved stereo matching with constant highway networks and reflective confidence learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4641–4650
žbontar J, Lecun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318
MATH Google Scholar
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4040–4048
Chang J, Chen Y (2018) Pyramid stereo matching network. In: 2018 The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5410–5418
Guo FX, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3282
Xu H, Zhang J (2020) AANet: adaptive aggregation network for efficient stereo matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1959–1968
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp 66–75
Zhang F, Prisacariu V, Yang R, Torr P (2019) GA-Net: guided aggregation net for end-to-end stereo matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 185–194
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp 483–499
Chong Y, Peng C, Zhang C, Wang Y, Feng W, Pan S (2021) Learning domain invariant and specific representation for cross-domain person re-identification. Applied Intelligence 51(8):5219–5232
Article Google Scholar
Pang Z, Guo J, Sun W, Xiao Y, Yu M (2021) Cross-domain person re-identification by hybrid supervised and unsupervised learning. Applied Intelligence 52(3):2987–3001
Article Google Scholar
Guo Y, Peng Y, Zhang B (2021) CAFR-CNN: coarse-to-fine adaptive faster R-CNN for cross-domain joint optic disc and cup segmentation. Applied Intelligence 51(8):5701–5725
Article Google Scholar
Shi H, Huang C, Zhang X, Zhao J, Li S (2022) Wasserstein distance based multi-scale adversarial domain adaptation method for remaining useful life prediction. Appl Intell, 1–16
Guo X, Li H, Yi S, Ren J, Wang X (2018) Learning monocular depth by distilling cross-domain stereo networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 484–500
Pang J, Sun W, Yang C, Ren J, Xiao R, Zeng J, Lin L (2018) Zoom and learn: generalizing deep stereo matching to novel domains. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2070–2079
Tonioni A, Rahnama O, Joy T, Stefano L, Ajanthan T, Torr P (2019) Learning to adapt for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9661–9670
Zhang F., Qi X, Yang R, Prisacariu V, Wah B, Torr P (2020) Domain-invariant stereo matching networks. In: European conference on computer vision, pp 420–439
Mousavian A, Pirsiavash H, Košecká J (2016) Joint semantic segmentation and depth estimation with deep convolutional networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp 611–619
Zama Ramirez P, Poggi M, Tosi F, Mattoccia S, Di Stefano L (2018) Geometry meets semantics for semi-supervised monocular depth estimation. In: Asian Conference on Computer Vision. Springer, pp 298–313
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille A (2015) Towards unified depth and semantic prediction from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2800–2809
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: exploiting semantic information for disparity estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 636–651
Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Semantic stereo matching with pyramid cost volumes. In: International Conference on Computer Vision (ICCV), pp 7484–7493
Zhang J, Skinner K, Vasudevan R, Johnson-Roberson M (2019) DispSegNet: Leveraging semantics for end-to-end learning of disparity estimation from stereo imagery. IEEE Robotics and Automation Letters 4:1162–1169
Article Google Scholar
Zhang Y, Chen Y, Bai X, Yu S, Yu K, Li Z, Yang K (2019) Adaptive unimodal cost volume filtering for deep stereo matching. The IEEE conference on computer vision and pattern recognition (CVPR) 34(7):12926–12934
Google Scholar
Song X, Zhao X, Hu H, Fang L (2018) EdgeStereo: A context integrated residual pyramid network for stereo matching. In: Asian conference on computer vision, pp 20–35
Luo W, Schwing AG, Urtasun R (2016) Efficient deep learning for stereo matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5695–5703
Gidaris S, Komodakis N (2017) Detect, Replace, Refine: Deep structured prediction for pixel wise labeling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5248–5257
Chen S, Zhang J, Jin M (2021) A simplified ICA-based local similarity stereo matching. Vis Comput 37(2):411–419
Article Google Scholar
Li X, Fan Y, Lv G, Ma H (2021) Area-based correlation and non-local attention network for stereo matching. In: The Visual Computer, pp 1–15
Tankovich V, Hane C, Zhang Y, Kowdle A, Fanello S, Bouaziz S (2021) Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14362–14372
Shen Z, Dai Y, Rao Z (2021) Cfnet: Cascade and fused cost volume for robust stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13906–13915
Chen PY, Liu A, Liu Y, Wang Y (2019) Towards scene understanding: unsupervised monocular depth estimation with semantic aware representation. In: 2019 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2624– 2632
Dovesi PL, Poggi M, Andraghetti L, Martí M, Kjellström H, Pieropan A, Mattoccia S (2020) Real-Time Semantic stereo matching. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 10780–10787
He K, et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3354–3361
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3061–3070
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M et al (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Rao Z, He M, Dai Y, Shen Z (2020) Patch attention network with generative adversarial model for semi-supervised binocular disparity prediction. In: The Visual Computer, pp 1–17
Zhang Y, Li Y, Kong Y, Liu B (2020) Attention aggregation encoder-decoder network framework for stereo matching. IEEE Signal Processing Letters 27:760–764
Article Google Scholar
Li Z, Liu X, Drenkow N, Ding A, Creighton F, Taylor R, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6197–6206

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (2020YJS029), National Nature Science Foundation of China (51827813, 61472029) and R&D Program of Beijing Municipal Education commission (KJZD20191000402).

Author information

Authors and Affiliations

Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
Ai-Xin Chong, Hui Yin & Yan-Ting Liu
Key Laboratory of Beijing for Railway Engineering, Beijing Jiaotong University, Beijing, 100044, China
Jin Wan & Qian-Qian Du

Authors

Ai-Xin Chong
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jin Wan
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qian-Qian Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chong, AX., Yin, H., Wan, J. et al. SA-Net: Scene-Aware Network for Cross-domain Stereo Matching. Appl Intell 53, 9978–9991 (2023). https://doi.org/10.1007/s10489-022-04003-3

Download citation

Accepted: 13 July 2022
Published: 13 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-04003-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Learning a Deep Convolutional Network for Image Super-Resolution

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SA-Net: Scene-Aware Network for Cross-domain Stereo Matching

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Learning a Deep Convolutional Network for Image Super-Resolution

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation