Abstract
Stereo matching is the process of generating dense correspondences in stereo images in order to create a disparity map for depth perception. Stereo matching is different from flow estimation task due to stereo rectification, which ensures that correspondences are always co-linear in a pair of stereo images. Stereo vision has become increasingly popular in mobile devices, such as autonomous cars and unmanned aerial vehicles, thanks to recent advances in full-feature embedded microcomputers. However, due to limited computing resources, there is a growing need for stereo matching algorithms that strike a balance between disparity estimation accuracy and efficiency. Challenges in this field include the lack of disparity ground truth, domain adaptation, and intractable areas such as occlusions. This chapter covers the fundamentals of stereopsis, including the perspective camera model and epipolar geometry, and reviews the most advanced stereo matching algorithms. It also explores disparity confidence measures, disparity estimation evaluation metrics, and publicly available datasets and benchmarks, before summarizing the outstanding challenges in this field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhou K et al (2020) Review of stereo matching algorithms based on deep learning. Comput Intell Neurosci 2020
Fan R et al (2022) Learning collision-free space detection from stereo images: homography matrix brings better data augmentation. IEEE/ASME Trans Mechatron 27(1):225–233
Wang H et al (2022) UnDAF: a general unsupervised domain adaptation framework for disparity or optical flow estimation. In: 2022 international conference on robotics and automation (ICRA). IEEE, pp 01–07
Ozgunalp U et al (2017) Multiple lane detection algorithm based on novel dense vanishing point estimation. IEEE Trans Intell Transp Syst 18(3):621–632
Duan R et al (2022) Stereo orientation prior for uav robust and accurate visual odometry. IEEE/ASME Trans Mechatron 27(5):3440–3450
Fan R et al (2020) Pothole detection based on disparity transformation and road surface modeling. IEEE Trans Image Process 29:897–908
Ma N et al (2022) Computer vision for road imaging and pothole detection: a state-of-the-art review of systems and algorithms. Transp Safety Environ 4(4):tdac026
Fan R et al (2021) Graph attention layer evolves semantic segmentation for road pothole detection: a benchmark and algorithms. IEEE Trans Image Process 30:8144–8154
Fan R, Liu M (2020) Road damage detection based on unsupervised disparity map segmentation. IEEE Trans Intell Transp Syst 21(11):4906–4911
Sicen G et al (2023) Road environment perception for safe and comfortable driving. Springer submitted for publication
Wang H et al (2022) Dynamic fusion module evolves drivable area and road anomaly detection: a benchmark and algorithms. IEEE Trans Cybern 52(10):10 750–10 760
Fan R et al (2020) SNE-RoadSeg: incorporating surface normal information into semantic segmentation for accurate freespace detection. In: Computer vision-ECCV (2020) 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. Springer, pp 340–356
Shean DE et al (2016) An automated, open-source pipeline for mass production of digital elevation models (DEMs) from very-high-resolution commercial stereo satellite imagery. ISPRS J Photogramm Remote Sens 116:101–117
Menze M et al (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3061–3070
Scharstein D et al (2014) High-resolution stereo datasets with subpixel-accurate ground truth. In: German conference on pattern recognition (GVPR). Springer, pp 31–42
Schops T et al (2017) A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3260–3269
Mayer N et al (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4040–4048
Hamid MS et al (2022) Stereo matching algorithm based on deep learning: a survey. J King Saud Univ-Comput Inf Sci 34(5):1663–1673
Bendig K et al (2022) Self-superflow: self-supervised scene flow prediction in stereo sequences. In: Proceedings of the IEEE international conference on image processing (ICIP). IEEE, pp 481–485
Lee M-J et al (2022) Refinement of inverse depth plane in textureless and occluded regions in a multiview stereo matching scheme. J Sens 2022
Gidaris S et al (2018) Unsupervised representation learning by predicting image rotations. arXiv:1803.07728
Trucco E et al (1998) Introductory techniques for 3-D computer vision, vol 201 . Prentice Hall Englewood Cliffs
Fan R et al (2023) Computer stereo vision for autonomous driving: theory and algorithms. In: Recent advances in computer vision applications using parallel processing. Springer, pp 41–70
Loop C et al (1999) Computing rectifying homographies for stereo vision. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 1. IEEE, pp 125–131
Tippetts B et al (2016) Review of stereo vision algorithms and their suitability for resource-limited systems. J Real-Time Image Proc 11(1):5–25
Ding J et al (2021) High-accuracy recognition and localization of moving targets in an indoor environment using binocular stereo vision. ISPRS Int J Geo Inf 10(4):234
Fan R et al (2018) Road surface 3D reconstruction based on dense subpixel disparity map estimation. IEEE Trans Image Process 27(6):3025–3035
Luo W et al (2016) Efficient deep learning for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5695–5703
Hamzah RA et al (2016) Literature survey on stereo vision disparity map algorithms. J Sens 2016
Yamaguchi K et al (2012) Continuous markov random fields for robust stereo estimation. In: European conference on computer vision (ECCV). Springer, pp 45–58
Boykov Y et al (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Brown MZ et al (2003) Advances in computational stereo. IEEE Trans Pattern Anal Mach Intell 25(8):993–1008
Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Seki A et al (2017) SGM-Nets: semi-global matching with neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 231–240
Spangenberg R et al (2013) Weighted semi-global matching and center-symmetric census transform for robust driver assistance. In: International conference on computer analysis of images and patterns (CAIP). Springer, pp 34–41
Žbontar J et al (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318
Hirschmuller H et al (2008) Evaluation of stereo matching costs on images with radiometric differences. IEEE Trans Pattern Anal Mach Intell 31(9):1582–1599
Scharstein D et al (2003) High-accuracy stereo depth maps using structured light. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 1. IEEE, pp 195–202
Yang Q et al (2008) Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE Trans Pattern Anal Mach Intell 31(3):492–504
Fan R et al (2019) Real-time dense stereo embedded in a UAV for road inspection. In: IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 535–543
Mattoccia S (2011) Stereo vision: algorithms and applications, vol 22. University of Bologna
Scharstein D et al (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vision 47(1):7–42
Zbontar J et al (2015) Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1592–1599
Chang J-R et al (2018) Pyramid stereo matching network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5410–5418
Wang H et al (2021) PVStereo: pyramid voting module for end-to-end self-supervised stereo matching. IEEE Robot Autom Lett 6(3):4353–4360
Zhong Y et al (2017) Self-supervised learning for stereo matching with self-improving ability. arXiv:1709.00930
Wang H et al (2021) Co-teaching: an ark to unsupervised stereo matching. In: Proceedings of the IEEE international conference on image processing (ICIP). IEEE, pp 3328–3332
Zhang F et al (2019) GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 185–194
Kendall A et al (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 66–75
Guo X et al (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3273–3282
Cheng X et al (2020) Hierarchical neural architecture search for deep stereo matching. Adv Neural Inf Process Syst 33:22 158–22 169
Xu H et al (2020) AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1959–1968
Teed Z et al (2020) RAFT: recurrent all-pairs field transforms for optical flow. In: European conference on computer vision (ECCV). Springer, pp 402–419
Lipson L et al (2021) RAFT-stereo: multilevel recurrent field transforms for stereo matching. In: Proceedings of the IEEE international conference on 3D vision (3DV). IEEE, pp 218–227
Li J et al (2022) Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16 263–16 272
Zhu X et al (2019) Deformable ConvNets V2: more deformable, better results. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9308–9316
Wang Y et al (2019) Anytime stereo image depth estimation on mobile devices. In: Proceedings of the IEEE international conference on robotics and automation (ICRA). IEEE, pp 5893–5900
Yee K et al (2020) Fast deep stereo with 2D convolutional processing of cost signatures. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 183–191
Tankovich V et al (2021) HitNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14 362–14 372
Chen S et al (2023) Feature enhancement network for stereo matching. Image Vis Comput 131:104614
Wang Z et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Zhou C et al (2017) Unsupervised learning of stereo matching. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1567–1575
Li A et al (2018) Occlusion aware stereo matching via cooperative unsupervised learning. In: Asian conference on computer vision (ACCV). Springer, pp 197–213
Joung S et al (2019) Unsupervised stereo matching using confidential correspondence consistency. IEEE Trans Intell Transp Syst 21(5):2190–2203
Liu P et al (2020) Flow2Stereo: effective self-supervised learning of optical flow and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6648–6657
Liu L et al (2020) Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6489–6498
Khamis S et al (2018) StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: European conference on computer vision (ECCV). Springer, pp 573–590
Fan X et al (2022) Occlusion-aware self-supervised stereo matching with confidence guided raw disparity fusion. In: Proceedings of the conference on robots and vision (CRV). IEEE, pp 132–139
Poggi M et al (2021) On the confidence of stereo matching in a deep-learning era: a quantitative evaluation. arXiv:2101.00431
Egnal G et al (2004) A stereo confidence metric using single view imagery with comparison to five alternative approaches. Image Vis Comput 22(12):943–957
Haeusler R et al (2013) Ensemble learning for confidence measures in stereo vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 305–312
Matthies L (1992) Stereo vision for planetary rovers: stochastic modeling to near real-time implementation. Int J Comput Vision 8(1):71–91
Spyropoulos A et al (2014) Learning to detect ground control points for improving the accuracy of stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1621–1628
Ding L et al (2001) On the canny edge detector. Pattern Recogn 34(3):721–725
Egnal G et al (2002) Detecting binocular half-occlusions: empirical comparisons of five approaches. IEEE Trans Pattern Anal Mach Intell 24(8):1127–1133
Hu X et al (2012) A quantitative evaluation of confidence measures for stereo vision. IEEE Trans Pattern Anal Mach Intell 34(11):2121–2133
Barron JL et al (1994) Performance of optical flow techniques. Int J Comput Vision 12:43–77
Szeliski R (1999) Prediction error as a quality metric for motion and stereo. In: Proceedings of the IEEE international conference on computer vision (ICCV), vol 2. IEEE, pp 781–788
Baker S et al (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31
Kalarot R et al (2011) Analysis of real-time stereo vision algorithms on gpu. In: International conference on image and vision computing New Zealand (IVCNZ), p 1
Geiger A et al (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Butler DJ et al (2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision (ECCV). Springer, pp 611–625
Silberman N et al (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision (ECCV). Springer, pp 746–760
Gaidon A et al (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4340–4349
Cabon Y et al (2020) Virtual KITTI 2. arXiv:2001.10773
Tremblay J et al (2018) Falling things: a synthetic dataset for 3d object detection and pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2038–2041
Yang G et al (2019) DrivingStereo: a large-scale dataset for stereo matching in autonomous driving scenarios. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 899–908
Hirschmuller H et al (2007) Evaluation of cost functions for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8
Song X et al (2021) AdaStereo: a simple and efficient approach for adaptive stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10 328–10 337
Jingwei Y et al (2023) Semantic segmentation for autonomous driving. Springer submitted for publication
Wang H et al (2021) SCV-Stereo: learning stereo matching from a sparse cost volume. In: Proceedings of the IEEE international conference on image processing (ICIP). IEEE, pp 3203–3207
Dosovitskiy A et al (2015) FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 2758–2766
Kuznietsov Y et al (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6647–6655
Zhang F et al (2020) Domain-invariant stereo matching networks. In: European conference on computer vision (ECCV) 2020. Springer, pp 420–439
Yang G et al (2019) Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5515–5524
Zhang J et al (2022) Revisiting domain generalized stereo matching networks from a feature consistency perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13 001–13 011
Watson J et al (2021) The temporal opportunist: self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1164–1174
Shu C et al (2020) Feature-metric loss for self-supervised learning of depth and egomotion. In: European conference on computer vision (ECCV). Springer, pp 572–588
Godard C et al (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 3828–3838
Kim S et al (2020) Adversarial confidence estimation networks for robust stereo matching. IEEE Trans Intell Transp Syst 22(11):6875–6889
Wu C-Y et al (2022) Toward practical monocular indoor depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3814–3824
Kanopoulos N et al (1988) Design of an image edge detection filter using the sobel operator. IEEE J Solid-State Circuits 23(2):358–367
Liu H et al (2021) Pseudo supervised monocular depth estimation with teacher-student network. arXiv:2110.11545
Hartley R et al (2003) Multiple view geometry in computer vision. Cambridge University Press
Acknowledgements
This work was supported by the National Key R &D Program of China under Grant 2020AAA0108100, the National Natural Science Foundation of China under Grant 62233013, and the Science and Technology Commission of Shanghai Municipal under Grant 22511104500.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Lie Group SO(3)
In the three-dimensional space, the coordinates of a 3D point in two coordinate systems are \(\boldsymbol{x}_{1}=[x_{1},y_{1},z_{1}]^\top \) and \(\boldsymbol{x}_{2}=[x_{2},y_{2},z_{2}]^\top \in \mathbb {R}^{3\times 1}\), respectively. \(\boldsymbol{x}_{1}\) can be transformed into \(\boldsymbol{x}_{2}\) using a rotation matrix \(\boldsymbol{R}\in \mathbb {R}^{3\times 3}\) and a translation vector \(\boldsymbol{t}\in \mathbb {R}^{3\times 1}\):
where \(\boldsymbol{R}\) satisfies orthogonality:
where \(\text {det}(\boldsymbol{R})\) represents the determinant of \(\boldsymbol{R}\). The subgroup of orthogonal matrices with \(\text {det}(\boldsymbol{R})=+1\) is referred to as a special orthogonal group and is denoted as SO(3).
Skew-Symmetric Matrix
In linear algebra, a skew-symmetric matrix \(\boldsymbol{A}\) satisfies the following property:
In 3D computer vision, the skew-symmetric matrix \([\boldsymbol{a}]_{\times }\) of a vector \(\boldsymbol{a}=[a_{1},a_{2},a_{3}]\) is defined as [104]:
The two properties of a skew-symmetric matrix are as follows:
where \(\boldsymbol{0}=[0,0,0]^\top \) is a zero vector. Furthermore, a skew-symmetric matrix can also represent cross product as matrix multiplication. Specifically, for two vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\), their cross product can be expressed as [104]:
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Liu, CW., Wang, H., Guo, S., Bocus, M.J., Chen, Q., Fan, R. (2023). Stereo Matching: Fundamentals, State-of-the-Art, and Existing Challenges. In: Fan, R., Guo, S., Bocus, M.J. (eds) Autonomous Driving Perception. Advances in Computer Vision and Pattern Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-99-4287-9_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-4287-9_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4286-2
Online ISBN: 978-981-99-4287-9
eBook Packages: Computer ScienceComputer Science (R0)