Abstract
At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correction network based on DispNet, which can refine the initial disparity image by combining the process output results in the trunk network. The method proposed in this paper is named ETE–DC–CNN (end to end–disparity correction–convolutional neural network). In the trunk network, the mixed dilated convolution module is employed to extract the features of the original image, and the feature image is constructed into a 3D disparity space volume through depth-separable convolution, after which the initial disparity image can be obtained through 2D encoder–decoder module. In the correction network, the gradient information of the initial feature image is combined to reconstruct the modified matching cost body guided by the initial disparity image and then a smaller encoder–decoder module is trained and processed to make the final result at the sub-pixel level. The algorithm in this paper has been verified in SceneFlow and KITTI 2012/2015 data sets. The method proposed in this article achieves 3.46, 2.12, 1.19, and 0.6 results in the non-occluded area and 4.04, 2.55, 1.58, and 0.7 results in the global area on the Er (> 2 px), Er (> 3 px), Er (> 5 px), and Epe indicators in the KITTI 2012 test set, which are higher than those of the backbone network DispNet.
Similar content being viewed by others
References
Ma, N.; Men, Y.; Men, C., et al.: Segmentation-based stereo matching using combinatorial similarity measurement and adaptive support region. Optik 137, 124–134 (2017)
Shahbazi, M.; Sohn, G.; Théau, J.: High-density stereo image matching using intrinsic curves. ISPRS J. Photogramm. Remote. Sens. 146, 373–388 (2018)
Tan, X.; Sun, C.; Pham, T.D.: Stereo matching based on multi-direction polynomial model. Signal Process. Image Commun. 44, 44–56 (2016)
Hong, G.-S.; Kim, B.-G.: A local stereo matching algorithm based on weighted guided image filtering for improving the generation of depth range images. Displays 49, 80–87 (2017)
Zeglazi, O.; Rziza, M.; Amine, A., et al.: A hierarchical stereo matching algorithm based on adaptive support region aggregation method. Pattern Recogn. Lett. 112, 205–211 (2018)
Hamzah, R.A.; Ibrahim, H.; Hassan, A.H.A.: Stereo matching algorithm for 3D surface reconstruction based on triangulation principle. In: 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 119–124 (2016)
Li, Y.; Zheng, S.; Wang, X., et al.: An efficient photogrammetric stereo matching method for high-resolution images. Comput. Geosci. 97, 58–66 (2016)
Li, Y.; Yang, C.; Zhong, W.; et al.: High throughput hardware architecture for accurate semi-global matching. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 641–646 (2017)
Žbontar, J.; Lecun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)
Žbontar, J.; Lecun, Y.: Computing the stereo matching cost with a convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1592–1599 (2015)
Yan, T.; Gan, Y.; Xia, Z., et al.: Segment-based disparity refinement with occlusion handling for stereo matching. IEEE Trans. Image Process. 28(8), 3885–3897 (2019)
Chen, L.; Fan, L.; Chen, J., et al.: A full density stereo matching system based on the combination of CNNs and slanted-planes. IEEE Trans. Syst. Man Cybern. Syst. 50(2), 397–408 (2020)
Dosovitskiy, A.; Fischer, P.; Ilg, E.; et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015)
Mayer, N.; Ilg, E.; Häusser, P.; et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)
Kendall, A.; Martirosyan, H.; Dasgupta, S.; et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)
Chang, J.R.; Chen, Y.S.: Pyramid stereo matching network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Zhao, H.; Shi, J.; Qi, X.; et al.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)
Zhang, F.; Prisacariu, V; Yang, R.; et al. GA-Net: guided aggregation net for end-to-end stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)
Xu, H.; Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1965 (2020)
Gidaris, S.; Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7187–7196 (2017)
Pang, J.; Sun, W.; Ren, J.S.; et al.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 878–886 (2017)
Nguyen, T.P.; Jeon, J.W.: Wide context learning network for stereo matching. Signal Process. Image Commun. 78, 263–273 (2019)
Stucker, C.; Schindler, K.: ResDepth: learned residual stereo reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 707–716 (2020)
Zhang, T.; Zhang, X.; Ke, X., et al.: HOG-ShipCLSNet: a novel deep learning network with HOG feature fusion for SAR ship classification. IEEE Trans. Geosci. Remote Sens. (2020). https://doi.org/10.1109/TGRS.2021.3082759
Zhang, T.W.; Zhang, X.L.: A mask attention interaction and scale enhancement network for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
Xu, X.; Zhang, X.; Shao, Z., et al.: A group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment for SAR ship detection. Remote Sens. 14(20), 5276 (2022)
Xu, X.W.; Zhang, X.L.; Zhang, T.W.: Lite-YOLOv5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 SAR images. Remote Sens. 14(4), 1018 (2022)
Zhang, T.W.; Zhang, X.L.: HTC+ for SAR ship instance segmentation. Remote Sens. 14(10), 2395 (2022)
Zhang, T.; Zhang, X.: A full-level context squeeze-and-excitation ROI extractor for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 19, 4506705 (2022)
Mei, X.; Sun, X.; Zhou, M.; et al.: On building an accurate stereo matching system on graphics hardware. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 467–474 (2011)
Geiger, A.; Lenz, P.; Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Menze, M.; Geiger, A.: Object scene flow for autonomous vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)
Kingma, D.; Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Guney, F.; Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4165–4175 (2015)
Batsos, K.; Mordohai, P.: Recresnet: a recurrent residual CNN architecture for disparity map enhancement. In: International Conference on 3D Vision (3DV), pp. 238–247 (2018)
Zhong, Y.; Dai, Y.; Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017)
Acknowledgements
We appreciate the support of the National Key R&D Program of China (No. 2022YFC2803903) and the Key R&D Program of Zhejiang Province (No. 2021C03013).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, Z., Liu, M., Guo, J. et al. End-to-End Deep Learning Method with Disparity Correction for Stereo Matching. Arab J Sci Eng 49, 3331–3345 (2024). https://doi.org/10.1007/s13369-023-07985-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-07985-5