Skip to main content
Log in

End-to-End Deep Learning Method with Disparity Correction for Stereo Matching

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correction network based on DispNet, which can refine the initial disparity image by combining the process output results in the trunk network. The method proposed in this paper is named ETE–DC–CNN (end to end–disparity correction–convolutional neural network). In the trunk network, the mixed dilated convolution module is employed to extract the features of the original image, and the feature image is constructed into a 3D disparity space volume through depth-separable convolution, after which the initial disparity image can be obtained through 2D encoder–decoder module. In the correction network, the gradient information of the initial feature image is combined to reconstruct the modified matching cost body guided by the initial disparity image and then a smaller encoder–decoder module is trained and processed to make the final result at the sub-pixel level. The algorithm in this paper has been verified in SceneFlow and KITTI 2012/2015 data sets. The method proposed in this article achieves 3.46, 2.12, 1.19, and 0.6 results in the non-occluded area and 4.04, 2.55, 1.58, and 0.7 results in the global area on the Er (> 2 px), Er (> 3 px), Er (> 5 px), and Epe indicators in the KITTI 2012 test set, which are higher than those of the backbone network DispNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ma, N.; Men, Y.; Men, C., et al.: Segmentation-based stereo matching using combinatorial similarity measurement and adaptive support region. Optik 137, 124–134 (2017)

    Article  ADS  Google Scholar 

  2. Shahbazi, M.; Sohn, G.; Théau, J.: High-density stereo image matching using intrinsic curves. ISPRS J. Photogramm. Remote. Sens. 146, 373–388 (2018)

    Article  ADS  Google Scholar 

  3. Tan, X.; Sun, C.; Pham, T.D.: Stereo matching based on multi-direction polynomial model. Signal Process. Image Commun. 44, 44–56 (2016)

    Article  Google Scholar 

  4. Hong, G.-S.; Kim, B.-G.: A local stereo matching algorithm based on weighted guided image filtering for improving the generation of depth range images. Displays 49, 80–87 (2017)

    Article  Google Scholar 

  5. Zeglazi, O.; Rziza, M.; Amine, A., et al.: A hierarchical stereo matching algorithm based on adaptive support region aggregation method. Pattern Recogn. Lett. 112, 205–211 (2018)

    Article  ADS  Google Scholar 

  6. Hamzah, R.A.; Ibrahim, H.; Hassan, A.H.A.: Stereo matching algorithm for 3D surface reconstruction based on triangulation principle. In: 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 119–124 (2016)

  7. Li, Y.; Zheng, S.; Wang, X., et al.: An efficient photogrammetric stereo matching method for high-resolution images. Comput. Geosci. 97, 58–66 (2016)

    Article  ADS  Google Scholar 

  8. Li, Y.; Yang, C.; Zhong, W.; et al.: High throughput hardware architecture for accurate semi-global matching. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 641–646 (2017)

  9. Žbontar, J.; Lecun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)

    Google Scholar 

  10. Žbontar, J.; Lecun, Y.: Computing the stereo matching cost with a convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1592–1599 (2015)

  11. Yan, T.; Gan, Y.; Xia, Z., et al.: Segment-based disparity refinement with occlusion handling for stereo matching. IEEE Trans. Image Process. 28(8), 3885–3897 (2019)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  12. Chen, L.; Fan, L.; Chen, J., et al.: A full density stereo matching system based on the combination of CNNs and slanted-planes. IEEE Trans. Syst. Man Cybern. Syst. 50(2), 397–408 (2020)

    Article  MathSciNet  CAS  Google Scholar 

  13. Dosovitskiy, A.; Fischer, P.; Ilg, E.; et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015)

  14. Mayer, N.; Ilg, E.; Häusser, P.; et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)

  15. Kendall, A.; Martirosyan, H.; Dasgupta, S.; et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)

  16. Chang, J.R.; Chen, Y.S.: Pyramid stereo matching network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)

  17. Zhao, H.; Shi, J.; Qi, X.; et al.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017)

  18. Zhang, F.; Prisacariu, V; Yang, R.; et al. GA-Net: guided aggregation net for end-to-end stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)

  19. Xu, H.; Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1965 (2020)

  20. Gidaris, S.; Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7187–7196 (2017)

  21. Pang, J.; Sun, W.; Ren, J.S.; et al.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 878–886 (2017)

  22. Nguyen, T.P.; Jeon, J.W.: Wide context learning network for stereo matching. Signal Process. Image Commun. 78, 263–273 (2019)

    Article  Google Scholar 

  23. Stucker, C.; Schindler, K.: ResDepth: learned residual stereo reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 707–716 (2020)

  24. Zhang, T.; Zhang, X.; Ke, X., et al.: HOG-ShipCLSNet: a novel deep learning network with HOG feature fusion for SAR ship classification. IEEE Trans. Geosci. Remote Sens. (2020). https://doi.org/10.1109/TGRS.2021.3082759

    Article  Google Scholar 

  25. Zhang, T.W.; Zhang, X.L.: A mask attention interaction and scale enhancement network for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)

    Google Scholar 

  26. Xu, X.; Zhang, X.; Shao, Z., et al.: A group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment for SAR ship detection. Remote Sens. 14(20), 5276 (2022)

    Article  ADS  Google Scholar 

  27. Xu, X.W.; Zhang, X.L.; Zhang, T.W.: Lite-YOLOv5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 SAR images. Remote Sens. 14(4), 1018 (2022)

    Article  ADS  Google Scholar 

  28. Zhang, T.W.; Zhang, X.L.: HTC+ for SAR ship instance segmentation. Remote Sens. 14(10), 2395 (2022)

    Article  ADS  MathSciNet  Google Scholar 

  29. Zhang, T.; Zhang, X.: A full-level context squeeze-and-excitation ROI extractor for SAR ship instance segmentation. IEEE Geosci. Remote Sens. Lett. 19, 4506705 (2022)

    Google Scholar 

  30. Mei, X.; Sun, X.; Zhou, M.; et al.: On building an accurate stereo matching system on graphics hardware. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 467–474 (2011)

  31. Geiger, A.; Lenz, P.; Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)

  32. Menze, M.; Geiger, A.: Object scene flow for autonomous vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)

  33. Kingma, D.; Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)

  34. Guney, F.; Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4165–4175 (2015)

  35. Batsos, K.; Mordohai, P.: Recresnet: a recurrent residual CNN architecture for disparity map enhancement. In: International Conference on 3D Vision (3DV), pp. 238–247 (2018)

  36. Zhong, Y.; Dai, Y.; Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017)

Download references

Acknowledgements

We appreciate the support of the National Key R&D Program of China (No. 2022YFC2803903) and the Key R&D Program of Zhejiang Province (No. 2021C03013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyu Zhou.

Ethics declarations

Conflict of interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Z., Liu, M., Guo, J. et al. End-to-End Deep Learning Method with Disparity Correction for Stereo Matching. Arab J Sci Eng 49, 3331–3345 (2024). https://doi.org/10.1007/s13369-023-07985-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-023-07985-5

Keywords

Navigation