Skip to main content

CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

Abstract

Deep learning based on convolutional neural network (CNN) has been successfully applied to stereo matching as it can accelerate the training process and improve the matching accuracy. However, the existing stereo matching framework based on CNN often has two problems. The first problem is the generalization ability of training model. Stereo matching frameworks are usually pre-trained on a large synthetic Scene Flow dataset and then fine-tuned on evaluation dataset. However, the evaluation dataset may contain trivial training data or even do not have disparity label for some specified tasks. This adversely affects the generality of the training model. The second problem is the poor matching performance in ill-posed regions. It is difficult to distinguish the ill-posed regions, including weak texture area, repeated texture area, occlusion area, reflection structure, and fine structure, etc. To ameliorate the aforementioned problems, we propose the cost volume enhancement network (CVE-Net) guided by sparse features for stereo matching. CVE-Net use the edge information and saliency information for sparsely sampling the precise disparity labels during training. Furthermore, we enhance the cost volume by leveraging the precise disparity sparse label information to guide the direction of training. The experiment shows that the generalization ability is significantly improved. The domain-transferring problem on the new dataset is significantly alleviated. In addition, introducing the sparse multiple semantic features improves the matching performance in the ill-posed regions. Even without fine-tuning, the matching requirements can be met. These results demonstrate the effectiveness of the CVE-Net.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo.

  2. 2.

    http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo.

References

  1. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698

    Article  Google Scholar 

  2. Chang J-R, Chen Y-S (2018) Pyramid stereo matching network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5418

  3. Gadekallu TR, Alazab M, Kaluri R, Maddikunta P, Parimala M (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex & Intelligent Systems, no. 6

  4. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361

  5. Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3273–3282

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  7. He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3837

  8. Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341

    Article  Google Scholar 

  9. Huang G, Gong Y, Xu Q, Wattanachote K, Zeng K, Luo X (2020) A convolutional attention residual network for stereo matching. IEEE Access 8:50828–50842

    Article  Google Scholar 

  10. Jie Z, Wang P, Ling Y, Zhao B, Wei Y, Feng J, Liu W (2018) Left-right comparative recurrent model for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3838–3846

  11. Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision, pp 66–75

  12. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  13. Knobelreiter P, Reinbacher C, Shekhovtsov A, Pock T (2017) End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2339–2348

  14. Liang Z, Feng Y, Guo Y, Liu H, Chen W, Qiao L, Zhou L, Zhang J (2018) Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2811–2820

  15. Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928

    Google Scholar 

  16. Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R-I (2018) Sparse cost volume for efficient stereo matching. Remote Sens 10(11):1844

    Article  Google Scholar 

  17. Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4040–4048

  18. Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061–3070

  19. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    MathSciNet  Article  Google Scholar 

  20. Pang J, Sun W, Ren JS, Yang C, Yan Q (2017) Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE international conference on computer vision workshops, pp 887–895

  21. Ren Y, Xie X, Li G, Wang Z (2018) A scan-line forest growing-based hand segmentation framework with multipriority vertex stereo matching for wearable devices. IEEE Trans Cybern 48(2):556–570

    Article  Google Scholar 

  22. Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42

    Article  Google Scholar 

  23. Seki A, Pollefeys M (2016) Patch based confidence prediction for dense disparity map. BMVC 2(3):4

    Google Scholar 

  24. Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 231–240

  25. Smolyanskiy N, Kamenev A, Birchfield S (2018) On the importance of stereo for accurate depth estimation: an efficient semi-supervised deep neural network approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1007–1015

  26. Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. Asian conference on computer vision. Springer, Berlin, pp 20–35

    Google Scholar 

  27. Srivastava G, Reddy PK, Gadekallu TR, Siva SG, Ashokkumar P (2020) A two stage text feature selection algorithm for improving text classification

  28. Tulyakov S, Ivanov A, Fleuret F (2018) Practical deep stereo (PDS): toward applications-friendly deep stereo matching. In: Advances in neural information processing systems, pp 5871–5881

  29. Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. In: IEEE Transactions on Cybernetics, pp 1–14

  30. Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  31. Wu Z, Su L, Huang Q (2019a) Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916

  32. Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE International conference on computer vision, pp 7484–7493

  33. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  34. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  35. Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 636–651

  36. Yang G, Manela J, Happold M, Ramanan D (2019) Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5515–5524

  37. Žbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318

    MATH  Google Scholar 

  38. Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching. IEEE Trans Image Process 27(2):822–836

    MathSciNet  Article  Google Scholar 

  39. Zhang Y, Chen Y, Bai X, Zhou J, Yu K, Li Z, Yang K (2019) Adaptive unimodal cost volume filtering for deep stereo matching. arXiv preprint arXiv:1909.03751

  40. Zhang F, Prisacariu V, Yang R, Torr PH (2019) Ga-net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 185–194

Download references

Acknowledgements

The work was supported by Guangdong Basic and Applied Basic Research Foundation Grant No. 2019A1515011078, and Guangzhou Scientific and Technological Plan Project No. 201904010228.

Author information

Affiliations

Authors

Contributions

All authors contributed to the research, experiment and manuscript. Huang Guangyi and Gong Yongyi were responsible for the design of the algorithm and the preparation of the experiment. The experiment and related discussion were performed by Qingzhen Xu, Shuang Liu, Guangyi Huang, Kun Zeng, Yongyi Gong and Xiaonan Luo. Qingzhen Xu, Shuang Liu and Guangyi Huang wrote the manuscript. Kun Zeng, Yongyi Gong and Xiaonan Luo were responsible for the final optimization. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yongyi Gong or Xiaonan Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, Q., Liu, S., Huang, G. et al. CVE-Net: cost volume enhanced network guided by sparse features for stereo matching. Soft Comput 25, 15183–15199 (2021). https://doi.org/10.1007/s00500-021-06257-4

Download citation

Keywords

  • Attention module
  • Cost volume
  • Sparse feature
  • Stereo matching