Advertisement

Domain-Invariant Stereo Matching Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

State-of-the-art stereo matching networks have difficulties in generalizing to new unseen environments due to significant domain differences, such as color, illumination, contrast, and texture. In this paper, we aim at designing a domain-invariant stereo matching network (DSMNet) that generalizes well to unseen scenes. To achieve this goal, we propose i) a novel “domain normalization” approach that regularizes the distribution of learned representations to allow them to be invariant to domain differences, and ii) an end-to-end trainable structure-preserving graph-based filter for extracting robust structural and geometric representations that can further enhance domain-invariant generalizations. When trained on synthetic data and generalized to real test sets, our model performs significantly better than all state-of-the-art models. It even outperforms some deep neural network models (e.g. MC-CNN [61]) fine-tuned with test-domain data. The code is available at https://github.com/feihuzhang/DSMNet.

Notes

Acknowledgement

Research is supported by Baidu, the ERC grant ERC-2012-AdG 321162-HELIOS, EPSRC grant Seebibyte EP/M013774/1 and EPSRC/MURI grant EP/N019474/1. We would also like to acknowledge the Royal Academy of Engineering.

Supplementary material

504434_1_En_25_MOESM1_ESM.pdf (2.7 mb)
Supplementary material 1 (pdf 2714 KB)

References

  1. 1.
    Balaji, Y., Sankaranarayanan, S., Chellappa, R.: MetaReg: towards domain generalization using meta-regularization. In: Advances in Neural Information Processing Systems, pp. 998–1008 (2018)Google Scholar
  2. 2.
    Bleyer, M., Rhemann, C., Rother, C.: PatchMatch stereo-stereo matching with slanted support windows. In: British Machine Vision Conference (BMVC), pp. 1–11 (2011)Google Scholar
  3. 3.
    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3722–3731 (2017)Google Scholar
  4. 4.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_44CrossRefGoogle Scholar
  5. 5.
    Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)Google Scholar
  6. 6.
    Chen, X., Kang, S.B., Yang, J., Yu, J.: Fast patch-based denoising using approximated patch geodesic paths. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1211–1218 (2013)Google Scholar
  7. 7.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, pp. 2414–2422 (2016)Google Scholar
  8. 8.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)Google Scholar
  9. 9.
    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)
  10. 10.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)Google Scholar
  11. 11.
    Ghifary, M., Bastiaan Kleijn, W., Zhang, M., Balduzzi, D.: Domain generalization for object recognition with multi-task autoencoders. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2551–2559 (2015)Google Scholar
  12. 12.
    Guo, X., Li, H., Yi, S., Ren, J., Wang, X.: Learning monocular depth by distilling cross-domain stereo networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 484–500 (2018)Google Scholar
  13. 13.
    Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)Google Scholar
  14. 14.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRefGoogle Scholar
  15. 15.
    Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)CrossRefGoogle Scholar
  16. 16.
    Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 603–612 (2019)Google Scholar
  17. 17.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2462–2470 (2017)Google Scholar
  18. 18.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  19. 19.
    Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. CoRR, abs/1703.04309 (2017)
  20. 20.
    Khamis, S., Fanello, S.R., Rhemann, C., Kowdle, A., Valentin, J.P.C., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. CoRR, abs/1807.08865 (2018)
  21. 21.
    Li, A., Yuan, Z.: Occlusion aware stereo matching via cooperative unsupervised learning. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 197–213. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-20876-9_13CrossRefGoogle Scholar
  22. 22.
    Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5542–5550 (2017)Google Scholar
  23. 23.
    Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Learning to generalize: meta-learning for domain generalization. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  24. 24.
    Li, H., Jialin Pan, S., Wang, S., Kot, A.C.: Domain generalization with adversarial feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5400–5409 (2018)Google Scholar
  25. 25.
    Li, Y., et al.: Deep domain generalization via conditional invariant adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 624–639 (2018)Google Scholar
  26. 26.
    Li, Y., Wang, N., Shi, J., Hou, X., Liu, J.: Adaptive batch normalization for practical domain adaptation. Pattern Recogn. 80, 109–117 (2018)CrossRefGoogle Scholar
  27. 27.
    Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779 (2016)
  28. 28.
    Liang, Z., et al.: Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2811–2820 (2018)Google Scholar
  29. 29.
    Liu, M.Y., Tuzel, O., Taguchi, Y.: Joint geodesic upsampling of depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 169–176 (2013)Google Scholar
  30. 30.
    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: Advances in Neural Information Processing Systems, pp. 1520–1530 (2017)Google Scholar
  31. 31.
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5695–5703 (2016)Google Scholar
  32. 32.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)Google Scholar
  33. 33.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)Google Scholar
  34. 34.
    Motiian, S., Piccirilli, M., Adjeroh, D.A., Doretto, G.: Unified deep supervised domain adaptation and generalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5715–5725 (2017)Google Scholar
  35. 35.
    Nam, H., Kim, H.E.: Batch-instance normalization for adaptively style-invariant neural networks. In: Advances in Neural Information Processing Systems, pp. 2558–2567 (2018)Google Scholar
  36. 36.
    Nie, G.Y., et al.: Multi-level context ultra-aggregation for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3283–3291 (2019)Google Scholar
  37. 37.
    Pan, X., Luo, P., Shi, J., Tang, X.: Two at once: enhancing learning and generalization capacities via IBN-Net. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 464–479 (2018)Google Scholar
  38. 38.
    Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE International Conference on Computer Vision Workshops (ICCVW) (2017)Google Scholar
  39. 39.
    Pang, J., et al.: Zoom and learn: generalizing deep stereo matching to novel domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2070–2079 (2018)Google Scholar
  40. 40.
    Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2337–2346 (2019)Google Scholar
  41. 41.
    Poggi, M., Pallotti, D., Tosi, F., Mattoccia, S.: Guided stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 979–988 (2019)Google Scholar
  42. 42.
    Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11752-2_3CrossRefGoogle Scholar
  43. 43.
    Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269 (2017)Google Scholar
  44. 44.
    Seki, A., Pollefeys, M.: SGM-Nets: semi-global matching with neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6640–6649 (2017)Google Scholar
  45. 45.
    Shi, L., Zhang, Y., Cheng, J., Lu, H.: Non-local graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1805.07694 (2018)
  46. 46.
    Song, L., et al.: Learnable tree filter for structure-preserving feature transform. In: Advances in Neural Information Processing Systems, pp. 1709–1719 (2019)Google Scholar
  47. 47.
    Song, X., Zhao, X., Fang, L., Hu, H.: EdgeStereo: an effective multi-task learning network for stereo matching and edge detection. arXiv preprint arXiv:1903.01700 (2019)
  48. 48.
    Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)Google Scholar
  49. 49.
    Tonioni, A., Poggi, M., Mattoccia, S., Di Stefano, L.: Unsupervised adaptation for deep stereo. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  50. 50.
    Tonioni, A., Rahnama, O., Joy, T., Stefano, L.D., Ajanthan, T., Torr, P.H.: Learning to adapt for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9661–9670 (2019)Google Scholar
  51. 51.
    Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019)Google Scholar
  52. 52.
    Tulyakov, S., Ivanov, A., Fleuret, F.: Weakly supervised learning of deep metrics for stereo reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1339–1348 (2017)Google Scholar
  53. 53.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
  54. 54.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7794–7803 (2018)Google Scholar
  55. 55.
    Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. arXiv preprint arXiv:1810.11408 (2018)
  56. 56.
    Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 501–509 (2019)Google Scholar
  57. 57.
    Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J.: SegStereo: exploiting semantic information for disparity estimation. arXiv preprint arXiv:1807.11699 (2018)
  58. 58.
    Yang, Q.: A non-local cost aggregation method for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1402–1409. IEEE (2012)Google Scholar
  59. 59.
    Yang, Q.: Stereo matching using tree filtering. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 834–846 (2014)CrossRefGoogle Scholar
  60. 60.
    Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6044–6053 (2019)Google Scholar
  61. 61.
    Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1592–1599 (2015)Google Scholar
  62. 62.
    Zhang, F., Dai, L., Xiang, S., Zhang, X.: Segment graph based image filtering: fast structure-preserving smoothing. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 361–369 (2015)Google Scholar
  63. 63.
    Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)Google Scholar
  64. 64.
    Zhang, F., Wah, B.W.: Fundamental principles on learning new features for effective dense matching. IEEE Trans. Image Process. 27(2), 822–836 (2018)MathSciNetCrossRefGoogle Scholar
  65. 65.
    Zhang, S., Yan, S., He, X.: LatentGNN: learning efficient non-local relations for visual recognition. arXiv preprint arXiv:1905.11634 (2019)
  66. 66.
    Zhang, Y., et al.: Adaptive unimodal cost volume filtering for deep stereo matching. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (2020)Google Scholar
  67. 67.
    Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017)
  68. 68.
    Zhou, C., Zhang, H., Shen, X., Jia, J.: Unsupervised learning of stereo matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1567–1575 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of OxfordOxfordEngland
  2. 2.University of Hong KongPok Fu LamHong Kong
  3. 3.Baidu ResearchBeijingChina
  4. 4.Chinese University of Hong KongSha TinChina

Personalised recommendations