Skip to main content
Log in

AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

A Correction to this article was published on 09 February 2022

This article has been updated

Abstract

Recently, records on stereo matching benchmarks are constantly broken by end-to-end disparity networks. However, the domain adaptation ability of these deep models is quite limited. Addressing such problem, we present a novel domain-adaptive approach called AdaStereo that aims to align multi-level representations for deep stereo matching networks. Compared to previous methods, our AdaStereo realizes a more standard, complete and effective domain adaptation pipeline. Firstly, we propose a non-adversarial progressive color transfer algorithm for input image-level alignment. Secondly, we design an efficient parameter-free cost normalization layer for internal feature-level alignment. Lastly, a highly related auxiliary task, self-supervised occlusion-aware reconstruction is presented to narrow the gaps in output space. We perform intensive ablation studies and break-down comparisons to validate the effectiveness of each proposed module. With no extra inference overhead and only a slight increase in training complexity, our AdaStereo models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo, even outperforming some state-of-the-art disparity networks finetuned with target-domain ground-truths. Moreover, based on two additional evaluation metrics, the superiority of our domain-adaptive stereo matching pipeline is further uncovered from more perspectives. Finally, we demonstrate that our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Change history

References

  • Abramov, A., Bayer, C., & Heller, C. (2020). Keep it simple: Image statistics matching for domain adaptation. arXiv preprint arXiv:2005.12551.

  • Atapour-Abarghouei, A., Breckon, T. P. (2018). Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In CVPR.

  • Black, M. J., & Rangarajan, A. (1996). On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. In IJCV.

  • Bleyer, M., Rhemann, C., & Rother, C. (2011). Patchmatch stereo–stereo matching with slanted support windows. In BMVC.

  • Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR.

  • Chabra, R., Straub, J., Sweeney, C., Newcombe, R., & Fuchs, H. (2019). Stereodrnet: Dilated residual stereonet. In CVPR.

  • Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR.

  • Chen, X., Kundu, K., Zhu, Y., Berneshawi, A. G., Ma, H., Fidler, S., & Urtasun, R. (2015). 3d object proposals for accurate object class detection. In NIPS.

  • Chen, Y., Li, W., & Van Gool, L. (2018). Road: Reality oriented adaptation for semantic segmentation of urban scenes. In CVPR.

  • Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFS. In ICLR.

  • Chen, Z., Sun, X., Wang, L., Yu, Y., & Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In ICCV.

  • Cheng, X., Wang, P., & Yang, R. (2019). Learning depth with convolutional spatial propagation network. In TPAMI.

  • Cheng, X., Zhong, Y., Harandi, M., Dai, Y., Chang, X., Drummond, T., Li, H., & Ge, Z. (2020). Hierarchical neural architecture search for deep stereo matching. In NIPS.

  • Dai, Z., & Heckel, R. (2019). Channel normalization in convolutional neural network avoids vanishing gradients. arXiv preprint arXiv:1907.09539.

  • Di Stefano, L., Mattoccia, S., & Tombari, F. (2005). ZNCC-based template matching using bounded partial correlation. In PRL (pp. 2129–2134).

  • Duggal, S., Wang, S., Ma, W. C., Hu, R., & Urtasun, R. (2019). Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In CVPR.

  • Engel, J., Stückler, J., & Cremers, D. (2015). Large-scale direct slam with stereo cameras. In IROS.

  • Fang, L., Zhao, X., Song, X., Zhang, S., & Yang, M. (2018). Putting the anchors efficiently: Geometric constrained pedestrian detection. In ACCV.

  • Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In ICCV.

  • Franke, U., & Joos, A. (2000). Real-time stereo vision for urban traffic scene understanding. In IV.

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR.

  • Geng, B., Tao, D., & Xu, C. (2011). Daml: Domain adaptation metric learning. In TIP.

  • Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016). Deep reconstruction-classification networks for unsupervised domain adaptation. In ECCV.

  • Gidaris, S., Singh, P., & Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. In ICLR.

  • Godard, C., Mac Aodha, O., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR.

  • Godard, C., Mac Aodha, O., Firman, M., & Brostow, G. J. (2019). Digging into self-supervised monocular depth estimation. In ICCV.

  • Gomez-Ojeda, R., Moreno, F. A., Zuñiga-Noël, D., Scaramuzza, D., & Gonzalez-Jimenez, J. (2019). Pl-slam: A stereo slam system through the combination of points and line segments. In T-RO.

  • Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). Dlow: Domain flow for adaptation and generalization. In CVPR.

  • Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In CVPR.

  • Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In CVPR.

  • Guo, X., Li, H., Yi, S., Ren, J., & Wang, X. (2018). Learning monocular depth by distilling cross-domain stereo networks. In ECCV.

  • Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In CVPR.

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. In TPAMI.

  • Hoffman, J., Tzeng, E., Park, T., Zhu, J. Y., Isola, P., Saenko, K., Efros, A., & Darrell, T. (2018). Cycada: Cycle-consistent adversarial domain adaptation. In ICML.

  • Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649.

  • Kang, S. B., Webb, J. A., Zitnick, C. L., Kanade, T. (1995). A multibaseline stereo system with active illumination and real-time image acquisition. In ICCV.

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  • Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In CVPR.

  • Li, P., Chen, X., & Shen, S. (2019). Stereo R-CNN based 3d object detection for autonomous driving. In CVPR.

  • Li, P., Qin, T., & Shen, S. (2018). Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In ECCV.

  • Li, Y., Wang, N., Shi, J., Liu, J., & Hou, X. (2016). Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779.

  • Liang, Z., Feng, Y., Guo, Y., & Liu, H. (2018). Learning deep correspondence through prior and posterior feature constancy. In: CVPR.

  • Liang, Z., Guo, Y., Feng, Y., Chen, W., Qiao, L., Zhou, L., Zhang, J., & Liu, H. (2019). Stereo matching using multi-level cost volume and multi-scale feature constancy. In TPAMI.

  • Li, J., Skinner, K. A., Eustice, R. M., & Johnson-Roberson, M. (2017). Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images. RAL, 3(1), 387–394.

    Google Scholar 

  • Liu, R., Yang, C., Sun, W., Wang, X., & Li, H. (2020). Stereogan: Bridging synthetic-to-real domain gap by joint optimization of domain translation and stereo matching. In CVPR.

  • Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In ICML.

  • Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional adversarial domain adaptation. In NIPS.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.

  • Lopez-Rodriguez, A., & Mikolajczyk, K. (2020). DESC: Domain adaptation for depth estimation via semantic consistency. arXiv preprint arXiv:2009.01579.

  • Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In CVPR.

  • Mattoccia, S., Tombari, F., & Di Stefano, L. (2008). Fast full-search equivalent template matching by enhanced bounded correlation. In TIP.

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR.

  • Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR.

  • Miclea, V. C., & Nedevschi, S. (2019). Real-time semantic segmentation-based stereo reconstruction. In TITS.

  • Nam, H., & Kim, H. E. (2018). Batch-instance normalization for adaptively style-invariant neural networks. In NIPS (pp. 2558–2567).

  • Nie, G. Y., Cheng, M. M., Liu, Y., Liang, Z., Fan, D. P., Liu, Y., Wang, Y. (2019). Multi-level context ultra-aggregation for stereo matching. In CVPR.

  • Ohta, Y., & Kanade, T. (1985). Stereo by intra-and inter-scanline search using dynamic programming. In TPAMI.

  • Okutomi, M., & Kanade, T. (1993). A multiple-baseline stereo. In TPAMI.

  • Pang, J., Sun, W., Ren, J., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. In ICCV workshop.

  • Pang, J., Sun, W., Yang, C., Ren, J., Xiao, R., Zeng, J., & Lin, L. (2018). Zoom and learn: Generalizing deep stereo matching to novel domains. In CVPR.

  • Reinhard, E., Adhikhmin, M., Gooch, B., & Shirley, P. (2001). Color transfer between images. CGA, 21(5), 34–41.

    Google Scholar 

  • Roy, S., Cox, I. J. (1998). A maximum-flow formulation of the n-camera stereo correspondence problem. In ICCV.

  • Saikia, T., Marrakchi, Y., Zela, A., Hutter, F., & Brox, T. (2019). Autodispnet: Improving disparity estimation with automl. In ICCV (pp. 1812–1823).

  • Saito, K., Ushiku, Y., Harada, T., & Saenko, K. (2019). Strong-weak distribution alignment for adaptive object detection. In: CVPR (pp. 6956–6965).

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IJCV.

  • Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In GCPR.

  • Schonberger, J. L., Sinha, S. N., & Pollefeys, M. (2018). Learning to fuse proposals from multiple scanline optimizations in semi-global matching. In ECCV.

  • Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR.

  • Seki, A., & Pollefeys, M. (2017). Sgm-nets: Semi-global matching with neural networks. In: CVPR.

  • Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In CVPR.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

  • Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., & Shi, J. (2021). Adastereo: A simple and efficient approach for adaptive stereo matching. In CVPR.

  • Song, X., Zhao, X., Fang, L., Hu, H., & Yu, Y. (2020). Edgestereo: An effective multi-task learning network for stereo matching and edge detection. In IJCV.

  • Song, X., Zhao, X., Hu, H., & Fang, L. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In ACCV.

  • Sun, J., Zheng, N. N., & Shum, H. Y. (2003). Stereo matching using belief propagation. In TPAMI.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.

  • Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., & Stefano, L. D. (2019). Real-time self-adaptive deep stereo. In CVPR.

  • Tsai, Y. H., Hung, W. C., Schulter, S., Sohn, K., Yang, M. H., Chandraker, M. (2018). Learning to adapt structured output space for semantic segmentation. In CVPR.

  • Tulyakov, S., Ivanov, A., & Fleuret, F. (2017). Weakly supervised learning of deep metrics for stereo reconstruction. In ICCV.

  • Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In CVPR.

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2016). Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022.

  • Vu, T. H., Jain, H., Bucher, M., Cord, M., & Pérez, P. (2019). Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In CVPR.

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. In TIP.

  • Wang, R., Schworer, M., & Cremers, D. (2017). Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras. In ICCV.

  • Wu, Z., Wu, X., Zhang, X., Wang, S., & Ju, L. (2019). Semantic stereo matching with pyramid cost volumes. In ICCV.

  • Xu, H., & Zhang, J. (2020). Aanet: Adaptive aggregation network for efficient stereo matching. In CVPR.

  • Xu, J., Xiao, L., & López, A. M. (2019). Self-supervised domain adaptation for computer vision tasks. Access, 7, 156694–156706.

    Article  Google Scholar 

  • Yang, G., Deng, Z., Lu, H., & Li, Z. (2018). SRC-DISP: Synthetic-realistic collaborative disparity learning for stereo matching. In ACCV.

  • Yang, G., Manela, J., Happold, M., & Ramanan, D. (2019). Hierarchical deep stereo matching on high-resolution images. In CVPR.

  • Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., & Zhou, B. (2019). Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In CVPR.

  • Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In ECCV.

  • Yin, Z., Darrell, T., & Yu, F. (2019). Hierarchical discrete distribution decomposition for match density estimation. In CVPR.

  • Yoo, J., Uh, Y., Chun, S., Kang, B., & Ha, J. W. (2019). Photorealistic style transfer via wavelet transforms. In ICCV.

  • Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In ECCV.

  • Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR.

  • Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Saminger-Platz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811.

  • Zhang, Y., Chen, Y., Bai, X., Yu, S., Yu, K., Li, Z., & Yang, K. (2020). Adaptive unimodal cost volume filtering for deep stereo matching. In AAAI.

  • Zhang, Y., David, P., & Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes. In ICCV.

  • Zhang, F., Prisacariu, V., Yang, R., & Torr, P. H. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In CVPR.

  • Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., & Torr, P. (2020). Domain-invariant stereo matching networks. In ECCV.

  • Zhang, C., Wang, L., & Yang, R. (2010). Semantic segmentation of urban scenes using dense depth maps. In ECCV.

  • Zhao, S., Fu, H., Gong, M., & Tao, D. (2019). Geometry-aware symmetric domain adaptation for monocular depth estimation. In CVPR.

  • Zheng, C., Cham, T. J., & Cai, J. (2018). T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In ECCV.

  • Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR.

  • Zhu, J. (2017). Image gradient-based joint direct visual odometry for stereo camera. In IJCAI.

  • Zhu, X., Pang, J., Yang, C., Shi, J., & Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In CVPR.

  • Zhu, J.Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.

  • Zhu, X., Zhou, H., Yang, C., Shi, J., & Lin, D. (2018). Penalizing top performers: Conservative loss for semantic segmentation adaptation. In ECCV.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by A. Hilton.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online published version of the article as been updated due to Open access cancellation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, X., Yang, G., Zhu, X. et al. AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach. Int J Comput Vis 130, 226–245 (2022). https://doi.org/10.1007/s11263-021-01549-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01549-6

Keywords

Navigation