Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)


Computational stereo has reached a high level of accuracy, but degrades in the presence of occlusions, repeated textures, and correspondence errors along edges. We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor, which is increasingly common on consumer cameras. Our network uses a novel architecture to fuse these two sources of information and can overcome the above-mentioned limitations of pure binocular stereo matching. Our method provides a dense depth map with sharp edges, which is crucial for computational photography applications like synthetic shallow-depth-of-field or 3D Photos. Additionally, we avoid the inherent ambiguity due to the aperture problem in stereo cameras by designing the stereo baseline to be orthogonal to the dual-pixel baseline. We present experiments and comparisons with state-of-the-art approaches to show that our method offers a substantial improvement over previous works.


Dual-pixels Stereo matching Depth estimation Computational photography 



We thank photographers Michael Milne and Andrew Radin for collecting the data, Jon Barron and Marc Levoy for their comments on the text, Yael Pritch, Sameer Ansari, Christoph Rhemann and Shahram Izadi for the help, useful discussions and support for this work.

Supplementary material (85.5 mb)
Supplementary material 1 (zip 87554 KB)


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  2. 2.
    Ansari, S., Wadhwa, N., Garg, R., Chen, J.: Wireless software synchronization of multiple distributed cameras. In: ICCP (2019)Google Scholar
  3. 3.
    Besse, F., Rother, C., Fitzgibbon, A., Kautz, J.: PMBP: patchmatch belief propagation for correspondence field estimation. IJCV 110(1), 2–13 (2014)CrossRefGoogle Scholar
  4. 4.
    Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC (2011)Google Scholar
  5. 5.
    Chabra, R., Straub, J., Sweeney, C., Newcombe, R., Fuchs, H.: StereoDRNet: dilated residual stereonet. In: CVPR (2019)Google Scholar
  6. 6.
    Chang, J., Chen, Y.: Pyramid stereo matching network. In: CVPR (2018)Google Scholar
  7. 7.
    DiVerdi, S., Barron, J.T.: Geometric calibration for mobile, stereo, autofocus cameras. In: WACV (2016)Google Scholar
  8. 8.
    Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: DeepPruner: learning efficient stereo matching via differentiable patchmatch. In: ICCV (2019)Google Scholar
  9. 9.
    Fanello, S.R., et al.: Low compute and fully parallel computer vision with hashmatch. In: ICCV (2017)Google Scholar
  10. 10.
    Fanello, S.R., et al.: UltraStereo: efficient learning-based matching for active stereo systems. In: CVPR (2017)Google Scholar
  11. 11.
    Fanello, S., et al.: 3D stereo estimation and fully automated learning of eye-hand coordination in humanoid robots. In: Humanoids (2014)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. IJCV 19, 57–92 (2006)Google Scholar
  13. 13.
    Fusiello, A., Trucco, E., Verri, A.: A compact algorithm for rectification of stereo pairs. Mach. Vis. Appl. 12, 16–22 (2000)CrossRefGoogle Scholar
  14. 14.
    Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: ICCV (2019)Google Scholar
  15. 15.
    Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: CVPR (2017)Google Scholar
  16. 16.
    Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. TOG (2019)Google Scholar
  17. 17.
    Hamzah, R.A., Ibrahim, H.: Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 1–23 (2016)CrossRefGoogle Scholar
  18. 18.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  20. 20.
    Hedman, P., Kopf, J.: Instant 3D photography. In: SIGGRAPH (2018)Google Scholar
  21. 21.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. TPAMI 30, 328–341 (2008)CrossRefGoogle Scholar
  22. 22.
    Huber, P.J.: Robust estimation of a location parameter. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, pp. 492–518. Springer, New York (1992). Scholar
  23. 23.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: CVPR (2017)Google Scholar
  24. 24.
    Jang, J., Park, S., Jo, J., Paik, J.: Depth map generation using a single image sensor with phase masks. Optics express (2016)Google Scholar
  25. 25.
    Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: CVPR (2017)Google Scholar
  26. 26.
    Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 596–613. Springer, Cham (2018). Scholar
  27. 27.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  28. 28.
    Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: ICPR (2006)Google Scholar
  29. 29.
    Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: ICCV (2001)Google Scholar
  30. 30.
    Liang, Z., et al.: Learning for disparity estimation through feature constancy. In: CVPR (2018)Google Scholar
  31. 31.
    Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194, 283–287 (1976)CrossRefGoogle Scholar
  32. 32.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)Google Scholar
  33. 33.
    Meier, L., Honegger, D., Vilhjalmsson, V., Pollefeys, M.: Real-time stereo matching failure prediction and resolution using orthogonal stereo setups. In: ICRA (2017)Google Scholar
  34. 34.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)Google Scholar
  35. 35.
    Morgan, M., Castet, E.: The aperture problem in stereopsis. Vis. Res. 37, 2737–2744 (1997)CrossRefGoogle Scholar
  36. 36.
    Mulligan, J., Isler, V., Daniilidis, K.: Trinocular stereo: a real-time algorithm and its evaluation. IJCV 47, 51–61 (2002)CrossRefGoogle Scholar
  37. 37.
    Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., Hanrahan, P.: Light field photography with a hand-held plenoptic camera. Technical report, Stanford University (2005)Google Scholar
  38. 38.
    Orts-Escolano, S., et al.: Holoportation: virtual 3D teleportation in real-time. In: UIST (2016)Google Scholar
  39. 39.
    Pang, J., Sun, W., Ren, J., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: ICCV Workshop (2017)Google Scholar
  40. 40.
    Punnappurath, A., Abuolaim, A., Afifi, M., Brown, M.S.: Modeling defocus-disparity in dual-pixel sensors. In: ICCP (2020)Google Scholar
  41. 41.
    Punnappurath, A., Brown, M.S.: Reflection removal using a dual-pixel sensor. In: CVPR (2019)Google Scholar
  42. 42.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47, 7–42 (2002). Scholar
  43. 43.
    Sinha, S.N., Scharstein, D., Szeliski, R.: Efficient high-resolution stereo matching using local plane sweeps. In: CVPR (2014)Google Scholar
  44. 44.
    Song, X., Zhao, X., Hu, H., Fang, L.: EdgeStereo: a context integrated residual pyramid network for stereo matching. In: ACCV (2018)Google Scholar
  45. 45.
    Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, Heidelberg (2010)zbMATHGoogle Scholar
  46. 46.
    Tankovich, V., et al.: Sos: stereo matching in O(1) with slanted support windows. In: IROS (2018)Google Scholar
  47. 47.
    Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. In: SIGGRAPH (2018)Google Scholar
  48. 48.
    Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J.: SegStereo: exploiting semantic information for disparity estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 660–676. Springer, Cham (2018). Scholar
  49. 49.
    Zhang, F., Prisacariu, V.A., Yang, R., Torr, P.H.S.: GA-Net: guided aggregation net for end-to-end stereo matching. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Google ResearchMountain ViewUSA

Personalised recommendations