Advertisement

Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)

Abstract

We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. While recent learning-based methods estimate depth at high accuracy, 3D point clouds exported from their depth maps often fail to preserve important geometric feature (e.g., corners, edges, planes) of man-made scenes. Widely-used pixel-wise depth errors do not specifically penalize inconsistency on these features. These inaccuracies are particularly severe when subsequent depth reconstructions are accumulated in an attempt to scan a full environment with man-made objects with this kind of features. Our depth estimation algorithm therefore introduces a Combined Normal Map (CNM) constraint, which is designed to better preserve high-curvature features and global planar regions. In order to further improve the depth estimation accuracy, we introduce a new occlusion-aware strategy that aggregates initial depth predictions from multiple adjacent views into one final depth map and one occlusion probability map for the current reference view. Our method outperforms the state-of-the-art in terms of depth estimation accuracy, and preserves essential geometric features of man-made indoor scenes much better than other algorithms.

Keywords

Multi-view depth estimation Normal constraint Occlusion-aware strategy Deep learning 

Supplementary material

504446_1_En_37_MOESM1_ESM.pdf (1.8 mb)
Supplementary material 1 (pdf 1829 KB)

Supplementary material 2 (mp4 28310 KB)

References

  1. 1.
    Learning to find occlusion regions. In: CVPR 2011, pp. 2161–2168. IEEE (2011)Google Scholar
  2. 2.
    Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120(2), 153–168 (2016)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Alvarez, H., Paz, L.M., Sturm, J., Cremers, D.: Collision avoidance for quadrotors with a monocular camera. In: Hsieh, M.A., Khatib, O., Kumar, V. (eds.) Experimental Robotics. STAR, vol. 109, pp. 195–209. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-23778-7_14CrossRefGoogle Scholar
  4. 4.
    Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. BMVC 11, 1–11 (2011)Google Scholar
  5. 5.
    Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)Google Scholar
  6. 6.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)Google Scholar
  7. 7.
    Egnal, G., Wildes, R.P.: Detecting binocular half-occlusions: empirical comparisons of five approaches. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1127–1133 (2002)CrossRefGoogle Scholar
  8. 8.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658 (2015)Google Scholar
  9. 9.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp. 2366–2374 (2014)Google Scholar
  10. 10.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)Google Scholar
  11. 11.
    Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends.Comput. Graph. Vis. 9(1-2), 1–148 (2015). https://doi.org/10.1561/0600000052, http://dx.doi.org/10.1561/0600000052
  12. 12.
    Gallup, D., Frahm, J.M., Mordohai, P., Yang, Q., Pollefeys, M.: Real-time plane-sweeping stereo with multiple sweeping directions. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  13. 13.
    Hosni, A., Bleyer, M., Gelautz, M., Rhemann, C.: Local stereo matching using geodesic support weights. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2093–2096. IEEE (2009)Google Scholar
  14. 14.
    Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2012)CrossRefGoogle Scholar
  15. 15.
    Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: Scenenn: a scene meshes dataset with annotations. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 92–101. IEEE (2016)Google Scholar
  16. 16.
    Huang, P.H., Matzen, K., Kopf, J., Ahuja, N., Huang, J.B.: Deepmvs: learning multi-view stereopsis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830 (2018)Google Scholar
  17. 17.
    Ilg, E., Saikia, T., Keuper, M., Brox, T.: Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 614–630 (2018)Google Scholar
  18. 18.
    Im, S., Jeon, H.G., Lin, S., Kweon, I.S.: Dpsnet: end-to-end deep plane sweep stereo. arXiv preprint arXiv:1905.00538 (2019)
  19. 19.
    Kang, S.B., Szeliski, R., Chai, J.: Handling occlusions in dense multi-view stereo. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, p. I. IEEE (2001)Google Scholar
  20. 20.
    Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  21. 21.
    Kusupati, U., Cheng, S., Chen, R., Su, H.: Normal assisted stereo depth estimation. arXiv preprint arXiv:1911.10444 (2019)
  22. 22.
    Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2041–2050 (2018)Google Scholar
  23. 23.
    Liu, C., Gu, J., Kim, K., Narasimhan, S.G., Kautz, J.: Neural RGB (r) d sensing: depth and uncertainty from a video camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10986–10995 (2019)Google Scholar
  24. 24.
    Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlanerCNN: 3d plane detection and reconstruction from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  25. 25.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)Google Scholar
  26. 26.
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)Google Scholar
  27. 27.
    Min, D., Sohn, K.: Cost aggregation and occlusion handling with WLS in stereo matching. IEEE Trans. Image Process. 17(8), 1431–1442 (2008)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Paszke, A., et al.: Automatic differentiation in pytorch (2017)Google Scholar
  29. 29.
    Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291 (2018)Google Scholar
  30. 30.
    Qiu, J., et al.: Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)Google Scholar
  31. 31.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1. pp. 519–528. CVPR 2006, IEEE Computer Society, Washington, DC, USA (2006).  https://doi.org/10.1109/CVPR.2006.19, http://dx.doi.org/10.1109/CVPR.2006.19
  32. 32.
    Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2937 (2013)Google Scholar
  33. 33.
    Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-d slam systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580. IEEE (2012)Google Scholar
  34. 34.
    Ummenhofer, B., et al.: Demon: Depth and motion network for learning monocular stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5038–5047 (2017)Google Scholar
  35. 35.
    Wang, J., Zickler, T.: Local detection of stereo occlusion boundaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3818–3827 (2019)Google Scholar
  36. 36.
    Wang, K., Shen, S.: MVdepthnet: real-time multiview depth estimation neural network. In: 2018 International Conference on 3D Vision (3DV), pp. 248–257. IEEE (2018)Google Scholar
  37. 37.
    Xiao, J., Shah, M.: Motion layer extraction in the presence of occlusion using graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1644–1659 (2005)CrossRefGoogle Scholar
  38. 38.
    Xiao, J., Owens, A., Torralba, A.: Sun3d: a database of big spaces reconstructed using SFM and object labels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1625–1632 (2013)Google Scholar
  39. 39.
    Xu, L., Jia, J.: Stereo matching: an outlier confidence approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 775–787. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88693-8_57CrossRefGoogle Scholar
  40. 40.
    Yang, Q.: A non-local cost aggregation method for stereo matching. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1402–1409. IEEE (2012)Google Scholar
  41. 41.
    Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSnet: Depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)Google Scholar
  42. 42.
    Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  43. 43.
    Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 4, 650–656 (2006)CrossRefGoogle Scholar
  44. 44.
    Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599 (2015)Google Scholar
  45. 45.
    Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: Learning local geometric descriptors from RGB-D reconstructions. In: CVPR (2017)Google Scholar
  46. 46.
    Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  47. 47.
    Zollhöfer, M., Stotko, P., Görlitz, A., Theobalt, C., Nießner, M., Klein, R., Kolb, A.: State of the Art on 3D Reconstruction with RGB-D Cameras. Comput. Graph. Forum (Eurograph. State Art Rep. 2018), 37(2), 625–652 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The University of Hong KongPok Fu LamHong Kong
  2. 2.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations