Building Scene Models by Completing and Hallucinating Depth and Semantics

  • Miaomiao Liu
  • Xuming He
  • Mathieu Salzmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)


Building 3D scene models has been a longstanding goal of computer vision. The great progress in depth sensors brings us one step closer to achieving this in a single shot. However, depth sensors still produce imperfect measurements that are sparse and contain holes. While depth completion aims at tackling this issue, it ignores the fact that some regions of the scene are occluded by the foreground objects. Building a scene model would therefore require to hallucinate the depth behind these objects. In contrast with existing methods that either rely on manual input, or focus on the indoor scenario, we introduce a fully-automatic method to jointly complete and hallucinate depth and semantics in challenging outdoor scenes. To this end, we develop a two-layer model representing both the visible information and the hidden one. At the heart of our approach lies a formulation based on the Mumford-Shah functional, for which we derive an effective optimization strategy. Our experiments evidence that our approach can accurately fill the large holes in the input depth maps, segment the different kinds of objects in the scene, and hallucinate the depth and semantics behind the foreground objects.

Supplementary material

419981_1_En_16_MOESM1_ESM.pdf (6.2 mb)
Supplementary material 1 (pdf 6370 KB)


  1. 1.
    Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. In: IJCV (2008)Google Scholar
  2. 2.
    Cornelis, N., Leibe, B., Cornelis, K., Van Gool, L.: 3d urban scene modeling integrating recognition and reconstruction. In: IJCV (2008)Google Scholar
  3. 3.
    Hane, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3d scene reconstruction and class segmentation. In: CVPR (2013)Google Scholar
  4. 4.
    Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3d reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10599-4_45 Google Scholar
  5. 5.
    Shao, L., Han, J., Kohli, P., Zhang, Z.: Computer vision and machine learning with RGB-D sensors. Springer, Heidelberg (2014)CrossRefzbMATHGoogle Scholar
  6. 6.
    Diebel, J., Thrun, S.: An application of markov random fields to range sensing. In: NIPS (2005)Google Scholar
  7. 7.
    Park, J., Kim, H., Tai, Y.W., Brown, M., Kweon, I.: High quality depth map upsampling for 3d-tof cameras. In: ICCV (2011)Google Scholar
  8. 8.
    Aodha, O.M., Campbell, N.D.F., Nair, A., Brostow, G.J.: Patch based synthesis for single depth image super-resolution. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 71–84. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33712-3_6 Google Scholar
  9. 9.
    Hornácek, M., Rhemann, C., Gelautz, M., Rother, C.: Depth super resolution by rigid body self-similarity in 3d. In: CVPR (2013)Google Scholar
  10. 10.
    Kiechle, M., Hawe, S., Kleinsteuber, M.: A joint intensity and depth co-sparse analysis model for depth map super-resolution. In: ICCV (2013)Google Scholar
  11. 11.
    Lu, S., Ren, X., Liu, F.: Depth enhancement via low-rank matrix completion. In: CVPR (2014)Google Scholar
  12. 12.
    Wang, L., Jin, H., Yang, R., Gong, M.: Stereoscopic inpainting: Joint color and depth completion from stereo images. In: CVPR (2008)Google Scholar
  13. 13.
    Doria, D., Radke, R.J.: Filling large holes in lidar data by inpainting depth gradients. In: CVPR Workshops (2012)Google Scholar
  14. 14.
    Dolson, J., Baek, J., Plagemann, C., Thrun, S.: Upsampling range data in dynamic environments. In: CVPR (2010)Google Scholar
  15. 15.
    Zach, C.: Dual decomposition for joint discrete-continuous optimization. In: AISTATS (2013)Google Scholar
  16. 16.
    Geiger, A., Wang, C.: Joint 3D object and layout inference from a single RGB-D image. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 183–195. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24947-6_15 CrossRefGoogle Scholar
  17. 17.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)Google Scholar
  18. 18.
    Scharwächter, T., Enzweiler, M., Franke, U., Roth, S.: Stixmantics: a medium-level model for real-time semantic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 533–548. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_35 Google Scholar
  19. 19.
    Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time rgb-d camera relocalization. In: ISMAR (2013)Google Scholar
  20. 20.
    Rusu, R.B., Holzbach, A., Diankov, R., Bradski, G., Beetz, M.: Perception for mobile manipulation and grasping using active stereo. In: IEEE-RAS International Conference on Humanoid Robots (2009)Google Scholar
  21. 21.
    Yang, Q., Yang, R., Davis, J., Nistér, D.: Spatial-depth super resolution for range images. In: CVPR (2007)Google Scholar
  22. 22.
    Shen, J., Cheung, S.: Layer depth denoising and completion for structured-light rgb-d cameras. In: CVPR (2013)Google Scholar
  23. 23.
    Liu, J., Gong, X., Liu, J.: Guided inpainting and filtering for kinect depth maps. In: ICPR (2012)Google Scholar
  24. 24.
    Bhavsar, A.V., Rajagopalan, A.N.: Range map superresolution-inpainting, and reconstruction from sparse data. CVIU 116(4), 572–591 (2012)Google Scholar
  25. 25.
    Herrera C., D., Kannala, J., Ladický, L., Heikkilä, J.: Depth map inpainting under a second-order smoothness prior. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 555–566. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38886-6_52 CrossRefGoogle Scholar
  26. 26.
    Ferstl, D., Reinbacher, C., Ranftl, R., Rüther, M., Bischof, H.: Image guided depth upsampling using anisotropic total generalized variation. In: ICCV (2013)Google Scholar
  27. 27.
    Criminisi, A., Perez, P., Toyama, K.: Object removal by exemplar-based inpainting. In: CVPR (2003)Google Scholar
  28. 28.
    Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: TOG (2004)Google Scholar
  29. 29.
    Schuon, S., Theobalt, C., Davis, J., Thrun, S.: Lidarboost: Depth superresolution for tof 3d shape scanning. In: CVPR (2009)Google Scholar
  30. 30.
    Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM UIST (2011)Google Scholar
  31. 31.
    Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: IROS (2012)Google Scholar
  32. 32.
    Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. In: TOG (2013)Google Scholar
  33. 33.
    Guo, R., Hoiem, D.: Beyond the line of sight: labeling the underlying surfaces. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 761–774. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_55 Google Scholar
  34. 34.
    Bleyer, M., Rother, C., Kohli, P., Scharstein, D., Sinha, S.: Object stereo joint stereo matching and object segmentation. In: CVPR (2011)Google Scholar
  35. 35.
    Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Strekalovskiy, E., Cremers, D.: Real-time minimization of the piecewise smooth mumford-shah functional. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 127–141. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_9 Google Scholar
  37. 37.
    Faugeras, O.D., Lustman, F.: Motion and structure from motion in a piecewise planar environment. Int. J. Pattern Recogn. Artif. Intell. 2(3), 4010–4017 (1988)CrossRefGoogle Scholar
  38. 38.
    Baillard, C., Zisserma, A.: Automatic reconstruction of piecewise planar models from multiple views. In: CVPR (1999)Google Scholar
  39. 39.
    Gallup, D., Frahm, J.M., Pollefeys, M.: Piecewise planar and non-planar stereo for urban scene reconstruction. In: CVPR (2010)Google Scholar
  40. 40.
    Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.: Joint optimization for object class segmentation and dense stereo reconstruction. In: IJCV (2012)Google Scholar
  41. 41.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Suesstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. In: PAMI (2012)Google Scholar
  43. 43.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  44. 44.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: NIPS (2011)Google Scholar
  45. 45.
    Liu, B., Gould, S., Koller, D.: Single image septh estimation from predicted semantic labels. In: CVPR (2010)Google Scholar
  46. 46.
    Ladický, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: CVPR (2014)Google Scholar
  47. 47.
    Xu, P., Davoine, F., Bordes, J.B., Zhao, H., Denœux, T.: Multimodal information fusion for urban scene understanding. Mach. Vis. Appl. 27, 331 (2014)CrossRefGoogle Scholar
  48. 48.
    Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vazquez, D., Lopez, A.M.: Vision-based offline-online perception paradigm for autonomous driving. In: WACV (2015)Google Scholar
  49. 49.
    Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. JMLR 17, 1–32 (2016)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Data61, CSIRO, and ANUCanberraAustralia
  2. 2.CVLabEPFLLausanneSwitzerland

Personalised recommendations