Depth Extraction from Video Using Non-parametric Sampling

  • Kevin Karsch
  • Ce Liu
  • Sing Bing Kang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large dataset containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.


Input Image Depth Estimation Motion Parallax Dynamic Scene Motion Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zhang, G., Jia, J., Hua, W., Bao, H.: Robust bilayer segmentation and motion/depth estimation with a handheld camera. IEEE TPAMI, 603–617 (2011)Google Scholar
  2. 2.
    Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: ACM SIGGRAPH (2005)Google Scholar
  3. 3.
    Delage, E., Lee, H., Ng, A.: A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: CVPR (2006)Google Scholar
  4. 4.
    Saxena, A., Chung, S., Ng, A.: Learning depth from single monocular images. In: NIPS (2005)Google Scholar
  5. 5.
    Saxena, A., Sun, M., Ng, A.: Make3D: Learning 3D scene structure from a single still image. IEEE TPAMI 31, 824–840 (2009)CrossRefGoogle Scholar
  6. 6.
    Batra, D., Saxena, A.: Learning the right model: Efficient max-margin learning in laplacian crfs. In: CVPR (2012)Google Scholar
  7. 7.
    Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: CVPR (2010)Google Scholar
  8. 8.
    Li, C., Kowdle, A., Saxena, A., Chen, T.: Towards holistic scene understanding: Feedback enabled cascaded classification models. In: NIPS (2010)Google Scholar
  9. 9.
    Wu, C., Frahm, J.M., Pollefeys, M.: Repetition-based dense single-view reconstruction. In: CVPR (2011)Google Scholar
  10. 10.
    Han, F., Zhu, S.C.: Bayesian reconstruction of 3D shapes and scenes from a single image. In: IEEE HLK (2003)Google Scholar
  11. 11.
    Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: CVPR Workshop on Beyond Patches (2006)Google Scholar
  12. 12.
    Guttmann, M., Wolf, L., Cohen-Or, D.: Semi-automatic stereo extraction from video footage. In: ICCV 2009., pp. 136–142 (2009)Google Scholar
  13. 13.
    Ward, B., Kang, S.B., Bennett, E.P.: Depth Director: A system for adding depth to movies. IEEE Comput. Graph. Appl. 31, 36–48 (2011)CrossRefGoogle Scholar
  14. 14.
    Liao, M., Gao, J., Yang, R., Gong, M.: Video stereolization: Combining motion analysis with user interaction. IEEE Transactions on Visualization and Computer Graphics 18, 1079–1088 (2012)CrossRefGoogle Scholar
  15. 15.
    Konrad, J., Wang, M., Ishwar, P.: 2d-to-3d image conversion by learning depth from examples. In: 3DCINE (2012)Google Scholar
  16. 16.
    Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: Label transfer via dense scene alignment. In: CVPR (2009)Google Scholar
  17. 17.
    Liu, C., Yuen, J., Torralba, A.: SIFT Flow: Dense correspondence across scenes and its applications. IEEE TPAMI 33, 978–994 (2011)CrossRefGoogle Scholar
  18. 18.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  19. 19.
    Liu, C.: Beyond pixels: Exploring new representations and applications for motion analysis. PhD thesis. MIT (2009)Google Scholar
  20. 20.
    Wang, O., Lang, M., Frei, M., Hornung, A., Smolic, A., Gross, M.: StereoBrush: Interactive 2D to 3D conversion using discontinuous warps. In: SBIM (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kevin Karsch
    • 1
  • Ce Liu
    • 2
  • Sing Bing Kang
    • 3
  1. 1.University of Illinois at Urbana-ChampaignUSA
  2. 2.Microsoft Research, New EnglandUSA
  3. 3.Microsoft ResearchUSA

Personalised recommendations