Recovering Depth Map from Video with Moving Objects

  • Hsiao-Wei Chen
  • Shang-Hong Lai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7088)


In this paper, we propose a novel approach to reconstructing depth map from a video sequence, which not only considers geometry coherence but also temporal coherence. Most of the previous methods of reconstructing depth map from video are based on the assumption of rigid motion, thus they cannot provide satisfactory depth estimation for regions with moving objects. In this work, we develop a depth estimation algorithm that detects regions of moving objects and recover the depth map in a Markov Random Field framework. We first apply SIFT matching across frames in the video sequence and compute the camera parameters for all frames and the 3D positions of the SIFT feature points via structure from motion. Then, the 3D depths at these SIFT points are propagated to the whole image based on image over-segmentation to construct an initial depth map. Then the depth values for the segments with large reprojection errors are refined by minimizing the corresponding re-projection errors. In addition, we detect the area of moving objects from the remaining pixels with large re-projection errors. In the final step, we optimize the depth map estimation in a Markov random filed framework. Some experimental results are shown to demonstrate improved depth estimation results of the proposed algorithm.


Depth map recovery structure from motion Markov random field 


  1. 1.
    Saxena, A., Sun, M., Ng, A.Y.: Make3D: Learning 3D Scene Structure from a Single Still Image. IEEE Trans. on Pattern Analysis and Machine Intelligence (2008)Google Scholar
  2. 2.
    Liu, B., Gould, S., Koller, D.: Single Image Depth Estimation From Predicted Semantic Labels. In: CVPR 2010 (2010)Google Scholar
  3. 3.
    Zhang, G., Jia, J., Wong, T., Bao, H.: Recovering Consistent Video Depth Maps via Bundle Optimization. In: CVPR (2008)Google Scholar
  4. 4.
    Zhang, G., Jia, J., Wong, T., Bao, H.: Consistent Depth Maps Recovery from a Video Sequence. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(6), 974–988 (2009)CrossRefGoogle Scholar
  5. 5.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In: CVPR (2006)Google Scholar
  6. 6.
    Newcombe, R.A., Davison, A.J.: Live Dense Reconstruction with a Single Moving Camera. In: CVPR (2010)Google Scholar
  7. 7.
    Comanicu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence (May 2002)Google Scholar
  8. 8.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59(2) (September 2004)Google Scholar
  9. 9.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering Occlusion Boundaries from an Image. In: IJCV (2010)Google Scholar
  10. 10.
    Sun, J., Shum, H.Y., Zheng, N.N.: Stereo Matching Using Belief Propagation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 510–524. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P., Huttenlocher, D.: Efficient belief propagation for early vision. In: IJCV, pp. 1–8 (2007)Google Scholar
  12. 12.
    Pele, O., Werman, M.: A Linear Time Histogram Metric for Improved SIFT Matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 495–508. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Martinec, D., Pajdla, T.: 3D Reconstruction by Fitting Low-Rank Matrices with Missing Data. In: CVPR 2005, pp. 198-205, IEEE (June 2005)Google Scholar
  14. 14.
    Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Intern. Journal of Computer Vision 59(3), 207–232 (2004)CrossRefGoogle Scholar
  15. 15.
    Alsabti, K., Ranka, S., Singh, V.: An Efficient k-means Clustering Algorithm. Pattern Recognit. Lett. 14(10), 763–769 (1993)CrossRefGoogle Scholar
  16. 16.
    Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A Comparative Study of Energy Minimization Methods for Markov Random Fields. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Boykov, Y., Veksler, O., Zabih, R.: Fast Approximate Energy Minimization via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  18. 18.
    Kolmogorov, V., Zabih, R.: What Energy Functions can be Minimized via Graph Cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 147–159 (2004)CrossRefzbMATHGoogle Scholar
  19. 19.
    Um, G., Bang, G., Hur, N., Kim, J., Ho, Y.-S.: Test Sequence “Lovebird1&2”Google Scholar
  20. 20.
    Domański, M., Grajek, T., Klimaszewski, K., Kurc, M., Stankiewicz, O., Stankowski, J., Wegner, K.: Poznań Multiview Video Test Sequences and Camera Parameters. ISO/IEC JTC1/SC29/WG11 MPEG 2009/M17050, Xian, China (October 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hsiao-Wei Chen
    • 1
  • Shang-Hong Lai
    • 1
  1. 1.Computer ScienceNational Tsing Hua UniversityHsinchuR.O.C.

Personalised recommendations