Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles

  • 1013 Accesses

  • 26 Citations


We propose a depth and image scene flow estimation method taking the input of a binocular video. The key component is motion-depth temporal consistency preservation, making computation in long sequences reliable. We tackle a number of fundamental technical issues, including connection establishment between motion and depth, structure consistency preservation in multiple frames, and long-range temporal constraint employment for error correction. We address all of them in a unified depth and scene flow estimation framework. Our main contributions include development of motion trajectories, which robustly link frame correspondences in a voting manner, rejection of depth/motion outliers through temporal robust regression, novel edge occurrence map estimation, and introduction of anisotropic smoothing priors for proper regularization.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Algorithm 3
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19


  1. 1.

    2D-Plus-Depth: (2009). Stereoscopic video coding format.

  2. 2.


  1. Álvarez, L., Deriche, R., Papadopoulo, T., & Sánchez, J. (2007). Symmetrical dense optical flow estimation with occlusions detection. International Journal of Computer Vision, 75, 371–385.

  2. Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92, 1–31.

  3. Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: a view centered variational approach. In CVPR (pp. 1506–1513).

  4. Black, M. J. (1994). Recursive non-linear estimation of discontinuous flow fields. In ECCV (Vol. 1, pp. 138–145).

  5. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In ECCV (Vol. 4, pp. 25–36).

  6. Brox, T., Bregler, C., & Malik, J. (2009). Large displacement optical flow. In CVPR (pp. 41–48).

  7. Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estimation: combining highest accuracy with real-time performance. In ICCV (pp. 749–755).

  8. Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets horn/Schunck: combining local and global optic flow methods. International Journal of Computer Vision, 61, 211–231.

  9. Cech, J., Sanchez-Riera, J., & Horaud, R. (2011). Scene flow estimation by growing correspondence seeds. In CVPR (pp. 3129–3136).

  10. Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In CVPR (pp. 1362–1376).

  11. Hadfield, S., & Bowden, R. (2011). Kinecting the dots: particle based scene flow from depth sensors. In ICCV (pp. 2290–2295).

  12. Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In ICCV (pp. 1–7).

  13. Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48, 173–194.

  14. Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 147–159.

  15. Min, D. B., & Sohn, K. (2006). Edge-preserving simultaneous joint motion-disparity estimation. In ICPR (Vol. 2, pp. 74–77).

  16. OpenMP ARB (2012). Open multi-processing.

  17. Patras, I., Alvertos, N., & Tziritas, G. (1996). Joint disparity and motion field estimation in stereoscopic image sequences. In International conference on pattern recognition (Vol. 1, pp. 359–363).

  18. Rabe, C., Müller, T., Wedel, A., & Franke, U. (2010). Dense, robust, and accurate motion field estimation from stereo image sequences in real-time. In ECCV (Vol. 4, pp. 582–595).

  19. Richardt, C., Orr, D., Davies, I., Criminisi, A., & Dodgson, N. A. (2010). Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In ECCV (Vol. 3, pp. 510–523).

  20. Sand, P., & Teller, S. J. (2006). Particle video: long-range motion estimation using point trajectories. In CVPR (Vol. 2, pp. 2195–2202).

  21. Sand, P., & Teller, S. J. (2008). Particle video: long-range motion estimation using point trajectories. International Journal of Computer Vision, 80, 72–91.

  22. Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47, 7–42.

  23. Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3d. ACM Transactions on Graphics, 25, 835–846.

  24. Sun, D., Roth, S., Lewis, J. P., & Black, M. J. (2008). Learning optical flow. In ECCV (Vol. 3, pp. 83–97).

  25. Sundaram, N., Brox, T., & Keutzer, K. (2010). Dense point trajectories by gpu-accelerated large displacement optical flow. In ECCV (Vol. 1, pp. 438–451).

  26. Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In ICCV (pp. 839–846).

  27. University of Auckland (2008). Enpeda. Image sequence analysis test site (eisats).

  28. Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., & Theobalt, C. (2010). Joint estimation of motion, structure and geometry from stereo sequences. In ECCV (Vol. 4, pp. 568–581).

  29. Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behavior on synthetic and real-world stereo sequences. In International conference of image and vision computing New Zealand (IVCNZ) (pp. 1–6).

  30. Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 475–480.

  31. Vogel, C., Schindler, K., & Roth, S. (2011). 3d scene flow estimation with a rigid motion prior. In ICCV (pp. 1291–1298).

  32. Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In ECCV (Vol. 1, pp. 739–751).

  33. Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision, 95, 29–51.

  34. Xiao, J., Cheng, H., Sawhney, H. S., Rao, C., & Isnardi, M. A. (2006). Bilateral filtering-based optical flow estimation with occlusion detection. In ECCV (Vol. 1, pp. 211–224).

  35. Xu, L., Chen, J., & Jia, J. (2008). A segmentation based variational model for accurate optical flow estimation. In ECCV (Vol. 1, pp. 671–684).

  36. Xu, L., Jia, J., & Matsushita, Y. (2010). Motion detail preserving optical flow estimation. In CVPR (pp. 1293–1300).

  37. Yoon, K. J., & Kweon, I. S. (2006). Adaptive support-weight approach for correspondence search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 650–656.

  38. Zhang, Z., & Faugeras, O. D. (1992). Estimation of displacements from two 3-d frames obtained from stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1141–1156.

  39. Zhang, Y., & Kambhamettu, C. (2001). On 3d scene flow and structure estimation. In CVPR (Vol. 2, pp. 778–785).

  40. Zhang, L., Curless, B., & Seitz, S. M. (2003). Spacetime stereo: shape recovery for dynamic scenes. In CVPR (Vol. 2, pp. 367–374).

  41. Zhang, G., Jia, J., Wong, T. T., & Bao, H. (2009). Consistent depth maps recovery from a video sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 974–988.

  42. Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, B. R. A., & Seidel, H. P. (2009). Complementary optic flow. In EMMCVPR (pp. 207–220).

Download references


The authors would like to thank the associate editor and all the anonymous reviewers for their time and effort. This work is supported by a grant from the Research Grants Council of the Hong Kong SAR (Project No. 413110).

Author information

Correspondence to Jiaya Jia.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(MP4 121.5 MB)

(MP4 121.5 MB)


Appendix A

We give the details of solving the Euler-Lagrange equation (21):

With the applied anisotropic diffusion tensor, the smoothness term involves d hh , d vv , and d hv , which relate several neighboring points. We use the indices in Fig. 20 to represent the 2D coordinates: d1=d(i+1,j+1). q is used to index the current point (i,j). We apply central difference in the second order derivative computation. Specifically, we introduce function ζ(⋅) expressed as

(ζd v ) v and (ζd h ) v are defined similarly. Then we discretize a grid with size h h ×h v to apply Gauss-Seidel relaxation. By defining

we represent the anisotropic factors in simpler forms. The increment Δd can be computed using the following iterations:

where \(\mathcal{N}\) is the set of neighboring pixels, \(\mathcal{N}_{h}(q)=\{2,6\}\), and \(\mathcal{N}_{v}(q)=\{0,4\}\). Further, g 1 is defined as

and b can be derived as


\(\overline{p}=p \mod8\). To facilitate computation, we adopt a standard non-linear multi-grid numerical scheme (Bruhn and Weickert 2005) to accelerate convergence. The Gauss-Seidel relaxation works as the pre- and post-smoother, which is applied twice in each level.

Fig. 20

Indices for the 2D coordinates

Appendix B

After discretization, the linear equations to approximate Eq. (20) can be easily derived. Δu, Δv, and Δδd are iteratively refined, by fixing the other two variables during update. It leads to the Gauss-Seidel relaxation, written as


g 1,g 2 are functions defined in Appendix A. The Gauss-Seidel iteration is accelerated by a non-linear Multi-grid numerical scheme similar to the one to compute disparities in Appendix A.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hung, C.H., Xu, L. & Jia, J. Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles. Int J Comput Vis 102, 271–292 (2013) doi:10.1007/s11263-012-0559-y

Download citation


  • Video depth estimation
  • Consistent scene flow
  • Chained temporal profiles
  • Stereo matching