A Comparison of Scene Flow Estimation Paradigms

  • Iraklis TsekourakisEmail author
  • Philippos Mordohai
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 842)


This paper presents a comparison between two core paradigms for computing scene flow from multi-view videos of dynamic scenes. In both approaches, shape and motion estimation are decoupled, in accordance to a large segment of the relevant literature. The first approach is faster and considers only one optical flow field and the depth difference between pixels in consecutive frames to generate a dense scene flow estimate. The second approach is more robust to outliers by considering multiple optical flow fields to generate scene flow. Our goal is to compare the isolated fundamental scene flow estimation methods, without using any post-processing, or optimization. We assess the accuracy of the two methods performing two tests: an optical flow prediction, and a future image prediction, both on a novel view. This is the first quantitative evaluation of scene flow estimation on real imagery of dynamic scenes, in absence of ground truth data.



This research has been supported in part by the National Science Foundation award #1217797 and #1527294.


  1. 1.
    Vedula, S., Baker, S., Rander, P., Collins, R.T., Kanade, T.: Three-dimensional scene flow. In: ICCV, pp. 722–729 (1999)Google Scholar
  2. 2.
    Vedula, S., Baker, S., Rander, P., Collins, R.T., Kanade, T.: Three-dimensional scene flow. PAMI 27(3), 475–480 (2005)CrossRefGoogle Scholar
  3. 3.
    Carceroni, R.L., Kutulakos, K.N.: Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance. IJCV 49(2–3), 175–214 (2002)CrossRefGoogle Scholar
  4. 4.
    Neumann, J., Aloimonos, Y.: Spatio-temporal stereo using multi-resolution subdivision surfaces. IJCV 47(1–3), 181–193 (2002)CrossRefGoogle Scholar
  5. 5.
    Huguet, F., Devernay, F.: A variational method for scene flow estimation from stereo sequences. In: ICCV (2007)Google Scholar
  6. 6.
    Gong, M.: Real-time joint disparity and disparity flow estimation on programmable graphics hardware. CVIU 113(1), 90–100 (2009)Google Scholar
  7. 7.
    Basha, T., Moses, Y., Kiryati, N.: Multi-view scene flow estimation: a view centered variational approach. In: CVPR (2010)Google Scholar
  8. 8.
    Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3D motion understanding. IJCV 95, 29–51 (2011)CrossRefGoogle Scholar
  9. 9.
    Vogel, C., Schindler, K., Roth, S.: 3D scene flow estimation with a piecewise rigid scene model. IJCV 115(1), 1–28 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Herbst, E., Ren, X., Fox, D.: RGB-D flow: dense 3-D motion estimation using color and depth. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2276–2282 (2013)Google Scholar
  11. 11.
    Jaimez, M., Souiai, M., Stuckler, J., Gonzalez-Jimenez, J., Cremers, D.: Motion cooperation: smooth piece-wise rigid scene flow from RGB-D images. In: 2015 International Conference on 3D Vision (3DV), pp. 64–72 (2015)Google Scholar
  12. 12.
    Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3D scanning deformable objects with a single RGBD sensor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 493–501 (2015)Google Scholar
  13. 13.
    Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352 (2015)Google Scholar
  14. 14.
    Sun, D., Sudderth, E.B., Pfister, H.: Layered RGBD scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 548–556 (2015)Google Scholar
  15. 15.
    Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. (TOG) 34(4), 69 (2015)CrossRefGoogle Scholar
  16. 16.
    Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.S.: High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23(3), 600–608 (2004)CrossRefGoogle Scholar
  17. 17.
    Liu, Y., Dai, Q., Xu, W.: A point cloud based multi-view stereo algorithm for free-viewpoint video. IEEE Trans. Visual. Comput. Graph. 16(3), 407–441 (2010)CrossRefGoogle Scholar
  18. 18.
    Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)CrossRefGoogle Scholar
  19. 19.
    Mordohai, P.: On the evaluation of scene flow estimation. In: Unsolved Problems in Optical Flow and Stereo Estimation Workshop (2012)Google Scholar
  20. 20.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR, pp. 3061–3070 (2015)Google Scholar
  21. 21.
    Vedula, S., Baker, S., Seitz, S.M., Kanade, T.: Shape and motion carving in 6D. In: CVPR, pp. 592–598 (2000)Google Scholar
  22. 22.
    Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. IJCV 38(3), 199–218 (2000)CrossRefGoogle Scholar
  23. 23.
    Pons, J.P., Keriven, R., Faugeras, O.D.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV 72(2), 179–193 (2007)CrossRefGoogle Scholar
  24. 24.
    Kwatra, V., et al.: Fluid in video: augmenting real video with simulated fluids. Comput. Graph. Forum 27(2), 487–496 (2008)CrossRefGoogle Scholar
  25. 25.
    Guan, L., Franco, J.S., Boyer, E., Pollefeys, M.: Probabilistic 3D occupancy flow with latent silhouette cues. In: CVPR (2010)Google Scholar
  26. 26.
    Sizintsev, M., Wildes, R.: Spacetime stereo and 3D flow via binocular spatiotemporal orientation analysis. PAMI 36(11), 2241–2254 (2014)CrossRefGoogle Scholar
  27. 27.
    Liu, F., Philomin, V.: Disparity estimation in stereo sequences using scene flow. In: British Machine Vision Conference (2009)Google Scholar
  28. 28.
    Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., Theobalt, C.: Joint estimation of motion, structure and geometry from stereo sequences. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 568–581. Springer, Heidelberg (2010). Scholar
  29. 29.
    Vogel, C., Schindler, K., Roth, S.: 3D scene flow estimation with a rigid motion prior. In: ICCV (2011)Google Scholar
  30. 30.
    Isard, M., MacCormick, J.P.: Dense motion and disparity estimation via loopy belief propagation. In: Asian Conference on Computer Vision, vol. II, pp. 32–41 (2006)CrossRefGoogle Scholar
  31. 31.
    Cech, J., Sanchez-Riera, J., Horaud, R.: Scene flow estimation by growing correspondence seeds. In: CVPR (2011)Google Scholar
  32. 32.
    Li, R., Sclaroff, S.: Multi-scale 3D scene flow from binocular stereo sequences. CVIU 110(1), 75–90 (2008)Google Scholar
  33. 33.
    Tao, H., Sawhney, H.S., Kumar, R.: Dynamic depth recovery from multiple synchronized video streams. In: CVPR, pp. 118–124 (2001)Google Scholar
  34. 34.
    Zhang, Y., Kambhamettu, C.: On 3-D scene flow and structure recovery from multiview image sequences. PAMI 33(4), 592–606 (2003)Google Scholar
  35. 35.
    Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., Cremers, D.: Efficient dense scene flow from sparse or dense stereo data. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 739–751. Springer, Heidelberg (2008). Scholar
  36. 36.
    Rabe, C., Müller, T., Wedel, A., Franke, U.: Dense, robust, and accurate motion field estimation from stereo image sequences in real-time. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 582–595. Springer, Heidelberg (2010). Scholar
  37. 37.
    Müller, T., Rannacher, J., Rabe, C., Franke, U.: Feature- and depth-supported modified total variation optical flow for 3D motion field estimation in real scenes. In: CVPR (2011)Google Scholar
  38. 38.
    Li, K., Dai, Q., Xu, W.: Markerless shape and motion capture from multiview video sequences. IEEE Trans. Circuits Syst. Video Technol. 21(3), 320–334 (2011)CrossRefGoogle Scholar
  39. 39.
    Furukawa, Y., Ponce, J.: Dense 3D motion capture from synchronized video streams. In: CVPR (2008)Google Scholar
  40. 40.
    Furukawa, Y., Ponce, J.: Dense 3D motion capture for human faces. In: CVPR (2009)Google Scholar
  41. 41.
    Courchay, J., Pons, J.P., Monasse, P., Keriven, R.: Dense and accurate spatio-temporal multi-view stereovision. In: Asian Conference on Computer Vision, vol. II, pp. 11–22 (2009)CrossRefGoogle Scholar
  42. 42.
    Cagniart, C., Boyer, E., Ilic, S.: Free-form mesh tracking: a patch-based approach. In: CVPR (2010)Google Scholar
  43. 43.
    Popham, T., Bhalerao, A., Wilson, R.: Multi-frame scene-flow estimation using a patch model and smooth motion prior. In: BMVC Workshop (2010)Google Scholar
  44. 44.
    Allain, B., Franco, J.S., Boyer, E.: An efficient volumetric framework for shape tracking. In: CVPR (2015)Google Scholar
  45. 45.
    Huang, C.H., Allain, B., Franco, J.S., Navab, N., Ilic, S., Boyer, E.: Volumetric 3D tracking by detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  46. 46.
    Starck, J., Hilton, A.: Correspondence labelling for wide-timeframe free-form surface matching. In: ICCV (2007)Google Scholar
  47. 47.
    Ahmed, N., Theobalt, C., Rossl, C., Thrun, S., Seidel, H.P.: Dense correspondence finding for parametrization-free animation reconstruction from video. In: CVPR (2008)Google Scholar
  48. 48.
    Varanasi, K., Zaharescu, A., Boyer, E., Horaud, R.: Temporal surface tracking using mesh evolution. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 30–43. Springer, Heidelberg (2008). Scholar
  49. 49.
    Zeng, Y., Wang, C., Wang, Y., Gu, X., Samaras, D., Paragios, N.: Dense non-rigid surface registration using high-order graph matching. In: CVPR, pp. 382–389 (2010)Google Scholar
  50. 50.
    Budd, C., Huang, P., Hilton, A.: Hierarchical shape matching for temporally consistent 3D video. In: 3DIMPVT, pp. 172–179 (2011)Google Scholar
  51. 51.
    Huang, P., Hilton, A., Budd, C.: Global temporal registration of multiple non-rigid surface sequences. In: CVPR (2011)Google Scholar
  52. 52.
    Letouzey, A., Petit, B., Boyer, E., Team, M.: Scene flow from depth and color images. In: BMVC (2011)Google Scholar
  53. 53.
    Ferstl, D., Reinbacher, C., Riegler, G., Rüther, M., Bischof, H.: aTGV-SF: dense variational scene flow through projective warping and higher order regularization. In: 3DV (2014)Google Scholar
  54. 54.
    Hadfield, S., Bowden, R.: Scene particles: unregularized particle-based scene flow estimation. PAMI 36(3), 564–576 (2014)CrossRefGoogle Scholar
  55. 55.
    Hornacek, M., Fitzgibbon, A., Rother, C.: SphereFlow: 6 DoF scene flow from RGB-D pairs. In: CVPR (2014)Google Scholar
  56. 56.
    Quiroga, J., Brox, T., Devernay, F., Crowley, J.: Dense semi-rigid scene flow estimation from RGBD images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 567–582. Springer, Cham (2014). Scholar
  57. 57.
    Scharstein, D., Szeliski, R.S.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1–3), 7–42 (2002)CrossRefGoogle Scholar
  58. 58.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  59. 59.
    Sizintsev, M., Wildes, R.: Spatiotemporal stereo and scene flow via stequel matching. PAMI 34(6), 1206–1219 (2012)CrossRefGoogle Scholar
  60. 60.
    Gallup, D., Frahm, J.M., Mordohai, P., Yang, Q., Pollefeys, M.: Real-time plane-sweeping stereo with multiple sweeping directions. In: CVPR (2007)Google Scholar
  61. 61.
    Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. PAMI 30(2), 328–341 (2008)CrossRefGoogle Scholar
  62. 62.
    Sun, D., Roth, S., Black., M.: Secrets of optical flow estimation and their principles. In: CVPR (2010)Google Scholar
  63. 63.
    Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. CVIU 78(1), 138–156 (2000)Google Scholar
  64. 64.
    Szeliski, R.: Prediction error as a quality metric for motion and stereo. In: ICCV, pp. 781–788 (1999)Google Scholar
  65. 65.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR (2016)Google Scholar
  66. 66.
    Waechter, M., Beljan, M., Fuhrmann, S., Moehrle, N., Kopf, J., Goesele, M.: Virtual rephotography: novel view prediction error for 3D reconstruction. arXiv preprint arXiv:1601.06950 (2016)
  67. 67.
    Kilner, J., Starck, J., Guillemaut, J.Y., Hilton, A.: Objective quality assessment in free-viewpoint video production. Sig. Process. Image Commun. 24(1–2), 3–16 (2009)CrossRefGoogle Scholar
  68. 68.
    Spangenberg, R., Langner, T., Adfeldt, S., Rojas, R.: Large scale semi-global matching on the CPU. In: IEEE Intelligent Vehicles Symposium, pp. 195–201 (2014)Google Scholar
  69. 69.
    Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRefGoogle Scholar
  70. 70.
    Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. PAMI 28(4), 650–656 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Stevens Institute of TechnologyHobokenUSA

Personalised recommendations