International Journal of Computer Vision

, Volume 115, Issue 1, pp 1–28 | Cite as

3D Scene Flow Estimation with a Piecewise Rigid Scene Model

Article

Abstract

3D scene flow estimation aims to jointly recover dense geometry and 3D motion from stereoscopic image sequences, thus generalizes classical disparity and 2D optical flow estimation. To realize its conceptual benefits and overcome limitations of many existing methods, we propose to represent the dynamic scene as a collection of rigidly moving planes, into which the input images are segmented. Geometry and 3D motion are then jointly recovered alongside an over-segmentation of the scene. This piecewise rigid scene model is significantly more parsimonious than conventional pixel-based representations, yet retains the ability to represent real-world scenes with independent object motion. It, furthermore, enables us to define suitable scene priors, perform occlusion reasoning, and leverage discrete optimization schemes toward stable and accurate results. Assuming the rigid motion to persist approximately over time additionally enables us to incorporate multiple frames into the inference. To that end, each view holds its own representation, which is encouraged to be consistent across all other viewpoints and frames in a temporal window. We show that such a view-consistent multi-frame scheme significantly improves accuracy, especially in the presence of occlusions, and increases robustness against adverse imaging conditions. Our method currently achieves leading performance on the KITTI benchmark, for both flow and stereo.

Keywords

3D scene flow Stereo Motion estimation  Piecewise planarity Piecewise rigidity Segmentation 

References

  1. Adiv, G. (1985). Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7(4), 384–401.CrossRefGoogle Scholar
  2. Ali, A. M., Farag, A. A., & Gimel’Farb, G. L. (2008). Optimizing binary MRFs with higher order cliques. In European Conference on Computer Vision.Google Scholar
  3. Badino, H., & Kanade, T. (2011). A head-wearable short-baseline stereo system for the simultaneous estimation of structure and motion. In IAPR Conference on Machine Vision Application (pp 185–189).Google Scholar
  4. Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31. vision.middlebury.edu/flow
  5. Barnes, C., & Shechtman, E. (2009). PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28(3), 24:1–24:11.Google Scholar
  6. Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: A view centered variational approach. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  7. Black, M. J., & Anandan, P. (1991). Robust dynamic motion estimation over time. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  8. Bleyer, M., Rother, C., & Kohli, P. (2010). Surface stereo with soft segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  9. Bleyer, M., Rhemann, C., & Rother, C. (2011a). PatchMatch stereo: Stereo matching with slanted support windows. In British Machine Vision Conference.Google Scholar
  10. Bleyer, M., Rother, C., Kohli, P., Scharstein, D., & Sinha, S. N. (2011b). Object stereo: Joint stereo matching and object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  11. Braux-Zin, J., Dupont, R., & Bartoli, A. (2013). A general dense image matching framework combining direct and feature-based costs. In IEEE International Conference on Computer Vision.Google Scholar
  12. Brox, T., & Malik, J. (2011). Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3), 500–513.CrossRefGoogle Scholar
  13. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision.Google Scholar
  14. Carceroni, R. L., & Kutulakos, K. N. (2002). Multi-view scene capture by surfel sampling: From video streams to non-rigid 3D motion, shape and reflectance. International Journal of Computer Vision, 49, 175–214.MATHCrossRefGoogle Scholar
  15. Courchay, J., Pons, J. P., Monasse, P., & Keriven, R. (2009). Dense and accurate spatio-temporal multi-view stereovision. In Asian Conference on Computer Vision.Google Scholar
  16. Demetz, O., Stoll, M., Volz, S., Weickert, J., & Bruhn, A. (2014). Learning brightness transfer functions for the joint recovery of illumination changes and optical flow. In European Conference on Computer Vision.Google Scholar
  17. Devernay, F., Mateus, D., & Guilbert, M. (2006). Multi-camera scene flow by tracking 3-D points and surfels. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  18. Einecke, N., & Eggert, J. (2014). Block-matching stereo with relaxed fronto-parallel assumption. In IEEE Intelligent Vehicles Symposium Proceedings (pp 700–705).Google Scholar
  19. Furukawa, Y., & Ponce, J. (2008). Dense 3D motion capture from synchronized video streams. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  20. Garg, R., Roussos, A., & Agapito, L. (2013). A variational approach to video registration with subspace constraints. International Journal of Computer Vision, 104(3), 286–314.MATHMathSciNetCrossRefGoogle Scholar
  21. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? In IEEE Conference on Computer Vision and Pattern Recognition. www.cvlibs.net/datasets/kitti/.
  22. Gorelick, L., Veksler, O., Boykov, Y., Ben Ayed, I., & Delong, A. (2014). Local submodular approximations for binary pairwise energies. In Computer Vision and Pattern Recognition.Google Scholar
  23. Hirschmüller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.CrossRefGoogle Scholar
  24. Hornacek, M., Fitzgibbon, A., & Rother, C. (2014). SphereFlow: 6 DoF scene flow from RGB-D pairs. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  25. Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In IEEE International Conference on Computer Vision.Google Scholar
  26. Hung, C. H., Xu, L., & Jia, J. (2013). Consistent binocular depth and scene flow with chained temporal profiles. International Journal of Computer Vision, 102(1–3), 271–292.MATHMathSciNetCrossRefGoogle Scholar
  27. Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48(3), 173–194.MATHMathSciNetCrossRefGoogle Scholar
  28. Ishikawa, H. (2009). Higher-order clique reduction in binary graph cut. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  29. Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. In IEEE International Conference on Computer Vision (pp 508–515).Google Scholar
  30. Lempitsky, V., Roth, S., & Rother, C. (2008). FusionFlow: Discrete-continuous optimization for optical flow estimation. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  31. Lempitsky, V., Rother, C., Roth, S., & Blake, A. (2010). Fusion moves for Markov random field optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1392–1405.CrossRefGoogle Scholar
  32. Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. International Joint Conference on Artificial Intelligence, 81, 674–679.Google Scholar
  33. Meister, S., Jähne, B., & Kondermann, D. (2012). Outdoor stereo camera system for the generation of real-world benchmark data sets. Optical Engineering, 51(2), 021107-1.Google Scholar
  34. Müller, T., Rannacher, J., Rabe, C., & Franke, U. (2011). Feature- and depth-supported modified total variation optical flow for 3D motion field estimation in real scenes. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  35. Murray, D. W., & Buxton, B. F. (1987). Scene segmentation from visual motion using global optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(2), 220–228.CrossRefGoogle Scholar
  36. Nir, T., Bruckstein, A., & Kimmel, R. (2008). Over-parameterized variational optical flow. International Journal of Computer Vision, 76(2), 205–216.CrossRefGoogle Scholar
  37. Park, J., Oh, T. H., Jung, J., Tai, Y. W., & Kweon, I. S. (2012). A tensor voting approach for multi-view 3D scene flow estimation and refinement. In European Conference on Computer Vision.Google Scholar
  38. Rabe, C., Müller, T., Wedel, A., & Franke, U. (2010). Dense, robust, and accurate motion field estimation from stereo image sequences in real-time. In European Conference on Computer Vision.Google Scholar
  39. Ranftl, R., Pock, T., & Bischof, H. (2013). Minimizing TGV-based variational models with non-convex data terms. In International Conference on Scale Space and Variational Methods in Computer Vision.Google Scholar
  40. Ranftl, R., Bredies, K., & Pock, T. (2014). Non-local total generalized variation for optical flow estimation. In European Conference on Computer Vision.Google Scholar
  41. Rother, C., Kolmogorov, V., Lempitsky, V., & Szummer, M. (2007). Optimizing binary MRFs via extended roof duality. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  42. Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  43. Schoenemann, T., & Cremers, D. (2008). High resolution motion layer decomposition using dual-space graph cuts. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  44. Spangenberg, R., Langner, T., & Rojas, R. (2013). Weighted semi-global matching and center-symmetric census transform for robust driver assistance. In International Conference on Computer Analysis of Images and Patterns.Google Scholar
  45. Sun, D., Sudderth, E. B., & Black, M. J. (2010). Layered image motion with explicit occlusions, temporal consistency, and depth ordering. In: Conference on Neural Information Processing Systems.Google Scholar
  46. Sun, D., Wulff, J., Sudderth, E., Pfister, H., & Black, M. (2013). A fully-connected layered model of foreground and background flow. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  47. Tao, H., & Sawhney, H. S. (2000). Global matching criterion and color segmentation based stereo. In: IEEE Workshop on Applications in Computer Vision.Google Scholar
  48. Unger, M., Werlberger, M., Pock, T., & Bischof, H. (2012). Joint motion estimation and segmentation of complex scenes with label costs and occlusion modeling. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  49. Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., & Theobalt, C. (2010). Joint estimation of motion, structure and geometry from stereo sequences. In European Conference on Computer Vision.Google Scholar
  50. Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behaviour on synthetic and real-world stereo sequences. In International Conference on Image and Vision Computing New Zealand.Google Scholar
  51. Vedula, S., Baker, S., Collins, R., Kanade, T., & Rander, P. (1999). Three-dimensional scene flow. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  52. Veksler, O., Boykov, Y., & Mehrani, P. (2010). Superpixels and supervoxels in an energy optimization framework. In European Conference on Computer Vision.Google Scholar
  53. Vogel, C., Schindler, K., & Roth, S. (2011). 3D scene flow estimation with a rigid motion prior. In IEEE International Conference on Computer Vision.Google Scholar
  54. Vogel, C., Roth, S., & Schindler, K. (2013a). An evaluation of data costs for optical flow. In Pattern Recognition (Proc. of GCPR) (pp 343–353).Google Scholar
  55. Vogel, C., Schindler, K., & Roth, S. (2013b). Piecewise rigid scene flow. In IEEE International Conference on Computer Vision.Google Scholar
  56. Vogel, C., Roth, S., & Schindler, K. (2014). View-consistent 3D scene flow estimation over multiple frames. In European Conference on Computer Vision.Google Scholar
  57. Volz, S., Bruhn, A., Valgaerts, L., & Zimmer, H. (2011). Modeling temporal coherence for optical flow. In IEEE International Conference on Computer Vision.Google Scholar
  58. Wang, J. Y. A., & Adelson, E. H. (1994). Representing moving images with layers. IEEE Transactions on Image Processing, 3, 625–638.CrossRefGoogle Scholar
  59. Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In European Conference on Computer Vision.Google Scholar
  60. Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., & Bischof, H. (2009). Anisotropic Huber-L1 optical flow. In British Machine Vision Conference.Google Scholar
  61. Yamaguchi, K., Hazan, T., McAllester, D., & Urtasun, R. (2012). Continuous Markov random fields for robust stereo estimation. In European Conference on Computer Vision.Google Scholar
  62. Yamaguchi, K., McAllester, D., & Urtasun, R. (2013). Robust monocular epipolar flow estimation. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  63. Yamaguchi, K., McAllester, D., & Urtasun, R. (2014). Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In European Conference on Computer Vision.Google Scholar
  64. Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In European Conference on Computer Vision.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Christoph Vogel
    • 1
  • Konrad Schindler
    • 1
  • Stefan Roth
    • 2
  1. 1.Photogrammetry & Remote SensingETH ZurichZurichSwitzerland
  2. 2.Department of Computer ScienceTU DarmstadtDarmstadtGermany

Personalised recommendations