Skip to main content
Log in

Monocular scene flow estimation via variational method

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scene flow provides the 3D motion field of point clouds, which correspond to image pixels. Current algorithms usually need complex stereo calibration before estimating flow, which has strong restrictions on the position of the camera. This paper proposes a monocular camera scene flow estimation algorithm. Firstly, an energy functional is constructed, where three important assumptions are turned into data terms derivation: a brightness constancy assumption, a gradient constancy assumption, and a short time object velocity constancy assumption. Two smooth operators are used as regularization terms. Then, an occluded map computation algorithm is used to ensure estimating scene flow only on un-occluded points. After that, the energy functional is solved with a coarse-to-fine variational equation on Gaussian pyramid, which can prevent the iteration from converging to a local minimum value. The experiment results show that the algorithm can use three sequential frames at least to get scene flow in world coordinate, without optical flow or disparity inputting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Adiv G (1985) Determining three-dimensional motion and structure from optical flow generated by several moving objects. IEEE Trans Pattern Anal Mach Intell 4:384–401

    Article  Google Scholar 

  2. Alcantarilla PF, Yebes JJ, Almazn J, Bergasa LM (2012) On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments. IEEE International Conference on Robotics and Automation (ICRA) 1290–1297

  3. Baker S, Scharstein D, Lewis J, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92(1):1–31

    Article  Google Scholar 

  4. Basha T, Moses Y, Kiryati N (2013) Multi-view scene flow estimation: a view centered variational approach. Int J Comput Vis 101(1):6–21

    Article  MathSciNet  MATH  Google Scholar 

  5. Birkbeck N, Cobzas D, Jagersand M (2011) Basis constrained 3d scene flow on a dynamic proxy. IEEE International Conference on Computer Vision (ICCV) 1967–1974

  6. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. Eur Conf Comput Vis 25–36. doi:10.1007/978-3-540-24673-2_3

  7. Civera J, Davison AJ, Montiel J (2008) Inverse depth parametrization for monocular slam. IEEE Trans Robot 24(5):932–945

    Article  Google Scholar 

  8. Cruz L, Lucio D, Velho L (2012) Kinect and rgbd images: challenges and applications. IEEE Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T) 36–49

  9. Dame A, Prisacariu V A, Ren C Y, Reid I (2013) Dense reconstruction using 3d object shape priors. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1288–1295

  10. Geiger A, Ziegler J, Stiller C (2011) Stereo scan: dense 3d reconstruction in real-time. IEEE Conference on Intelligent Vehicles Symposium 963–968

  11. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237

    Article  Google Scholar 

  12. Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) Rgb-d mapping: using kinect-style depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663

    Article  Google Scholar 

  13. Herbst E, Ren X, Fox D (2013) Rgb-d flow: dense 3-d motion estimation using color and depth. IEEE International Conference on Robotics and Automation (ICRA) 2276–2282

  14. Hornacek M, Rhemann C, Gelautz M, Rother C (2013) Depth super resolution by rigid body self-similarity in 3d. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1123–1130

  15. Huang S, Dissanayake G (2007) Convergence and consistency analysis for extended kalman filter based slam. IEEE Trans Robot 23(5):1036–1049

    Article  Google Scholar 

  16. Izadi S, Kim D, Hilliges O, Molyneaux D, Newcombe R, Kohli P, Shotton J, Hodges S, Freeman D, Davison A (2011) Kinect fusion: real time 3d reconstruction and interaction using a moving depth camera. ACM symposium on User interface software and technology 559–568

  17. Jan Č, Sanchez-Riera J, Horaud R (2011) Scene flow estimation by growing correspondence seeds. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3129–3136

  18. Khoshelham K, Elberink S O (2012) Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 1437–1454

  19. Klette R (2015) http://ccv.wordpress.fos.auckland.ac.nz/eisats/. Accessed 2 May 2014

  20. Letouzey A, Petit B, Boyer E (2011) Scene flow from depth and color images. Proc Br Mach Vis Conf 46:1–11. doi:10.5244/C.25.46

    Google Scholar 

  21. Newcombe RA, Lovegrove SJ, Davison AJ (2011) Dtam: dense tracking and mapping in real-time. IEEE International Conference on Computer Vision (ICCV) 2320–2327

  22. Nie L, Akbari M, Li T, Chua T (2014) A joint local–global approach for medical terminology assignment. MedIR@SIGIR 24–27

  23. Nie L, Li T, Akbari M, Shen J, Chua T (2014) WenZher: comprehensive vertical search for healthcare domain. ACM Conference on Research and Development in Information Retrieval (SIGIR) 1245–1246

  24. Nie L, Zhang L, Yang Y, Wang M, Hong R, Chua T (2015) Beyond doctors: future health prediction from multimedia and multimodal observations. ACM Multimedia 591–600

  25. Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409

    Article  Google Scholar 

  26. Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua T (2015) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119

    Article  Google Scholar 

  27. Stoyanov D (2012) Stereoscopic scene flow for robotic assisted minimally invasive surgery. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012:479–486

    Google Scholar 

  28. Vedula S, Baker S, Rander P, Collins R, Kanade T (1999) Three dimensional scene flow. IEEE Int Conf Comput Vis 2:722–729

    Google Scholar 

  29. Vogel C, Schindler K, Roth S (2011) 3d scene flow estimation with a rigid motion prior. IEEE International Conference on Computer Vision (ICCV) 1291–1298

  30. Wedel A, Brox T, Vaudrey T, Rabe C, Franke U, Cremers D (2011) Stereoscopic scene flow computation for 3d motion understanding. Int J Comput Vis 95(1):29–51

    Article  MATH  Google Scholar 

  31. Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. IEEE International Conference on Computer Vision (ICCV) 1177–1184

  32. Yan Y, Liu G, Ricci E, Sebe N (2014) Multi-task linear discriminant analysis for multi-view action recognition. IEEE Trans Image Process (TIP) 23(12):5599–5611

    Article  Google Scholar 

  33. Yan Y, Yang Y, Meng D, Liu G, Tong W (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process (TIP) 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  34. Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process (TIP) 24(10):2984–2995

    Article  MathSciNet  Google Scholar 

  35. Yang Z, Xiong Z, Zhang Y, Wang J, Wu F (2013) Depth acquisition from density modulated binary patterns. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 25–32

  36. Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMedia 19(2):4–10

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their insightful comments and suggestions. This work is supported in part by National Natural Science Foundation of China (Grant No.61272062, 61300036), the Projects in the National Science & Technology Pillar Program (Grant No.2013BAH38F01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Yang.

Appendix

Appendix

1.1 Iterative process

Our iterative process is divided into two layers. The outer layer is constructed by the Gaussian pyramid from coarse to fine, iterations at this level is for getting unknown quantity itself. At each layer of the pyramid, the inner iteration can get the unknown variables incremental from SOR iteration. As Fig. 12 shows, the Gaussian pyramid layers is built according to input outer layer iteration value, during the build process, we calculate the scaling factor for each layer. In order to ensure correct correspondence between the image space and the world space, the scaling factor should not only work for image resolution, but also for focal length of the camera and the optical center position. At the inner iteration process, the scene flow quantity initial values are set to zero. By setting the value of inner iteration times, starting from the lowest resolution level of Gaussian pyramid, SOR iteration obtains the unknown value increment before convergence or reaching the number of iterations. Every final value of inner iteration will be added to current outer layer, and set as initial value of next outer layer. Algorithm 2 shows the whole iteration process.

Fig. 12
figure 12

The two-level iteration, the outer iteration is processed on Gaussian pyramid layers, the inner iteration is processed on each outer layer by SOR method

It is necessary to determine the number of iterations of the inner and outer layers in the iterative process. The number of iterations of outer layers determines the pyramid layers. Our experiments set the outer iteration number as 10 due to memory limitations. Figure 13 shows the relationship between erroneous percentage and inner iteration numbers, the iterations are processed at the last outer layer. We set the number of inner iteration as 10, for the polyline shows: iterations more than 10 will cause over smooth.

Fig. 13
figure 13

Erroneous percentages of different number of inner iterations at the last layer of outer iteration

figure f

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, D., Yang, Q., Yang, B. et al. Monocular scene flow estimation via variational method. Multimed Tools Appl 76, 10575–10597 (2017). https://doi.org/10.1007/s11042-015-3091-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3091-6

Keywords

Navigation