Spatio-Temporally Consistent Correspondence for Dense Dynamic Scene Modeling

  • Dinghuang JiEmail author
  • Enrique Dunn
  • Jan-Michael Frahm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)


We address the problem of robust two-view correspondence estimation within the context of dynamic scene modeling. To this end, we investigate the use of local spatio-temporal assumptions to both identify and refine dense low-level data associations in the absence of prior dynamic content models. By developing a strictly data-driven approach to correspondence search, based on bottom-up local 3D motion cues of local rigidity and non-local coherence, we are able to robustly address the higher-order problems of video synchronization and dynamic surface modeling. Our findings suggest an important relationship between these two tasks, in that maximizing spatial coherence of surface points serves as a direct metric for the temporal alignment of local image sequences. The obtained results for these two problems on multiple publicly available dynamic reconstruction datasets illustrate both the effectiveness and generality of our proposed approach.


Two-View correspondences Motion consistency 

Supplementary material

Supplementary material 1 (mp4 43172 KB)


  1. 1.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building rome in a day. In: Proceedings of ICCV (2012)Google Scholar
  2. 2.
    Heinly, J., Schonberger, J., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the yahoo 100 million image dataset). In: Proceedings of CVPR (2015)Google Scholar
  3. 3.
    Wu, C.: Towards linear-time incremental structure from motion. In: 3DV, pp. 127–134 (2013)Google Scholar
  4. 4.
    Furukawa, Y., Ponce, J.: Towards internet-scale multi-view stereo. In: Proceedings of CVPR 1434 (2010)Google Scholar
  5. 5.
    Joo, H., Park, H., Sheikh, Y.: Map visibility estimation for large scale dynamic 3d reconstruction. In: Proceedings of CVPR (2014)Google Scholar
  6. 6.
    Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motion capture. In: Proceedings of ICCV (2015)Google Scholar
  7. 7.
    Kim, H., Sarim, M., Takai, T., Guillemaut, J., Hilton, A.: Dynamic 3d scene reconstruction in outdoor environments. In: Proceedings of 3DPVT (2010)Google Scholar
  8. 8.
    Jiang, H., Liu, H., Tan, P., Zhang, G., Bao, H.: 3d reconstruction of dynamic scenes with multiple handheld cameras. In: Proceedings of ECCV (2012)Google Scholar
  9. 9.
    Taneja, A., Ballan, L., Pollefeys, M.: Modeling dynamic scenes recorded with freely moving cameras. In: Proceedings of ECCV (2010)Google Scholar
  10. 10.
    Mustafa, A., Kim, H., Guillemaut, J., Hilton, A.: General dynamic scene reconstruction from multiple view video. In: Proceedings of ICCV (2015)Google Scholar
  11. 11.
    Oswald, M., Cremers, D.: A convex relaxation approach to space time multi-view 3d reconstruction. In: International Conference on Computer Vision (ICCV) Workshops, pp. 291–298 (2013)Google Scholar
  12. 12.
    Oswald, M., Stühmer, J., Cremers, D.: Generalized connectivity constraints for spatio-temporal 3d reconstruction. In: European Conference on Computer Vision (ECCV) pp. 32–46. IEEE (2014)Google Scholar
  13. 13.
    Djelouah, A., Franco, J.S., Boyer, E., Le Clerc, F., Pérez, P.: Sparse multi-view consistency for object segmentation. Pattern Anal. Mach. Intell. (PAMI) 37(9), 1890–1903 (2015)CrossRefGoogle Scholar
  14. 14.
    Letouzey, A., Boyer, E.: Progressive shape models. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 190–197. IEEE (2012)Google Scholar
  15. 15.
    Wu, C., Varanasi, K., Liu, Y., Seidel, H.P., Theobalt, C.: Shading-based dynamic shape refinement from multi-view video under general illumination. In: International Conference on Computer Vision (ICCV), pp. 1108–1115. IEEE (2011)Google Scholar
  16. 16.
    Guan, L., Franco, J.S., Pollefeys, M.: Multi-view occlusion reasoning for probabilistic silhouette-based dynamic scene reconstruction. Int. J. Comput. Vis. (IJCV) 90(3), 283–303 (2010)CrossRefGoogle Scholar
  17. 17.
    Cagniart, C., Boyer, E., Ilic, S.: Probabilistic deformable surface tracking from multiple videos. In: European Conference on Computer Vision (ECCV), pp. 326–339. Springer, Heidelberg (2010)Google Scholar
  18. 18.
    Russell, C., Yu, R., Agapito, L.: Video pop-up: Monocular 3d reconstruction of dynamic scenes. In: Proceedings of ECCV (2014)Google Scholar
  19. 19.
    Zheng, E., Ji, D., Dunn, E., Frahm, J.: Sparse dynamic 3d reconstruction from unsynchronized videos. In: Proceedings of ICCV, pp. 4435–4443 (2015)Google Scholar
  20. 20.
    Zheng, E., Ji, D., Dunn, E., Frahm, J.: Self-expressive dictionary learning for dynamic 3d reconstruction. arXiv preprint (2014). arXiv:1605.06863
  21. 21.
    Basha, T., Moses, Y., Avidan, S.: Photo sequencing. In: Proceedings of ECCV (2012)Google Scholar
  22. 22.
    Basha, T., Moses, Y., Avidan, S.: Space-time tradeoffs in photo sequencing. In: Proceedings of ICCV (2013)Google Scholar
  23. 23.
    Tuytelaars, T., Gool, L.: Synchronizing video sequences. In: Proceedings of CVPR, pp. 762–768 (2004)Google Scholar
  24. 24.
    Wolf, L., Zomet, A.: Wide baseline matching between unsynchronized video sequences. Int. J. Comput. Vis. 68, 43–52 (2006)CrossRefGoogle Scholar
  25. 25.
    Pundik, D., Moses, Y.: Video synchronization using temporal signals from epipolar lines. In: Proceedings of ECCV, pp. 15–28 (2010)Google Scholar
  26. 26.
    Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of ECCV (2004)Google Scholar
  27. 27.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. Trans. PAMI 34, 2274 (2012)CrossRefGoogle Scholar
  28. 28.
    Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Trans. Graph. 29, 87 (2010). ACMCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.The University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations