4D Match Trees for Non-rigid Surface Alignment

  • Armin MustafaEmail author
  • Hansung Kim
  • Adrian Hilton
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)


This paper presents a method for dense 4D temporal alignment of partial reconstructions of non-rigid surfaces observed from single or multiple moving cameras of complex scenes. 4D Match Trees are introduced for robust global alignment of non-rigid shape based on the similarity between images across sequences and views. Wide-timeframe sparse correspondence between arbitrary pairs of images is established using a segmentation-based feature detector (SFD) which is demonstrated to give improved matching of non-rigid shape. Sparse SFD correspondence allows the similarity between any pair of image frames to be estimated for moving cameras and multiple views. This enables the 4D Match Tree to be constructed which minimises the observed change in non-rigid shape for global alignment across all images. Dense 4D temporal correspondence across all frames is then estimated by traversing the 4D Match tree using optical flow initialised from the sparse feature matches. The approach is evaluated on single and multiple view images sequences for alignment of partial surface reconstructions of dynamic objects in complex indoor and outdoor scenes to obtain a temporally consistent 4D representation. Comparison to previous 2D and 3D scene flow demonstrates that 4D Match Trees achieve reduced errors due to drift and improved robustness to large non-rigid deformations.


Non-sequential tracking Surface alignment Temporal coherence Dynamic scene reconstruction 4D modeling 

Supplementary material

419956_1_En_13_MOESM1_ESM.wmv (27.1 mb)
Supplementary material 1 (wmv 27721 KB)


  1. 1.
    Zhang, G., Jia, J., Hua, W., Bao, H.: Robust bilayer segmentation and motion/depth estimation with a handheld camera. PAMI 33, 603–617 (2011)CrossRefGoogle Scholar
  2. 2.
    Jiang, H., Liu, H., Tan, P., Zhang, G., Bao, H.: 3D reconstruction of dynamic scenes with multiple handheld cameras. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 601–615. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33709-3_43 Google Scholar
  3. 3.
    Taneja, A., Ballan, L., Pollefeys, M.: Modeling dynamic scenes recorded with freely moving cameras. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 613–626. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19318-7_48 CrossRefGoogle Scholar
  4. 4.
    Mustafa, A., Kim, H., Guillemaut, J., Hilton, A.: General dynamic scene reconstruction from wide-baseline views. In: ICCV (2015)Google Scholar
  5. 5.
    Kanade, T., Rander, P., Narayanan, P.J.: Virtualized reality: constructing virtual worlds from real scenes. IEEE MultiMedia 4, 34–47 (1997)CrossRefGoogle Scholar
  6. 6.
    Franco, J.S., Boyer, E.: Exact polyhedral visual hulls. In: Proceedings of BMVC, pp. 32:1–32:10 (2003)Google Scholar
  7. 7.
    Starck, J., Hilton, A.: Model-based multiple view reconstruction of people. In: ICCV, pp. 915–922 (2003)Google Scholar
  8. 8.
    Newcombe, R., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR (2015)Google Scholar
  9. 9.
    Tevs, A., Berner, A., Wand, M., Ihrke, I., Bokeloh, M., Kerber, J., Seidel, H.P.: Animation cartography: intrinsic reconstruction of shape and motion. ACM Trans. Graph. 31, 12:1–12:15 (2012)CrossRefGoogle Scholar
  10. 10.
    Wei, L., Huang, Q., Ceylan, D., Vouga, E., Li, H.: Dense human body correspondences using convolutional networks (2015). CoRR abs/1511.05904Google Scholar
  11. 11.
    Malleson, C., Klaudiny, M., Guillemaut, J.Y., Hilton, A.: Structured representation of non-rigid surfaces from single view 3D point tracks. In: 3DV (2014)Google Scholar
  12. 12.
    Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3d motion understanding. IJCV 95, 29–51 (2011)CrossRefzbMATHGoogle Scholar
  13. 13.
    Basha, T., Moses, Y., Kiryati, N.: Multi-view scene flow estimation: a view centered variational approach. In: CVPR, pp. 1506–1513 (2010)Google Scholar
  14. 14.
    Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 438–451. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15549-9_32 CrossRefGoogle Scholar
  15. 15.
    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)Google Scholar
  16. 16.
    Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)Google Scholar
  17. 17.
    Zheng, E., Ji, D., Dunn, E., Frahm, J.M.: Sparse dynamic 3D reconstruction from unsynchronized videos. In: ICCV (2015)Google Scholar
  18. 18.
    Zanfir, A., Sminchisescu, C.: Large displacement 3D scene flow with occlusion reasoning. In: ICCV (2015)Google Scholar
  19. 19.
    Lei, C., Chen, X.D., Yang, Y.H.: A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering. In: ICCV, pp. 1570–1577 (2009)Google Scholar
  20. 20.
    Mustafa, A., Kim, H., Guillemaut, J.Y., Hilton, A.: Temporally coherent 4D reconstruction of complex dynamic scenes. In: CVPR (2016)Google Scholar
  21. 21.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 97:1–97:9 (2008)CrossRefGoogle Scholar
  22. 22.
    Tung, T., Nobuhara, S., Matsuyama, T.: Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In: ICCV, pp. 1709–1716 (2009)Google Scholar
  23. 23.
    Cagniart, C., Boyer, E., Ilic, S.: Probabilistic deformable surface tracking from multiple videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 326–339. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15561-1_24 CrossRefGoogle Scholar
  24. 24.
    Budd, C., Huang, P., Klaudiny, M., Hilton, A.: Global non-rigid alignment of surface sequences. Int. J. Comput. Vis. 102, 256–270 (2013)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Huang, C., Cagniart, C., Boyer, E., Ilic, S.: A Bayesian approach to multi-view 4D modeling. Int. J. Comput. Vis. 116, 115–135 (2016)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Russell, C., Yu, R., Agapito, L.: Video pop-up: monocular 3D reconstruction of dynamic scenes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 583–598. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_38 Google Scholar
  27. 27.
    Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In: ICCV (2015)Google Scholar
  28. 28.
    Bailer, C., Taetz, B., Stricker, D.: Flow fields: dense correspondence fields for highly accurate large displacement optical flow estimation. In: ICCV (2015)Google Scholar
  29. 29.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: CVPR (2012)Google Scholar
  30. 30.
    Collet, A., Chuang, M., Sweeney, P., Gillett, D., Evseev, D., Calabrese, D., Hoppe, H., Kirk, A., Sullivan, S.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69:1–69:13 (2015)CrossRefGoogle Scholar
  31. 31.
    Ji, D., Dunn, E., Frahm, J.-M.: 3D reconstruction of dynamic textures in crowd sourced data. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 143–158. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_10 Google Scholar
  32. 32.
    Oswald, M.R., Stühmer, J., Cremers, D.: Generalized connectivity constraints for spatio-temporal 3D reconstruction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 32–46. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_3 Google Scholar
  33. 33.
    Mustafa, A., Kim, H., Imre, E., Hilton, A.: Segmentation based features for wide-baseline multi-view reconstruction. In: 3DV (2015)Google Scholar
  34. 34.
    4D repository. In: Institut national de recherche en informatique et en automatique (INRIA) Rhone Alpes.
  35. 35.
    4D and multiview video repository. In: Centre for Vision Speech and Signal Processing, University of Surrey, UKGoogle Scholar
  36. 36.
    Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Trans. Graph. 29, 1–11 (2010)CrossRefGoogle Scholar
  37. 37.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  38. 38.
    Rosten, E., Porter, R., Drummond, T.: Faster and better: a machine learning approach to corner detection. PAMI 32, 105–119 (2010)CrossRefGoogle Scholar
  39. 39.
    Evangelidis, G.D., Psarakis, E.Z.: Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1858–1865 (2008)CrossRefGoogle Scholar
  40. 40.
    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)CrossRefGoogle Scholar
  42. 42.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi: 10.1007/3-540-45103-X_50 CrossRefGoogle Scholar
  43. 43.
    Nebehay, G., Pflugfelder, R.: Clustering of static-adaptive correspondences for deformable object tracking. In: CVPR (2015)Google Scholar
  44. 44.
    Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: ICCV, pp. 1385–1392(2013)Google Scholar
  45. 45.
    Joo, H., Soo Park, H., Sheikh, Y.: Map visibility estimation for large-scale dynamic 3D reconstruction. In: CVPR (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.CVSSPUniversity of SurreyGuildfordUK

Personalised recommendations