Towards Optimal Non-rigid Surface Tracking

  • Martin Klaudiny
  • Chris Budd
  • Adrian Hilton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7575)


This paper addresses the problem of optimal alignment of non-rigid surfaces from multi-view video observations to obtain a temporally consistent representation. Conventional non-rigid surface tracking performs frame-to-frame alignment which is subject to the accumulation of errors resulting in drift over time. Recently, non-sequential tracking approaches have been introduced which re-order the input data based on a dissimilarity measure. One or more input sequences are represented in a tree with reducing alignment path length. This limits drift and increases robustness to large non-rigid deformations. However, jumps may occur in the aligned mesh sequence where tree branches meet due to independent error accumulation. Optimisation of the tree for non-sequential tracking is proposed to minimise the errors in temporal consistency due to both the drift and jumps. A novel cluster tree enforces sequential tracking in local segments of the sequence while allowing global non-sequential traversal among these segments. This provides a mechanism to create a tree structure which reduces the number of jumps between branches and limits the length of branches. Comprehensive evaluation is performed on a variety of challenging non-rigid surfaces including faces, cloth and people. This demonstrates that the proposed cluster tree achieves better temporal consistency than the previous sequential and non-sequential tracking approaches. Quantitative ground-truth comparison on a synthetic facial performance shows reduced error with the cluster tree.


dense motion capture non-rigid surface alignment non-sequential tracking minimum spanning tree cluster tree dissimilarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vedula, S., Baker, S., Rander, P., Collins, R., Kanade, T.: Three-dimensional scene flow. TPAMI 27, 475–480 (2005)CrossRefGoogle Scholar
  2. 2.
    Zhang, L., Snavely, N., Curless, B., Seitz, S.M.: Spacetime faces: high resolution capture for modeling and animation. ACM TOG 23, 548–558 (2004)Google Scholar
  3. 3.
    Pons, J.P., Keriven, R., Faugeras, O.: Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. IJCV 72, 179–193 (2007)CrossRefGoogle Scholar
  4. 4.
    Courchay, J., Pons, J.-P., Monasse, P., Keriven, R.: Dense and Accurate Spatio-temporal Multi-view Stereovision. In: Zha, H., Taniguchi, R.-I., Maybank, S. (eds.) ACCV 2009, Part II. LNCS, vol. 5995, pp. 11–22. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Carceroni, R., Kutulakos, K.: Multi-view scene capture by surfel sampling: From video streams to non-rigid 3d motion, shape and reflectance. IJCV 49, 175–214 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Neumann, J., Aloimonos, Y.: Spatio-temporal stereo using multi-resolution subdivision surfaces. IJCV 47, 181–193 (2002)zbMATHCrossRefGoogle Scholar
  7. 7.
    Furukawa, Y., Ponce, J.: Dense 3D motion capture from synchronized video streams. In: CVPR, pp. 1–8. IEEE (2008)Google Scholar
  8. 8.
    Klaudiny, M., Hilton, A.: Cooperative patch-based 3D surface tracking. In: Con- ference for Visual Media Production, pp. 67–76. IEEE Computer Society (2011)Google Scholar
  9. 9.
    Wand, M., Adams, B., Ovsjanikov, M., Berner, A., Bokeloh, M., Jenke, P., Guibas, L., Seidel, H., Schilling, A.: Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 15:1–15:15 (2009)Google Scholar
  10. 10.
    Cagniart, C., Boyer, E.: Free-form mesh tracking: A patch-based approach. In: CVPR, pp. 1339–1346. IEEE (2010)Google Scholar
  11. 11.
    Tevs, A.R.T., Berner, A., Wand, M., Ihrke, I.V.O., Bokeloh, M., Kerber, J., Seidel, H.P.: Animation Cartography - Intrinsic Reconstruction of Shape and Motion. ACM TOG 31, 12:1–12:15 (2011)Google Scholar
  12. 12.
    Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM TOG 27, 97:1–97:9 (2008)Google Scholar
  13. 13.
    Beeler, T., Hahn, F., Bradley, D., Bickel, B.: High-quality passive facial performance capture using anchor frames. ACM TOG 30, 75:1–75:10 (2011)Google Scholar
  14. 14.
    Budd, C., Huang, P., Klaudiny, M., Hilton, A.: Global Non-rigid Alignment of Surface Sequences. International Journal of Computer Vision, 1–15 (2012) ISSN 0920-5691, doi: 10.1007/s11263-012-0553-4,
  15. 15.
    Huang, P., Budd, C., Hilton, A.: Global temporal registration of multiple non-rigid surface sequences. In: CVPR, pp. 3473–3480. IEEE (2011)Google Scholar
  16. 16.
    Huang, P., Hilton, A., Starck, J.: Automatic 3d video summarization: Key frame extraction from self-similarity. In: 3DPVT. IEEE Computer Society (2008)Google Scholar
  17. 17.
    Starck, J., Hilton, A.: Surface capture for performance-based animation. Computer Graphics and Applications 27, 21–31 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Martin Klaudiny
    • 1
  • Chris Budd
    • 1
  • Adrian Hilton
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyUK

Personalised recommendations