International Journal of Computer Vision

, Volume 102, Issue 1–3, pp 239–255 | Cite as

Direct Model-Based Tracking of 3D Object Deformations in Depth and Color Video

  • Andreas JordtEmail author
  • Reinhard Koch


The tracking of deformable objects using video data is a demanding research topic due to the inherent ambiguity problems, which can only be solved using additional assumptions about the deformation. Image feature points, commonly used to approach the deformation problem, only provide sparse information about the scene at hand. In this paper a tracking approach for deformable objects in color and depth video is introduced that does not rely on feature points or optical flow data but employs all the input image information available to find a suitable deformation for the data at hand. A versatile NURBS based deformation space is defined for arbitrary complex triangle meshes, decoupling the object surface complexity from the complexity of the deformation. An efficient optimization scheme is introduced that is able to calculate results in real-time (25 Hz). Extensive synthetic and real data tests of the algorithm and its features show the reliability of this approach.


Deformation Tracking Range video 


  1. Alizadeh, F., & Goldfarb, D. (2001). Second-order cone programming. Mathematical Programming, 95, 3–51. MathSciNetCrossRefGoogle Scholar
  2. Auger, A., Brockhoff, D., & Hansen, N. (2010). Benchmarking the (1,4)-CMA-ES with mirrored sampling and sequential selection on the noisy BBOB-2010 testbed. In GECCO workshop on Black-Box optimization benchmarking (BBOB’2010) (pp. 1625–1632). New York: ACM. Google Scholar
  3. Bardinet, E., Cohen, L. D., & Ayache, N. (1998). A parametric deformable model to fit unstructured 3d data. Computer Vision and Image Understanding, 71(1), 39–54. CrossRefGoogle Scholar
  4. Bartczak, B., & Koch, R. (2009). Dense depth maps from low resolution time-of-flight depth and high resolution color views. In Lecture notes in computer science: Vol. 5876. ISVC (2) (pp. 228–239). Berlin: Springer. Google Scholar
  5. Bartoli, A., & Zisserman, A. (2004). Direct estimation of non-rigid registration. In British machine vision conference. Google Scholar
  6. Bascle, B., & Blake, A. (1998). Separability of pose and expression in facial tracking and animation. In Proceedings of the sixth international conference on computer vision, ICCV ’98 (p. 323). Washington: IEEE Comput. Soc. Google Scholar
  7. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3d shape from image streams. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2690–2696). Washington: IEEE Comput. Soc. Google Scholar
  8. Cagniar, C., Boyer, E., & Ilic, S. (2009). Iterative mesh deformation for dense surface tracking. In 12th international conference on computer vision workshops. Google Scholar
  9. Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. Camera, 6313(2), 229–242. Google Scholar
  10. Chen, S. E., & Williams, L. (1993). View interpolation for image synthesis. In Proceedings of the 20th annual conference on computer graphics and interactive techniques, SIGGRAPH’93 (pp. 279–288). New York: ACM. CrossRefGoogle Scholar
  11. Cohen, L. D., & Cohen, I. (1991). Finite element methods for active contour models and balloons for 2d and 3d images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 1131–1147. CrossRefGoogle Scholar
  12. Costeira, J., & Kanade, T. (1994). A multi-body factorization method for motion analysis (Tech. Rep. CMU-CS-TR-94-220). Computer Science Department, Pittsburgh, PA. Google Scholar
  13. de Aguiar, E., Theobalt, C., Stoll, C., & Seidel, H. P. (2007). Marker-less deformable mesh tracking for human shape and motion capture. In IEEE international conference on computer vision and pattern recognition (CVPR), Minneapolis, USA (pp. 1–8). New York: IEEE Press. Google Scholar
  14. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. In ACM transactions on graphics, Proc. of ACM SIGGRAPH (Vol. 27). Google Scholar
  15. Del Bue, A., & Agapito, L. (2006). Non-rigid stereo factorization. International Journal of Computer Vision, 66, 193–207. CrossRefGoogle Scholar
  16. Del Bue, A., Smeraldi, F., & Agapito, L. (2007). Non-rigid structure from motion using ranklet-based tracking and non-linear optimization. Image and Vision Computing, 25(3), 297–310. CrossRefGoogle Scholar
  17. Delingette, H., Hebert, M., & Ikeuchi, K. (1991). Deformable surfaces: a free-form shape representation. In Geometric methods in computer vision: Vol. 1570. Proc. SPIE (pp. 21–30). Google Scholar
  18. Fayad, J., Del Bue, A., Agapito, L., & Aguiar, P. (2009). Non-rigid structure from motion using quadratic deformation models. In British machine vision conference (BMVC), London, UK. Google Scholar
  19. Fayad, J., Agapito, L., & Bue, A. D. (2010). Piecewise quadratic reconstruction of non-rigid surfaces from monocular sequences. In Proceedings of the 11th European conference on computer vision: Part IV, ECCV’10 (pp. 297–310). Berlin: Springer. Google Scholar
  20. Hansen, N. (2006). The CMA evolution strategy: a comparing review. In Towards a new evolutionary computation. Advances on estimation of distribution algorithms (pp. 75–102). Berlin: Springer. CrossRefGoogle Scholar
  21. Hartley, R. I., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. ISBN:0521623049. zbMATHGoogle Scholar
  22. Hilsmann, A., & Eisert, P. (2009). Realistic cloth augmentation in single view video. In Vision, modeling, and visualization workshop 2009, Braunschweig, Germany. Google Scholar
  23. Horn, B. K. P., & Harris, J. G. (1991). Rigid body motion from range image sequences. CVGIP. Image Understanding, 53, 1–13. zbMATHCrossRefGoogle Scholar
  24. Jaklič, A., Leonardis, A., & Solina, F. (2000). Computational imaging and vision: Vol. 20. Segmentation and recovery of superquadrics. Dordrecth: Kluwer. ISBN 0-7923-6601-8. zbMATHGoogle Scholar
  25. Jordt, A., & Koch, R. (2011). Fast tracking of deformable objects in depth and colour video. In Proceedings of the British machine vision conference, BMVC 2011. British Machine Vision Association. Google Scholar
  26. Kim, Y. M., Theobalt, C., Diebel, J., Kosecka, J., Micusik, B., & Thrun, S. (2009). Multi-view image and tof sensor fusion for dense 3d reconstruction. In IEEE workshop on 3-D digital imaging and modeling (3DIM), Kyoto, Japan (pp. 1542–1549). New York: IEEE Press. Google Scholar
  27. Koch, R. (1993). Dynamic 3-d scene analysis through synthesis feedback control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6), 556–568. CrossRefGoogle Scholar
  28. Mcinerney, T., & Terzopoulos, D. (1993). A finite element model for 3d shape reconstruction and nonrigid motion tracking. In 4th international conference on in computer vision, ICCV (pp. 518–523). Google Scholar
  29. Muñoz, E., Buenaposada, J. M., & Baumela, L. (2009). A direct approach for efficiently tracking with 3d morphable models. In ICCV (pp. 1615–1622). New York: IEEE Press. Google Scholar
  30. Netravali, A., & Salz, J. (1985). Algorithms for estimation of three-dimensional motion. AT & T Bell Laboratories Technical Journal, 64, 2. Google Scholar
  31. Osher, S., & Sethian, J. A. (1988). Fronts propagating with curvature dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics, 79(1), 12–49. MathSciNetzbMATHCrossRefGoogle Scholar
  32. Ostermeier, A., & Hansen, N. (1999). An evolution strategy with coordinate system invariant adaptation of arbitrary normal mutation distributions within the concept of mutative strategy parameter control. In Proceedings of the genetic and evolutionary computation conference (GECCO) (pp. 902–909). San Mateo: Morgan Kaufmann. Google Scholar
  33. Piegl, L., & Tiller, W. (1997). The NURBS book (2nd ed.). Berlin: Springer. CrossRefGoogle Scholar
  34. Pilet, J., Lepetit, V., & Fua, P. (2008). Fast non-rigid surface detection, registration and realistic augmentation. International Journal of Computer Vision, 76, 109–122. CrossRefGoogle Scholar
  35. Rosenhahn, B., Kersting, U., Powell, K., Klette, R., Klette, G., & Seidel, H. P. (2007). A system for articulated tracking incorporating a clothing model. Machine Vision and Applications, 18, 25–40. CrossRefGoogle Scholar
  36. Russell, C., Fayad, J., & Agapito, L. (2011). Energy based multiple model fitting for non-rigid structure from motion. In IEEE conference on computer vision and pattern recognition. Google Scholar
  37. Salzmann, M., Hartley, R., & Fua, P. (2007). Convex optimization for deformable surface 3-d tracking. In ICCV’07 (pp. 1–8). Google Scholar
  38. Salzmann, M., Lepetit, V., & Fua, P. (2007). Deformable surface tracking ambiguities. In IEEE international conference on computer vision and pattern recognition (CVPR). Google Scholar
  39. Schiller, I., Beder, C., & Koch, R. (2008). Calibration of a PMD camera using a planar calibration object together with a multi-camera setup. In The international archives of the photogrammetry, remote sensing and spatial information sciences, Beijing, China (Vol. XXXVII, pp. 297–302). XXI. Part B3a, ISPRS Congress. Google Scholar
  40. Shen, S., Zheng, Y., & Liu, Y. (2008). Deformable surface stereo tracking-by-detection using second order cone programming. In International conference on computer vision and pattern recognition (CVPR) (pp. 1–4). New York: IEEE Press. Google Scholar
  41. Shen, S., Ma, W., Shi, W., & Liu, Y. (2010). Convex optimization for nonrigid stereo reconstruction. IEEE Transactions on Image Processing, 19, 782–794. MathSciNetCrossRefGoogle Scholar
  42. Shotton, J., Fitzgibbon, A. W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In CVPR (pp. 1297–1304). New York: IEEE Press. CrossRefGoogle Scholar
  43. Stanford, L. T., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3d shape from 2d motion. In Proceedings of the 17th annual conference on neural information processing systems (NIPS) (pp. 1555–1562). Cambridge: MIT Press. Google Scholar
  44. Taylor, J., Jepson, A. D., & Kutulakos, K. N. (2010). In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2761–2768). Google Scholar
  45. Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9, 137–154. CrossRefGoogle Scholar
  46. Torresani, L., Yang, D. B., Alexander, E. J., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 493–500). Google Scholar
  47. Vedula, S., Baker, S., Collins, R., & Kanada, T. (1999). Three-dimensional scene flow. In Proceedings of the 7th international conference on computer vision, ICCV (pp. 722–726). New York: IEEE Press. CrossRefGoogle Scholar
  48. Yamamoto, M., Boulanger, P., Beraldin, J. A., & Rioux, M. (1993). Direct estimation of range flow on deformable shape from a video rate range camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(1), 82–89. doi: 10.1109/34.184776. CrossRefGoogle Scholar
  49. Zhang, Z. (1994). Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision, 13(2), 119–152. doi: 10.1007/BF01427149. CrossRefGoogle Scholar
  50. Zhu, J., Hoi, S. C., Xu, Z., & Lyu, M. R. (2008). An effective approach to 3d deformable surface tracking. In Proceedings of the 10th European conference on computer vision: Part III, ECCV ’08 (pp. 766–779). Berlin: Springer. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Multimedia Information Processing GroupUniversity of KielKielGermany

Personalised recommendations