# Direct Model-Based Tracking of 3D Object Deformations in Depth and Color Video

- 1.6k Downloads
- 8 Citations

## Abstract

The tracking of deformable objects using video data is a demanding research topic due to the inherent ambiguity problems, which can only be solved using additional assumptions about the deformation. Image feature points, commonly used to approach the deformation problem, only provide sparse information about the scene at hand. In this paper a tracking approach for deformable objects in color and depth video is introduced that does not rely on feature points or optical flow data but employs all the input image information available to find a suitable deformation for the data at hand. A versatile NURBS based deformation space is defined for arbitrary complex triangle meshes, decoupling the object surface complexity from the complexity of the deformation. An efficient optimization scheme is introduced that is able to calculate results in real-time (25 Hz). Extensive synthetic and real data tests of the algorithm and its features show the reliability of this approach.

## Keywords

Deformation Tracking Range video## References

- Alizadeh, F., & Goldfarb, D. (2001). Second-order cone programming.
*Mathematical Programming*,*95*, 3–51. MathSciNetCrossRefGoogle Scholar - Auger, A., Brockhoff, D., & Hansen, N. (2010). Benchmarking the (1,4)-CMA-ES with mirrored sampling and sequential selection on the noisy BBOB-2010 testbed. In
*GECCO workshop on Black-Box optimization benchmarking (BBOB’2010)*(pp. 1625–1632). New York: ACM. Google Scholar - Bardinet, E., Cohen, L. D., & Ayache, N. (1998). A parametric deformable model to fit unstructured 3d data.
*Computer Vision and Image Understanding*,*71*(1), 39–54. CrossRefGoogle Scholar - Bartczak, B., & Koch, R. (2009). Dense depth maps from low resolution time-of-flight depth and high resolution color views. In
*Lecture notes in computer science: Vol.**5876*.*ISVC (2)*(pp. 228–239). Berlin: Springer. Google Scholar - Bartoli, A., & Zisserman, A. (2004). Direct estimation of non-rigid registration. In
*British machine vision conference*. Google Scholar - Bascle, B., & Blake, A. (1998). Separability of pose and expression in facial tracking and animation. In
*Proceedings of the sixth international conference on computer vision, ICCV ’98*(p. 323). Washington: IEEE Comput. Soc. Google Scholar - Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3d shape from image streams. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 2690–2696). Washington: IEEE Comput. Soc. Google Scholar - Cagniar, C., Boyer, E., & Ilic, S. (2009). Iterative mesh deformation for dense surface tracking. In
*12th international conference on computer vision workshops*. Google Scholar - Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera.
*Camera*,*6313*(2), 229–242. Google Scholar - Chen, S. E., & Williams, L. (1993). View interpolation for image synthesis. In
*Proceedings of the 20th annual conference on computer graphics and interactive techniques, SIGGRAPH’93*(pp. 279–288). New York: ACM. CrossRefGoogle Scholar - Cohen, L. D., & Cohen, I. (1991). Finite element methods for active contour models and balloons for 2d and 3d images.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*15*, 1131–1147. CrossRefGoogle Scholar - Costeira, J., & Kanade, T. (1994).
*A multi-body factorization method for motion analysis*(Tech. Rep. CMU-CS-TR-94-220). Computer Science Department, Pittsburgh, PA. Google Scholar - de Aguiar, E., Theobalt, C., Stoll, C., & Seidel, H. P. (2007). Marker-less deformable mesh tracking for human shape and motion capture. In
*IEEE international conference on computer vision and pattern recognition (CVPR)*, Minneapolis, USA (pp. 1–8). New York: IEEE Press. Google Scholar - de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., & Thrun, S. (2008). Performance capture from sparse multi-view video. In
*ACM transactions on graphics, Proc. of ACM SIGGRAPH*(Vol. 27). Google Scholar - Del Bue, A., & Agapito, L. (2006). Non-rigid stereo factorization.
*International Journal of Computer Vision*,*66*, 193–207. CrossRefGoogle Scholar - Del Bue, A., Smeraldi, F., & Agapito, L. (2007). Non-rigid structure from motion using ranklet-based tracking and non-linear optimization.
*Image and Vision Computing*,*25*(3), 297–310. CrossRefGoogle Scholar - Delingette, H., Hebert, M., & Ikeuchi, K. (1991). Deformable surfaces: a free-form shape representation. In
*Geometric methods in computer vision: Vol.**1570*.*Proc. SPIE*(pp. 21–30). Google Scholar - Fayad, J., Del Bue, A., Agapito, L., & Aguiar, P. (2009). Non-rigid structure from motion using quadratic deformation models. In
*British machine vision conference (BMVC)*, London, UK. Google Scholar - Fayad, J., Agapito, L., & Bue, A. D. (2010). Piecewise quadratic reconstruction of non-rigid surfaces from monocular sequences. In
*Proceedings of the 11th European conference on computer vision: Part IV, ECCV’10*(pp. 297–310). Berlin: Springer. Google Scholar - Hansen, N. (2006). The CMA evolution strategy: a comparing review. In
*Towards a new evolutionary computation. Advances on estimation of distribution algorithms*(pp. 75–102). Berlin: Springer. CrossRefGoogle Scholar - Hartley, R. I., & Zisserman, A. (2000).
*Multiple view geometry in computer vision*. Cambridge: Cambridge University Press. ISBN:0521623049. MATHGoogle Scholar - Hilsmann, A., & Eisert, P. (2009). Realistic cloth augmentation in single view video. In
*Vision, modeling, and visualization workshop 2009*, Braunschweig, Germany. Google Scholar - Horn, B. K. P., & Harris, J. G. (1991). Rigid body motion from range image sequences.
*CVGIP. Image Understanding*,*53*, 1–13. MATHCrossRefGoogle Scholar - Jaklič, A., Leonardis, A., & Solina, F. (2000).
*Computational imaging and vision: Vol.**20*.*Segmentation and recovery of superquadrics*. Dordrecth: Kluwer. ISBN 0-7923-6601-8. MATHGoogle Scholar - Jordt, A., & Koch, R. (2011). Fast tracking of deformable objects in depth and colour video. In
*Proceedings of the British machine vision conference, BMVC 2011*. British Machine Vision Association. Google Scholar - Kim, Y. M., Theobalt, C., Diebel, J., Kosecka, J., Micusik, B., & Thrun, S. (2009). Multi-view image and tof sensor fusion for dense 3d reconstruction. In
*IEEE workshop on 3-D digital imaging and modeling (3DIM)*, Kyoto, Japan (pp. 1542–1549). New York: IEEE Press. Google Scholar - Koch, R. (1993). Dynamic 3-d scene analysis through synthesis feedback control.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*15*(6), 556–568. CrossRefGoogle Scholar - Mcinerney, T., & Terzopoulos, D. (1993). A finite element model for 3d shape reconstruction and nonrigid motion tracking. In
*4th international conference on in computer vision, ICCV*(pp. 518–523). Google Scholar - Muñoz, E., Buenaposada, J. M., & Baumela, L. (2009). A direct approach for efficiently tracking with 3d morphable models. In
*ICCV*(pp. 1615–1622). New York: IEEE Press. Google Scholar - Netravali, A., & Salz, J. (1985). Algorithms for estimation of three-dimensional motion.
*AT & T Bell Laboratories Technical Journal*,*64*, 2. Google Scholar - Osher, S., & Sethian, J. A. (1988). Fronts propagating with curvature dependent speed: algorithms based on Hamilton-Jacobi formulations.
*Journal of Computational Physics*,*79*(1), 12–49. MathSciNetMATHCrossRefGoogle Scholar - Ostermeier, A., & Hansen, N. (1999). An evolution strategy with coordinate system invariant adaptation of arbitrary normal mutation distributions within the concept of mutative strategy parameter control. In
*Proceedings of the genetic and evolutionary computation conference (GECCO)*(pp. 902–909). San Mateo: Morgan Kaufmann. Google Scholar - Piegl, L., & Tiller, W. (1997).
*The NURBS book*(2nd ed.). Berlin: Springer. CrossRefGoogle Scholar - Pilet, J., Lepetit, V., & Fua, P. (2008). Fast non-rigid surface detection, registration and realistic augmentation.
*International Journal of Computer Vision*,*76*, 109–122. CrossRefGoogle Scholar - Rosenhahn, B., Kersting, U., Powell, K., Klette, R., Klette, G., & Seidel, H. P. (2007). A system for articulated tracking incorporating a clothing model.
*Machine Vision and Applications*,*18*, 25–40. CrossRefGoogle Scholar - Russell, C., Fayad, J., & Agapito, L. (2011). Energy based multiple model fitting for non-rigid structure from motion. In
*IEEE conference on computer vision and pattern recognition*. Google Scholar - Salzmann, M., Hartley, R., & Fua, P. (2007). Convex optimization for deformable surface 3-d tracking. In
*ICCV’07*(pp. 1–8). Google Scholar - Salzmann, M., Lepetit, V., & Fua, P. (2007). Deformable surface tracking ambiguities. In
*IEEE international conference on computer vision and pattern recognition (CVPR)*. Google Scholar - Schiller, I., Beder, C., & Koch, R. (2008). Calibration of a PMD camera using a planar calibration object together with a multi-camera setup. In
*The international archives of the photogrammetry, remote sensing and spatial information sciences*, Beijing, China (Vol. XXXVII, pp. 297–302). XXI. Part B3a, ISPRS Congress. Google Scholar - Shen, S., Zheng, Y., & Liu, Y. (2008). Deformable surface stereo tracking-by-detection using second order cone programming. In
*International conference on computer vision and pattern recognition (CVPR)*(pp. 1–4). New York: IEEE Press. Google Scholar - Shen, S., Ma, W., Shi, W., & Liu, Y. (2010). Convex optimization for nonrigid stereo reconstruction.
*IEEE Transactions on Image Processing*,*19*, 782–794. MathSciNetCrossRefGoogle Scholar - Shotton, J., Fitzgibbon, A. W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In
*CVPR*(pp. 1297–1304). New York: IEEE Press. CrossRefGoogle Scholar - Stanford, L. T., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3d shape from 2d motion. In
*Proceedings of the 17th annual conference on neural information processing systems (NIPS)*(pp. 1555–1562). Cambridge: MIT Press. Google Scholar - Taylor, J., Jepson, A. D., & Kutulakos, K. N. (2010). In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 2761–2768). Google Scholar - Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method.
*International Journal of Computer Vision*,*9*, 137–154. CrossRefGoogle Scholar - Torresani, L., Yang, D. B., Alexander, E. J., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 493–500). Google Scholar - Vedula, S., Baker, S., Collins, R., & Kanada, T. (1999). Three-dimensional scene flow. In
*Proceedings of the 7th international conference on computer vision, ICCV*(pp. 722–726). New York: IEEE Press. CrossRefGoogle Scholar - Yamamoto, M., Boulanger, P., Beraldin, J. A., & Rioux, M. (1993). Direct estimation of range flow on deformable shape from a video rate range camera.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*15*(1), 82–89. doi: 10.1109/34.184776. CrossRefGoogle Scholar - Zhang, Z. (1994). Iterative point matching for registration of free-form curves and surfaces.
*International Journal of Computer Vision*,*13*(2), 119–152. doi: 10.1007/BF01427149. CrossRefGoogle Scholar - Zhu, J., Hoi, S. C., Xu, Z., & Lyu, M. R. (2008). An effective approach to 3d deformable surface tracking. In
*Proceedings of the 10th European conference on computer vision: Part III, ECCV ’08*(pp. 766–779). Berlin: Springer. Google Scholar