Autonomous Robots

, Volume 39, Issue 3, pp 239–258 | Cite as

DART: dense articulated real-time tracking with consumer depth cameras

Article

Abstract

This paper introduces DART, a general framework for tracking articulated objects composed of rigid bodies connected through a kinematic tree. DART covers a broad set of objects encountered in indoor environments, including furniture and tools, and human and robot bodies, hands and manipulators. To achieve efficient and robust tracking, DART extends the signed distance function representation to articulated objects and takes full advantage of highly parallel GPU algorithms for data association and pose optimization. We demonstrate the capabilities of DART on different types of objects that have each required dedicated tracking techniques in the past.

Keywords

Articulated model tracking Signed distance function  Real-time vision RGB-D 

Supplementary material

Supplementary material 1 (mp4 42056 KB)

References

  1. Ballan, L., Taneja, A., Gall, J., Gool, L. V., & Pollefeys, M. (2012). Motion capture of hands in action using discriminative salient points. In European conference on computer vision (ECCV).Google Scholar
  2. Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., & Rother, C. (2014). Learning 6D object pose estimation using 3D object coordinates. In European conference on computer vision (ECCV).Google Scholar
  3. Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Proceedings of robotics: Science and systems, Berlin, Germany.Google Scholar
  4. Bylow, E., Olsson, C., & Kahl, F. (2014). Robust camera tracking by combining color and depth measurements. In 2014 22nd international conference on pattern recognition (ICPR).Google Scholar
  5. Canelhas, D., Stoyanov, T., & Lilienthal, A. (2013). SDF tracker: A parallel algorithm for on-line pose estimation and scene reconstruction from depth images. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS).Google Scholar
  6. Chang, W., & Zwicker, M. (2008). Automatic registration for articulated shapes. Computer Graphics Forum, 27, 1459–1468.CrossRefGoogle Scholar
  7. Comport, A., Marchand, E., & Chaumette, F. (2007). Kinematic sets for real-time robust articulated object tracking. Image and Vision Computing, 25(3), 374–391.CrossRefGoogle Scholar
  8. Damianou, A., Titsias, M., & Lawrence, N. (2011). Variational gaussian process dynamical systems. In Advances in neural information processing systems (NIPS).Google Scholar
  9. Dewaele, G., Devernay, F., & Horaud, R. (2004). Hand motion from 3D point trajectories and a smooth surface model. In European conference on computer vision (ECCV).Google Scholar
  10. Drummond, T., & Cipolla, R. (1999). Visual tracking and control using Lie algebras. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  11. Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  12. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Distance transforms of sampled functions. Cornell computing and information science technical report.Google Scholar
  13. Fitzgibbon, A. W. (2001). Robust registration of 2D and 3D point sets. In Proceedings of the British machine vision conference (BMVC).Google Scholar
  14. Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2010). Real time motion capture using a single time-of-flight camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  15. Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  16. Grest, D., Woetzel, J., & Koch, R. (2005). Nonlinear body pose estimation from depth images. In Pattern recognition (pp. 285–292). New York: Springer.Google Scholar
  17. Hähnel, D., Thrun, S., & Burgard, W. (2003). An extension of the ICP algorithm for modeling nonrigid objects with mobile robots. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI).Google Scholar
  18. Helten, T., Baak, A., Bharaj, G., Müller, M., Seidel, H., & Theobalt, C. (2013). Personalization and evaluation of a real-time depth-based full body tracker. In International conference on 3D vision (3DV).Google Scholar
  19. Henry, P., Fox, D., Bhowmik, A., & Mongia, R. (2013). Patch volumes: Segmentation-based consistent mapping with RGB-D cameras. In International conference on 3D vision (3DV).Google Scholar
  20. Keskin, C., Kiraç, F., Kara, Y. E., & Akarun, L. (2011). Real time hand pose estimation using depth sensors. In ICCV workshops.Google Scholar
  21. Kiraç, F., Kara, Y. E., & Akarun, L. (2014). Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recognition Letters, 50, 91–100.CrossRefGoogle Scholar
  22. Klingensmith, M., Galluzzo, T., Dellin, C., Kazemi, M., Bagnell, J., & Pollard, N. (2013). Closed-loop servoing using real-time markerless arm tracking. In International conference on robotics and automation (Humanoids workshop).Google Scholar
  23. Ko, J., & Fox, D. (2011). Learning GP-BayesFilters via Gaussian process latent variable models. Autonomous Robots, 30(1), 3–23.CrossRefGoogle Scholar
  24. Krainin, M., Henry, P., Ren, X., & Fox, D. (2011). Manipulator and object tracking for in-hand 3d object modeling. The International Journal of Robotics Research, 30(11), 1311–1327.CrossRefGoogle Scholar
  25. Kwok, C., & Fox, D. (2004). Map-based multiple model tracking of a moving object. In RoboCup 2004: Robot soccer world cup VIII, (Vol. 3276). Berlin: Springer VerlagGoogle Scholar
  26. Kyriazis, N., Argyros, A. (2013). Physically plausible 3D scene tracking: The single actor hypothesis. In Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR).Google Scholar
  27. Li, H., Sumner, R. W., & Pauly, M. (2008). Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum, 27(5), 1421–1430.CrossRefGoogle Scholar
  28. Li, H., Yu, J., Ye, Y., & Bregler, C. (2013). Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics, 32(4), 42.Google Scholar
  29. Newcombe, R. A. (2014). Dense visual SLAM. PhD thesis, Imperial College London.Google Scholar
  30. Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the international symposium on mixed and augmented reality (ISMAR).Google Scholar
  31. Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011a). Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British machine vision conference (BMVC).Google Scholar
  32. Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011b). Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British machine vision conference (BMVC).Google Scholar
  33. Pauwels, K., Ivan, V., Ros, E., & Vijayakumar, S. (2014). Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In Proceedings of the IEEE/RSJ conference on intelligent robots and systems (IROS).Google Scholar
  34. Qian, C., Sun, X., Wei, Y., Tang, X., & Sun, J. (2014). Realtime and robust hand tracking from depth. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  35. Ren, C., & Reid, I., (2012). A unified energy minimization framework for model fitting in depth. In Computer vision ECCV 2012. Workshops and demonstrations (Vol. 7584, pp. 72–82). Lecture Notes in Computer Science Berlin Heidelberg: Springer.Google Scholar
  36. Romero, J., Kjellström, H., Ek, C. H., & Kragic, D. (2013). Non-parametric hand pose estimation with object context. Image and Vision Computing, 31(8), 555–564.CrossRefGoogle Scholar
  37. Schmidt, T., Hertkorn, K., Newcombe, R., Marton, Z., Suppa, S., & Fox, D. (2015). Depth-based tracking with physical constraints for robot manipulation. In IEEE international conference on robotics and automation (ICRA).Google Scholar
  38. Schröder, M., Maycock, J., Ritter, H., & Botsch, M. (2013). Analysis of hand synergies for inverse kinematics hand tracking. In IEEE international conference on robotics and automation (ICRA).Google Scholar
  39. Schulman, J., Lee, A., Ho, J., & Abbeel, P. (2013). Tracking deformable objects with point clouds. In Proceedings of the IEEE international conference on robotics and automation (ICRA).Google Scholar
  40. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  41. Sridhar, S., Oulasvirta, A., & Theobalt, C. (2013). Interactive markerless articulated hand motion tracking using RGB and depth data. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  42. Sturm, J., Bylow, E., Kahl, F., & Cremers, D. (2013). CopyMe3D: Scanning and printing persons in 3D. In Pattern recognition (pp. 405–414). New York: Springer.Google Scholar
  43. Taylor, J., Shotton, J., Sharp, T., & Fitzgibbon, A. (2012). The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  44. Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge, MA: MIT Press, ISBN 0-262-20162-3.Google Scholar
  45. Todorov, E. (2014). Analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo. In Proceedings of the IEEE international conference on robotics and automation (ICRA).Google Scholar
  46. Tompson, J., Stein, M., Lecun, Y., & Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (TOG), 33(5), 169.CrossRefGoogle Scholar
  47. Whelan, T., McDonald, J., Kaess, M., Fallon, M., Johannsson, H., & Leonard, J. J. (2012). Kintinuous: Spatially extended kinectfusion. In Workshop on RGB-D: Advanced reasoning with depth cameras, in conjunction with robotics: Science and systems.Google Scholar
  48. Ye, M., & Yang, R. (2014). Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  49. Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., et al. (2014). Real-time Non-rigid Reconstruction using an RGB-D Camera. TOG: ACM Transactions on Graphics.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA

Personalised recommendations