This paper introduces DART, a general framework for tracking articulated objects composed of rigid bodies connected through a kinematic tree. DART covers a broad set of objects encountered in indoor environments, including furniture and tools, and human and robot bodies, hands and manipulators. To achieve efficient and robust tracking, DART extends the signed distance function representation to articulated objects and takes full advantage of highly parallel GPU algorithms for data association and pose optimization. We demonstrate the capabilities of DART on different types of objects that have each required dedicated tracking techniques in the past.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ballan, L., Taneja, A., Gall, J., Gool, L. V., & Pollefeys, M. (2012). Motion capture of hands in action using discriminative salient points. In European conference on computer vision (ECCV).
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., & Rother, C. (2014). Learning 6D object pose estimation using 3D object coordinates. In European conference on computer vision (ECCV).
Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers, D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Proceedings of robotics: Science and systems, Berlin, Germany.
Bylow, E., Olsson, C., & Kahl, F. (2014). Robust camera tracking by combining color and depth measurements. In 2014 22nd international conference on pattern recognition (ICPR).
Canelhas, D., Stoyanov, T., & Lilienthal, A. (2013). SDF tracker: A parallel algorithm for on-line pose estimation and scene reconstruction from depth images. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS).
Chang, W., & Zwicker, M. (2008). Automatic registration for articulated shapes. Computer Graphics Forum, 27, 1459–1468.
Comport, A., Marchand, E., & Chaumette, F. (2007). Kinematic sets for real-time robust articulated object tracking. Image and Vision Computing, 25(3), 374–391.
Damianou, A., Titsias, M., & Lawrence, N. (2011). Variational gaussian process dynamical systems. In Advances in neural information processing systems (NIPS).
Dewaele, G., Devernay, F., & Horaud, R. (2004). Hand motion from 3D point trajectories and a smooth surface model. In European conference on computer vision (ECCV).
Drummond, T., & Cipolla, R. (1999). Visual tracking and control using Lie algebras. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proceedings of the international conference on computer vision (ICCV).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Distance transforms of sampled functions. Cornell computing and information science technical report.
Fitzgibbon, A. W. (2001). Robust registration of 2D and 3D point sets. In Proceedings of the British machine vision conference (BMVC).
Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2010). Real time motion capture using a single time-of-flight camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In Proceedings of the European conference on computer vision (ECCV).
Grest, D., Woetzel, J., & Koch, R. (2005). Nonlinear body pose estimation from depth images. In Pattern recognition (pp. 285–292). New York: Springer.
Hähnel, D., Thrun, S., & Burgard, W. (2003). An extension of the ICP algorithm for modeling nonrigid objects with mobile robots. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI).
Helten, T., Baak, A., Bharaj, G., Müller, M., Seidel, H., & Theobalt, C. (2013). Personalization and evaluation of a real-time depth-based full body tracker. In International conference on 3D vision (3DV).
Henry, P., Fox, D., Bhowmik, A., & Mongia, R. (2013). Patch volumes: Segmentation-based consistent mapping with RGB-D cameras. In International conference on 3D vision (3DV).
Keskin, C., Kiraç, F., Kara, Y. E., & Akarun, L. (2011). Real time hand pose estimation using depth sensors. In ICCV workshops.
Kiraç, F., Kara, Y. E., & Akarun, L. (2014). Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recognition Letters, 50, 91–100.
Klingensmith, M., Galluzzo, T., Dellin, C., Kazemi, M., Bagnell, J., & Pollard, N. (2013). Closed-loop servoing using real-time markerless arm tracking. In International conference on robotics and automation (Humanoids workshop).
Ko, J., & Fox, D. (2011). Learning GP-BayesFilters via Gaussian process latent variable models. Autonomous Robots, 30(1), 3–23.
Krainin, M., Henry, P., Ren, X., & Fox, D. (2011). Manipulator and object tracking for in-hand 3d object modeling. The International Journal of Robotics Research, 30(11), 1311–1327.
Kwok, C., & Fox, D. (2004). Map-based multiple model tracking of a moving object. In RoboCup 2004: Robot soccer world cup VIII, (Vol. 3276). Berlin: Springer Verlag
Kyriazis, N., Argyros, A. (2013). Physically plausible 3D scene tracking: The single actor hypothesis. In Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Li, H., Sumner, R. W., & Pauly, M. (2008). Global correspondence optimization for non-rigid registration of depth scans. Computer Graphics Forum, 27(5), 1421–1430.
Li, H., Yu, J., Ye, Y., & Bregler, C. (2013). Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics, 32(4), 42.
Newcombe, R. A. (2014). Dense visual SLAM. PhD thesis, Imperial College London.
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the international symposium on mixed and augmented reality (ISMAR).
Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011a). Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British machine vision conference (BMVC).
Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011b). Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British machine vision conference (BMVC).
Pauwels, K., Ivan, V., Ros, E., & Vijayakumar, S. (2014). Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In Proceedings of the IEEE/RSJ conference on intelligent robots and systems (IROS).
Qian, C., Sun, X., Wei, Y., Tang, X., & Sun, J. (2014). Realtime and robust hand tracking from depth. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Ren, C., & Reid, I., (2012). A unified energy minimization framework for model fitting in depth. In Computer vision ECCV 2012. Workshops and demonstrations (Vol. 7584, pp. 72–82). Lecture Notes in Computer Science Berlin Heidelberg: Springer.
Romero, J., Kjellström, H., Ek, C. H., & Kragic, D. (2013). Non-parametric hand pose estimation with object context. Image and Vision Computing, 31(8), 555–564.
Schmidt, T., Hertkorn, K., Newcombe, R., Marton, Z., Suppa, S., & Fox, D. (2015). Depth-based tracking with physical constraints for robot manipulation. In IEEE international conference on robotics and automation (ICRA).
Schröder, M., Maycock, J., Ritter, H., & Botsch, M. (2013). Analysis of hand synergies for inverse kinematics hand tracking. In IEEE international conference on robotics and automation (ICRA).
Schulman, J., Lee, A., Ho, J., & Abbeel, P. (2013). Tracking deformable objects with point clouds. In Proceedings of the IEEE international conference on robotics and automation (ICRA).
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Sridhar, S., Oulasvirta, A., & Theobalt, C. (2013). Interactive markerless articulated hand motion tracking using RGB and depth data. In Proceedings of the international conference on computer vision (ICCV).
Sturm, J., Bylow, E., Kahl, F., & Cremers, D. (2013). CopyMe3D: Scanning and printing persons in 3D. In Pattern recognition (pp. 405–414). New York: Springer.
Taylor, J., Shotton, J., Sharp, T., & Fitzgibbon, A. (2012). The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge, MA: MIT Press, ISBN 0-262-20162-3.
Todorov, E. (2014). Analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo. In Proceedings of the IEEE international conference on robotics and automation (ICRA).
Tompson, J., Stein, M., Lecun, Y., & Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (TOG), 33(5), 169.
Whelan, T., McDonald, J., Kaess, M., Fallon, M., Johannsson, H., & Leonard, J. J. (2012). Kintinuous: Spatially extended kinectfusion. In Workshop on RGB-D: Advanced reasoning with depth cameras, in conjunction with robotics: Science and systems.
Ye, M., & Yang, R. (2014). Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., et al. (2014). Real-time Non-rigid Reconstruction using an RGB-D Camera. TOG: ACM Transactions on Graphics.
This work was funded in part by the Intel Science and Technology Center for Pervasive Computing (ISTC-PC) and by ONR grant N00014-13-1-0720.
Author information
Authors and Affiliations
Corresponding author
Additional information
This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 42056 KB)
Rights and permissions
About this article
Cite this article
Schmidt, T., Newcombe, R. & Fox, D. DART: dense articulated real-time tracking with consumer depth cameras. Auton Robot 39, 239–258 (2015). https://doi.org/10.1007/s10514-015-9462-z
Issue Date:
DOI: https://doi.org/10.1007/s10514-015-9462-z