Autonomous Robots

, Volume 37, Issue 3, pp 227–242 | Cite as

Fast RGB-D people tracking for service robots

  • Matteo MunaroEmail author
  • Emanuele Menegatti


Service robots have to robustly follow and interact with humans. In this paper, we propose a very fast multi-people tracking algorithm designed to be applied on mobile service robots. Our approach exploits RGB-D data and can run in real-time at very high frame rate on a standard laptop without the need for a GPU implementation. It also features a novel depth-based sub-clustering method which allows to detect people within groups or even standing near walls. Moreover, for limiting drifts and track ID switches, an online learning appearance classifier is proposed featuring a three-term joint likelihood. We compared the performances of our system with a number of state-of-the-art tracking algorithms on two public datasets acquired with three static Kinects and a moving stereo pair, respectively. In order to validate the 3D accuracy of our system, we created a new dataset in which RGB-D data are acquired by a moving robot. We made publicly available this dataset which is not only annotated by hand, but the ground-truth position of people and robot are acquired with a motion capture system in order to evaluate tracking accuracy and precision in 3D coordinates. Results of experiments on these datasets are presented, showing that, even without the need for a GPU, our approach achieves state-of-the-art accuracy and superior speed.


People tracking Service robots RGB-D Kinect tracking precision dataset Microsoft kinect 



We wish to thank the Biongineering of Movement Laboratory of the University of Padova for providing the motion capture facility, in particular Martina Negretto and Annamaria Guiotto for their help for the data acquisition and all the people who took part to the KTP Dataset. We wish also to thank Filippo Basso and Stefano Michieletto as co-authors of the previous publications related to this work and Mauro Antonello for the advices on the disparity computation for the ETH dataset.


  1. Bajracharya, M., Moghaddam, B., Howard, A., Brennan, S., & Matthies, L. H. (2009). A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. International Journal of Robotics Research, 28(11–12), 1466–1485.CrossRefGoogle Scholar
  2. Basso, F., Munaro, M., Michieletto, S., Pagello, E., & Menegatti, E. (2012). IAS-12 (pp. 265–276). Korea: Jeju Island.Google Scholar
  3. Bellotto, N., & Hu, H. (2010). Computationally efficient solutions for tracking people with a mobile robot: an experimental evaluation of bayesian filters. Autonomous Robots, 28, 425–438.CrossRefGoogle Scholar
  4. Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. Journal of Image Video Processing, 2008, 1:1–1:10.Google Scholar
  5. Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.Google Scholar
  6. Breitenstein, M. D., Reichlin, F., Leibe, B., Koller-Meier, E., & Gool, L. V. (2009). Robust tracking-by-detection using a detector confidence particle filter. 12th International Conference on Computer Vision, 1, 1515–1522.Google Scholar
  7. Carballo, A., Ohya, A., & Yuta, S. (2011). Reliable people detection using range and intensity data from multiple layers of laser range finders on a mobile robot. International Journal of Social Robotics, 3(2), 167–186.CrossRefGoogle Scholar
  8. Choi, W., Pantofaru, C., & Savarese, S. (2011). Detecting and tracking people using an rgb-d camera via multiple detector fusion. ICCV Workshops, 2011, 1076–1083.Google Scholar
  9. Choi, W., Pantofaru, C., & Savarese, S. (2012). A general framework for tracking multiple people from a moving camera. Pattern Analysis and Machine Intelligence (PAMI), 35(7), 1577–1591.Google Scholar
  10. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, 1, 886–893.Google Scholar
  11. Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2009). Pedestrian detection: A benchmark. Computer Vision and Pattern Recognition, 2009, 304–311.Google Scholar
  12. Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2009). Moving obstacle detection in highly dynamic scenes. International Conference on Robotics and Automation, 4451–4458.Google Scholar
  13. Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. Computer Vision and Pattern Recognition, 2008, 1–8.Google Scholar
  14. Everingham, M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.CrossRefGoogle Scholar
  15. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.CrossRefGoogle Scholar
  16. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. CVPR 2012 (pp. 3354–3361). USA: Providence.Google Scholar
  17. Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In CVPR, Vol. 1, pp. 260–267. IEEE Computer SocietyGoogle Scholar
  18. Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the kinect to work. In ICCV workshop on consumer depth cameras in computer vision.Google Scholar
  19. Kim, W., Yibing, W., Ovsiannikov, I., Lee, S., Park, Y., Chung, C., et al. (2012). A 1.5Mpixel RGBZ CMOS image sensor for simultaneous color and range image capture. In ISSCC 2012, San Francisco, USA, pp. 392–394.Google Scholar
  20. Konstantinova, P., Udvarev, A., & Semerdjiev, T. (2003). A study of a target tracking algorithm using global nearest neighbor approach. In CompSysTec 2003: e-Learning, pp. 290–295. ACMGoogle Scholar
  21. Koppula, H. S., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. Advances in Neural Information Processing Systems, 244–252.Google Scholar
  22. Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. International Conference on Robotics and Automation, 2011, 1817–1824.Google Scholar
  23. Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. Intelligent Robots and Systems, 2011, 3844–3849.Google Scholar
  24. Martin, C., Schaffernicht, E., Scheidig, A., & Gross, H.-M. (2006). Multi-modal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robotics and Autonomous Systems, 54(9), 721–728.CrossRefGoogle Scholar
  25. Mitzel, D., & Leibe, B. (2011). Real-time multi-person tracking with detector assisted structure propagation. ICCV Workshops, 2011, 974–981.Google Scholar
  26. Mozos, O., Kurazume, R., & Hasegawa, T. (2010). Multi-part people detection using 2d range data. International Journal of Social Robotics, 2, 31–40.CrossRefGoogle Scholar
  27. Munaro, M., Basso, F., & Menegatti, E. (2012). Tracking people within groups with RGB-D data. IROS 2012 (pp. 2101–2107). Portugal: Algarve.Google Scholar
  28. Munaro, M., Basso, F., Michieletto, S., Pagello, E., & Menegatti, E. (2013). A software architecture for RGB-D people tracking based on ros framework for a mobile robot. Frontiers of Intelligent Autonomous Systems, 466, 53–68.CrossRefGoogle Scholar
  29. Navarro-Serment, L. E., Mertz, C., & Hebert, M. (2009). Pedestrian detection and tracking using three-dimensional ladar data. The International Journal of Robotics Research, 103–112.Google Scholar
  30. Pandey, G., McBride, J. R., & Eustice, R. M. (2011). Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13), 1543–1552.CrossRefGoogle Scholar
  31. Pantofaru, C. (2010). The Moving People, Moving Platform Dataset.
  32. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). Ros: An open-source robot operating system ICRA.Google Scholar
  33. Rusu, R. B., & Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In ICRA 2011, Shanghai, China, May 9–13, pp. 1–4.Google Scholar
  34. Satake, J., & Miura, J. (2009). Robust stereo-based person detection and tracking for a person following robot. Workshop on people detection and tracking (ICRA 2009).Google Scholar
  35. Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In ICCV 2011— workshop on 3D representation and recognition, pp. 601–608.Google Scholar
  36. Spinello, L., Arras, K. O., Triebel, R., & Siegwart, R. (2010). A layered approach to people detection in 3d range data. In AAAI’10. Atlanta, USA: PGAI Track.Google Scholar
  37. Spinello, L., Luber, M., & Arras, K. O. (2011). Tracking people in 3d using a bottom-up top-down people detector. In ICRA 2011 (pp. 1304–1310). Shanghai.Google Scholar
  38. Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. Intelligent Robots and Systems, 2011, 3838–3843.Google Scholar
  39. Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. Intelligent Robots and Systems, 2012, 573–580.Google Scholar
  40. Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. IEEE International Conference on Robotics and Automation, 2012, 842–849.Google Scholar
  41. Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. Computer Vision and Pattern Recognition 1200–1207.Google Scholar
  42. Zhang, L., Li, Y., & N. R. (2008). Global data association for multi-object tracking using network flows. Computer Vision and Pattern Recognition 1–8.Google Scholar
  43. Zhang, H., & Parker, L. E. (2011). 4-dimensional local spatio-temporal features for human activity recognition. Intelligent Robots and Systems, 2011, 2044–2049.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of PadovaPadovaItaly

Personalised recommendations