Follow Me: Real-Time in the Wild Person Tracking Application for Autonomous Robotics

  • Thomas WeberEmail author
  • Sergey Triputen
  • Michael Danner
  • Sascha Braun
  • Kristiaan Schreve
  • Matthias Rätsch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11175)


In the last 20 years there have been major advances in autonomous robotics. In IoT (Industry 4.0), mobile robots require more intuitive interaction possibilities with humans in order to expand its field of applications. This paper describes a user-friendly setup, which enables a person to lead the robot in an unknown environment. The environment has to be perceived by means of sensory input. For realizing a cost and resource efficient Follow Me application we use a single monocular camera as low-cost sensor. For efficient scaling of our Simultaneous Localization and Mapping (SLAM) algorithm, we integrate an inertial measurement unit (IMU) sensor. With the camera input we detect and track a person. We propose combining state of the art deep learning with Convolutional Neural Network (CNN) and SLAM algorithms functionality on the same input camera image. Based on the output robot navigation is possible. This work presents the specification, workflow for an efficient development of the Follow Me application. Our application’s delivered point clouds are also used for surface construction. For demonstration, we use our platform SCITOS G5 equipped with the afore mentioned sensors. Preliminary tests show the system works robustly in the wild (This work is partially supported by a grant of the BMBF FHprofUnt program, no. 03FH049PX5).


Mobile robotics 3D perception Navigation Human-robot interaction Person tracking Machine learning CNN SLAM 


  1. 1.
    Hollywoodheads dataset. Accessed 01 Feb 2017
  2. 2.
    Eisenbach, M., Vorndran, A., Sorge, S., Gross, H.M.: User recognition for guiding and following people with a mobile robot in a clinical environment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3600–3607. IEEE (2015)Google Scholar
  3. 3.
    Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: Robotics and Automation (2012)Google Scholar
  4. 4.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). Scholar
  5. 5.
    Engel, J., Stueckler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: International Conference on Intelligent Robots and Systems (IROS) (2015)Google Scholar
  6. 6.
    Engel, J., Sturm, J., Cremers, D.: Camera-based navigation of a low-cost quadrocopter. In: International Conference on Intelligent Robot Systems (IROS) (2012)Google Scholar
  7. 7.
    Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular camera. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  8. 8.
    Engel, J., Sturm, J., Cremers, D.: Scale-aware navigation of a low-cost quadrocopter with a monocular camera. Robot. Auton. Syst. (RAS) 62, 1646–1656 (2014)CrossRefGoogle Scholar
  9. 9.
    Horn, B.: Closed-form solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. 4, 629–642 (1987)CrossRefGoogle Scholar
  10. 10.
    Horn, B., Hilden, H., Negahdaripour, S.: Closed-form solution of absolute orientation using orthonormal matrices. J. Opt. Soc. Am. 5, 1127–1135 (1988)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Huber, P., et al.: A multiresolution 3D morphable face model and fitting framework.
  12. 12.
    Jia, Y., et al.: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  13. 13.
    Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR 2007, Nara) (2007)Google Scholar
  14. 14.
    Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. In: Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR 2009, Orlando) (2009)Google Scholar
  15. 15.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012).
  16. 16.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  17. 17.
    Maier, R., Sturm, J., Cremers, D.: Submap-based bundle adjustment for 3D reconstruction from RGB-D data. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 54–65. Springer, Cham (2014). Scholar
  18. 18.
    Mueller, S., Schaffernicht, E., Scheidig, A., Boehme, H.J., Gross, H.M.: Are you still following me? In: European Conference on Mobile Robots (ECMR), pp. 211–216. Albert-Ludwigs-Universitaet Freiburg - Universitaetsverlag (2007)Google Scholar
  19. 19.
    Mur-Artal, R., Tardos, J.: ORB-SLAM: tracking and mapping recognizable features. In: Robotics: Science and Systems (RSS) Workshop on Multi View Geometry in Robotics (MVIGRO) (2014)Google Scholar
  20. 20.
    Mur-Artal, R., Tardós, J.: Probabilistic semi-dense mapping from highly accurate feature-based monocular SLAM. In: Robotics, Science and Systems (2015)Google Scholar
  21. 21.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  22. 22.
    Triputen, S., Schreve, K., Tkachev, V., Rätsch, M.: Closed-form solution for IMU based LSD-SLAM point cloud conversion into the scaled 3D world environment. In: Submitted IEEE AIM (2017)Google Scholar
  23. 23.
    Shiry, S., Menhaj, M.B., Daronkolaei, A.G.: Multiple target tracking for mobile robots using the JPDAF algorithm. In: 2007 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 01, pp. 137–145 (2007)Google Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
  25. 25.
    Usenko, V., Engel, J., Stückler, J., Cremers, D.: Direct visual-inertial odometry with stereo cameras. In: Robotics and Automation (ICRA) (2016)Google Scholar
  26. 26.
    Vu, T., Osokin, A., Laptev, I.: Context-aware CNNs for person head detection. CoRR abs/1511.07917 (2015).
  27. 27.
    Welch, G., Bishop, G.: An introduction to the Kalman filter. Technical report, Chapel Hill, NC, USA (1995)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Thomas Weber
    • 1
    Email author
  • Sergey Triputen
    • 1
  • Michael Danner
    • 1
  • Sascha Braun
    • 1
  • Kristiaan Schreve
    • 1
  • Matthias Rätsch
    • 1
  1. 1.Reutlingen UniversityReutlingenGermany

Personalised recommendations