Autonomous Robots

, Volume 43, Issue 6, pp 1309–1325 | Cite as

Multi-users online recognition of technical gestures for natural human–robot collaboration in manufacturing

  • Eva Coupeté
  • Fabien MoutardeEmail author
  • Sotiris Manitsaris
Part of the following topical collections:
  1. Special Issue: Learning for Human-Robot Collaboration


Human–robot collaboration in industrial context requires a smooth, natural and efficient coordination between robot and human operators. The approach we propose to achieve this goal is to use online recognition of technical gestures. In this paper, we present together, and analyze, parameterize and evaluate much more thoroughly, three findings previously unveiled separately by us in several conference presentations: (1) we show on a real prototype that multi-users continuous real-time recognition of technical gestures on an assembly-line is feasible (\(\approx \) 90% recall and precision in our case-study), using only non-intrusive sensors (depth-camera with a top-view, plus inertial sensors placed on tools); (2) we formulate an end-to-end methodology for designing and developing such a system; (3) we propose a method for adapting to new users our gesture recognition. Furthermore we present here two new findings: (1) by comparing recognition performances using several sets of features, we highlight the importance of choosing features that focus on the effective part of gestures, i.e. usually hands movements; (2) we obtain new results suggesting that enriching a multi-users training set can lead to higher precision than using a separate training dataset for each operator.


Real-time online gesture recognition Human–robot collaboration Multi-users technical gesture recognition Collaborative robotics Gesture recognition from depth-video 



This research benefited from the support of the Chair ‘PSA Peugeot Citroën Robotics and Virtual Reality’, led by MINES ParisTech and supported by PEUGEOT S.A. The partners of the Chair cannot be held accountable for the content of this paper, which engages the authors’ responsibility only.


  1. Aarno, D., & Kragic, D. (2008). Motion intention recognition in robotassisted applications. Robotics and Autonomous Systems, 56(8), 692–705.CrossRefGoogle Scholar
  2. Bannat, A., Bautze, T., Beetz, M., Blume, J., Diepold, K., Ertelt, C., et al. (2011). Artificial cognition in production systems. IEEE Transactions on Automation Science and Engineering, 8(1), 148–174.CrossRefGoogle Scholar
  3. Biswas, K. K., & Basu, S. K. (2011). Gesture recognition using microsoft kinect. In The 5th international conference on automation, robotics and applications, (pp. 100–103). IEEE.Google Scholar
  4. Bregonzio, M., Gong, S., & Xiang, T. (2009). Recognising action as clouds of space-time interest points. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1948–1955). IEEE.Google Scholar
  5. Bulling, A., Blanke, U., & Schiele, B. (2014). A tutorial on human activity recognition using body-worn inertial sensors. ACM Computing Surveys, 46(3), 1–33.CrossRefGoogle Scholar
  6. Calinon, S., & Billard, A. (2004). Stochastic gesture production and recognition model for a humanoid robot. In: Proceedings. 2004 IEEE/RSJ international conference on intelligent robots and systems, 2004 (IROS 2004), pp. 2769–2774.Google Scholar
  7. Chen, C., Jafari, R., & Kehtarnavaz, N. (2016). Fusion of depth, skeleton, and inertial data for human action recognition. 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2712–2716). IEEE.Google Scholar
  8. Chen, C. P., Chen, Y. T., Lee, P. H., Tsai, Y. P., & Lei, S. (2011). Real-time hand tracking on depth images. In 2011 visual communications and image processing (VCIP) (pp. 1–4). IEEE.Google Scholar
  9. Chen, F., Zhong, Q., Cannella, F., Sekiyama, K., & Fukuda, T. (2015). Hand gesture modeling and recognition for human and robot interactive assembly using hidden Markov models. International Journal of Advanced Robotic Systems, 12(4), 48.CrossRefGoogle Scholar
  10. Chen, L., Wei, H., & Ferryman, J. (2013). A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34(15), 1995–2006.CrossRefGoogle Scholar
  11. Cherubini, A., Passama, R., Crosnier, A., Lasnier, A., & Fraisse, P. (2016). Collaborative manufacturing with physical human–robot interaction. Robotics and Computer-Integrated Manufacturing, 40, 1–13.CrossRefGoogle Scholar
  12. Corrales, R. J. A., García, G. G. J., Torres, M. F., & Perdereau, V. (2012). Cooperative tasks between humans and robots in industrial environments. Rijeka: InTech.CrossRefGoogle Scholar
  13. Coupeté, E., Manitsaris, S., & Moutarde, F. (2014). Real-time recognition of human gestures for collaborative robots on assembly-line. In 3rd international digital human modeling symposium (DHM2014), Tokyo, Japan (p. 7).Google Scholar
  14. Coupeté, E., Moutarde, F., & Manitsaris, S. (2015). Gesture recognition using a depth camera for human robot collaboration on assembly lines. Procedia Manufacturing, 3, 518–525.CrossRefGoogle Scholar
  15. Coupeté, E., Moutarde, F., Manitsaris, S. (2016a). A user-adaptive gesture recognition system applied to human–robot collaboration in factories. In 3rd international symposium on movement and computing (MOCO’16), Thessalonique, Greece.Google Scholar
  16. Coupeté, E., Moutarde, F., Manitsaris, S., & Hugues, O. (2016b). Recognition of technical gestures for human–robot collaboration in factories. In The ninth international conference on advances in computer–human interactions, Venise, Italy.Google Scholar
  17. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1), 269–271.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In 2005 IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance (pp. 65–72). IEEE.Google Scholar
  19. Dong, L., Wu, J., & Chen, X. (2007). A Body activity tracking system using wearable accelerometers. In 2007 IEEE international conference on multimedia and expo (pp. 1011–1014).Google Scholar
  20. Dragan, A. D., Bauman, S., Forlizzi, J., & Srinivasa, S. S. (2015). Effects of robot motion on human–robot collaboration. In Proceedings of the tenth annual ACM/IEEE international conference on human–robot interaction—HRI ’15 (pp. 51–58). New York: ACM Press.Google Scholar
  21. Hägele, M., Schaaf, W., & Helms, E. (2002). Robot assistants at manual workplaces: Effective co-operation and safety aspects. In Proceedings of the 33rd ISR (international symposium on robotics) (pp. 7–11).Google Scholar
  22. Hamester, D., Jirak, D., & Wermter, S. (2013). Improved estimation of hand postures using depth images. In 2013 16th international conference on advanced robotics (ICAR) (pp. 1–6).Google Scholar
  23. Hoffman, G., & Breazeal, C. (2007). Effects of anticipatory action on human–robot teamwork efficiency, fluency, and perception of team. In Proceedings of the ACM/IEEE international conference on human–robot interaction—HRI ’07 (p. 1). New York: ACM Press.Google Scholar
  24. Joo, S. I., Weon, S. H., & Choi, H. I. (2014). Real-time depth-based hand detection and tracking. The Scientific World Journal, 2014, 284827.Google Scholar
  25. Junker, H., Amft, O., Lukowicz, P., & Tröster, G. (2008). Gesture spotting with body-worn inertial sensors to detect user activities. Pattern Recognition, 41(6), 2010–2024.CrossRefzbMATHGoogle Scholar
  26. Ke, Y., Sukthankar, R., & Hebert, M. (2007). Spatio-temporal shape and flow correlation for action recognition. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.Google Scholar
  27. Laptev, I., & Lindeberg, T. (2003). Space–time interest points. In Proceedings ninth IEEE international conference on computer vision (Vol. 1, pp. 432–439).Google Scholar
  28. Lenz, C., Nair, S., Rickert, M., Knoll, A., Rosel, W., Gast, J., Bannat, A., Wallhoff, F. (2008). Joint-action for humans and industrial robots for assembly tasks. In RO-MAN 2008—the 17th IEEE international symposium on robot and human interactive communication (pp. 130–135). IEEE.Google Scholar
  29. Liu, J., Zhong, L., Wickramasuriya, J., & Vasudevan, V. (2009). uwave: Accelerometer-based personalized gesture recognition and its applications. Pervasive and Mobile Computing, 5(6), 657–675.CrossRefGoogle Scholar
  30. Luo, J., Wang, W., & Qi, H. (2013). Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In The IEEE international conference on computer vision (ICCV) (pp. 1809–1816).Google Scholar
  31. Migniot, C., & Ababsa, F. (2013). 3D human tracking from depth cue in a buying behavior analysis context. In 15th international conference on computer analysis of images and patterns (CAIP 2013) (pp. 482–489).Google Scholar
  32. Oikonomopoulos, A., Patras, I., & Pantic, M. (2005). Spatiotemporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 36(3), 710–719.CrossRefGoogle Scholar
  33. Reyes, M., Domínguez, G., & Escalera, S. (2011). Feature weighting in dynamic timewarping for gesture recognition in depth data. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1182–1188).Google Scholar
  34. Rickert, M., Foster, M. E., Giuliani, M., By, T., Panin, G., & Knoll, A. (2007). Integrating language, vision and action for human robot dialog systems. In C. Stephanidis (Ed.), Universal access in human-computer interaction. Ambient interaction. Ambient Interaction. UAHCI 2007. Lecture notes in computer science (Vol. 4555, pp. 987–995). Springer, Berlin, Heidelberg.Google Scholar
  35. Schrempf, O. C., Hanebeck, U. D., Schmid, A. J., & Worn, H. (2005). A novel approach to proactive human–robot cooperation. In ROMAN 2005. IEEE international workshop on robot and human interactive communication, 2005 (pp. 555–560). IEEE.Google Scholar
  36. Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: a local SVM approach. In Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004. (Vol. 3, pp. 32–36). IEEE.Google Scholar
  37. Schwarz, L. A., Mkhitaryan, A., Mateus, D., & Navab, N. (2012). Human skeleton tracking from depth data using geodesic distances and optical flow. Image and Vision Computing, 30(3), 217–226.CrossRefGoogle Scholar
  38. Sempena, S., Maulidevi, N. U., & Aryan, P. R. (2011). Human action recognition using dynamic time warping. In 2011 international conference on electrical engineering and informatics (ICEEI) (pp. 1–5). IEEE.Google Scholar
  39. Shi, J., Jimmerson, G., Pearson, T., & Menassa, R. (2012). Levels of human and robot collaboration for automotive manufacturing. In Proceedings of the workshop on performance metrics for intelligent systems—PerMIS ’12 (p. 95). New York: ACM Press.Google Scholar
  40. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., et al. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR ’11 (pp. 1297–1304). Washington, DC: IEEE Computer Society.Google Scholar
  41. Wang, H., Ullah, M. M., Klaser, A., Laptev, I., & Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British machine vision conference 2009 (pp. pp 124.1–124.11). British Machine Vision Association.Google Scholar
  42. Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., & Ogunbona, P. O. (2016). Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human–Machine Systems, 46(4), 498–509.CrossRefGoogle Scholar
  43. Xia, L., Chen, C.C., Aggarwal, J.K. (2012). View invariant human action recognition using histograms of 3D joints. In 2012 IEEE Computer Society conference on computer vision and pattern recognition workshops (pp. 20–27). IEEE.Google Scholar
  44. Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden Markov model. In Proceedings 1992 IEEE Computer Society conference on computer vision and pattern recognition (pp. 379–385). IEEE Comput. Soc. Press.Google Scholar
  45. Zhang, H., & Parker, L. E. (2011). 4-dimensional local spatio-temporal features for human activity recognition. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 2044—-2049).Google Scholar
  46. Zhu, H. M., & Pun, C. M. (2012). Real-time hand gesture recognition from depth image sequences. In 2012 Ninth international conference on computer graphics, imaging and visualization (pp. 49–52).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Robotics, MINES ParisTechPSL Research UniversityParisFrance

Personalised recommendations