Context-Aware Human-Robot Collaborative Assembly

  • Lihui WangEmail author
  • Xi Vincent Wang


In human-robot collaborative manufacturing, industrial robots would work alongside the human workers who jointly perform the assigned tasks. Recent research work revealed that recognised human motions could be used as input for industrial robots control. However, the human-robot collaboration team still cannot work symbiotically. In response to the requirement, this chapter explores the potential of establishing context awareness between a human worker and an industrial robot for human-robot collaborative assembly. The context awareness between the human worker and the industrial robot is established by applying gesture recognition, human motion recognition and Augmented Reality (AR) based worker instruction technologies. Such a system works in a cyber-physical environment and is demonstrated by case studies.


  1. 1.
    J. Krüger, T.K. Lien, A. Verl, Cooperation of human and machines in assembly lines. CIRP Ann. Technol. 58, 628–646 (2009)CrossRefGoogle Scholar
  2. 2.
    S.A. Green, M. Billinghurst, X. Chen, G.J. Chase, Human-robot collaboration: a literature review and augmented reality approach in design. Int. J. Adv. Robot. Syst. 1–18 (2008)Google Scholar
  3. 3.
    P.R. Cohen, H.J. Levesque, Teamwork. Nous 487–512 (1991)Google Scholar
  4. 4.
    L.S. Vygotsky, Mind in society: the development of higher psychological processes (Harvard University Press, 1980)Google Scholar
  5. 5.
    P.R. Cohen, H.J. Levesque, persistence, intention, and commitment. Reason. About Actions Plans 297–340 (1990)Google Scholar
  6. 6.
    C. Breazeal et al., Humanoid robots as cooperative partners for people. Int. J. Humanoid Robot. 1, 1–34 (2004)CrossRefGoogle Scholar
  7. 7.
    Z.M. Bi, L. Wang, Advances in 3D data acquisition and processing for industrial applications. Robot. Comput. Integr. Manuf. 26, 403–413 (2010)CrossRefGoogle Scholar
  8. 8.
    B. Schmidt, L. Wang, Depth camera based collision avoidance via active robot control. J. Manuf. Syst. 33, 711–718 (2014)CrossRefGoogle Scholar
  9. 9.
    H. Liu, L. Wang, Gesture recognition for human-robot collaboration: a review (J. Ind. Ergon, Int, 2017). doi: 10.1016/j.ergon.2017.02.004 Google Scholar
  10. 10.
    A. Bauer, D. Wollherr, M. Buss, Human–robot collaboration: a survey. Int. J. Humanoid Robot. 5, 47–66 (2008)CrossRefGoogle Scholar
  11. 11.
    S. Mitra, T. Acharya, Gesture recognition: a survey. IEEE Trans. Syst. Man, Cybern. Part C Appl. Rev. 37, 311–324 (2007)CrossRefGoogle Scholar
  12. 12.
    R. Parasuraman, T.B. Sheridan, C.D. Wickens, A model for types and levels of human interaction with automation. IEEE Trans. Syst. Man Cybern. Part A Syst. Humans. 30, 286–297 (2000)CrossRefGoogle Scholar
  13. 13.
    T.E. Starner, Visual Recognition of American Sign Language Using Hidden Markov Models (1995)Google Scholar
  14. 14.
    T. Starner, J. Weaver, A. Pentland, Real-time american sign language recognition using desk and wearable computer based video, in IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20 (1998), pp. 1371–1375Google Scholar
  15. 15.
    N.R. Howe, M.E. Leventon, W.T. Freeman, Bayesian reconstruction of 3D human motion from single-camera video. NIPS 99, 820–826 (1999)Google Scholar
  16. 16.
    Y. Katsuki, Y. Yamakawa, M. Ishikawa, High-speed human/robot hand interaction system, in Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction Extended Abstracts (2015), pp. 117–118Google Scholar
  17. 17.
    M. Elmezain, A. Al-Hamadi, J. Appenrodt, B. Michaelis, A hidden markov model-based continuous gesture recognition system for hand motion trajectory, in 19th International Conference on Pattern Recognition (2008), pp. 1–4Google Scholar
  18. 18.
    Y. Matsumoto, A. Zelinsky, An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement, in Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (2000), pp. 499–504Google Scholar
  19. 19.
    J.P. Wachs, M. Kölsch, H. Stern, Y. Edan, Vision-based hand-gesture applications. Commun. ACM 54, 60–71 (2011)CrossRefGoogle Scholar
  20. 20.
    J. Suarez, R.R. Murphy, Hand gesture recognition with depth images: a review. IEEE RO-MAN 411–417 (2012)Google Scholar
  21. 21.
    P. Doliotis, A. Stefan, C. McMurrough, D. Eckhard, V. Athitsos, Comparing gesture recognition accuracy using color and depth information, in Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments (2011), p. 20Google Scholar
  22. 22.
    T. Sharp et al., Accurate, robust, and flexible real-time hand tracking, in Proceeding CHI (2015), p. 8Google Scholar
  23. 23.
    A. Erol, G. Bebis, M. Nicolescu, R.D. Boyle, X. Twombly, Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108, 52–73 (2007)CrossRefGoogle Scholar
  24. 24.
    T. Labs, Myo (2015)
  25. 25.
    Y. Zhang, C. Harrison, Tomo: wearable, low-cost electrical impedance tomography for hand gesture recognition, in Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (2015), pp. 167–173Google Scholar
  26. 26.
    N. Haroon, A.N. Malik, Multiple hand gesture recognition using surface EMG signals. J. Biomed. Eng. Med. Imaging 3, 1 (2016)CrossRefGoogle Scholar
  27. 27.
    S. Roy, S. Ghosh, A. Barat, M. Chattopadhyay, D. Chowdhury, Artif. Intell. Evol. Comput. Engin. Syst. 357–364 (2016)Google Scholar
  28. 28.
    Google, Project soli (2015)
  29. 29.
    J. Smith et al., Electric field sensing for graphical interfaces. Comput. Graph. Appl. IEEE 18, 54–60 (1998)CrossRefGoogle Scholar
  30. 30.
    F. Adib, C.-Y. Hsu, H. Mao, D. Katabi, F. Durand, Capturing the human figure through a wall. ACM Trans. Graph. 34, 219 (2015)CrossRefGoogle Scholar
  31. 31.
    F. Adib, D. Katabi, See through walls with WiFi! ACM. 43 (2013)Google Scholar
  32. 32.
    F. Adib, Z. Kabelac, D. Katabi, R.C. Miller, 3D tracking via body radio reflections. Usenix NSDI 14 (2014)Google Scholar
  33. 33.
    J. Letessier, F. Bérard, Visual tracking of bare fingers for interactive surfaces, in Proceedings of the 17th annual ACM symposium on User interface software and technology (2004), pp. 119–122Google Scholar
  34. 34.
    D. Weinland, R. Ronfard, E. Boyer, A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)CrossRefGoogle Scholar
  35. 35.
    D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of 7th IEEE International Conference on Computer Vision, vol. 2 (1999), pp. 1150–1157Google Scholar
  36. 36.
    H. Bay, T. Tuytelaars, L. Van Gool, Computer visionECCV (2006), pp. 404–417Google Scholar
  37. 37.
    E. Rublee, V. Rabaud, K. Konolige, G. Bradski, ORB: an efficient alternative to SIFT or SURF, in IEEE International Conference on Computer Vision (ICCV) (2011), pp. 2564–2571Google Scholar
  38. 38.
    S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts. Pattern Anal. Mach. Intell. IEEE Trans. 24, 509–522 (2002)CrossRefGoogle Scholar
  39. 39.
    B. Allen, B. Curless, Z. Popović, Articulated body deformation from range scan data. ACM Trans. Graph. 21, 612–619 (2002)CrossRefGoogle Scholar
  40. 40.
    I. Oikonomidis, N. Kyriazis, A.A. Argyros, Efficient model-based 3D tracking of hand articulations using Kinect. BMVC 1, 3 (2011)Google Scholar
  41. 41.
    R. Cutler, M. Turk, View-Based Interpretation of Real-Time Optical Flow for Gesture Recognition (1998), p. 416Google Scholar
  42. 42.
    J.L. Barron, D.J. Fleet, S.S. Beauchemin, Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994)CrossRefGoogle Scholar
  43. 43.
    C. Thurau, V. Hlaváč, Pose primitive based human action recognition in videos or still images, in IEEE Conference on Computer Vision and Pattern Recognition (2008), pp. 1–8Google Scholar
  44. 44.
    Q. Pu, S. Gupta, S. Gollakota, S. Patel, Whole-home gesture recognition using wireless signals, in Proceedings of the 19th Annual International Conference on Mobile Computing & Networking (2013), pp. 27–38Google Scholar
  45. 45.
    R. Ronfard, C. Schmid, B. Triggs, Computer Vision (2002), pp. 700–714Google Scholar
  46. 46.
    S.-J. Lee, C.-S. Ouyang, S.-H. Du, A neuro-fuzzy approach for segmentation of human objects in image sequences. Syst. Man Cybern. Part B Cybern. IEEE Trans. 33, 420–437 (2003)CrossRefGoogle Scholar
  47. 47.
    D. Tang, H.J. Chang, A. Tejani, T.-K. Kim, Latent regression forest: structured estimation of 3D articulated hand posture, in IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 3786–3793Google Scholar
  48. 48.
    J. Taylor, J. Shotton, T. Sharp, A. Fitzgibbon, The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation, in IEEE Conference on Computer Vision and Pattern Recognition (2012), pp. 103–110Google Scholar
  49. 49.
    J. Han, L. Shao, D. Xu, J. Shotton, Enhanced computer vision with microsoft kinect sensor: a review. Cybern. IEEE Trans. 43, 1318–1334 (2013)CrossRefGoogle Scholar
  50. 50.
    Y. Li, Hand gesture recognition using Kinect, in IEEE 3rd International Conference on Software Engineering and Service Science (2012), pp. 196–199Google Scholar
  51. 51.
    D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 142–149 (2000)Google Scholar
  52. 52.
    S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics (MIT Press, 2005)Google Scholar
  53. 53.
    R.E. Kalman, A new approach to linear filtering and prediction problems. J. Fluids Eng. 82, 35–45 (1960)Google Scholar
  54. 54.
    S. Haykin, Kalman Filtering and Neural Networks, vol. 47 (Wiley, 2004)Google Scholar
  55. 55.
    E. Wan, R. Van Der Merwe, The unscented Kalman filter for nonlinear estimation, in IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium (2000), pp. 153–158Google Scholar
  56. 56.
    K. Okuma, A. Taleghani, N. De Freitas, J.J. Little, D.G. Lowe, Computer Vision (Springer, 2004), pp. 28–39Google Scholar
  57. 57.
    S. Oron, A. Bar-Hillel, D. Levi, S. Avidan, Locally orderless tracking, in IEEE Conference on Computer Vision and Pattern Recognition 1940–1947 (2012)Google Scholar
  58. 58.
    J. Kwon, K.M. Lee, Tracking by sampling trackers, in IEEE International Conference on Computer Vision (2011), pp. 1195–1202Google Scholar
  59. 59.
    J. Kwon, K.M. Lee, F.C. Park, Visual tracking via geometric particle filtering on the affine group with optimal importance functions, in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 991–998Google Scholar
  60. 60.
    R. Gao, L. Wang, R. Teti, D. Dornfeld, S. Kumara, M. Mori, M. Helu, Cloud-enabled prognosis for manufacturing. CIRP Ann. Technol. 64(2), 749–772 (2015)CrossRefGoogle Scholar
  61. 61.
    T. Li, S. Sun, T.P. Sattar, J.M. Corchado, Fight sample degeneracy and impoverishment in particle filters: a review of intelligent approaches. Expert Syst. Appl. 41, 3944–3954 (2014)CrossRefGoogle Scholar
  62. 62.
    T. Li, T.P. Sattar, S. Sun, Deterministic resampling: unbiased sampling to avoid sample impoverishment in particle filters. Sig. Process. 92, 1637–1645 (2012)CrossRefGoogle Scholar
  63. 63.
    Rincón J.M. Del, D. Makris, C.O. Uruňuela, J.-C. Nebel, Tracking human position and lower body parts using Kalman and particle filters constrained by human biomechanics. Syst. Man. Cybern. Part B Cybern. IEEE Trans. 41, 26–37 (2011)CrossRefGoogle Scholar
  64. 64.
    D.A. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77, 125–141 (2008)CrossRefGoogle Scholar
  65. 65.
    Z. Kalal, J. Matas, K. Mikolajczyk, Pn learning: bootstrapping binary classifiers by structural constraints, in IEEE Conference on Computer Vision and Pattern Recognition (2010), pp. 49–56Google Scholar
  66. 66.
    B. Babenko, M.-H. Yang, S. Belongie, Visual tracking with online multiple instance learning, in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 983–990Google Scholar
  67. 67.
    A.W.M. Smeulders et al., Visual tracking: an experimental survey. Pattern Anal. Mach. Intell. IEEE Trans. 36, 1442–1468 (2014)CrossRefGoogle Scholar
  68. 68.
    L.E. Peterson, K-nearest neighbor. Scholarpedia 4, 1883 (2009)CrossRefGoogle Scholar
  69. 69.
    A.D. Wilson, A.F. Bobick, Parametric hidden markov models for gesture recognition. Pattern Anal. Mach. Intell. IEEE Trans. 21, 884–900 (1999)CrossRefGoogle Scholar
  70. 70.
    S. Lu, J. Picone, S. Kong, Fingerspelling Alphabet Recognition Using A Two-level Hidden Markov Modeli in Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (2013), p. 1Google Scholar
  71. 71.
    J. McCormick, K. Vincs, S. Nahavandi, D. Creighton, S. Hutchison, Teaching a digital performing agent: artificial neural network and hidden Markov model for recognising and performing dance movement, in Proceedings of the 2014 International Workshop on Movement and Computing (2014), p. 70Google Scholar
  72. 72.
    S.-Z. Yu, Hidden semi-Markov models. Artif. Intell. 174, 215–243 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  73. 73.
    M.A. Hearst, S.T. Dumais, E. Osman, J. Platt, B. Scholkopf, Support vector machines. IEEE Intell. Syst. their Appl. 13, 18–28 (1998)CrossRefGoogle Scholar
  74. 74.
    M.E. Tipping, Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)MathSciNetzbMATHGoogle Scholar
  75. 75.
    B. Schiilkopf, The kernel trick for distances, in Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems, vol. 13 (2001), p. 301Google Scholar
  76. 76.
    A. Cenedese, G.A. Susto, G. Belgioioso, G.I. Cirillo, F. Fraccaroli, Home automation oriented gesture classification from inertial measurements. Autom. Sci. Eng. IEEE Trans. 12, 1200–1210 (2015)CrossRefGoogle Scholar
  77. 77.
    K. Feng, F. Yuan, Static hand gesture recognition based on HOG characters and support vector machines, in 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (2013), pp. 936–938Google Scholar
  78. 78.
    D. Ghimire, J. Lee, Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13, 7714–7734 (2013)CrossRefGoogle Scholar
  79. 79.
    O. Patsadu, C. Nukoolkit, B. Watanapa, Human gesture recognition using Kinect camera, in International Joint Conference on Computer Science and Software Engineering (2012), pp. 28–32Google Scholar
  80. 80.
    R.E. Schapire, Nonlinear estimation and classification (Springer, 2003), pp. 149–171Google Scholar
  81. 81.
    Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  82. 82.
    S. Celebi, A.S. Aydin, T.T. Temiz, T. Arici, Gesture recognition using skeleton data with weighted dynamic time warping. VISAPP 1, 620–625 (2013)Google Scholar
  83. 83.
    E.J. Keogh, M.J. Pazzani, Derivative dynamic time warping. SDM 1, 5–7 (2001)Google Scholar
  84. 84.
    S.S. Haykin, Neural Networks and Learning Machines, vol. 3 (Pearson Education Upper Saddle River, 2009)Google Scholar
  85. 85.
    T.H.H. Maung, Real-time hand tracking and gesture recognition system using neural networks. World Acad. Sci. Eng. Technol. 50, 466–470 (2009)Google Scholar
  86. 86.
    H. Hasan, S. Abdul-Kareem, Static hand gesture recognition using neural networks. Artif. Intell. Rev. 41, 147–181 (2014)CrossRefGoogle Scholar
  87. 87.
    T. D’Orazio, G. Attolico, G. Cicirelli, C. Guaragnella, A neural network approach for human gesture recognition with a kinect sensor. ICPRAM 741–746 (2014)Google Scholar
  88. 88.
    A.H. El-Baz, A.S. Tolba, An efficient algorithm for 3D hand gesture recognition using combined neural classifiers. Neural Comput. Appl. 22, 1477–1484 (2013)CrossRefGoogle Scholar
  89. 89.
    K. Subramanian, S. Suresh, Human action recognition using meta-cognitive neuro-fuzzy inference system. Int. J. Neural Syst. 22, 1250028 (2012)CrossRefGoogle Scholar
  90. 90.
    Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be better than all. Artif. Intell. 137, 239–263 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  91. 91.
    Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  92. 92.
    J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw 61, 85–117 (2015)CrossRefGoogle Scholar
  93. 93.
    J. Tompson, M. Stein, Y. Lecun, K. Perlin, Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph 33, 169 (2014)CrossRefGoogle Scholar
  94. 94.
    K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 568–576 (2014)Google Scholar
  95. 95.
    J. Nagi et al., Max-pooling convolutional neural networks for vision-based hand gesture recognition, in IEEE International Conference on Signal and Image Processing Applications (2011), pp. 342–347Google Scholar
  96. 96.
    A. Jain, J. Tompson, Y. LeCun, C. Bregler, Computer Vision (2014), pp. 302–315Google Scholar
  97. 97.
    K. Li, Y. Fu, Prediction of human activity by discovering temporal sequence patterns. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1644–1657 (2014)CrossRefGoogle Scholar
  98. 98.
    M.S. Ryoo, Human activity prediction: early recognition of ongoing activities from streaming videos, in IEEE International Conference on Computer Vision (2011), pp. 1036–1043Google Scholar
  99. 99.
    W. Ding, K. Liu, F. Cheng, J. Zhang, Learning hierarchical spatio-temporal pattern for human activity prediction. J. Vis. Commun. Image Represent. 35, 103–111 (2016)CrossRefGoogle Scholar
  100. 100.
    L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. IEEE Proc. 77, 257–286 (1989)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Production EngineeringKTH Royal Institute of TechnologyStockholmSweden

Personalised recommendations