Skip to main content

Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration

  • Conference paper
  • First Online:
Intelligent Autonomous Systems 17 (IAS 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 577))

Included in the following conference series:

Abstract

Human action recognition plays a major role in enabling an effective and safe collaboration between humans and robots. Considering for example a collaborative assembly task, the human worker can use gestures to communicate with the robot while the robot can exploit the recognized actions to anticipate the next steps in the assembly process, improving safety and the overall productivity. In this work, we propose a novel framework for human action recognition based on 3D pose estimation and ensemble techniques. In such framework, we first estimate the 3D coordinates of the human hands and body joints by means of OpenPose and RGB-D data. The estimated joints are then fed to a set of graph convolutional networks derived from Shift-GCN, one network for each set of joints (i.e., body, left hand and right hand). Finally, using an ensemble approach we average the output scores of all the networks to predict the final human action. The proposed framework was evaluated on a dedicated dataset, named IAS-Lab Collaborative HAR dataset, which includes both actions and gestures commonly used in human-robot collaboration tasks. The experimental results demonstrated how the ensemble of the different action recognition models helps improving the accuracy and the robustness of the overall system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at http://robotics.dei.unipd.it/.

  2. 2.

    https://www.intelrealsense.com/lidar-camera-l515/.

  3. 3.

    https://github.com/kchengiva/Shift-GCN.

References

  1. Villani, V., Pini, F., Leali, F., Secchi, C.: Survey on human-robot collaboration in industrial settings: safety, intuitive interfaces and applications. Mechatronics 55, 248–266 (2018)

    Article  Google Scholar 

  2. Matheson, E., Minto, R., Zampieri, E.G., Faccio, M., Rosati, G.: Human-robot collaboration in manufacturing applications: a review. Robotics 8(4), 100 (2019)

    Article  Google Scholar 

  3. Kim, W., Peternel, L., Lorenzini, M., Babič, J., Ajoudani, A.: A human-robot collaboration framework for improving ergonomics during dexterous operation of power tools. Robot. Comput.-Integr. Manuf. 68, 102084 (2021)

    Article  Google Scholar 

  4. Liu, H., Fang, T., Zhou, T., Wang, L.: Towards robust human-robot collaborative manufacturing: multimodal fusion. IEEE Access 6, 74762–74771 (2018)

    Article  Google Scholar 

  5. Mohammadi Amin, F., Rezayati, M., van de Venn, H.W., Karimpour, H.: A mixed-perception approach for safe human-robot collaboration in industrial automation. Sensors 20(21), 6347 (2020)

    Article  Google Scholar 

  6. Kobayashi, T., Aoki, Y., Shimizu, S., Kusano, K., Okumura, S.: Fine-grained action recognition in assembly work scenes by drawing attention to the hands. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 440–446. IEEE (2019)

    Google Scholar 

  7. Liu, K., Zhu, M., Fu, H., Ma, H., Chua, T.S.: Enhancing anomaly detection in surveillance videos with transfer learning from action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4664–4668 (2020)

    Google Scholar 

  8. Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11(1), 5–22 (2019)

    Google Scholar 

  9. Ranieri, C.M., MacLeod, S., Dragone, M., Vargas, P.A., Romero, R.A.F.: Activity recognition for ambient assisted living with videos, inertial units and ambient sensors. Sensors 21(3), 768 (2021)

    Article  Google Scholar 

  10. Al-Amin, M., Tao, W., Doell, D., Lingard, R., Yin, Z., Leu, M.C., Qin, R.: Action recognition in manufacturing assembly using multimodal sensor fusion. Procedia Manuf. 39, 158–167 (2019)

    Article  Google Scholar 

  11. Bo, W., Fuqi, M., Rong, J., Peng, L., Xuzhu, D.: Skeleton-based violation action recognition method for safety supervision in the operation field of distribution network based on graph convolutional network. CSEE J. Power Energy Syst. (2021)

    Google Scholar 

  12. Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 168–172. IEEE (2015)

    Google Scholar 

  13. Yu, J., Gao, H., Yang, W., Jiang, Y., Chin, W., Kubota, N., Ju, Z.: A discriminative deep model with feature fusion and temporal attention for human action recognition. IEEE Access 8, 43243–43255 (2020)

    Article  Google Scholar 

  14. Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6, 1155–1166 (2017)

    Article  Google Scholar 

  15. Feichtenhofer, C.: X3D: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)

    Google Scholar 

  16. Wen, X., Chen, H., Hong, Q.: Human assembly task recognition in human-robot collaboration based on 3D CNN. In: 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 1230–1234. IEEE (2019)

    Google Scholar 

  17. Xiong, Q., Zhang, J., Wang, P., Liu, D., Gao, R.X.: Transferable two-stream convolutional neural network for human action recognition. J. Manuf. Syst. 56, 605–614 (2020)

    Article  Google Scholar 

  18. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)

  19. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)

    Article  Google Scholar 

  20. Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  21. Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)

    Google Scholar 

  22. Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

    Google Scholar 

  23. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)

    Google Scholar 

  24. Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)

    Google Scholar 

  25. Martins, G.S., Santos, L., Dias, J.: The GrowMeUp project and the applicability of action recognition techniques. In: Third Workshop on Recognition and Action for Scene Understanding (REACTS), Ruiz de Aloza (2015)

    Google Scholar 

Download references

Acknowledgment

The research leading to these results has received funding from the European Unions Horizon 2020 research and innovation program under grant agreement No. 101006732. Part of this work was supported by MIUR (Italian Minister for Education) under the initiative “Departments of Excellence” (Law 232/2016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Terreran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Terreran, M., Lazzaretto, M., Ghidoni, S. (2023). Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration. In: Petrovic, I., Menegatti, E., Marković, I. (eds) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-031-22216-0_3

Download citation

Publish with us

Policies and ethics