Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration

Terreran, Matteo; Lazzaretto, Margherita; Ghidoni, Stefano

doi:10.1007/978-3-031-22216-0_3

Matteo Terreran¹²,
Margherita Lazzaretto¹² &
Stefano Ghidoni¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 577))

Included in the following conference series:

International Conference on Intelligent Autonomous Systems

1051 Accesses
6 Citations

Abstract

Human action recognition plays a major role in enabling an effective and safe collaboration between humans and robots. Considering for example a collaborative assembly task, the human worker can use gestures to communicate with the robot while the robot can exploit the recognized actions to anticipate the next steps in the assembly process, improving safety and the overall productivity. In this work, we propose a novel framework for human action recognition based on 3D pose estimation and ensemble techniques. In such framework, we first estimate the 3D coordinates of the human hands and body joints by means of OpenPose and RGB-D data. The estimated joints are then fed to a set of graph convolutional networks derived from Shift-GCN, one network for each set of joints (i.e., body, left hand and right hand). Finally, using an ensemble approach we average the output scores of all the networks to predict the final human action. The proposed framework was evaluated on a dedicated dataset, named IAS-Lab Collaborative HAR dataset, which includes both actions and gestures commonly used in human-robot collaboration tasks. The experimental results demonstrated how the ensemble of the different action recognition models helps improving the accuracy and the robustness of the overall system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://robotics.dei.unipd.it/.
2.
https://www.intelrealsense.com/lidar-camera-l515/.
3.
https://github.com/kchengiva/Shift-GCN.

References

Villani, V., Pini, F., Leali, F., Secchi, C.: Survey on human-robot collaboration in industrial settings: safety, intuitive interfaces and applications. Mechatronics 55, 248–266 (2018)
Article Google Scholar
Matheson, E., Minto, R., Zampieri, E.G., Faccio, M., Rosati, G.: Human-robot collaboration in manufacturing applications: a review. Robotics 8(4), 100 (2019)
Article Google Scholar
Kim, W., Peternel, L., Lorenzini, M., Babič, J., Ajoudani, A.: A human-robot collaboration framework for improving ergonomics during dexterous operation of power tools. Robot. Comput.-Integr. Manuf. 68, 102084 (2021)
Article Google Scholar
Liu, H., Fang, T., Zhou, T., Wang, L.: Towards robust human-robot collaborative manufacturing: multimodal fusion. IEEE Access 6, 74762–74771 (2018)
Article Google Scholar
Mohammadi Amin, F., Rezayati, M., van de Venn, H.W., Karimpour, H.: A mixed-perception approach for safe human-robot collaboration in industrial automation. Sensors 20(21), 6347 (2020)
Article Google Scholar
Kobayashi, T., Aoki, Y., Shimizu, S., Kusano, K., Okumura, S.: Fine-grained action recognition in assembly work scenes by drawing attention to the hands. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 440–446. IEEE (2019)
Google Scholar
Liu, K., Zhu, M., Fu, H., Ma, H., Chua, T.S.: Enhancing anomaly detection in surveillance videos with transfer learning from action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4664–4668 (2020)
Google Scholar
Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11(1), 5–22 (2019)
Google Scholar
Ranieri, C.M., MacLeod, S., Dragone, M., Vargas, P.A., Romero, R.A.F.: Activity recognition for ambient assisted living with videos, inertial units and ambient sensors. Sensors 21(3), 768 (2021)
Article Google Scholar
Al-Amin, M., Tao, W., Doell, D., Lingard, R., Yin, Z., Leu, M.C., Qin, R.: Action recognition in manufacturing assembly using multimodal sensor fusion. Procedia Manuf. 39, 158–167 (2019)
Article Google Scholar
Bo, W., Fuqi, M., Rong, J., Peng, L., Xuzhu, D.: Skeleton-based violation action recognition method for safety supervision in the operation field of distribution network based on graph convolutional network. CSEE J. Power Energy Syst. (2021)
Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 168–172. IEEE (2015)
Google Scholar
Yu, J., Gao, H., Yang, W., Jiang, Y., Chin, W., Kubota, N., Ju, Z.: A discriminative deep model with feature fusion and temporal attention for human action recognition. IEEE Access 8, 43243–43255 (2020)
Article Google Scholar
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6, 1155–1166 (2017)
Article Google Scholar
Feichtenhofer, C.: X3D: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Google Scholar
Wen, X., Chen, H., Hong, Q.: Human assembly task recognition in human-robot collaboration based on 3D CNN. In: 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 1230–1234. IEEE (2019)
Google Scholar
Xiong, Q., Zhang, J., Wang, P., Liu, D., Gao, R.X.: Transferable two-stream convolutional neural network for human action recognition. J. Manuf. Syst. 56, 605–614 (2020)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Article Google Scholar
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Google Scholar
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Google Scholar
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)
Google Scholar
Martins, G.S., Santos, L., Dias, J.: The GrowMeUp project and the applicability of action recognition techniques. In: Third Workshop on Recognition and Action for Scene Understanding (REACTS), Ruiz de Aloza (2015)
Google Scholar

Download references

Acknowledgment

The research leading to these results has received funding from the European Unions Horizon 2020 research and innovation program under grant agreement No. 101006732. Part of this work was supported by MIUR (Italian Minister for Education) under the initiative “Departments of Excellence” (Law 232/2016).

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, Padova, Italy
Matteo Terreran, Margherita Lazzaretto & Stefano Ghidoni

Authors

Matteo Terreran
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Lazzaretto
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Ghidoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Terreran .

Editor information

Editors and Affiliations

Faculty of Electrical Engineering, University of Zagreb, Zagreb, Croatia
Ivan Petrovic
Department of Information Engineering, University of Padua, Padua, Italy
Emanuele Menegatti
Faculty of Electrical Engineering, University of Zagreb, Zagreb, Croatia
Ivan Marković

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Terreran, M., Lazzaretto, M., Ghidoni, S. (2023). Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration. In: Petrovic, I., Menegatti, E., Marković, I. (eds) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-031-22216-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-22216-0_3
Published: 18 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22215-3
Online ISBN: 978-3-031-22216-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics