Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks

Kagirov, Ildar; Ryumin, Dmitry; Axyonov, Alexandr

doi:10.1007/978-3-030-26061-3_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

International Conference on Speech and Computer

1215 Accesses
5 Citations

Abstract

The paper presents an approach to the multimodal recognition of dynamic and static gestures of Russian sign language through 3D convolutional and LSTM neural networks. A set of data in color format and a depth map, consisting of 48 one-handed gestures of Russian sign language, is presented as well. The set of data was obtained with the use of the Kinect sensor v2 and contains records of 13 different native signers of Russian sign language. The obtained results are compared with these of other methods. The experiment on classification showed a great potential of neural networks in solving this problem. Achieved recognition accuracy was of 73.25%, and, compared to other approaches to the problem, this turns out to be the best result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ryumin, D., Karpov, A.A.: Towards automatic recognition of sign language gestures using kinect 2.0. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10278, pp. 89–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58703-5_7
Chapter Google Scholar
Karpov, A., Krnoul, Z., Zelezny, M., Ronzhin, A.: Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013. LNCS, vol. 8009, pp. 520–529. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39188-0_56
Chapter Google Scholar
Ryumin, D., Ivanko, D., Axyonov, A., Kagirov, I., Karpov, A., Zelezny, M.: Human-robot interaction with smart shopping trolley using sign language: data collection. In: Proceedings of IEEE International Conference on Pervasive Computing and Communications, PerCom-2019, Kyoto, Japan (2019, in press)
Google Scholar
Lin, W., Du, L., Harris-Adamson, C., Barr, A., Rempel, D.: Design of hand gestures for manipulating objects in virtual reality. In: Kurosu, M. (ed.) HCI 2017. LNCS, vol. 10271, pp. 584–592. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58071-5_44
Chapter Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2018, arXiv preprint arXiv:1812.08008 (2018)
Oyedotun, O., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 28(12), 3941–3951 (2017)
Article Google Scholar
Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.G.: Hidden two-stream convolutional networks for action recognition. arXiv preprint arXiv:1704.00389 (2017)
Ouyang, D., Zhang, Y., Shao, J.: Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117, 153–160 (2019)
Article Google Scholar
Li, Z., Gavves, E., Jain, M., Snoek, C.G.: VideoLSTM convolves, attends and flows for action recognition. arXiv preprint arXiv:1607.01794 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2010)
Article Google Scholar
Nanni, L., Ghidoni, S., Brahnam, S.: Handcrafted vs non-handcrafted features for computer vision classification. Pattern Recogn. 71, 158–172 (2017)
Article Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2(3), 27 (2011)
Google Scholar
Remilekun Basaru, R., Slabaugh, G., Alonso, E., Child, C.: Hand pose estimation using deep stereovision and markov-chain monte carlo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 595–603 (2017)
Google Scholar
Sinha, K., Kumari, R., Priya, A., Paul, P.: A computer vision-based gesture recognition using hidden markov model. In: Chattopadhyay, J., Singh, R., Bhattacherjee, V. (eds.) Innovations in Soft Computing and Information Technology, pp. 55–67. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3185-5_6
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dynamic time warping for continuous hand trajectory gesture recognition. Pattern Recogn. 80, 21–31 (2018)
Article Google Scholar
Li, G., Wu, H., Jiang, G., Xu, S., Liu, H.: Dynamic gesture recognition in the Internet of Things. IEEE Access 7, 23713–23724 (2019)
Article Google Scholar
Priyal, S., Bora, P.: A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments. Pattern Recogn. 46(8), 2202–2219 (2013)
Article Google Scholar
Lin, J., Ruan, X., Yu, N., Yang, Y.: Adaptive local spatiotemporal features from RGB-D data for one-shot learning gesture recognition. Sensors 16(12), 2171 (2016)
Article Google Scholar
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.: Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2017, pp. 3296–3297 (2017)
Google Scholar
Ranjan, R., Patel, V., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2019)
Article Google Scholar
Pigou, L., Van Den Oord, A., Dieleman, S., Van Herreweghe, M., Dambre, J.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vis. 126, 430–439 (2018)
Article MathSciNet Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Escalante, H., et al.: ChaLearn joint contest on multimedia challenges beyond visual analysis: an overview. In: 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 67–73 (2016)
Google Scholar
Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 19–24 (2016)
Google Scholar
Duan, J., Zhou, S., Wan, J., Guo, X., Li, S.: Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition. arXiv preprint arXiv:1611.06689 (2016)
Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14(1), 21 (2018)
Google Scholar
Kudubayeva, S., Ryumin, D., Kalghanov, M.: The influence of the kazakh language semantic peculiarities on computer sign language. In: International Conferences on Information and Communication Technology, Society, and Human Beings, ICT-2016, Madeira, Portugal, pp. 221–226 (2016)
Google Scholar
Karpov, A., Kipyatkova, I., Zelezny, M.: Automatic technologies for processing spoken sign languages. In: 5th Workshop on Spoken Language Technologies for Under-Resourced Languages, SLTU-2016, vol. 81, pp. 201–207 (2016)
Google Scholar
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 7–12 (2016)
Google Scholar
Gavrila, D.: The visual analysis of human movement: a survey. Comput. vis. Image Underst. 73(1), 2–98 (1999)
Article Google Scholar
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)
Google Scholar
Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd (2017)
Google Scholar
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)
Google Scholar
Tung, P., Ngoc, L.: Elliptical density shape model for hand gesture recognition. In: International Proceedings of the ICTD (2014)
Google Scholar
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Google Scholar
Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W.: Fusing shape and spatiotemporal features for depth-based dynamic hand gesture recognition. In: Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W. (eds.) Multimedia Tools and Applications, vol. 76, pp. 1–20. Springer, New York (2016). https://doi.org/10.1007/s11042-016-3988-8
Chapter Google Scholar

Download references

Acknowledgments

This research is financially supported by the Ministry of Science and Higher Education of the Russian Federation, agreement No. 14.616.21.0095 (reference RFMEFI61618X0095).

Author information

Authors and Affiliations

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Ildar Kagirov, Dmitry Ryumin & Alexandr Axyonov

Authors

Ildar Kagirov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Ryumin
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr Axyonov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ildar Kagirov .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Albert Ali Salah
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kagirov, I., Ryumin, D., Axyonov, A. (2019). Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-26061-3_20
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics