Abstract
The paper presents an approach to the multimodal recognition of dynamic and static gestures of Russian sign language through 3D convolutional and LSTM neural networks. A set of data in color format and a depth map, consisting of 48 one-handed gestures of Russian sign language, is presented as well. The set of data was obtained with the use of the Kinect sensor v2 and contains records of 13 different native signers of Russian sign language. The obtained results are compared with these of other methods. The experiment on classification showed a great potential of neural networks in solving this problem. Achieved recognition accuracy was of 73.25%, and, compared to other approaches to the problem, this turns out to be the best result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ryumin, D., Karpov, A.A.: Towards automatic recognition of sign language gestures using kinect 2.0. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10278, pp. 89–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58703-5_7
Karpov, A., Krnoul, Z., Zelezny, M., Ronzhin, A.: Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013. LNCS, vol. 8009, pp. 520–529. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39188-0_56
Ryumin, D., Ivanko, D., Axyonov, A., Kagirov, I., Karpov, A., Zelezny, M.: Human-robot interaction with smart shopping trolley using sign language: data collection. In: Proceedings of IEEE International Conference on Pervasive Computing and Communications, PerCom-2019, Kyoto, Japan (2019, in press)
Lin, W., Du, L., Harris-Adamson, C., Barr, A., Rempel, D.: Design of hand gestures for manipulating objects in virtual reality. In: Kurosu, M. (ed.) HCI 2017. LNCS, vol. 10271, pp. 584–592. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58071-5_44
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2018, arXiv preprint arXiv:1812.08008 (2018)
Oyedotun, O., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 28(12), 3941–3951 (2017)
Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.G.: Hidden two-stream convolutional networks for action recognition. arXiv preprint arXiv:1704.00389 (2017)
Ouyang, D., Zhang, Y., Shao, J.: Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117, 153–160 (2019)
Li, Z., Gavves, E., Jain, M., Snoek, C.G.: VideoLSTM convolves, attends and flows for action recognition. arXiv preprint arXiv:1607.01794 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2010)
Nanni, L., Ghidoni, S., Brahnam, S.: Handcrafted vs non-handcrafted features for computer vision classification. Pattern Recogn. 71, 158–172 (2017)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2(3), 27 (2011)
Remilekun Basaru, R., Slabaugh, G., Alonso, E., Child, C.: Hand pose estimation using deep stereovision and markov-chain monte carlo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 595–603 (2017)
Sinha, K., Kumari, R., Priya, A., Paul, P.: A computer vision-based gesture recognition using hidden markov model. In: Chattopadhyay, J., Singh, R., Bhattacherjee, V. (eds.) Innovations in Soft Computing and Information Technology, pp. 55–67. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3185-5_6
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dynamic time warping for continuous hand trajectory gesture recognition. Pattern Recogn. 80, 21–31 (2018)
Li, G., Wu, H., Jiang, G., Xu, S., Liu, H.: Dynamic gesture recognition in the Internet of Things. IEEE Access 7, 23713–23724 (2019)
Priyal, S., Bora, P.: A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments. Pattern Recogn. 46(8), 2202–2219 (2013)
Lin, J., Ruan, X., Yu, N., Yang, Y.: Adaptive local spatiotemporal features from RGB-D data for one-shot learning gesture recognition. Sensors 16(12), 2171 (2016)
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.: Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2017, pp. 3296–3297 (2017)
Ranjan, R., Patel, V., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2019)
Pigou, L., Van Den Oord, A., Dieleman, S., Van Herreweghe, M., Dambre, J.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vis. 126, 430–439 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Escalante, H., et al.: ChaLearn joint contest on multimedia challenges beyond visual analysis: an overview. In: 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 67–73 (2016)
Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 19–24 (2016)
Duan, J., Zhou, S., Wan, J., Guo, X., Li, S.: Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition. arXiv preprint arXiv:1611.06689 (2016)
Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14(1), 21 (2018)
Kudubayeva, S., Ryumin, D., Kalghanov, M.: The influence of the kazakh language semantic peculiarities on computer sign language. In: International Conferences on Information and Communication Technology, Society, and Human Beings, ICT-2016, Madeira, Portugal, pp. 221–226 (2016)
Karpov, A., Kipyatkova, I., Zelezny, M.: Automatic technologies for processing spoken sign languages. In: 5th Workshop on Spoken Language Technologies for Under-Resourced Languages, SLTU-2016, vol. 81, pp. 201–207 (2016)
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 7–12 (2016)
Gavrila, D.: The visual analysis of human movement: a survey. Comput. vis. Image Underst. 73(1), 2–98 (1999)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)
Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd (2017)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)
Tung, P., Ngoc, L.: Elliptical density shape model for hand gesture recognition. In: International Proceedings of the ICTD (2014)
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W.: Fusing shape and spatiotemporal features for depth-based dynamic hand gesture recognition. In: Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W. (eds.) Multimedia Tools and Applications, vol. 76, pp. 1–20. Springer, New York (2016). https://doi.org/10.1007/s11042-016-3988-8
Acknowledgments
This research is financially supported by the Ministry of Science and Higher Education of the Russian Federation, agreement No. 14.616.21.0095 (reference RFMEFI61618X0095).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kagirov, I., Ryumin, D., Axyonov, A. (2019). Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-26061-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)