Skip to main content

Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2019)

Abstract

The paper presents an approach to the multimodal recognition of dynamic and static gestures of Russian sign language through 3D convolutional and LSTM neural networks. A set of data in color format and a depth map, consisting of 48 one-handed gestures of Russian sign language, is presented as well. The set of data was obtained with the use of the Kinect sensor v2 and contains records of 13 different native signers of Russian sign language. The obtained results are compared with these of other methods. The experiment on classification showed a great potential of neural networks in solving this problem. Achieved recognition accuracy was of 73.25%, and, compared to other approaches to the problem, this turns out to be the best result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ryumin, D., Karpov, A.A.: Towards automatic recognition of sign language gestures using kinect 2.0. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10278, pp. 89–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58703-5_7

    Chapter  Google Scholar 

  2. Karpov, A., Krnoul, Z., Zelezny, M., Ronzhin, A.: Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013. LNCS, vol. 8009, pp. 520–529. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39188-0_56

    Chapter  Google Scholar 

  3. Ryumin, D., Ivanko, D., Axyonov, A., Kagirov, I., Karpov, A., Zelezny, M.: Human-robot interaction with smart shopping trolley using sign language: data collection. In: Proceedings of IEEE International Conference on Pervasive Computing and Communications, PerCom-2019, Kyoto, Japan (2019, in press)

    Google Scholar 

  4. Lin, W., Du, L., Harris-Adamson, C., Barr, A., Rempel, D.: Design of hand gestures for manipulating objects in virtual reality. In: Kurosu, M. (ed.) HCI 2017. LNCS, vol. 10271, pp. 584–592. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58071-5_44

    Chapter  Google Scholar 

  5. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2018, arXiv preprint arXiv:1812.08008 (2018)

  6. Oyedotun, O., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 28(12), 3941–3951 (2017)

    Article  Google Scholar 

  7. Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.G.: Hidden two-stream convolutional networks for action recognition. arXiv preprint arXiv:1704.00389 (2017)

  8. Ouyang, D., Zhang, Y., Shao, J.: Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117, 153–160 (2019)

    Article  Google Scholar 

  9. Li, Z., Gavves, E., Jain, M., Snoek, C.G.: VideoLSTM convolves, attends and flows for action recognition. arXiv preprint arXiv:1607.01794 (2016)

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Google Scholar 

  11. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2010)

    Article  Google Scholar 

  12. Nanni, L., Ghidoni, S., Brahnam, S.: Handcrafted vs non-handcrafted features for computer vision classification. Pattern Recogn. 71, 158–172 (2017)

    Article  Google Scholar 

  13. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2(3), 27 (2011)

    Google Scholar 

  14. Remilekun Basaru, R., Slabaugh, G., Alonso, E., Child, C.: Hand pose estimation using deep stereovision and markov-chain monte carlo. In Proceedings of the IEEE International Conference on Computer Vision, pp. 595–603 (2017)

    Google Scholar 

  15. Sinha, K., Kumari, R., Priya, A., Paul, P.: A computer vision-based gesture recognition using hidden markov model. In: Chattopadhyay, J., Singh, R., Bhattacherjee, V. (eds.) Innovations in Soft Computing and Information Technology, pp. 55–67. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-3185-5_6

    Chapter  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  17. Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dynamic time warping for continuous hand trajectory gesture recognition. Pattern Recogn. 80, 21–31 (2018)

    Article  Google Scholar 

  18. Li, G., Wu, H., Jiang, G., Xu, S., Liu, H.: Dynamic gesture recognition in the Internet of Things. IEEE Access 7, 23713–23724 (2019)

    Article  Google Scholar 

  19. Priyal, S., Bora, P.: A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments. Pattern Recogn. 46(8), 2202–2219 (2013)

    Article  Google Scholar 

  20. Lin, J., Ruan, X., Yu, N., Yang, Y.: Adaptive local spatiotemporal features from RGB-D data for one-shot learning gesture recognition. Sensors 16(12), 2171 (2016)

    Article  Google Scholar 

  21. Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.: Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 56–64 (2016)

    Google Scholar 

  22. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  23. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2017, pp. 3296–3297 (2017)

    Google Scholar 

  24. Ranjan, R., Patel, V., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2019)

    Article  Google Scholar 

  25. Pigou, L., Van Den Oord, A., Dieleman, S., Van Herreweghe, M., Dambre, J.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vis. 126, 430–439 (2018)

    Article  MathSciNet  Google Scholar 

  26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  27. Escalante, H., et al.: ChaLearn joint contest on multimedia challenges beyond visual analysis: an overview. In: 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 67–73 (2016)

    Google Scholar 

  28. Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., Shen, P.: Large-scale isolated gesture recognition using pyramidal 3D convolutional networks. In 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 19–24 (2016)

    Google Scholar 

  29. Duan, J., Zhou, S., Wan, J., Guo, X., Li, S.: Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition. arXiv preprint arXiv:1611.06689 (2016)

  30. Duan, J., Wan, J., Zhou, S., Guo, X., Li, S.: A unified framework for multi-modal isolated gesture recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14(1), 21 (2018)

    Google Scholar 

  31. Kudubayeva, S., Ryumin, D., Kalghanov, M.: The influence of the kazakh language semantic peculiarities on computer sign language. In: International Conferences on Information and Communication Technology, Society, and Human Beings, ICT-2016, Madeira, Portugal, pp. 221–226 (2016)

    Google Scholar 

  32. Karpov, A., Kipyatkova, I., Zelezny, M.: Automatic technologies for processing spoken sign languages. In: 5th Workshop on Spoken Language Technologies for Under-Resourced Languages, SLTU-2016, vol. 81, pp. 201–207 (2016)

    Google Scholar 

  33. Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: Proceedings of the 23rd International Conference on Pattern Recognition, ICPR-2016, pp. 7–12 (2016)

    Google Scholar 

  34. Gavrila, D.: The visual analysis of human movement: a survey. Comput. vis. Image Underst. 73(1), 2–98 (1999)

    Article  Google Scholar 

  35. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)

    Google Scholar 

  36. Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd (2017)

    Google Scholar 

  37. Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: Twenty-Third International Joint Conference on Artificial Intelligence (2013)

    Google Scholar 

  38. Tung, P., Ngoc, L.: Elliptical density shape model for hand gesture recognition. In: International Proceedings of the ICTD (2014)

    Google Scholar 

  39. Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)

    Google Scholar 

  40. Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W.: Fusing shape and spatiotemporal features for depth-based dynamic hand gesture recognition. In: Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W. (eds.) Multimedia Tools and Applications, vol. 76, pp. 1–20. Springer, New York (2016). https://doi.org/10.1007/s11042-016-3988-8

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is financially supported by the Ministry of Science and Higher Education of the Russian Federation, agreement No. 14.616.21.0095 (reference RFMEFI61618X0095).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ildar Kagirov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kagirov, I., Ryumin, D., Axyonov, A. (2019). Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26061-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26060-6

  • Online ISBN: 978-3-030-26061-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics