Skip to main content

Advertisement

Log in

Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Hand gesture recognition is an essential task in computer vision. It is the most intuitive and natural medium for communication when dealing with computers. Recently, with the advent of innovative technologies and high performing computer systems, there has been a surge in the research of Gesture Recognition. Traditional approaches to modelling skeletons are typically based on hand-crafted components or traversal algorithms, leading to limited expressive capacity and generalisation challenges. In this work, we present a novel dynamic skeleton model based on BiLSTM and soft attention named DyHand that mitigates the challenges of intra-class and inter-class variability of gesture classes to a great extent. The comparison of our model with state-of-the-art approaches on the two benchmark data sets with various data augmentation techniques is reported. The proposed approach yields the best results, achieving 97.14 and 96.42% recognition accuracy in the 14 and 28 gesture categories, respectively, for the DHG-14/28 data set and comparable recognition accuracy of 93.98% on 14 gesture classes and 87.86% on 28 gesture classes, respectively, in case of SHREC’17 data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The data sets used in this research is publicly available. The DHG-14/28 data set is available at http://www-rech.telecom-lille.fr/DHGdataset/. The SHREC’17 data set is available at http://www-rech.telecom-lille.fr/shrec2017-hand/ and http://dx.doi.org/10.2312/3dor.20171049.

References

  1. De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Heterogeneous hand gesture recognition using 3d dynamic skeletal data. Comput. Vis. Image Underst. 181, 60–72 (2019)

    Article  Google Scholar 

  2. Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition, vol. 12, pp. 296–301 (1995). IEEE Computer Society, Washington

  3. Wang, C., Liu, Z., Chan, S.-C.: Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans. Multimedia 17(1), 29–39 (2014)

    Article  Google Scholar 

  4. Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–7 (2015)

  5. De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)

  6. De Smedt, Q., Wannous, H., Vandeborre, J.-P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)

  7. Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2881–2885 (2017). IEEE

  8. Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)

    Article  Google Scholar 

  9. Thang, N.D., Kim, T.-S., Lee, Y.-K., Lee, S.: Estimation of 3-d human body posture via co-registration of 3-d human model and sequential stereo information. Appl. Intell. 35(2), 163–177 (2011)

    Article  Google Scholar 

  10. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)

  11. Oberweger, M., Lepetit, V.: Deepprior++: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)

  12. Biswas, K.K., Basu, S.K.: Gesture recognition using microsoft kinect®. In: The 5th International Conference on Automation, Robotics and Applications, pp. 100–103. IEEE (2011)

  13. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.-K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)

  14. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimedia Tools Appl. 75(22), 14991–15015 (2016)

    Article  Google Scholar 

  15. Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using Naive-Bayes-nearest-neighbor. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19 (2012). IEEE

  16. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012). IEEE

  17. Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)

  18. Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1809–1816 (2013)

  19. Evangelidis, G., Singh, G., Horaud, R.: Skeletal quads: Human action recognition using joint quadruples. In: 2014 22nd International Conference on Pattern Recognition, pp. 4513–4518 (2014). IEEE

  20. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)

  21. Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using gram matrix embeddings on a Riemannian manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4498–4507 (2016)

  22. Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

  23. Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 106–113. IEEE (2018)

  24. Liu, J., Wang, G., Duan, L.-Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)

    Article  MathSciNet  Google Scholar 

  25. Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12036–12045 (2019)

  26. Ionescu, B., Coquin, D., Lambert, P., Buzuloiu, V.: Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J. Adv. Signal Process. 2005(13), 1–9 (2005)

    Article  Google Scholar 

  27. Reddy, K.S., Latha, P.S., Babu, M.R.: Hand gesture recognition using skeleton of hand and distance based metric. In: International Conference on Advances in Computing and Information Technology, pp. 346–354. Springer (2011)

  28. Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)

  29. Wang, C., Chan, S.: A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover’s distance. In: 2014 4th International Workshop on Cognitive Information Processing (CIP), pp. 1–6. IEEE (2014)

  30. Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-d human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2014)

    Article  PubMed  Google Scholar 

  31. Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3d pattern assembled trajectories. In: 2017 7th International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE (2017)

  32. Boulahia, S.Y., Anquetil, E., Kulpa, R., Multon, F.: Hif3d: Handwriting-inspired features for 3d skeleton-based action recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 985–990. IEEE (2016)

  33. Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016 (2016). https://doi.org/10.1155/2016/4351435

  34. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

  35. Neverova, N., Wolf, C., Paci, G., Sommavilla, G., Taylor, G., Nebout, F.: A multi-scale approach to gesture detection and recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 484–491 (2013)

  36. Neverova, N., Wolf, C., Taylor, G., Nebout, F.: Moddrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1692–1706 (2015)

    Article  Google Scholar 

  37. Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)

  38. Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)

    Article  Google Scholar 

  39. Wang, S., Zhang, S., Zhang, X., Geng, Q.: A two-branch hand gesture recognition approach combining Atrous convolution and attention mechanism. Vis. Comput. 39(10), 4487–4500 (2023)

    Article  Google Scholar 

  40. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI Conference on Artificial Intelligence (2018)

  41. Caputo, F.M., Prebianca, P., Carcangiu, A., Spano, L.D., Giachetti, A.: Comparing 3d trajectories for simple mid-air gesture recognition. Comput. Graph. 73, 17–25 (2018)

    Article  Google Scholar 

  42. Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)

    Article  PubMed  PubMed Central  Google Scholar 

  43. Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)

  44. Tai, D.N., Na, I.S., Kim, S.H.: Hsfe network and fusion model based dynamic hand gesture recognition. KSII Trans. Internet Inf. Syst. (TIIS) 14(9), 3924–3940 (2020)

    Google Scholar 

  45. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  CAS  PubMed  Google Scholar 

  46. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  47. Laurent, T., von Brecht, J.: A recurrent neural network without chaos. arXiv preprint arXiv:1612.06212 (2016)

  48. Oord, A.V.D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

  49. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  50. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)

  51. Corbetta, M., Shulman, G.L.: Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3(3), 201–215 (2002)

    Article  CAS  PubMed  Google Scholar 

  52. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

  53. Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)

  54. Maghoumi, M., LaViola, J.J.: Deepgru: Deep gesture recognition utility. In: International Symposium on Visual Computing, pp. 16–31. Springer (2019)

  55. Zhang, W., Lin, Z., Cheng, J., Ma, C., Deng, X., Wang, H.: Sta-gcn: two-stream graph convolutional network with spatial-temporal attention for hand gesture recognition. Vis. Comput. 36(10), 2433–2444 (2020)

    Article  Google Scholar 

  56. Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., Maybank, S.J.: Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans. Image Process. 31, 164–175 (2021)

    Article  PubMed  Google Scholar 

  57. Song, J.-H., Kong, K., Kang, S.-J.: Dynamic hand gesture recognition using improved spatio-temporal graph convolutional network. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6227–6239 (2022)

    Article  Google Scholar 

  58. Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimedia 26, 811–823 (2024)

    Article  Google Scholar 

  59. Miah, A.S.M., Hasan, M.A.M., Shin, J.: Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model. IEEE Access 11, 4703–4716 (2023)

    Article  Google Scholar 

  60. Mahmud, H., Morshed, M.M., Hasan, M.K.: Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition. Vis. Comput. 40(1), 11–25 (2024)

    Article  Google Scholar 

  61. Singh, A., Singh, T.D., Bandyopadhyay, S.: Attention based video captioning framework for Hindi. Multimedia Syst. 28(1), 195–207 (2022)

    Article  Google Scholar 

  62. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  63. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  64. Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis. Comput. 34(6), 1053–1063 (2018)

    Article  Google Scholar 

  65. De Smedt, Q.: Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. PhD thesis, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Pratap Singh.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R.P., Singh, L.D. Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03307-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03307-4

Keywords

Navigation