Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods

Singh, Rohit Pratap; Singh, Laiphrakpam Dolendro

doi:10.1007/s00371-024-03307-4

Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods

Original article
Published: 18 March 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

103 Accesses
1 Altmetric
Explore all metrics

Abstract

Hand gesture recognition is an essential task in computer vision. It is the most intuitive and natural medium for communication when dealing with computers. Recently, with the advent of innovative technologies and high performing computer systems, there has been a surge in the research of Gesture Recognition. Traditional approaches to modelling skeletons are typically based on hand-crafted components or traversal algorithms, leading to limited expressive capacity and generalisation challenges. In this work, we present a novel dynamic skeleton model based on BiLSTM and soft attention named DyHand that mitigates the challenges of intra-class and inter-class variability of gesture classes to a great extent. The comparison of our model with state-of-the-art approaches on the two benchmark data sets with various data augmentation techniques is reported. The proposed approach yields the best results, achieving 97.14 and 96.42% recognition accuracy in the 14 and 28 gesture categories, respectively, for the DHG-14/28 data set and comparable recognition accuracy of 93.98% on 14 gesture classes and 87.86% on 28 gesture classes, respectively, in case of SHREC’17 data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail Model

A Feature Fusion Network for Skeleton-Based Gesture Recognition

A novel feature fusion technique for robust hand gesture recognition

Article 19 January 2024

Availability of data and materials

The data sets used in this research is publicly available. The DHG-14/28 data set is available at http://www-rech.telecom-lille.fr/DHGdataset/. The SHREC’17 data set is available at http://www-rech.telecom-lille.fr/shrec2017-hand/ and http://dx.doi.org/10.2312/3dor.20171049.

References

De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Heterogeneous hand gesture recognition using 3d dynamic skeletal data. Comput. Vis. Image Underst. 181, 60–72 (2019)
Article Google Scholar
Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition, vol. 12, pp. 296–301 (1995). IEEE Computer Society, Washington
Wang, C., Liu, Z., Chan, S.-C.: Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans. Multimedia 17(1), 29–39 (2014)
Article Google Scholar
Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–7 (2015)
De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
De Smedt, Q., Wannous, H., Vandeborre, J.-P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2881–2885 (2017). IEEE
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Thang, N.D., Kim, T.-S., Lee, Y.-K., Lee, S.: Estimation of 3-d human body posture via co-registration of 3-d human model and sequential stereo information. Appl. Intell. 35(2), 163–177 (2011)
Article Google Scholar
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)
Oberweger, M., Lepetit, V.: Deepprior++: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Biswas, K.K., Basu, S.K.: Gesture recognition using microsoft kinect®. In: The 5th International Conference on Automation, Robotics and Applications, pp. 100–103. IEEE (2011)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.-K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimedia Tools Appl. 75(22), 14991–15015 (2016)
Article Google Scholar
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using Naive-Bayes-nearest-neighbor. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19 (2012). IEEE
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012). IEEE
Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)
Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1809–1816 (2013)
Evangelidis, G., Singh, G., Horaud, R.: Skeletal quads: Human action recognition using joint quadruples. In: 2014 22nd International Conference on Pattern Recognition, pp. 4513–4518 (2014). IEEE
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using gram matrix embeddings on a Riemannian manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4498–4507 (2016)
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 106–113. IEEE (2018)
Liu, J., Wang, G., Duan, L.-Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)
Article MathSciNet Google Scholar
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12036–12045 (2019)
Ionescu, B., Coquin, D., Lambert, P., Buzuloiu, V.: Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J. Adv. Signal Process. 2005(13), 1–9 (2005)
Article Google Scholar
Reddy, K.S., Latha, P.S., Babu, M.R.: Hand gesture recognition using skeleton of hand and distance based metric. In: International Conference on Advances in Computing and Information Technology, pp. 346–354. Springer (2011)
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)
Wang, C., Chan, S.: A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover’s distance. In: 2014 4th International Workshop on Cognitive Information Processing (CIP), pp. 1–6. IEEE (2014)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-d human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2014)
Article PubMed Google Scholar
Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3d pattern assembled trajectories. In: 2017 7th International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE (2017)
Boulahia, S.Y., Anquetil, E., Kulpa, R., Multon, F.: Hif3d: Handwriting-inspired features for 3d skeleton-based action recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 985–990. IEEE (2016)
Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016 (2016). https://doi.org/10.1155/2016/4351435
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Neverova, N., Wolf, C., Paci, G., Sommavilla, G., Taylor, G., Nebout, F.: A multi-scale approach to gesture detection and recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 484–491 (2013)
Neverova, N., Wolf, C., Taylor, G., Nebout, F.: Moddrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1692–1706 (2015)
Article Google Scholar
Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)
Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)
Article Google Scholar
Wang, S., Zhang, S., Zhang, X., Geng, Q.: A two-branch hand gesture recognition approach combining Atrous convolution and attention mechanism. Vis. Comput. 39(10), 4487–4500 (2023)
Article Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI Conference on Artificial Intelligence (2018)
Caputo, F.M., Prebianca, P., Carcangiu, A., Spano, L.D., Giachetti, A.: Comparing 3d trajectories for simple mid-air gesture recognition. Comput. Graph. 73, 17–25 (2018)
Article Google Scholar
Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Article PubMed PubMed Central Google Scholar
Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)
Tai, D.N., Na, I.S., Kim, S.H.: Hsfe network and fusion model based dynamic hand gesture recognition. KSII Trans. Internet Inf. Syst. (TIIS) 14(9), 3924–3940 (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article CAS PubMed Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Laurent, T., von Brecht, J.: A recurrent neural network without chaos. arXiv preprint arXiv:1612.06212 (2016)
Oord, A.V.D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Corbetta, M., Shulman, G.L.: Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3(3), 201–215 (2002)
Article CAS PubMed Google Scholar
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)
Maghoumi, M., LaViola, J.J.: Deepgru: Deep gesture recognition utility. In: International Symposium on Visual Computing, pp. 16–31. Springer (2019)
Zhang, W., Lin, Z., Cheng, J., Ma, C., Deng, X., Wang, H.: Sta-gcn: two-stream graph convolutional network with spatial-temporal attention for hand gesture recognition. Vis. Comput. 36(10), 2433–2444 (2020)
Article Google Scholar
Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., Maybank, S.J.: Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans. Image Process. 31, 164–175 (2021)
Article PubMed Google Scholar
Song, J.-H., Kong, K., Kang, S.-J.: Dynamic hand gesture recognition using improved spatio-temporal graph convolutional network. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6227–6239 (2022)
Article Google Scholar
Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimedia 26, 811–823 (2024)
Article Google Scholar
Miah, A.S.M., Hasan, M.A.M., Shin, J.: Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model. IEEE Access 11, 4703–4716 (2023)
Article Google Scholar
Mahmud, H., Morshed, M.M., Hasan, M.K.: Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition. Vis. Comput. 40(1), 11–25 (2024)
Article Google Scholar
Singh, A., Singh, T.D., Bandyopadhyay, S.: Attention based video captioning framework for Hindi. Multimedia Syst. 28(1), 195–207 (2022)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis. Comput. 34(6), 1053–1063 (2018)
Article Google Scholar
De Smedt, Q.: Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. PhD thesis, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189 (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Silchar, Fakiratilla, Silchar, Assam, 788010, India
Rohit Pratap Singh & Laiphrakpam Dolendro Singh

Authors

Rohit Pratap Singh
View author publications
You can also search for this author in PubMed Google Scholar
Laiphrakpam Dolendro Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Pratap Singh.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, R.P., Singh, L.D. Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03307-4

Download citation

Accepted: 10 February 2024
Published: 18 March 2024
DOI: https://doi.org/10.1007/s00371-024-03307-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods

Abstract

Access this article

Similar content being viewed by others

Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail Model

A Feature Fusion Network for Skeleton-Based Gesture Recognition

A novel feature fusion technique for robust hand gesture recognition

Availability of data and materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods

Abstract

Access this article

Similar content being viewed by others

Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail Model

A Feature Fusion Network for Skeleton-Based Gesture Recognition

A novel feature fusion technique for robust hand gesture recognition

Availability of data and materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation