Abstract
Hand gesture recognition is an essential task in computer vision. It is the most intuitive and natural medium for communication when dealing with computers. Recently, with the advent of innovative technologies and high performing computer systems, there has been a surge in the research of Gesture Recognition. Traditional approaches to modelling skeletons are typically based on hand-crafted components or traversal algorithms, leading to limited expressive capacity and generalisation challenges. In this work, we present a novel dynamic skeleton model based on BiLSTM and soft attention named DyHand that mitigates the challenges of intra-class and inter-class variability of gesture classes to a great extent. The comparison of our model with state-of-the-art approaches on the two benchmark data sets with various data augmentation techniques is reported. The proposed approach yields the best results, achieving 97.14 and 96.42% recognition accuracy in the 14 and 28 gesture categories, respectively, for the DHG-14/28 data set and comparable recognition accuracy of 93.98% on 14 gesture classes and 87.86% on 28 gesture classes, respectively, in case of SHREC’17 data set.
Similar content being viewed by others
Availability of data and materials
The data sets used in this research is publicly available. The DHG-14/28 data set is available at http://www-rech.telecom-lille.fr/DHGdataset/. The SHREC’17 data set is available at http://www-rech.telecom-lille.fr/shrec2017-hand/ and http://dx.doi.org/10.2312/3dor.20171049.
References
De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Heterogeneous hand gesture recognition using 3d dynamic skeletal data. Comput. Vis. Image Underst. 181, 60–72 (2019)
Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition, vol. 12, pp. 296–301 (1995). IEEE Computer Society, Washington
Wang, C., Liu, Z., Chan, S.-C.: Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans. Multimedia 17(1), 29–39 (2014)
Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–7 (2015)
De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
De Smedt, Q., Wannous, H., Vandeborre, J.-P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2881–2885 (2017). IEEE
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Thang, N.D., Kim, T.-S., Lee, Y.-K., Lee, S.: Estimation of 3-d human body posture via co-registration of 3-d human model and sequential stereo information. Appl. Intell. 35(2), 163–177 (2011)
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)
Oberweger, M., Lepetit, V.: Deepprior++: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Biswas, K.K., Basu, S.K.: Gesture recognition using microsoft kinect®. In: The 5th International Conference on Automation, Robotics and Applications, pp. 100–103. IEEE (2011)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.-K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)
Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimedia Tools Appl. 75(22), 14991–15015 (2016)
Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using Naive-Bayes-nearest-neighbor. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19 (2012). IEEE
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297 (2012). IEEE
Ohn-Bar, E., Trivedi, M.: Joint angles similarities and hog2 for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 465–470 (2013)
Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1809–1816 (2013)
Evangelidis, G., Singh, G., Horaud, R.: Skeletal quads: Human action recognition using joint quadruples. In: 2014 22nd International Conference on Pattern Recognition, pp. 4513–4518 (2014). IEEE
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595 (2014)
Zhang, X., Wang, Y., Gou, M., Sznaier, M., Camps, O.: Efficient temporal sequence comparison and classification using gram matrix embeddings on a Riemannian manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4498–4507 (2016)
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., Yang, H.: Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 106–113. IEEE (2018)
Liu, J., Wang, G., Duan, L.-Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12036–12045 (2019)
Ionescu, B., Coquin, D., Lambert, P., Buzuloiu, V.: Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J. Adv. Signal Process. 2005(13), 1–9 (2005)
Reddy, K.S., Latha, P.S., Babu, M.R.: Hand gesture recognition using skeleton of hand and distance based metric. In: International Conference on Advances in Computing and Information Technology, pp. 346–354. Springer (2011)
Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723 (2013)
Wang, C., Chan, S.: A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover’s distance. In: 2014 4th International Workshop on Cognitive Information Processing (CIP), pp. 1–6. IEEE (2014)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-d human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2014)
Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3d pattern assembled trajectories. In: 2017 7th International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6. IEEE (2017)
Boulahia, S.Y., Anquetil, E., Kulpa, R., Multon, F.: Hif3d: Handwriting-inspired features for 3d skeleton-based action recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 985–990. IEEE (2016)
Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016 (2016). https://doi.org/10.1155/2016/4351435
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Neverova, N., Wolf, C., Paci, G., Sommavilla, G., Taylor, G., Nebout, F.: A multi-scale approach to gesture detection and recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 484–491 (2013)
Neverova, N., Wolf, C., Taylor, G., Nebout, F.: Moddrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1692–1706 (2015)
Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)
Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)
Wang, S., Zhang, S., Zhang, X., Geng, Q.: A two-branch hand gesture recognition approach combining Atrous convolution and attention mechanism. Vis. Comput. 39(10), 4487–4500 (2023)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI Conference on Artificial Intelligence (2018)
Caputo, F.M., Prebianca, P., Carcangiu, A., Spano, L.D., Giachetti, A.: Comparing 3d trajectories for simple mid-air gesture recognition. Comput. Graph. 73, 17–25 (2018)
Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., Zhang, L.: Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19(2), 239 (2019)
Liu, J., Liu, Y., Wang, Y., Prinet, V., Xiang, S., Pan, C.: Decoupled representation learning for skeleton-based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5751–5760 (2020)
Tai, D.N., Na, I.S., Kim, S.H.: Hsfe network and fusion model based dynamic hand gesture recognition. KSII Trans. Internet Inf. Syst. (TIIS) 14(9), 3924–3940 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Laurent, T., von Brecht, J.: A recurrent neural network without chaos. arXiv preprint arXiv:1612.06212 (2016)
Oord, A.V.D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Corbetta, M., Shulman, G.L.: Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3(3), 201–215 (2002)
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)
Maghoumi, M., LaViola, J.J.: Deepgru: Deep gesture recognition utility. In: International Symposium on Visual Computing, pp. 16–31. Springer (2019)
Zhang, W., Lin, Z., Cheng, J., Ma, C., Deng, X., Wang, H.: Sta-gcn: two-stream graph convolutional network with spatial-temporal attention for hand gesture recognition. Vis. Comput. 36(10), 2433–2444 (2020)
Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., Maybank, S.J.: Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans. Image Process. 31, 164–175 (2021)
Song, J.-H., Kong, K., Kang, S.-J.: Dynamic hand gesture recognition using improved spatio-temporal graph convolutional network. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6227–6239 (2022)
Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimedia 26, 811–823 (2024)
Miah, A.S.M., Hasan, M.A.M., Shin, J.: Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model. IEEE Access 11, 4703–4716 (2023)
Mahmud, H., Morshed, M.M., Hasan, M.K.: Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition. Vis. Comput. 40(1), 11–25 (2024)
Singh, A., Singh, T.D., Bandyopadhyay, S.: Attention based video captioning framework for Hindi. Multimedia Syst. 28(1), 195–207 (2022)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis. Comput. 34(6), 1053–1063 (2018)
De Smedt, Q.: Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. PhD thesis, Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singh, R.P., Singh, L.D. Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03307-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03307-4