Skip to main content

Advertisement

Log in

mXception and dynamic image for hand gesture recognition

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Gesture detection has recently attracted a lot of attention due to its wide range of applications, notably in human–computer interaction (HCI). However, when it comes to video-based gesture recognition, elements in the background unrelated to gestures slow down the system’s classification rate. This paper presents an algorithm designed for the recognition of large-scale gestures. In the training phase, we utilize RGB-D videos, where the depth modality videos are derived from RGB modality videos using UNET and subsequently employed for testing. However, it’s worth noting that in real-time applications of the proposed dynamic hand gesture recognition (DHGR) system, only RGB modality videos are needed. The algorithm begins by creating two dynamic images: one from the estimated depth video and the other from the RGB video. Dynamic images generated from RGB video excel in capturing spatial information; while, those derived from depth video excel in encoding temporal aspects. These two dynamic images are merged to form an RGB-D dynamic image (RDDI). The RDDI is then fed into a modified Xception-based CNN model for the purpose of gesture classification and recognition. In order to evaluate the system’s performance, we conducted experiments using the EgoGesture and MSR Gesture datasets. The results are highly promising, with a reported classification accuracy of 91.64% for the EgoGesture dataset and an impressive 99.41% for the MSR Gesture dataset. The results demonstrated that the suggested system outperformed some existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324

    Article  Google Scholar 

  2. Hasan H, Abdul-Kareem S (2014) RETRACTED ARTICLE: human–computer interaction using vision-based hand gesture recognition systems: a survey. Neural Comput Appl 25(2):251–261

    Article  Google Scholar 

  3. Chang CC, Chen JJ, Tai WK, Han CC (2006) New approach for static gesture recognition. J Inf Sci Eng 22(5):1047–1057

    Google Scholar 

  4. Köpüklü O, Gunduz A, Kose N, Rigoll G (2020) Online dynamic hand gesture recognition including efficiency analysis. IEEE Trans Biom Behav Identity Sci 2(2):85–97

    Article  Google Scholar 

  5. Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human–computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695

    Article  Google Scholar 

  6. Barbhuiya AA, Karsh RK, Jain R (2021) CNN based feature extraction and classification for sign language. Multimed Tools Appl 80(2):3051–3069

    Article  Google Scholar 

  7. Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Underst 171:118–139

    Article  Google Scholar 

  8. Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Walsh J (2019) Deep learning versus traditional computer vision. In: Science and information conference, Springer, pp 128–144

  9. Al-Shamayleh AS, Ahmad R, Abushariah MA, Alam KA, Jomhari N (2018) A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 77(21):28121–28184

    Article  Google Scholar 

  10. Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  11. Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recognition with depth sequences. Image Vis Comput 55:93–100

    Article  Google Scholar 

  12. Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35:1–16

    Article  Google Scholar 

  13. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  14. Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016) Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4207–4215

  15. Duan J, Wan J, Zhou S, Guo X, Li SZ (2018) A unified framework for multi-modal isolated gesture recognition. ACM Trans Multimed Comput Commun Appl (TOMM) 14(1s):1–16

    Article  Google Scholar 

  16. Narayana P, Beveridge R, Draper BA (2018) Gesture recognition: focus on the hands. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5235–5244

  17. Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829

    Article  Google Scholar 

  18. Dos Santos CC, Samatelo JLA, Vassallo RF (2020) Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing 400:238–254

    Article  Google Scholar 

  19. Asadi-Aghbolaghi M, Clapes A, Bellantonio M, Escalante HJ, Ponce-López V, Baró X, Escalera S (2017) A survey on deep learning based approaches for action and gesture recognition in image sequences. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), pp 476–483 (IEEE)

  20. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  21. Cui J, Zhang H, Han H, Shan S, Chen X (2018) Improving 2D face recognition via discriminative face depth estimation. In: 2018 International Conference on Biometrics (ICB), pp 140–147 (IEEE)

  22. Li G, Liu Z, Ling H (2020) ICNet: information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884

    Article  Google Scholar 

  23. Caglayan A, Burak Can A (2018) Exploiting multi-layer features using a CNN-RNN approach for RGB-D object recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops

  24. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on Multimedia, pp 1057–1060

  25. Wang P, Li W, Liu S, Zhang Y, Gao Z, Ogunbona P (2016) Large-scale continuous gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 13–18 (IEEE)

  26. Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-D action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051–1061

    Article  Google Scholar 

  27. Neverova N, Wolf C, Taylor G, Nebout F (2015) Moddrop: adaptive multi-modal gesture recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1692–1706

    Article  Google Scholar 

  28. Ijjina EP, Chalavadi KM (2017) Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn 72:504–516

    Article  Google Scholar 

  29. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015). Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  30. Tang X, Yan Z, Peng J, Hao B, Wang H, Li J (2021) Selective spatiotemporal features learning for dynamic gesture recognition. Expert Syst Appl 169:114499

    Article  Google Scholar 

  31. Cao Z, Li Y, Shin BS (2022) Content-Adaptive and attention-based network for hand gesture recognition. Appl Sci 12(4):2041

    Article  Google Scholar 

  32. Yu Z, Zhou B, Wan J, Wang P, Chen H, Liu X, Zhao G (2021) Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Trans Image Process 30:5626–5640

    Article  Google Scholar 

  33. Jain R, Karsh RK, Barbhuiya AA (2022) Encoded motion image-based dynamic hand gesture recognition. Vis Comput 38(6):1957–1974

    Article  Google Scholar 

  34. Kantor IL, Solodovnikov AS, Shenitzer A (1989) Hypercomplex numbers: an elementary introduction to algebras, vol 302. Springer, New York

    Book  Google Scholar 

  35. Yadav KS, Laskar RH, Ahmad N (2023) Exploration of deep learning models for localizing bare-hand in the practical environment. Eng Appl Artif Intell 123:106253

    Article  Google Scholar 

  36. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  37. Bao P, Maqueda AI, del Blanco CR, García N (2017) Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Trans Consum Electron 63(3):251–257

    Article  Google Scholar 

  38. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516

    Article  Google Scholar 

  39. Zhang Y, Cao C, Cheng J, Lu H (2018) EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050

    Article  Google Scholar 

  40. Kurakin A, Zhang Z, Liu Z (2012) A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp 1975–1979 (IEEE)

  41. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151

    Article  Google Scholar 

  42. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121

    MathSciNet  Google Scholar 

  43. Zou F, Shen L, Jie Z, Zhang W, Liu W (2019) A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 11127–11135

  44. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  45. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 14, No 2, pp 1137–1145

  46. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558

  47. Cao C, Zhang Y, Wu Y, Lu H, Cheng J (2017) Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: Proceedings of the IEEE international conference on computer vision, pp 3763–3771

  48. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  49. Wang Y, Zhu A, Ma H, Ai L, Song W, Zhang S (2023) 3D-shufflevit: an efficient video action recognition network with deep integration of self-attention and convolution. Mathematics 11(18):3848

    Article  Google Scholar 

  50. Azad R, Asadi-Aghbolaghi M, Kasaei S, Escalera S (2018) Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans Circuits Syst Video Technol 29(6):1729–1740

    Article  Google Scholar 

  51. Yang R, Yang R (2014) DMM-pyramid based deep architectures for action recognition with depth cameras. In: Asian Conference on Computer Vision, Springer, pp 37–49

  52. Viet VH, Phuc NTT, Hoang PM, Nghia LK (2018) Spatial-temporal shape and motion features for dynamic hand gesture recognition in depth video. Int J Image Graph Signal Process. https://doi.org/10.5815/ijigsp.2018.09.03

    Article  Google Scholar 

  53. Bulbul MF, Islam S, Azme Z, Pareek P, Kabir MH, Ali H (2022) Enhancing the performance of 3D auto-correlation gradient features in depth action classification. Int J Multimed Inf Retr 11:1–16

    Google Scholar 

  54. Weiyao X, Muqing W, Min Z, Yifeng L, Bo L, Ting X (2019) Human action recognition using multilevel depth motion maps. IEEE Access 7:41811–41822

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhumika Karsh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karsh, B., Laskar, R.H. & Karsh, R.K. mXception and dynamic image for hand gesture recognition. Neural Comput & Applic 36, 8281–8300 (2024). https://doi.org/10.1007/s00521-024-09509-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09509-0

Keywords

Navigation