Skip to main content
Log in

Deep learning models beyond temporal frame-wise features for hand gesture video recognition

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recurrent neural networks (RNNs) are widely utilized in neural network research to capture spatiotemporal features in video data. However, their effectiveness heavily relies on the spatial features upon which they trained. This paper introduces innovative ensembles of features for constructing frame-wise structures by employing impactful neural network models with innovative training pipelines. These features are designed to enhance the recognition of hand gesture videos using RNN by leveraging temporal information. Recognizing hand gestures from videos is a complex task that presents considerable challenges. One notable challenge is the overlap in gesture motion, where different gesture categories exhibit similar hand poses within a single video clip. To overcome this issue, we were motivated to develop extensive and diverse features that offer a more comprehensive description of the gesture video clips, thereby mitigating recognition problems caused by images overlapping. Overall, our efforts to generate diverse features have yielded promising results in enhancing the recognition of hand gestures from videos, particularly in scenarios where overlap poses a significant challenge. We have combined the extracted features from a deep neural network trained from scratch with features obtained from various standard neural networks (Self-Organizing Map, Radial Base Function) that are trained to enhance the deep-trained features. The mutual arrangement for combining the shared features has configured new frame-wise image features. Furthermore, we have provided a performance comparison of the newly constructed frame-wise features through time-sharing to train RNN for recognition. The proposed models have been evaluated on two-hand gesture video datasets, where a preserving gesture sequence is crucial due to overlapping motions. Our work demonstrates a significant improvement in performance for both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 3
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The original data utilized to support the findings of this study is accessible through Kaggle (www.kaggle.com), a reputable online platform for data scientists and machine learning engineers. Those interested in accessing the data can submit a request to the community via the platform. Kaggle provides a convenient and reliable avenue for obtaining the necessary data for further analysis and research.

References

  1. Martins AT, Faísca L, Vieira H, Gonçalves G (2019) Emotional recognition and empathy both in deaf and blind adults. J Deaf Stud Deaf Educ 24(2):119–127

    Article  PubMed  Google Scholar 

  2. Huo J, Keung KL, Lee CKM, Ng HY (2021) Hand gesture recognition with augmented reality and leap motion controller. In: 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). IEEE, pp 1015–1019

  3. Juan W (2021) Gesture recognition and information recommendation based on machine learning and virtual reality in distance education. J Intell Fuzzy Syst 40(4):7509–7519

    Article  Google Scholar 

  4. León DG, Gröli J, Yeduri SR, Rossier D, Mosqueron R, Pandey OJ, Cenkeramaddi LR (2022) Video hand gestures recognition using depth camera and lightweight CNN. IEEE Sens J 22(14):14610–14619

    Article  ADS  Google Scholar 

  5. Dias TS, Junior JJAM, Pichorim SF (2023) Comparison between handcraft feature extraction and methods based on recurrent neural network models for gesture recognition by instrumented gloves: a case for Brazilian Sign Language Alphabet. Biomed Signal Process Control 80:104201

    Article  Google Scholar 

  6. Caifeng Shan, Yucheng Wei, Tieniu Tan and F. Ojardias, "Real time hand tracking by combining particle filtering and mean shift," Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., Seoul, South Korea, 2004, pp. 669–674, doi: https://doi.org/10.1109/AFGR.2004.1301611.

  7. Yang M-H, Ahuja N, Tabb M (2002) Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans Pattern Anal Mach Intell 24(8):1061–1074. https://doi.org/10.1109/TPAMI.2002.1023803

    Article  Google Scholar 

  8. Elmezain M, Al-Hamadi A, Appenrodt J, Michaelis B (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory. In: 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761080

  9. Deo N, Rangesh A, Trivedi M (2016) In-vehicle hand gesture recognition using hidden Markov models. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, pp 2179–2184. https://doi.org/10.1109/ITSC.2016.7795908

  10. Shokoohi-Yekta M, Bing Hu, Jin H, Wang J, Keogh E (2017) Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min Knowl Disc 31:1–31

    Article  MathSciNet  Google Scholar 

  11. Benmoussa M, Mahmoudi A (2018) Machine learning for hand gesture recognition using bag-of-words. In: 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, pp 1–7. https://doi.org/10.1109/ISACV.2018.8354082

  12. Lahiani H, Neji M (2018) Hand gesture recognition method based on HOG-LBP features for mobile devices. Procedia Comput Sci 126:254–263

    Article  Google Scholar 

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  14. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732

  15. Islam MR, Mitu UK, Bhuiyan RA, Shin J (2018) Hand gesture feature extraction using deep convolutional neural network for recognizing American sign language. In: 2018 4th International Conference on Frontiers of Signal Processing (ICFSP). IEEE, pp 115–119

  16. Xing K et al (2018) Hand gesture recognition based on deep learning method. In: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, pp 542–546. https://doi.org/10.1109/DSC.2018.00087

  17. Anand K, Urolagin S, Mishra RK (2021) How does hand gestures in videos impact social media engagement-Insights based on deep learning. Int J Inf Manag Data Insights 1(2):100036

    Google Scholar 

  18. Li J, Yang M, Liu Y, Wang Y, Zheng Q, Wang D (2019) Dynamic hand gesture recognition using multi-direction 3D convolutional neural networks. Eng Lett 27(3):2569

    Google Scholar 

  19. Gunawan MR, Djamal EC (2021) Spatio-temporal approach using CNN-RNN in hand gesture recognition. In: 2021 4th International Conference of Computer and Informatics Engineering (IC2IE). IEEE, pp 385–389

  20. Toro-Ossaba A, Jaramillo-Tigreros J, Tejada JC, Peña A, López-González A, Castanho RA (2022) LSTM recurrent neural network for hand gesture recognition using EMG signals. Appl Sci 12(19):9700. https://doi.org/10.3390/app12199700

    Article  CAS  Google Scholar 

  21. Tu Z et al (2023) Consistent 3D hand reconstruction in video via self-supervised learning. IEEE Trans Pattern Anal Mach Intell 45(8):9469–9485. https://doi.org/10.1109/TPAMI.2023.3247907

    Article  PubMed  Google Scholar 

  22. Tan CK, Ming Lim K, Lee CP, Kwang Yang Chang R, Lim JY (2023) HGR-ResNet: hand gesture recognition with enhanced residual neural network. In: 2023 11th International Conference on Information and Communication Technology (ICoICT), Melaka, Malaysia, pp 131–136. https://doi.org/10.1109/ICoICT58202.2023.10262710

  23. Shaukat K, Luo S, Varadharajan V, Hameed IA, Min Xu (2020) A survey on machine learning techniques for cyber security in the last decade. IEEE Access 8:222310–222354

    Article  Google Scholar 

  24. Sarimveis H, Doganis P, Alexandridis A (2006) A classification technique based on radial basis function neural networks. Adv Eng Softw 37(4):218–221

    Article  Google Scholar 

  25. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  26. Nanni L, Ghidoni S, Brahnam S (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognit 71:158–172

    Article  ADS  Google Scholar 

  27. Sergio E, Baró X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2015) Chalearn looking at people challenge 2014: dataset and results. In: Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part I 13, pp 459–473. Springer. https://doi.org/10.1007/978-3-319-16178-5_32

  28. Wan J, Zhao Y, Zhou S, Guyon I, Escalera S, Li SZ (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 56–64

  29. Vedaldi A, Lenc K, Matconvnet (2015) Convolutional neural networks for matlab. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference. ACM, pp 689–692

  30. Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for Matlab. In: ACM International Conference on Multimedia, pp 689–692

  31. McCormick C (2013) Radial basis function network (RBFN), tutorial. https://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial/

  32. Tian J, Azarian MH, Pecht M (2014) Anomaly detection using self-organizing maps-based k-nearest neighbor algorithm. In: PHM Society European Conference, vol 2, no 1. https://github.com/marevab/SOM

  33. Mareva Brixy , (accessed 2017), “Self-organising Map for handwritten number classification” .GitHub.

  34. Ye C, Zhao C, Yang Y, Fermüller C, Aloimonos Y (2016). Lightnet: a versatile, standalone matlab-based environment for deep learning. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 1156–1159.

  35. Escobedo-Cardenas E, Camara-Chavez G (2015) A robust gesture recognition using hand local data and skeleton trajectory. In: 2015 IEEE International Conference on Image Processing (ICIP). IEEE, pp 1240–1244

  36. Wu Di, Pigou L, Kindermans P-J, Le N-H, Shao L, Dambre J, Odobez J-M (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597

    Article  PubMed  Google Scholar 

  37. Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part I 13. Springer, pp 474–490

  38. Tur AO, Keles HY (2021) Evaluation of hidden Markov models using deep CNN features in isolated sign recognition. Multimedia Tools Appl 80:19137–19155

    Article  Google Scholar 

  39. Ramachandram D, Lisicki M, Shields TJ, Amer MR, Taylor GW (2018) Bayesian optimization on graph-structured search spaces: Optimizing deep multimodal fusion architectures. Neurocomputing 298:80–89

    Article  Google Scholar 

  40. Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit 76:80–94

    Article  ADS  Google Scholar 

  41. Wang B, Hoai M (2018) Predicting body movement and recognizing actions: an integrated framework for mutual benefits. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, pp 341–348

  42. Hosseini B, Montagne R, Hammer B (2020) Deep-aligned convolutional neural network for skeleton-based action recognition and segmentation. Data Sci Eng 5:126–139. https://doi.org/10.1007/s41019-020-00123-3

    Article  Google Scholar 

  43. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:256

    Google Scholar 

Download references

Acknowledgements

We express our deep gratitude to Prof. Dr. Eman Salih Al-Shamery from the University of Babylon for her outstanding support and unwavering commitment during the analysis of the data set and the development of proposals. Her exceptional expertise and valuable insights have played a vital role in ensuring the alignment and coherence of our work. We sincerely appreciate her guidance and contributions, which have greatly enriched the quality and success of this project.

Author information

Authors and Affiliations

Authors

Contributions

AM, under the supervision of OH, conceived, designed, and programmed the experiments, performed the experiments, analyzed and interpreted the data, and wrote the paper.

Corresponding author

Correspondence to Anwar Mira.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mira, A., Hellwich, O. Deep learning models beyond temporal frame-wise features for hand gesture video recognition. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05910-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-05910-7

Keywords

Navigation