Deep learning models beyond temporal frame-wise features for hand gesture video recognition

Mira, Anwar; Hellwich, Olaf

doi:10.1007/s11227-024-05910-7

Deep learning models beyond temporal frame-wise features for hand gesture video recognition

Published: 14 February 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Anwar Mira^1,2 &
Olaf Hellwich¹

63 Accesses
Explore all metrics

Abstract

Recurrent neural networks (RNNs) are widely utilized in neural network research to capture spatiotemporal features in video data. However, their effectiveness heavily relies on the spatial features upon which they trained. This paper introduces innovative ensembles of features for constructing frame-wise structures by employing impactful neural network models with innovative training pipelines. These features are designed to enhance the recognition of hand gesture videos using RNN by leveraging temporal information. Recognizing hand gestures from videos is a complex task that presents considerable challenges. One notable challenge is the overlap in gesture motion, where different gesture categories exhibit similar hand poses within a single video clip. To overcome this issue, we were motivated to develop extensive and diverse features that offer a more comprehensive description of the gesture video clips, thereby mitigating recognition problems caused by images overlapping. Overall, our efforts to generate diverse features have yielded promising results in enhancing the recognition of hand gestures from videos, particularly in scenarios where overlap poses a significant challenge. We have combined the extracted features from a deep neural network trained from scratch with features obtained from various standard neural networks (Self-Organizing Map, Radial Base Function) that are trained to enhance the deep-trained features. The mutual arrangement for combining the shared features has configured new frame-wise image features. Furthermore, we have provided a performance comparison of the newly constructed frame-wise features through time-sharing to train RNN for recognition. The proposed models have been evaluated on two-hand gesture video datasets, where a preserving gesture sequence is crucial due to overlapping motions. Our work demonstrates a significant improvement in performance for both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition

Article 29 August 2022

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

Article 04 October 2016

Real-time hand gesture recognition using multiple deep learning architectures

Article 05 July 2023

Data availability

The original data utilized to support the findings of this study is accessible through Kaggle (www.kaggle.com), a reputable online platform for data scientists and machine learning engineers. Those interested in accessing the data can submit a request to the community via the platform. Kaggle provides a convenient and reliable avenue for obtaining the necessary data for further analysis and research.

References

Martins AT, Faísca L, Vieira H, Gonçalves G (2019) Emotional recognition and empathy both in deaf and blind adults. J Deaf Stud Deaf Educ 24(2):119–127
Article PubMed Google Scholar
Huo J, Keung KL, Lee CKM, Ng HY (2021) Hand gesture recognition with augmented reality and leap motion controller. In: 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). IEEE, pp 1015–1019
Juan W (2021) Gesture recognition and information recommendation based on machine learning and virtual reality in distance education. J Intell Fuzzy Syst 40(4):7509–7519
Article Google Scholar
León DG, Gröli J, Yeduri SR, Rossier D, Mosqueron R, Pandey OJ, Cenkeramaddi LR (2022) Video hand gestures recognition using depth camera and lightweight CNN. IEEE Sens J 22(14):14610–14619
Article ADS Google Scholar
Dias TS, Junior JJAM, Pichorim SF (2023) Comparison between handcraft feature extraction and methods based on recurrent neural network models for gesture recognition by instrumented gloves: a case for Brazilian Sign Language Alphabet. Biomed Signal Process Control 80:104201
Article Google Scholar
Caifeng Shan, Yucheng Wei, Tieniu Tan and F. Ojardias, "Real time hand tracking by combining particle filtering and mean shift," Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., Seoul, South Korea, 2004, pp. 669–674, doi: https://doi.org/10.1109/AFGR.2004.1301611.
Yang M-H, Ahuja N, Tabb M (2002) Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans Pattern Anal Mach Intell 24(8):1061–1074. https://doi.org/10.1109/TPAMI.2002.1023803
Article Google Scholar
Elmezain M, Al-Hamadi A, Appenrodt J, Michaelis B (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory. In: 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761080
Deo N, Rangesh A, Trivedi M (2016) In-vehicle hand gesture recognition using hidden Markov models. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, pp 2179–2184. https://doi.org/10.1109/ITSC.2016.7795908
Shokoohi-Yekta M, Bing Hu, Jin H, Wang J, Keogh E (2017) Generalizing DTW to the multi-dimensional case requires an adaptive approach. Data Min Knowl Disc 31:1–31
Article MathSciNet Google Scholar
Benmoussa M, Mahmoudi A (2018) Machine learning for hand gesture recognition using bag-of-words. In: 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, pp 1–7. https://doi.org/10.1109/ISACV.2018.8354082
Lahiani H, Neji M (2018) Hand gesture recognition method based on HOG-LBP features for mobile devices. Procedia Comput Sci 126:254–263
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732
Islam MR, Mitu UK, Bhuiyan RA, Shin J (2018) Hand gesture feature extraction using deep convolutional neural network for recognizing American sign language. In: 2018 4th International Conference on Frontiers of Signal Processing (ICFSP). IEEE, pp 115–119
Xing K et al (2018) Hand gesture recognition based on deep learning method. In: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, pp 542–546. https://doi.org/10.1109/DSC.2018.00087
Anand K, Urolagin S, Mishra RK (2021) How does hand gestures in videos impact social media engagement-Insights based on deep learning. Int J Inf Manag Data Insights 1(2):100036
Google Scholar
Li J, Yang M, Liu Y, Wang Y, Zheng Q, Wang D (2019) Dynamic hand gesture recognition using multi-direction 3D convolutional neural networks. Eng Lett 27(3):2569
Google Scholar
Gunawan MR, Djamal EC (2021) Spatio-temporal approach using CNN-RNN in hand gesture recognition. In: 2021 4th International Conference of Computer and Informatics Engineering (IC2IE). IEEE, pp 385–389
Toro-Ossaba A, Jaramillo-Tigreros J, Tejada JC, Peña A, López-González A, Castanho RA (2022) LSTM recurrent neural network for hand gesture recognition using EMG signals. Appl Sci 12(19):9700. https://doi.org/10.3390/app12199700
Article CAS Google Scholar
Tu Z et al (2023) Consistent 3D hand reconstruction in video via self-supervised learning. IEEE Trans Pattern Anal Mach Intell 45(8):9469–9485. https://doi.org/10.1109/TPAMI.2023.3247907
Article PubMed Google Scholar
Tan CK, Ming Lim K, Lee CP, Kwang Yang Chang R, Lim JY (2023) HGR-ResNet: hand gesture recognition with enhanced residual neural network. In: 2023 11th International Conference on Information and Communication Technology (ICoICT), Melaka, Malaysia, pp 131–136. https://doi.org/10.1109/ICoICT58202.2023.10262710
Shaukat K, Luo S, Varadharajan V, Hameed IA, Min Xu (2020) A survey on machine learning techniques for cyber security in the last decade. IEEE Access 8:222310–222354
Article Google Scholar
Sarimveis H, Doganis P, Alexandridis A (2006) A classification technique based on radial basis function neural networks. Adv Eng Softw 37(4):218–221
Article Google Scholar
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Article Google Scholar
Nanni L, Ghidoni S, Brahnam S (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognit 71:158–172
Article ADS Google Scholar
Sergio E, Baró X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2015) Chalearn looking at people challenge 2014: dataset and results. In: Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part I 13, pp 459–473. Springer. https://doi.org/10.1007/978-3-319-16178-5_32
Wan J, Zhao Y, Zhou S, Guyon I, Escalera S, Li SZ (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 56–64
Vedaldi A, Lenc K, Matconvnet (2015) Convolutional neural networks for matlab. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference. ACM, pp 689–692
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for Matlab. In: ACM International Conference on Multimedia, pp 689–692
McCormick C (2013) Radial basis function network (RBFN), tutorial. https://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial/
Tian J, Azarian MH, Pecht M (2014) Anomaly detection using self-organizing maps-based k-nearest neighbor algorithm. In: PHM Society European Conference, vol 2, no 1. https://github.com/marevab/SOM
Mareva Brixy , (accessed 2017), “Self-organising Map for handwritten number classification” .GitHub.
Ye C, Zhao C, Yang Y, Fermüller C, Aloimonos Y (2016). Lightnet: a versatile, standalone matlab-based environment for deep learning. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 1156–1159.
Escobedo-Cardenas E, Camara-Chavez G (2015) A robust gesture recognition using hand local data and skeleton trajectory. In: 2015 IEEE International Conference on Image Processing (ICIP). IEEE, pp 1240–1244
Wu Di, Pigou L, Kindermans P-J, Le N-H, Shao L, Dambre J, Odobez J-M (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597
Article PubMed Google Scholar
Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part I 13. Springer, pp 474–490
Tur AO, Keles HY (2021) Evaluation of hidden Markov models using deep CNN features in isolated sign recognition. Multimedia Tools Appl 80:19137–19155
Article Google Scholar
Ramachandram D, Lisicki M, Shields TJ, Amer MR, Taylor GW (2018) Bayesian optimization on graph-structured search spaces: Optimizing deep multimodal fusion architectures. Neurocomputing 298:80–89
Article Google Scholar
Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit 76:80–94
Article ADS Google Scholar
Wang B, Hoai M (2018) Predicting body movement and recognizing actions: an integrated framework for mutual benefits. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, pp 341–348
Hosseini B, Montagne R, Hammer B (2020) Deep-aligned convolutional neural network for skeleton-based action recognition and segmentation. Data Sci Eng 5:126–139. https://doi.org/10.1007/s41019-020-00123-3
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:256
Google Scholar

Download references

Acknowledgements

We express our deep gratitude to Prof. Dr. Eman Salih Al-Shamery from the University of Babylon for her outstanding support and unwavering commitment during the analysis of the data set and the development of proposals. Her exceptional expertise and valuable insights have played a vital role in ensuring the alignment and coherence of our work. We sincerely appreciate her guidance and contributions, which have greatly enriched the quality and success of this project.

Author information

Authors and Affiliations

Computer Vision and Remote Sensing, Technische Universität Berlin, Berlin, Germany
Anwar Mira & Olaf Hellwich
College of Information Technology, University of Babylon, Hillah, Iraq
Anwar Mira

Authors

Anwar Mira
View author publications
You can also search for this author in PubMed Google Scholar
Olaf Hellwich
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AM, under the supervision of OH, conceived, designed, and programmed the experiments, performed the experiments, analyzed and interpreted the data, and wrote the paper.

Corresponding author

Correspondence to Anwar Mira.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mira, A., Hellwich, O. Deep learning models beyond temporal frame-wise features for hand gesture video recognition. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05910-7

Download citation

Accepted: 07 January 2024
Published: 14 February 2024
DOI: https://doi.org/10.1007/s11227-024-05910-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning models beyond temporal frame-wise features for hand gesture video recognition

Abstract

Access this article

Similar content being viewed by others

Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

Real-time hand gesture recognition using multiple deep learning architectures

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning models beyond temporal frame-wise features for hand gesture video recognition

Abstract

Access this article

Similar content being viewed by others

Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition

Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

Real-time hand gesture recognition using multiple deep learning architectures

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation