Abstract
Human gesture recognition is one of the most challenging problems in computer vision, striving to analyze human gestures by machine. However, most of the literature on gesture recognition utilizes isolated data with only one gesture in one image or a video for classifying gestures. This work targets the identification of human gestures from the continuous stream of data input taken from a live camera feed, with no pre-defined boundaries. This task becomes even more complex given the diverse lighting conditions, varying backgrounds and different gesture positions in the same input stream of data. This work presents an effective deep learning architecture to classify gestures taken from multiple viewpoints and varying object sizes. To perform the classification, in this work, we have synthesized a real-world dataset consisting of 4500 images collected from different persons of varying age groups ranging from 10 to 50. The dataset is accumulated considering a wide variety of characteristics to address the complexities in the gesture recognition process. A real-time system is developed that captures, analyzes and classifies live gesture videos frame by frame. To prove the validity of our approach, we have compared our results with multiple deep learning architectures and other benchmark datasets. The results depict that our approach outperforms the existing works and is able to detect gestures with deteriorating lighting conditions and murky gesture positions, achieving an accuracy of 99.63%.
Similar content being viewed by others
Availability of data and materials
Not applicable.
References
Moin, A., Zhou, A., Rahimi, A., Menon, A., Benatti, S., Alexandrov, G., Tamakloe, S., Ting, J., Yamamoto, N., Khan, Y., et al.: A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nat. Electron. 4(1), 54–63 (2021)
Mujahid, A., Awan, M.J., Yasin, A., Mohammed, M.A., Damaševičius, R., Maskeliūnas, R., Abdulkareem, K.H.: Real-time hand gesture recognition based on deep learning yolov3 model. Appl. Sci. 11(9), 4164 (2021)
Ahmed, S., Kallu, K.D., Ahmed, S., Cho, S.H.: Hand gestures recognition using radar sensors for human-computer-interaction: a review. Remote Sens. 13(3), 527 (2021)
Stergiopoulou, E., Papamarkos, N.: Hand gesture recognition using a neural network shape fitting technique. Eng. Appl. Artif. Intell. 22(8), 1141–1158 (2009)
Czuszynski, K., Ruminski, J., Wtorek, J.: Pose classification in the gesture recognition using the linear optical sensor. In: 2017 10th International Conference on Human System Interactions (HSI), pp. 18–24. IEEE (2017)
Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–7 (2015)
Flores, C.J.L., Cutipa, A.G., Enciso, R.L.: Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1–4. IEEE (2017)
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 106–113. IEEE (2018)
Fernández, D.N., Kwolek, B.: Hand posture recognition using convolutional neural network. In: Iberoamerican Congress on Pattern Recognition, pp. 441–449. Springer (2017)
Limonchik, B., Amdur, G.: 3d model-based data augmentation for hand gesture recognition. http://cs231n.stanford.edu/reports/2017/pdfs/218.pdf, 1–9 (2017). Accessed 01 Apr 2023
Arenas, J.O.P., Moreno, R.J., Murillo, P.C.U.: Hand gesture recognition by means of region-based convolutional neural networks. Contemp. Eng. Sci. 10(27), 1329–1342 (2017)
Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 1–9 (2019)
Gupta, O., Raviv, D., Raskar, R.: Multi-velocity neural networks for gesture recognition in videos. https://arxiv.org/abs/1603.06829 (2016). Accessed 06 Dec 2021
Seok, W., Kim, Y., Park, C.: Pattern recognition of human arm movement using deep reinforcement learning. In: 2018 International Conference on Information Networking (ICOIN), pp. 917–919. IEEE (2018)
Luzanin, O., Plancak, M.: Hand gesture recognition using low-budget data glove and cluster-trained probabilistic neural network. Assem. Autom. 34(1), 94–105 (2014)
AlZu’bi, S., Al-Qatawneh, S., Alsmirat, M.: Transferable hmm trained matrices for accelerating statistical segmentation time. In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 172–176. IEEE (2018)
Al-Ayyoub, M., AlZu’bi, S., Jararweh, Y., Shehab, M.A., Gupta, B.B.: Accelerating 3d medical volume segmentation using gpus. Multim. Tools Appl. 77(4), 4939–4958 (2018)
AlZu’bi, S., Shehab, M., Al-Ayyoub, M., Jararweh, Y., Gupta, B.: Parallel implementation for 3d medical volume fuzzy segmentation. Pattern Recognit. Lett. 130, 312–318 (2020)
Al-Zu’bi, S., Hawashin, B., Mughaid, A., Baker, T.: Efficient 3d medical image segmentation algorithm over a secured multimedia network. Multim. Tools Appl. 80(11), 16887–16905 (2021)
Singha, J., Roy, A., Laskar, R.H.: Dynamic hand gesture recognition using vision-based approach for human-computer interaction. Neural Comput. Appl. 29(4), 1129–1141 (2018)
Aggarwal, A., Srivastava, A., Agarwal, A., Chahal, N., Singh, D., Alnuaim, A.A., Alhadlaq, A., Lee, H.-N.: Two-way feature extraction for speech emotion recognition using deep learning. Sensors 22(6), 2378 (2022)
Li, Z.: Practice of gesture recognition based on resnet50. J. Phys. Conf. Ser. 1574, 012154 (2020)
Satybaldina, D., Kalymova, G.: Deep learning based static hand gesture recognition. Indones. J. Electr. Eng. Comput. Sci. 21(1), 398–405 (2021)
Ozcan, T., Basturk, A.: Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition. Neural Comput. Appl. 31(12), 8955–8970 (2019)
Tangri, K.: Multi-class image classification using Alexnet deep learning network implemented in Keras API. Medium. https://medium.com/analytics-vidhya/multi-class-image-classification-using-alexnet-deep-learning-network-implemented-in-keras-api-c9ae7bc4c05f (2020). Accessed 06 Dec 2021
Zhang, E., Xue, B., Cao, F., Duan, J., Lin, G., Lei, Y.: Fusion of 2d cnn and 3d densenet for dynamic gesture recognition. Electronics 8(12), 1511 (2019)
Teams, K.: Keras documentation: DenseNet. Keras. https://keras.io/api/applications/densenet/#densenet121-function. Accessed 06 Dec 2021
Teams, K.: Keras documentation: EfficientNet B0 to B7. Keras. https://keras.io/api/applications/efficientnet/#efficientnetb0-function. Accessed 06 Dec 2021
G., R.: Everything you need to know about VGG16. Medium. https://medium.com/@mygreatlearning/everything-you-need-to-know-about-vgg16-7315defb5918. Accessed 06 Apr 2023
Kang, S., Kim, H., Park, C., Sim, Y., Lee, S., Jung, Y.: semg-Based hand gesture recognition using binarized neural network. Sensors 23(3), 1436 (2023)
Miah, A.S.M., Hasan, M.A.M., Shin, J.: Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model. IEEE Access 11, 4703–4716 (2023)
Colli Alfaro, J.G., Trejos, A.L.: User-independent hand gesture recognition classification models using sensor fusion. Sensors 22(4), 1321 (2022)
Wang, S., Wang, A., Ran, M., Liu, L., Peng, Y., Liu, M., Su, G., Alhudhaif, A., Alenezi, F., Alnaim, N.: Hand gesture recognition framework using a lie group based spatio-temporal recurrent network with multiple hand-worn motion sensors. Inf. Sci. 606, 722–741 (2022)
Jain, K.: Hand Gesture Recognition. https://www.kaggle.com/kritanjalijain/gestures-hand (2020). Accessed 06 Dec 2021
Sappani, R.: Hand gesture recognition. https://www.kaggle.com/datasets/roobansappani/hand-gesture-recognition (2020). Accessed 08 Aug 2022
Acknowledgements
This work was conducted as a part of internship project at AI-Shala Pvt. Limited.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript. NB and RK: methodology, software, validation, formal analysis, investigation, resources, data curation. AA: conceptualization, writing—original draft, review \(\&\) editing, visualization, project administration, supervision. KS and GD: writing—review.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aggarwal, A., Bhutani, N., Kapur, R. et al. Real-time hand gesture recognition using multiple deep learning architectures. SIViP 17, 3963–3971 (2023). https://doi.org/10.1007/s11760-023-02626-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02626-8