Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Mohammed, Adam A. Q.; Lv, Jiancheng; Islam, Md. Sajjatul; Sang, Yongsheng

doi:10.1007/s12652-021-03546-6

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Original Research
Published: 08 February 2022

Volume 14, pages 6829–6842, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Adam A. Q. Mohammed¹,
Jiancheng Lv¹,
Md. Sajjatul Islam¹ &
…
Yongsheng Sang¹

1191 Accesses
16 Citations
Explore all metrics

Abstract

Hand gesture and action recognition have been extensively researched in the past two decades due to the emerging advanced acquisition and interaction technologies, which open the floodgates for a vast range of potential applications. Particularly, many spatial–temporal feature extractors have been proposed, such as RNNs-based models, temporal convolutional network (TCN), and 3D convolutional neural networks (3DCNN) for modeling long-term dependencies in sequential data. However, it remains challenging to obtain a high recognition rate because of the difficulty of effectively extracting spatial–temporal features and efficiently classifying them with noisy and complex skeleton sequences. Therefore, this paper proposes a deep ensemble framework called multi-model ensemble gesture recognition network (MMEGRN) for skeleton-based hand gesture recognition. Specifically, to establish effective feature extraction and accurate gesture recognition, we propose an architecture consisting of four sub-networks, three spatio-temporal features classifiers to leverage their various capabilities of extracting and classifying skeleton sequences. Through late feature fusion, the features resulted from the feature extractors of each sub-network are fused into a new fusion classifier. Each subnetwork is trained independently to perform the task of gesture recognition using only skeleton joints. The training is performed using the cyclic annealing learning rate to generate a series of models that are combined in an ensemble using the optimized weighted ensemble (OWE) method. The proposed framework combines deep learning and ensemble strengths to establish a new deep-learning network architecture for more accurate and efficient hand gesture recognition. Extensive experiments on three skeleton-based hand gesture recognition datasets have shown the effectiveness of the proposed framework and the superiority over other models in terms of recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

A review of hand gesture and sign language recognition techniques

Article 08 August 2017

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

Notes

References

Ashukha A, Lyzhov A, Molchanov D, Vetrov D (2020) Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. https://arxiv.org/abs/2002.06470
Avola D, Bernardi M, Cinque L et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2856094
Article Google Scholar
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. https://arxiv.org/abs/1803.01271
Boulahia SY, Anquetil E, Multon F, Kulpa R (2018) Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: Proceedings of the 7th international conference on image processing theory, tools and applications, IPTA 2017, pp 1–6
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion. https://doi.org/10.1016/j.inffus.2004.04.004
Article Google Scholar
Cao Z, Hidalgo Martinez G, Simon T et al (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2929257
Article Google Scholar
Chen X, Wang G, Guo H et al (2019a) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19:239. https://doi.org/10.3390/s19020239
Article Google Scholar
Chen X, Wang G, Guo H et al (2019b) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors. https://doi.org/10.3390/s19020239
Article Google Scholar
Chen Y, Zhao L, Peng X et al (2020) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: 30th British machine vision conference 2019, BMVC 2019, pp 48.1–48.13
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
De Smedt Q (2017) Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. Université de Lille 1
De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 1–9
De Smedt Q, Wannous H, Vandeborre JP et al (2017) SHREC’17 track: 3D hand gesture recognition using a depth and skeletal dataset. In: Eurographics workshop on 3D object retrieval, EG 3DOR, pp 1–6
Devanne M, Wannous H, Berretti S et al (2015) 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2014.2350774
Article Google Scholar
Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: Proceedings of 13th IEEE international conference on automatic face and gesture recognition, FG 2018, pp 106–113
Dietterich TG (2000) Ensemble methods in machine learning: multiple classifier systems. Springer, Berlin
Book Google Scholar
Doosti B (2019) Hand pose estimation: a survey. https://arxiv.org/abs/1903.01013
El-Baz AH, Tolba AS (2013) An efficient algorithm for 3D hand gesture recognition using combined neural classifiers. Neural Comput Appl. https://doi.org/10.1007/s00521-012-0844-2
Article Google Scholar
Hashem S (1997) Optimal linear combinations of neural networks. Neural Netw. https://doi.org/10.1016/S0893-6080(96)00098-6
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, IEEE Computer Society, pp 770–778
Hou J, Wang G, Chen X et al (2019) Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 273–286
Huang G, Li Y, Pleiss G et al (2017) Snapshot ensembles: train 1, get M for free. In: 5th International conference on learning representations, ICLR 2017
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR 2015
Kobylarz J, Bird JJ, Faria DR et al (2020) Thumbs up, thumbs down: non-verbal human-robot interaction through real-time EMG classification via inductive and supervised transductive transfer learning. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01852-z
Article Google Scholar
Kong Y, Li L, Zhang K et al (2019) Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. J Electron Imaging. https://doi.org/10.1117/1.jei.28.4.043032
Article Google Scholar
Kraft D (1988) A software package for sequential quadratic programming. Dfvlr-Fb. http://degenerateconic.com/wp-content/uploads/2018/03/DFVLR_FB_88_28.pdf
Lai K, Yanushkevich S (2020) An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: Proceedings of the international joint conference on neural networks, pp 1–7
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362. https://doi.org/10.1016/j.patcog.2017.02.030
Article Google Scholar
Liu H, Tu J, Liu M, Ding R (2018) Learning explicit shape and motion evolution maps for skeleton-based human action recognition. In: ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings, Institute of Electrical and Electronics Engineers Inc., pp 1333–1337
Liu J, Liu Y, Wang Y (2020) Decoupled representation learning for skeleton-based gesture recognition. In: IEEE/CVF conference on computer vision and pattern recognition, pp 5751–5760
Lupinetti K, Ranieri A, Giannini F, Monti M (2020) 3D dynamic hand gestures recognition using the Leap Motion sensor and convolutional neural networks. https://arxiv.org/abs/2003.01450
Ma C, Wang A, Chen G, Xu C (2018) Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis Comput. https://doi.org/10.1007/s00371-018-1556-0
Article Google Scholar
Maghoumi M, LaViola JJ (2019) DeepGRU: deep gesture recognition utility. In: International symposium on visual computing, pp 16–31
Mohammed AAQ, Lv J, Islam MDS (2019a) A deep learning-based end-to-end composite system for hand detection and gesture recognition. Sensors. https://doi.org/10.3390/s19235282
Article Google Scholar
Mohammed AAQ, Lv J, Islam MS (2019b) Small deep learning models for hand gesture recognition. In: Proceedings of 2019 IEEE international conference on parallel and distributed processing with applications, big data and cloud computing, sustainable computing and communications, social computing and networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp 1429–1435
Nguyen XS, Brun L, Lezoray O, Bougleux S (2019) A neural network based on spd manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 12036–12045
Núñez JC, Cabido R, Pantrigo JJ et al (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.10.033
Article Google Scholar
Ohn-Bar E, Trivedi MM (2013) Joint angles similarities and HOG2 for action recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 465–470
Oord A van den, Dieleman S, Zen H et al (2016) WaveNet: a generative model for raw audio based on PixelCNN architecture. https://arxiv.org/abs/1609.03499
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 716–723
Ponti MP (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: Proceedings of 24th SIBGRAPI conference on graphics, patterns, and images tutorials, SIBGRAPI-T 2011, pp 1–10
Shahhosseini M, Hu G, Pham H (2019) Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. https://arxiv.org/abs/1908.05287
Shi X, Chen Z, Wang H et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Shin S, Kim WY (2020) Skeleton-based dynamic hand gesture recognition using a part-based GRU-RNN for gesture-based interface. IEEE Access 8:50236–50243. https://doi.org/10.1109/ACCESS.2020.2980128
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–9
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Wang GW, Zhang C, Zhuang J (2012) An application of classifier combination methods in hand gesture recognition. Math Probl Eng. https://doi.org/10.1155/2012/346951
Article Google Scholar
Wu J, Ishwar P, Konrad J (2016) Two-stream CNNs for gesture-based verification and identification: learning user style. In: IEEE computer society conference on computer vision and pattern recognition workshops, IEEE Computer Society, pp 110–118
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 20–27
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, pp 7444–7452
Yang F, Wu Y, Sakti S, Nakamura S (2019) Make skeleton-based action recognition model smaller, faster and better. In: 1st ACM international conference on multimedia in Asia, MMAsia 2019, pp 1–6
Zhang S, Yang Y, Xiao J et al (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2802648
Article Google Scholar
Zhu W, Lan C, Xing J et al (2016) Co-Occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI conference on artificial intelligence, AAAI 2016, pp 3697–3704

Download references

Funding

This work is supported by the National Natural Science Foundation of China (Grant No. U1831121), and the Science and Technology Major Project of Sichuan province (Grant Nos. 2019ZDZX0006 and 2020YFG0478).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Adam A. Q. Mohammed, Jiancheng Lv, Md. Sajjatul Islam & Yongsheng Sang

Authors

Adam A. Q. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Jiancheng Lv
View author publications
You can also search for this author in PubMed Google Scholar
Md. Sajjatul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Yongsheng Sang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongsheng Sang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohammed, A.A.Q., Lv, J., Islam, M.S. et al. Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition. J Ambient Intell Human Comput 14, 6829–6842 (2023). https://doi.org/10.1007/s12652-021-03546-6

Download citation

Received: 12 January 2021
Accepted: 11 October 2021
Published: 08 February 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s12652-021-03546-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of hand gesture and sign language recognition techniques

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of hand gesture and sign language recognition techniques

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation