Skip to main content
Log in

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Hand gesture and action recognition have been extensively researched in the past two decades due to the emerging advanced acquisition and interaction technologies, which open the floodgates for a vast range of potential applications. Particularly, many spatial–temporal feature extractors have been proposed, such as RNNs-based models, temporal convolutional network (TCN), and 3D convolutional neural networks (3DCNN) for modeling long-term dependencies in sequential data. However, it remains challenging to obtain a high recognition rate because of the difficulty of effectively extracting spatial–temporal features and efficiently classifying them with noisy and complex skeleton sequences. Therefore, this paper proposes a deep ensemble framework called multi-model ensemble gesture recognition network (MMEGRN) for skeleton-based hand gesture recognition. Specifically, to establish effective feature extraction and accurate gesture recognition, we propose an architecture consisting of four sub-networks, three spatio-temporal features classifiers to leverage their various capabilities of extracting and classifying skeleton sequences. Through late feature fusion, the features resulted from the feature extractors of each sub-network are fused into a new fusion classifier. Each subnetwork is trained independently to perform the task of gesture recognition using only skeleton joints. The training is performed using the cyclic annealing learning rate to generate a series of models that are combined in an ensemble using the optimized weighted ensemble (OWE) method. The proposed framework combines deep learning and ensemble strengths to establish a new deep-learning network architecture for more accurate and efficient hand gesture recognition. Extensive experiments on three skeleton-based hand gesture recognition datasets have shown the effectiveness of the proposed framework and the superiority over other models in terms of recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig.2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.intelrealsense.com/.

  2. https://www.ultraleap.com/product/leap-motion-controller/.

  3. http://www-rech.telecom-lille.fr/DHGdataset/.

  4. http://www-rech.telecom-lille.fr/shrec2017-hand/?#gestures.

  5. https://www-intuidoc.irisa.fr/en/english-leap-motion-dynamic-hand-gesture-lmdhg-database/.

References

  • Ashukha A, Lyzhov A, Molchanov D, Vetrov D (2020) Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. https://arxiv.org/abs/2002.06470

  • Avola D, Bernardi M, Cinque L et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2856094

    Article  Google Scholar 

  • Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. https://arxiv.org/abs/1803.01271

  • Boulahia SY, Anquetil E, Multon F, Kulpa R (2018) Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: Proceedings of the 7th international conference on image processing theory, tools and applications, IPTA 2017, pp 1–6

  • Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion. https://doi.org/10.1016/j.inffus.2004.04.004

    Article  Google Scholar 

  • Cao Z, Hidalgo Martinez G, Simon T et al (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2929257

    Article  Google Scholar 

  • Chen X, Wang G, Guo H et al (2019a) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19:239. https://doi.org/10.3390/s19020239

    Article  Google Scholar 

  • Chen X, Wang G, Guo H et al (2019b) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors. https://doi.org/10.3390/s19020239

    Article  Google Scholar 

  • Chen Y, Zhao L, Peng X et al (2020) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: 30th British machine vision conference 2019, BMVC 2019, pp 48.1–48.13

  • Chollet F et al (2015) Keras. https://github.com/fchollet/keras

  • De Smedt Q (2017) Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. Université de Lille 1

  • De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 1–9

  • De Smedt Q, Wannous H, Vandeborre JP et al (2017) SHREC’17 track: 3D hand gesture recognition using a depth and skeletal dataset. In: Eurographics workshop on 3D object retrieval, EG 3DOR, pp 1–6

  • Devanne M, Wannous H, Berretti S et al (2015) 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2014.2350774

    Article  Google Scholar 

  • Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: Proceedings of 13th IEEE international conference on automatic face and gesture recognition, FG 2018, pp 106–113

  • Dietterich TG (2000) Ensemble methods in machine learning: multiple classifier systems. Springer, Berlin

    Book  Google Scholar 

  • Doosti B (2019) Hand pose estimation: a survey. https://arxiv.org/abs/1903.01013

  • El-Baz AH, Tolba AS (2013) An efficient algorithm for 3D hand gesture recognition using combined neural classifiers. Neural Comput Appl. https://doi.org/10.1007/s00521-012-0844-2

    Article  Google Scholar 

  • Hashem S (1997) Optimal linear combinations of neural networks. Neural Netw. https://doi.org/10.1016/S0893-6080(96)00098-6

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, IEEE Computer Society, pp 770–778

  • Hou J, Wang G, Chen X et al (2019) Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 273–286

  • Huang G, Li Y, Pleiss G et al (2017) Snapshot ensembles: train 1, get M for free. In: 5th International conference on learning representations, ICLR 2017

  • Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  • Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR 2015

  • Kobylarz J, Bird JJ, Faria DR et al (2020) Thumbs up, thumbs down: non-verbal human-robot interaction through real-time EMG classification via inductive and supervised transductive transfer learning. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01852-z

    Article  Google Scholar 

  • Kong Y, Li L, Zhang K et al (2019) Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. J Electron Imaging. https://doi.org/10.1117/1.jei.28.4.043032

    Article  Google Scholar 

  • Kraft D (1988) A software package for sequential quadratic programming. Dfvlr-Fb. http://degenerateconic.com/wp-content/uploads/2018/03/DFVLR_FB_88_28.pdf

  • Lai K, Yanushkevich S (2020) An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: Proceedings of the international joint conference on neural networks, pp 1–7

  • Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362. https://doi.org/10.1016/j.patcog.2017.02.030

    Article  Google Scholar 

  • Liu H, Tu J, Liu M, Ding R (2018) Learning explicit shape and motion evolution maps for skeleton-based human action recognition. In: ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings, Institute of Electrical and Electronics Engineers Inc., pp 1333–1337

  • Liu J, Liu Y, Wang Y (2020) Decoupled representation learning for skeleton-based gesture recognition. In: IEEE/CVF conference on computer vision and pattern recognition, pp 5751–5760

  • Lupinetti K, Ranieri A, Giannini F, Monti M (2020) 3D dynamic hand gestures recognition using the Leap Motion sensor and convolutional neural networks. https://arxiv.org/abs/2003.01450

  • Ma C, Wang A, Chen G, Xu C (2018) Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis Comput. https://doi.org/10.1007/s00371-018-1556-0

    Article  Google Scholar 

  • Maghoumi M, LaViola JJ (2019) DeepGRU: deep gesture recognition utility. In: International symposium on visual computing, pp 16–31

  • Mohammed AAQ, Lv J, Islam MDS (2019a) A deep learning-based end-to-end composite system for hand detection and gesture recognition. Sensors. https://doi.org/10.3390/s19235282

    Article  Google Scholar 

  • Mohammed AAQ, Lv J, Islam MS (2019b) Small deep learning models for hand gesture recognition. In: Proceedings of 2019 IEEE international conference on parallel and distributed processing with applications, big data and cloud computing, sustainable computing and communications, social computing and networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp 1429–1435

  • Nguyen XS, Brun L, Lezoray O, Bougleux S (2019) A neural network based on spd manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 12036–12045

  • Núñez JC, Cabido R, Pantrigo JJ et al (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.10.033

    Article  Google Scholar 

  • Ohn-Bar E, Trivedi MM (2013) Joint angles similarities and HOG2 for action recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 465–470

  • Oord A van den, Dieleman S, Zen H et al (2016) WaveNet: a generative model for raw audio based on PixelCNN architecture. https://arxiv.org/abs/1609.03499

  • Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 716–723

  • Ponti MP (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: Proceedings of 24th SIBGRAPI conference on graphics, patterns, and images tutorials, SIBGRAPI-T 2011, pp 1–10

  • Shahhosseini M, Hu G, Pham H (2019) Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. https://arxiv.org/abs/1908.05287

  • Shi X, Chen Z, Wang H et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  • Shin S, Kim WY (2020) Skeleton-based dynamic hand gesture recognition using a part-based GRU-RNN for gesture-based interface. IEEE Access 8:50236–50243. https://doi.org/10.1109/ACCESS.2020.2980128

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  • Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–9

  • Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508

  • Wang GW, Zhang C, Zhuang J (2012) An application of classifier combination methods in hand gesture recognition. Math Probl Eng. https://doi.org/10.1155/2012/346951

    Article  Google Scholar 

  • Wu J, Ishwar P, Konrad J (2016) Two-stream CNNs for gesture-based verification and identification: learning user style. In: IEEE computer society conference on computer vision and pattern recognition workshops, IEEE Computer Society, pp 110–118

  • Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 20–27

  • Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, pp 7444–7452

  • Yang F, Wu Y, Sakti S, Nakamura S (2019) Make skeleton-based action recognition model smaller, faster and better. In: 1st ACM international conference on multimedia in Asia, MMAsia 2019, pp 1–6

  • Zhang S, Yang Y, Xiao J et al (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2802648

    Article  Google Scholar 

  • Zhu W, Lan C, Xing J et al (2016) Co-Occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI conference on artificial intelligence, AAAI 2016, pp 3697–3704

Download references

Funding

This work is supported by the National Natural Science Foundation of China (Grant No. U1831121), and the Science and Technology Major Project of Sichuan province (Grant Nos. 2019ZDZX0006 and 2020YFG0478).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongsheng Sang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohammed, A.A.Q., Lv, J., Islam, M.S. et al. Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition. J Ambient Intell Human Comput 14, 6829–6842 (2023). https://doi.org/10.1007/s12652-021-03546-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03546-6

Keywords

Navigation