Skip to main content
Log in

Evaluation of hidden Markov models using deep CNN features in isolated sign recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Akram S, Beskow J, Kjellstrom H (2012) Visual recognition of isolated swedish sign language signs. arXiv:1211.3901[cs]

  2. Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. International Journal of Machine Learning and Cybernetics 10 (1):131–153. https://doi.org/10.1007/s13042-017-0705-5

    Article  Google Scholar 

  3. Combrink JH (2018) Discriminative training of hidden Markov models for gesture recognition. Master’s thesis, University of Cape Town. https://open.uct.ac.za/handle/11427/29267

  4. Cooper H, Ong EJ, Pugeault N, Bowden R (2012) Sign language recognition using sub-units. J Mach Learn Res 13 (Jul):2205–2231. http://www.jmlr.org/papers/v13/cooper12a.html

    Google Scholar 

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). https://doi.org/10.1109/CVPR.2005.177. ISSN: 1063-6919, vol 1, pp 886–893

  6. Escalera S, Athitsos V, Guyon I (2017) Challenges in multi-modal gesture recognition. In: Escalera S, Guyon I, Athitsos V (eds) Gesture recognition, the springer series on challenges in machine learning. https://doi.org/10.1007/978-3-319-57021-1_1. Springer International Publishing, Cham, pp 1–60

  7. Escalera S, Baró X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2014) Chalearn looking at people challenge 2014: dataset and results. In: Workshop at the European conference on computer vision. Springer, pp 459–473

  8. Escalera S, Gonzàlez J, Baró X, Reyes M, Lopes O, Guyon I, Athitsos V, Escalante H (2013) Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. https://doi.org/10.1145/2522848.2532595. Association for Computing Machinery, Sydney, pp 445–452

  9. Forney G (1973) The viterbi algorithm. Proceedings of the IEEE 61(3):268–278. https://doi.org/10.1109/PROC.1973.9030. Conference Name: Proceedings of the IEEE

    Article  MathSciNet  Google Scholar 

  10. Grobel K, Assan M (1997) Isolated sign language recognition using hidden Markov models. In: Computational cybernetics and simulation 1997 IEEE international conference on systems, man, and cybernetics. https://doi.org/10.1109/ICSMC.1997.625742, vol 1, pp 162–167

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  12. Huang J, Zhou W, Li H, Li W (2015) Sign language recognition using 3D convolutional neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME). https://doi.org/10.1109/ICME.2015.7177428, pp 1–6

  13. Keogh E, Mueen A (2017) Curse of dimensionality. Springer US, Boston, pp 314–315. https://doi.org/10.1007/978-1-4899-7687-1_192

    Google Scholar 

  14. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  15. Koller O, Zargaran S, Ney H, Bowden R (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis 126(12):1311–1325. https://doi.org/10.1007/s11263-018-1121-3

    Article  Google Scholar 

  16. Li F, Neverova N, Wolf C, Taylor G (2017) Modout: learning multi-modal architectures by stochastic regularization. In: 2017 12th IEEE international conference on automatic face gesture recognition (FG 2017). https://doi.org/10.1109/FG.2017.59. ISSN: null, pp 422–429

  17. Liu L, Shao L (2013) Learning discriminative representations from RGB-d video data. In: Proceedings of the twenty-third international joint conference on artificial intelligence, IJCAI ’13. AAAI Press, Beijing, pp 1493–1500

  18. Mannor S, Peleg D, Rubinstein R (2005) The cross entropy method for classification. In: Proceedings of the 22nd international conference on Machine learning, ICML ’05. https://doi.org/10.1145/1102351.1102422. Association for Computing Machinery, Bonn, Germany, pp 561–568

  19. Mercanoglu Sincan O, Tur AO, Yalim Keles H (2019) Isolated sign language recognition with multi-scale features using LSTM. In: 2019 27th signal processing and communications applications conference (SIU). https://doi.org/10.1109/SIU.2019.8806467. ISSN: 2165-0608, pp 1–4

  20. Murakami K, Taguchi H (1991) Gesture recognition using recurrent neural networks. In: Proceedings of the SIGCHI conference on Human factors in computing systems Reaching through technology - CHI ’91. https://doi.org/10.1145/108844.108900. http://portal.acm.org/citation.cfm?doid=108844.108900. ACM Press, New Orleans, pp 237–242

  21. Neverova N, Wolf C, Taylor G, Nebout F (2016) ModDrop: adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(8):1692–1706. https://doi.org/10.1109/TPAMI.2015.2461544. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

    Article  Google Scholar 

  22. Núñez JC, Cabido R, Pantrigo JJ, Montemayor AS, Vélez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition 76:80–94. https://doi.org/10.1016/j.patcog.2017.10.033. http://www.sciencedirect.com/science/article/pii/S0031320317304405

    Article  Google Scholar 

  23. Nishida N, Nakayama H (2016) Multimodal gesture recognition using multi-stream recurrent neural network. In: Bräunl T, McCane B, Rivera M, Yu X (eds) Image and video technology, lecture notes in computer science. https://doi.org/10.1007/978-3-319-29451-3_54. Springer International Publishing, Cham, pp 682–694

  24. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS-W

  25. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2014) Sign language recognition using convolutional neural networks. In: Workshop at the european conference on computer vision. Springer, pp 572–578

  26. Pigou L, van den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision 126 (2-4):430–439. https://doi.org/10.1007/s11263-016-0957-7. http://link.springer.com/10.1007/s11263-016-0957-7

    Article  MathSciNet  Google Scholar 

  27. Pisharady PK, Saerbeck M (2015) Recent methods and databases in vision-based hand gesture recognition: a review. Computer Vision and Image Understanding 141:152–165. https://doi.org/10.1016/j.cviu.2015.08.004. http://www.sciencedirect.com/science/article/pii/S1077314215001794

    Article  Google Scholar 

  28. Rabiner L, Juang B (1986) An introduction to hidden markov models. IEEE ASSP Mag 3(1):4–16

    Article  Google Scholar 

  29. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  30. Santos CCD, Samatelo JLA, Vassallo RF (2020) Dynamic gesture recognition by using CNNs and star RGB: S temporal information condensation. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.03.038. http://www.sciencedirect.com/science/article/pii/S092523122030391X

  31. Schreiber J (2018) Pomegranate: fast and flexible probabilistic modeling in python. arXiv:1711.001371711.00137[cs, stat]

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  33. Tsironi E, Barros P, Wermter S (2016) Gesture recognition with a convolutional long short-term memory recurrent neural network. Comput Intell: 6

  34. Tur AO, Keles HY (2019) Isolated sign recognition with a siamese neural network of RGB and depth streams. In: IEEE EUROCON 2019 -18th international conference on smart technologies. https://doi.org/10.1109/EUROCON.2019.8861945, pp 1–6

Download references

Acknowledgements

The research presented is part of a project funded by The Scientific and Technological Research Council of Turkey (TÜBİTAK) under the grant number 217E022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hacer Yalim Keles.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tur, A.O., Keles, H.Y. Evaluation of hidden Markov models using deep CNN features in isolated sign recognition. Multimed Tools Appl 80, 19137–19155 (2021). https://doi.org/10.1007/s11042-021-10593-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10593-w

Keywords

Navigation