Skip to main content

Advertisement

Log in

Robust static hand gesture recognition: harnessing sparsity of deeply learned features

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Apart from verbal communication among humans, non-verbal interactions also play a significant role in conveying meaningful information. Non-verbal cues mainly comprise gestures, body postures, and facial expressions. Hand gestures constitute the preferred mechanism for non-verbal communication, and today, they also find utility in human–computer interaction (HCI), gaming, virtual reality, robotics, sign language, etc. While extensive research has been conducted on utilizing deep learning for hand gesture recognition, there has been a notable scarcity of efforts focused on leveraging the sparse characteristics of deeply acquired features to distinguish hand postures, even in the presence of challenges such as varying hand sizes, diverse spatial positions within images, and background clutter. We demonstrate the effect of data augmentation, transfer learning, and sparsity on the performance of the proposed algorithm using publicly available hand gesture datasets. We also provide a quantitative comparative analysis of the proposed approach with state-of-the-art algorithms for static hand gesture recognition. We illustrate a noteworthy finding wherein dictionary learning through LC-KSVD, when applied to fine-tuned features extracted from a deep architecture, outperforms the results achieved by state-of-the-art architectures in the context of hand gesture classification. We have realized substantial enhancements with our proposed methodology when compared to a baseline convolutional model. For instance, in the case of the EgoGesture dataset, we attained an accuracy of \(94.9\%\), as opposed to the baseline accuracy of \(63.3\%\), through the utilization of sparsity in deep features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The proposed approach report results on publicly available hand gesture datasets [5, 13, 30, 34,35,36,37, 41] and [42].

References

  1. Mohanty, A., Rambhatla, S.S., Sahay, R.R.: Deep gesture: static hand gesture recognition using cnn. In: Proceedings of International Conference on Computer Vision and Image Processing, pp. 449–461, Springer (2017)

  2. Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: Pcanet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)

    Article  MathSciNet  Google Scholar 

  3. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  4. Kumar, P.P., Vadakkepat, P., Loh, A.P.: Hand posture and face recognition using a fuzzy-rough approach. Int. J. Humanoid Rob. 7(3), 331–356 (2010)

    Article  Google Scholar 

  5. Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)

    Article  Google Scholar 

  6. Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)

    Article  Google Scholar 

  7. Kumar, P.P., Vadakkepat, P., Poh, L.A.: Microstructure and its effect on toughness and wear resistance of laser surface melted and post heat treated high speed steel. In: 2010 11th International Conference on Control Automation Robotics Vision, pp. 1151–1156 (2010)

  8. El-Sawah, A., Georganas, N.D., Petriu, E.M.: A prototype for 3-d hand tracking and posture estimation. IEEE Trans. Instrum. Meas. 57(8), 1627–1636 (2008)

    Article  Google Scholar 

  9. Teng, X., Wu, B., Yu, W., Liu, C.: A hand gesture recognition system based on local linear embedding. J. Vis. Lang. Comput. 16(5), 442–454 (2005)

    Article  Google Scholar 

  10. Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)

    Article  Google Scholar 

  11. Lades, M., Vorbruggen, J.C., Buhmann, J., Lange, J., Von Der Malsburg, C., Wurtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42(3), 300–311 (1993)

    Article  Google Scholar 

  12. Triesch, J., Von Der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans. Patt. Anal. Mach. Intell. 23(12), 1449–1453 (2001)

    Article  Google Scholar 

  13. Triesch, J., von der Malsburg, C.: Robust classification of hand postures against complex backgrounds, pp. 170–175 (1996)

  14. Triesch, J., Von Der Malsburg, C.: A gesture interface for human-robot-interaction. In: Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference On, pp. 546–551, IEEE (1998)

  15. Li, Y.-T., Wachs, J.P.: Hierarchical elastic graph matching for hand gesture recognition. In: Iberoamerican Congress on Pattern Recognition, pp. 308–315, Springer (2012)

  16. Wiskott, L., Krüger, N., Kuiger, N., Von Der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE Trans. Patt. Anal. Mach. Intell. 19(7), 775–779 (1997)

    Article  Google Scholar 

  17. Ueda, E., Matsumoto, Y., Imai, M., Ogasawara, T.: A hand-pose estimation for vision-based human interfaces. IEEE Trans. Industr. Electron. 50(4), 676–684 (2003)

    Article  Google Scholar 

  18. Yin, X., Xie, M.: Estimation of the fundamental matrix from uncalibrated stereo hand images for 3d hand gesture recognition. Patt. Recogn. 36(3), 567–584 (2003)

    Article  Google Scholar 

  19. Keskin, C., Kiraç, F., Kara, Y.E., Akarun, L.: Randomized Decision Forests for Static and Dynamic Hand Shape Classification, pp. 31–36 (2012)

  20. Kim, S.Y., Han, H.G., Kim, J.W., Lee, S., Kim, T.W.: A hand gesture recognition sensor using reflected impulses. IEEE Sens. J. 17(10), 2975–2976 (2017)

    Article  Google Scholar 

  21. Xie, R., Cao, J.: Accelerometer-Based Hand Gesture Recognition by Neural Network and Similarity Matching. PhD thesis (2016)

  22. Lu, W., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion controller. IEEE Sign. Process. Lett. 23(9), 1188–1192 (2016)

    Article  Google Scholar 

  23. Yang, C., Ku, B., Han, D.K., Ko, H.: Alpha-numeric hand gesture recognition based on fusion of spatial feature modelling and temporal feature modelling. Electron. Lett. 52(20), 1679–1681 (2016)

    Article  Google Scholar 

  24. Li, G., Zhang, R., Ritchie, M., Griffiths, H.: Sparsity-Based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, pp. 0928–0931 (2017)

  25. Sang, Y., Shi, L., Liu, Y.: Micro hand gesture recognition system using ultrasonic active sensing. arXiv preprint arXiv:1712.00216 (2017)

  26. Padhy, S.: A tensor-based approach using multilinear SVD for hand gesture recognition from SEMG signals. IEEE Sens. J. 21(5), 6634–6642 (2020)

    Article  Google Scholar 

  27. Jaramillo-Yánez, A., Benalcázar, M.E., Mena-Maldonado, E.: Real-time hand gesture recognition using surface electromyography and machine learning: a systematic literature review. Sensors 20(9), 2467 (2020)

    Article  Google Scholar 

  28. Oudah, M., Al-Naji, A., Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imag. 6(8), 73 (2020)

    Article  Google Scholar 

  29. Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. 164, 113794 (2021)

    Article  Google Scholar 

  30. Marcel, S., Bernier, O.: Hand posture recognition in a body-face centered space. In: Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human–Computer Interaction, pp. 97–100, Springer (1999)

  31. Chen, D., Li, G., Sun, Y., Kong, J., Jiang, G., Tang, H., Ju, Z., Yu, H., Liu, H.: An interactive image segmentation method in hand gesture recognition. Sensors 17(2), 253 (2017)

    Article  Google Scholar 

  32. Ge, C., Gu, I.Y.-H., Yang, J.: Human fall detection using segment-level CNN features and sparse dictionary learning. In: Machine Learning for Signal Processing (MLSP), 2017 IEEE 27th International Workshop On, pp. 1–6, IEEE (2017)

  33. Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5761–5770 (2020)

  34. Barczak, A., Reyes, N., Abastillas, M., Piccio, A., Susnjak, T.: A new 2d static hand gesture colour image dataset for asl gestures. Res. Lett. Inf. Math. Sci. 15, 12–20 (2011)

    Google Scholar 

  35. Kawulok, M., Kawulok, J., Nalepa, J.: Spatial-based skin detection using discriminative skin-presence features. Patt. Recognit. Lett. 41, 3–13 (2014)

    Article  Google Scholar 

  36. Kawulok, M.: Fast Propagation-based Skin Regions Segmentation in Color Images, pp. 1–7 (2013)

  37. Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist Localization in Color Images for Hand Gesture Recognition, pp. 79–86 (2014)

  38. Garcia, B., Viesca, S.A.: Real-time american sign language recognition with convolutional neural networks. Convolut. Neural Netw. Vis. Recognit. 2 (2016)

  39. Kendon, A., Nespoulous, J.: The Biological Foundations of Gestures: Motor and Semiotic Aspects. Lawrence Erlbaum Associates, Hillsday (1986)

    Google Scholar 

  40. Mohanty, A., Roy, K., Sahay, R.R.: Nrityamanthan: unravelling the intent of the dancer using deep learning. Herit. Preservation: Comput. Approach (2018). https://doi.org/10.1007/978-981-10-7221-5_11

    Article  Google Scholar 

  41. Gupta, P., Kautz, K., : Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2016)

  42. Zhang, Y., Cao, C., Cheng, J., Lu, H.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018)

    Article  Google Scholar 

  43. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  44. LeCun, Y., Huang, F.J., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting, vol. 2, pp. 97–1042 (2004)

  45. Mohanty, A., Vaishnavi, P., Jana, P., Majumdar, A., Ahmed, A., Goswami, T., Sahay, R.R.: Nrityabodha: towards understanding Indian classical dance using a deep learning approach. Sign. Process.: Image Commun. 47, 529–548 (2016)

    Google Scholar 

  46. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  Google Scholar 

  47. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images 1 (2009)

  48. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  49. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555 (2018)

  50. Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)

  51. Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–8, IEEE (2019)

  52. Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Sign. Process. Mag. 25(2), 21–30 (2008)

    Article  Google Scholar 

  53. Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.: J. Issu. Courant Inst. Math. Sci. 59(8), 1207–1223 (2006)

    Article  MathSciNet  Google Scholar 

  54. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online Dictionary Learning for Sparse Coding, pp. 689–696 (2009)

  55. Aharon, M., Elad, M., Bruckstein, A.: \(rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311–4322 (2006)

    Article  Google Scholar 

  56. Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2691–2698, IEEE (2010)

  57. Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans. Patt. Anal. Mach. Intell. 35(11), 2651–2664 (2013)

    Article  Google Scholar 

  58. Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)

    Article  MathSciNet  Google Scholar 

  59. Mallat, S.G., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Sign. Process. 41(12), 3397–3415 (1993)

    Article  Google Scholar 

  60. Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control 50(5), 1873–1896 (1989)

    Article  Google Scholar 

  61. Davis, G.M., Mallat, S.G., Zhang, Z.: Adaptive time–frequency decompositions. Opt. Eng. 33(7), 2183–2192 (1994)

    Article  Google Scholar 

  62. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference On, pp. 40–44, IEEE (1993)

  63. Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

    Article  MathSciNet  Google Scholar 

  64. Elad, M., Starck, J.-L., Querre, P., Donoho, D.L.: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal. 19(3), 340–358 (2005)

    Article  MathSciNet  Google Scholar 

  65. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)

    Article  MathSciNet  Google Scholar 

  66. Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)

    Article  MathSciNet  Google Scholar 

  67. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  68. Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311 (2006)

    Article  Google Scholar 

  69. Kviatkovsky, I., Gabel, M., Rivlin, E., Shimshoni, I.: On the equivalence of the LC-KSVD and the D-KSVD algorithms. IEEE Trans. Patt. Anal. Mach. Intell. 39(2), 411–416 (2017)

    Article  Google Scholar 

  70. Vedaldi, A., Lenc, K.: Matconvnet: convolutional neural networks for MATLAB. CoRR (2014) arXiv:1412.4564

  71. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database, pp. 248–255 (2009)

  72. Roy, K., Mohanty, A., Sahay, R.R.: Deep learning based hand detection in cluttered environment using skin segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 640–649 (2017)

  73. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014) arXiv:1409.1556

  74. Chao, Y.-W., Yang, W., Xiang, Y., Molchanov, P., Handa, A., Tremblay, J., Narang, Y.S., Van Wyk, K., Iqbal, U., Birchfield, S., Kautz, J., Fox, D.: DexYCB: A benchmark for capturing hand grasping of objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

  75. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aparna Mohanty.

Ethics declarations

Conflict of interest

The work is primarily done at the Indian Institute of Technology, Kharagpur, India, as part of the research thesis and is not funded by any external agencies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohanty, A., Roy, K. & Sahay, R.R. Robust static hand gesture recognition: harnessing sparsity of deeply learned features. Vis Comput (2023). https://doi.org/10.1007/s00371-023-03179-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03179-0

Keywords

Navigation