Robust static hand gesture recognition: harnessing sparsity of deeply learned features

Mohanty, Aparna; Roy, Kankana; Sahay, Rajiv Ranjan

doi:10.1007/s00371-023-03179-0

Robust static hand gesture recognition: harnessing sparsity of deeply learned features

Original article
Published: 14 December 2023

(2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Aparna Mohanty^1,3,
Kankana Roy^2,3 &
Rajiv Ranjan Sahay³

132 Accesses
Explore all metrics

Abstract

Apart from verbal communication among humans, non-verbal interactions also play a significant role in conveying meaningful information. Non-verbal cues mainly comprise gestures, body postures, and facial expressions. Hand gestures constitute the preferred mechanism for non-verbal communication, and today, they also find utility in human–computer interaction (HCI), gaming, virtual reality, robotics, sign language, etc. While extensive research has been conducted on utilizing deep learning for hand gesture recognition, there has been a notable scarcity of efforts focused on leveraging the sparse characteristics of deeply acquired features to distinguish hand postures, even in the presence of challenges such as varying hand sizes, diverse spatial positions within images, and background clutter. We demonstrate the effect of data augmentation, transfer learning, and sparsity on the performance of the proposed algorithm using publicly available hand gesture datasets. We also provide a quantitative comparative analysis of the proposed approach with state-of-the-art algorithms for static hand gesture recognition. We illustrate a noteworthy finding wherein dictionary learning through LC-KSVD, when applied to fine-tuned features extracted from a deep architecture, outperforms the results achieved by state-of-the-art architectures in the context of hand gesture classification. We have realized substantial enhancements with our proposed methodology when compared to a baseline convolutional model. For instance, in the case of the EgoGesture dataset, we attained an accuracy of \(94.9\%\), as opposed to the baseline accuracy of \(63.3\%\), through the utilization of sparsity in deep features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

A unified approach for continuous sign language recognition and translation

Article 29 April 2024

Data Availability

The proposed approach report results on publicly available hand gesture datasets [5, 13, 30, 34,35,36,37, 41] and [42].

References

Mohanty, A., Rambhatla, S.S., Sahay, R.R.: Deep gesture: static hand gesture recognition using cnn. In: Proceedings of International Conference on Computer Vision and Image Processing, pp. 449–461, Springer (2017)
Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: Pcanet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)
Article MathSciNet Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Kumar, P.P., Vadakkepat, P., Loh, A.P.: Hand posture and face recognition using a fuzzy-rough approach. Int. J. Humanoid Rob. 7(3), 331–356 (2010)
Article Google Scholar
Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)
Article Google Scholar
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)
Article Google Scholar
Kumar, P.P., Vadakkepat, P., Poh, L.A.: Microstructure and its effect on toughness and wear resistance of laser surface melted and post heat treated high speed steel. In: 2010 11th International Conference on Control Automation Robotics Vision, pp. 1151–1156 (2010)
El-Sawah, A., Georganas, N.D., Petriu, E.M.: A prototype for 3-d hand tracking and posture estimation. IEEE Trans. Instrum. Meas. 57(8), 1627–1636 (2008)
Article Google Scholar
Teng, X., Wu, B., Yu, W., Liu, C.: A hand gesture recognition system based on local linear embedding. J. Vis. Lang. Comput. 16(5), 442–454 (2005)
Article Google Scholar
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)
Article Google Scholar
Lades, M., Vorbruggen, J.C., Buhmann, J., Lange, J., Von Der Malsburg, C., Wurtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42(3), 300–311 (1993)
Article Google Scholar
Triesch, J., Von Der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans. Patt. Anal. Mach. Intell. 23(12), 1449–1453 (2001)
Article Google Scholar
Triesch, J., von der Malsburg, C.: Robust classification of hand postures against complex backgrounds, pp. 170–175 (1996)
Triesch, J., Von Der Malsburg, C.: A gesture interface for human-robot-interaction. In: Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference On, pp. 546–551, IEEE (1998)
Li, Y.-T., Wachs, J.P.: Hierarchical elastic graph matching for hand gesture recognition. In: Iberoamerican Congress on Pattern Recognition, pp. 308–315, Springer (2012)
Wiskott, L., Krüger, N., Kuiger, N., Von Der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE Trans. Patt. Anal. Mach. Intell. 19(7), 775–779 (1997)
Article Google Scholar
Ueda, E., Matsumoto, Y., Imai, M., Ogasawara, T.: A hand-pose estimation for vision-based human interfaces. IEEE Trans. Industr. Electron. 50(4), 676–684 (2003)
Article Google Scholar
Yin, X., Xie, M.: Estimation of the fundamental matrix from uncalibrated stereo hand images for 3d hand gesture recognition. Patt. Recogn. 36(3), 567–584 (2003)
Article Google Scholar
Keskin, C., Kiraç, F., Kara, Y.E., Akarun, L.: Randomized Decision Forests for Static and Dynamic Hand Shape Classification, pp. 31–36 (2012)
Kim, S.Y., Han, H.G., Kim, J.W., Lee, S., Kim, T.W.: A hand gesture recognition sensor using reflected impulses. IEEE Sens. J. 17(10), 2975–2976 (2017)
Article Google Scholar
Xie, R., Cao, J.: Accelerometer-Based Hand Gesture Recognition by Neural Network and Similarity Matching. PhD thesis (2016)
Lu, W., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion controller. IEEE Sign. Process. Lett. 23(9), 1188–1192 (2016)
Article Google Scholar
Yang, C., Ku, B., Han, D.K., Ko, H.: Alpha-numeric hand gesture recognition based on fusion of spatial feature modelling and temporal feature modelling. Electron. Lett. 52(20), 1679–1681 (2016)
Article Google Scholar
Li, G., Zhang, R., Ritchie, M., Griffiths, H.: Sparsity-Based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, pp. 0928–0931 (2017)
Sang, Y., Shi, L., Liu, Y.: Micro hand gesture recognition system using ultrasonic active sensing. arXiv preprint arXiv:1712.00216 (2017)
Padhy, S.: A tensor-based approach using multilinear SVD for hand gesture recognition from SEMG signals. IEEE Sens. J. 21(5), 6634–6642 (2020)
Article Google Scholar
Jaramillo-Yánez, A., Benalcázar, M.E., Mena-Maldonado, E.: Real-time hand gesture recognition using surface electromyography and machine learning: a systematic literature review. Sensors 20(9), 2467 (2020)
Article Google Scholar
Oudah, M., Al-Naji, A., Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imag. 6(8), 73 (2020)
Article Google Scholar
Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. 164, 113794 (2021)
Article Google Scholar
Marcel, S., Bernier, O.: Hand posture recognition in a body-face centered space. In: Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human–Computer Interaction, pp. 97–100, Springer (1999)
Chen, D., Li, G., Sun, Y., Kong, J., Jiang, G., Tang, H., Ju, Z., Yu, H., Liu, H.: An interactive image segmentation method in hand gesture recognition. Sensors 17(2), 253 (2017)
Article Google Scholar
Ge, C., Gu, I.Y.-H., Yang, J.: Human fall detection using segment-level CNN features and sparse dictionary learning. In: Machine Learning for Signal Processing (MLSP), 2017 IEEE 27th International Workshop On, pp. 1–6, IEEE (2017)
Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5761–5770 (2020)
Barczak, A., Reyes, N., Abastillas, M., Piccio, A., Susnjak, T.: A new 2d static hand gesture colour image dataset for asl gestures. Res. Lett. Inf. Math. Sci. 15, 12–20 (2011)
Google Scholar
Kawulok, M., Kawulok, J., Nalepa, J.: Spatial-based skin detection using discriminative skin-presence features. Patt. Recognit. Lett. 41, 3–13 (2014)
Article Google Scholar
Kawulok, M.: Fast Propagation-based Skin Regions Segmentation in Color Images, pp. 1–7 (2013)
Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist Localization in Color Images for Hand Gesture Recognition, pp. 79–86 (2014)
Garcia, B., Viesca, S.A.: Real-time american sign language recognition with convolutional neural networks. Convolut. Neural Netw. Vis. Recognit. 2 (2016)
Kendon, A., Nespoulous, J.: The Biological Foundations of Gestures: Motor and Semiotic Aspects. Lawrence Erlbaum Associates, Hillsday (1986)
Google Scholar
Mohanty, A., Roy, K., Sahay, R.R.: Nrityamanthan: unravelling the intent of the dancer using deep learning. Herit. Preservation: Comput. Approach (2018). https://doi.org/10.1007/978-981-10-7221-5_11
Article Google Scholar
Gupta, P., Kautz, K., : Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2016)
Zhang, Y., Cao, C., Cheng, J., Lu, H.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
LeCun, Y., Huang, F.J., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting, vol. 2, pp. 97–1042 (2004)
Mohanty, A., Vaishnavi, P., Jana, P., Majumdar, A., Ahmed, A., Goswami, T., Sahay, R.R.: Nrityabodha: towards understanding Indian classical dance using a deep learning approach. Sign. Process.: Image Commun. 47, 529–548 (2016)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images 1 (2009)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555 (2018)
Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–8, IEEE (2019)
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Sign. Process. Mag. 25(2), 21–30 (2008)
Article Google Scholar
Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.: J. Issu. Courant Inst. Math. Sci. 59(8), 1207–1223 (2006)
Article MathSciNet Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online Dictionary Learning for Sparse Coding, pp. 689–696 (2009)
Aharon, M., Elad, M., Bruckstein, A.: \(rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311–4322 (2006)
Article Google Scholar
Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2691–2698, IEEE (2010)
Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans. Patt. Anal. Mach. Intell. 35(11), 2651–2664 (2013)
Article Google Scholar
Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)
Article MathSciNet Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Sign. Process. 41(12), 3397–3415 (1993)
Article Google Scholar
Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control 50(5), 1873–1896 (1989)
Article Google Scholar
Davis, G.M., Mallat, S.G., Zhang, Z.: Adaptive time–frequency decompositions. Opt. Eng. 33(7), 2183–2192 (1994)
Article Google Scholar
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference On, pp. 40–44, IEEE (1993)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
Article MathSciNet Google Scholar
Elad, M., Starck, J.-L., Querre, P., Donoho, D.L.: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal. 19(3), 340–358 (2005)
Article MathSciNet Google Scholar
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Article MathSciNet Google Scholar
Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Article MathSciNet Google Scholar
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311 (2006)
Article Google Scholar
Kviatkovsky, I., Gabel, M., Rivlin, E., Shimshoni, I.: On the equivalence of the LC-KSVD and the D-KSVD algorithms. IEEE Trans. Patt. Anal. Mach. Intell. 39(2), 411–416 (2017)
Article Google Scholar
Vedaldi, A., Lenc, K.: Matconvnet: convolutional neural networks for MATLAB. CoRR (2014) arXiv:1412.4564
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database, pp. 248–255 (2009)
Roy, K., Mohanty, A., Sahay, R.R.: Deep learning based hand detection in cluttered environment using skin segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 640–649 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014) arXiv:1409.1556
Chao, Y.-W., Yang, W., Xiang, Y., Molchanov, P., Handa, A., Tremblay, J., Narang, Y.S., Van Wyk, K., Iqbal, U., Birchfield, S., Kautz, J., Fox, D.: DexYCB: A benchmark for capturing hand grasping of objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)

Download references

Author information

Authors and Affiliations

Vellore Institute of Technology, Vellore, TamilNadu, India
Aparna Mohanty
Helmholtz Zentrum Hereon, Max Plank Strasses, Geesthacht, Germany
Kankana Roy
Computational Vision Lab, Electrical Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Aparna Mohanty, Kankana Roy & Rajiv Ranjan Sahay

Authors

Aparna Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Kankana Roy
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ranjan Sahay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aparna Mohanty.

Ethics declarations

Conflict of interest

The work is primarily done at the Indian Institute of Technology, Kharagpur, India, as part of the research thesis and is not funded by any external agencies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mohanty, A., Roy, K. & Sahay, R.R. Robust static hand gesture recognition: harnessing sparsity of deeply learned features. Vis Comput (2023). https://doi.org/10.1007/s00371-023-03179-0

Download citation

Accepted: 31 October 2023
Published: 14 December 2023
DOI: https://doi.org/10.1007/s00371-023-03179-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust static hand gesture recognition: harnessing sparsity of deeply learned features

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of convolutional neural networks in computer vision

A unified approach for continuous sign language recognition and translation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust static hand gesture recognition: harnessing sparsity of deeply learned features

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

A review of convolutional neural networks in computer vision

A unified approach for continuous sign language recognition and translation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation