Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled

Paul, Soumi; Nasser, Hayat; Mollah, Ayatullah Faruk; Bhattacharyya, Arpan; Ngo, Phuc; Nasipuri, Mita; Debled-Rennesson, Isabelle; Basu, Subhadip

doi:10.1007/s11042-021-11745-8

Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled

Published: 26 January 2022

Volume 81, pages 7285–7321, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Soumi Paul ORCID: orcid.org/0000-0002-1722-5842¹,
Hayat Nasser^2,3,
Ayatullah Faruk Mollah⁴,
Arpan Bhattacharyya¹,
Phuc Ngo^2,3,
Mita Nasipuri¹,
Isabelle Debled-Rennesson^2,3 &
…
Subhadip Basu¹

271 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Reliable hand gesture recognition is extremely relevant for automatic interpretation of sign languages used by people with hearing and speech disabilities. In this work, we present (i) new benchmark datasets of depth-sensor based, multi-oriented, isolated and static hand gestures of numerals and alphabets following the conventions of American Sign Language (ASL), (ii) an effective strategy for segmentation of hand region from depth data and appropriate preprocessing for feature extraction, and (iii) an effective statistical-geometrical feature set for recognition of multi-oriented hand gestures. Besides setting benchmark performances on the developed datasets, viz. 97.67%, 96.53% and 96.86% on numerals, alphabets and alpha-numerals respectively, the proposed pipeline is also implemented on two related public datasets and is found superior to state-of-the-art methods reported so far.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MRCS: multi-radii circular signature based feature descriptor for hand gesture recognition

Article 02 February 2022

A Statistical-Topological Feature Combination for Recognition of Isolated Hand Gestures from Kinect Based Depth Images

A New Comprehensive Database for Hand Gesture Recognition

References

Bai X, Latecki LJ (2008) Path similarity skeleton graph matching. IEEE Trans Pattern Anal Mach Intell 30(7):1282–1292
Article Google Scholar
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Article Google Scholar
Debled-Rennesson I, Feschet F, Rouyer-Degli J (2006) Optimal blurred segments decomposition of noisy shapes in linear time. Comput Graph 30 (1):30–36
Article Google Scholar
Dewaele G, Devernay F, Horaud R (2004) Hand motion from 3d point trajectories and a smooth surface model. In: European Conference on Computer Vision. Springer, pp 495–507
Dinh DL, Lee S, Kim TS (2016) Hand number gesture recognition using recognized hand parts in depth images. Multimed Tools Appl 75 (2):1333–1348
Article Google Scholar
Geetha M, Manjusha C, Unnikrishnan P, Harikrishnan R (2013) A vision based dynamic gesture recognition of indian sign language on kinect based depth images. in: 2013 international conference on emerging trends in communication, control, signal processing and computing applications (C2SPCA). IEEE, pp 1–7
Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294
Article Google Scholar
Kapuscinski T, Oszust M, Wysocki M (2013) Recognition of signed dynamic expressions observed by tof camera. In: 2013 signal processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp 291–296
Kerautret B, Lachaud J (2014) Meaningful scales detection: an unsupervised noise detection algorithm for digital contours. Image Process Line 4:98–115
Article Google Scholar
Kerautret B, Lachaud J, Said M (2012) Meaningful thickness detection on polygonal curve. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods: ICPRAM. vol 1. SciTePress, pp 372–379
Kry PG, Pai DK (2006) Interaction capture and synthesis. In: ACM Transactions on Graphics (TOG). vol 25. ACM, pp 872–880
Lachaud J (2010) Digital shape analysis with maximal segments. In: International Workshop on Applications of Discrete Geometry and Mathematical Morphology. Springer, p p14–27
Le QK, Pham CH, Le TH (2012) Road traffic control gesture recognition using depth images. IEIE Trans Smart Process Comput 1(1):1–7
Liang H, Yuan J, Thalmann D (2014) Parsing the hand in depth images. IEEE Trans Multimed 16(5):1241–1253
Article Google Scholar
Lv Z (2013) Wearable smartphone: Wearable hybrid framework for hand and foot gesture interaction on smartphone. In: Proceedings of the IEEE international conference on computer vision workshops, pp 436–443
Lv Z, Esteve C, Chirivella J, Gagliardo P (2017) Serious game based personalized healthcare system for dysphonia rehabilitation. Pervasive Mob Comput 41:504–519
Article Google Scholar
Lv Z, Halawani A, Feng S, Li H, Réhman SU (2014) Multimodal hand and foot gesture interaction for handheld devices. ACM Trans Multimed Comput Communications Appl (TOMM) 11(1s):1–19
Article Google Scholar
Lv Z, Halawani A, Feng S, Ur Réhman S, Li H (2015) Touch-less interactive augmented reality game on vision-based wearable device. Pers Ubiquit Comput 19(3):551–567
Article Google Scholar
Mitra S, Acharya T (2007) Gesture recognition: A survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324
Article Google Scholar
Nasser H, Ngo P, Debled-Rennesson I (2018) Dominant point detection based on discrete curve structure and applications. J Comput Syst Sci 95 (1):177–192
Article MathSciNet Google Scholar
Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135
Article MathSciNet Google Scholar
Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135
Article MathSciNet Google Scholar
Ngo P, Nasser H, Debled-Rennesson I (2015) Efficient dominant point detection based on discrete curve structure. In: International Workshop on Combinatorial Image Analysis (IWCIA), Kolkata, India. Volume 9448 of LNCS, pp 143–156
Ngo P, Nasser H, Debled-Rennesson I, Kerautret B (2016) Adaptive tangential cover for noisy digital contours. In: Discrete Geometry for Computer Imagery - 19th IAPR International Conference, DGCI 2016, Nantes, France. Volume 9647 of LNCS, p p439–451
Nguyen TP, Debled-Rennesson I (2011) A discrete geometry approach for dominant point detection. Pattern Recogn 44(1):32–44
Article Google Scholar
Paul S, Basu S, Nasipuri M (2015) Microsoft kinect in gesture recognition: A short review. Int J Control Theory Appl 8(5):2071-2076
Paul S, Bhattacharyya A, Mollah AF, Basu S, Nasipuri M (2019) Hand segmentation from complex background for gesture recognition. In: Emerging Technology in Modelling and Graphics. Springer Singapore, p 775–782
Paul S, Nasser H, Nasipuri M, Ngo P, Basu S, Debled-Rennesson I (2017) A statistical-topological feature combination for recognition of isolated hand gestures from kinect based depth images. In: 18th international workshop on combinatorial image analysis (IWCIA). Springer LNCS, pp 256–267
Plouffe G, Cretu AM (2015) Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans Instrumen Measur 65(2):305–316
Article Google Scholar
Qin S, Zhu X, Yang Y, Jiang Y (2014) Real-time hand gesture recognition from depth images using convex shape decomposition method. J Signal Process Syst 74(1):47–58
Article Google Scholar
Ren Z, Meng J, Yuan J (2011) Depth camera based hand gesture recognition and its applications in human-computer-interaction. In: Communications and Signal Processing (ICICS) 2011 8th International Conference on Information. IEEE, pp 1–5
Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using kinect sensor. IEEE Trans Multimed 15(5):1110–1120
Article Google Scholar
Reveillès JP (1991) Géométrie discrète, calculs en nombre entiers et algorithmique. Thèse d’état. Université Louis Pasteur, Strasbourg
She Y, Wang Q, Jia Y, Gu T, He Q, Yang B (2014) A real-time hand gesture recognition approach based on motion features of feature points. In: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering. IEEE Computer Society, pp 1096–1102
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moor e R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Suarez J, Murphy RR (2012) Hand gesture recognition with depth images: a review. In: 2012 IEEE RO-MAN. IEEE, pp 411–417
Wang C, Liu Z, Chan SC (2015) Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans Multimed 17(1):29–39
Article Google Scholar
Wu Y, Lin J, Huang TS (2005) Analyzing and capturing articulated hand motion in image sequences. IEEE Trans Pattern Anal Mach Intell 27 (12):1910–1922
Article Google Scholar
Yang MH, Ahuja N, Tabb M (2002) Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans Pattern Anal Mach Intell 24(8):1061–1074
Article Google Scholar
Zhang C, Yang X, Tian Y (2013) Histogram of 3d facets: A characteristic descriptor for hand gesture recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, pp 1–8

Download references

Acknowledgements

This work is partially supported by three grants from the Govt. of India, namely, grant no. SR/WOS-A/ET-1001/2015, grant no. EMR/2016/007213 of the Department of Science and Technology (DST) within the Ministry of Science and Technology, and grant no. BT/PR16356/BID/7/596/2016 of the Department of Biotechnology and Rashtriya Uchchatar Shiksha Abhiyan (RUSA) from the Department of Higher Education, Govt. of India.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Jadavpur University, Kolkata, 700032, India
Soumi Paul, Arpan Bhattacharyya, Mita Nasipuri & Subhadip Basu
Université de Lorraine, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France
Hayat Nasser, Phuc Ngo & Isabelle Debled-Rennesson
CNRS, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France
Hayat Nasser, Phuc Ngo & Isabelle Debled-Rennesson
Department of Computer Science & Engineering, Aliah University, Kolkata, 700160, India
Ayatullah Faruk Mollah

Authors

Soumi Paul
View author publications
You can also search for this author in PubMed Google Scholar
Hayat Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Ayatullah Faruk Mollah
View author publications
You can also search for this author in PubMed Google Scholar
Arpan Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
Phuc Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Mita Nasipuri
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Debled-Rennesson
View author publications
You can also search for this author in PubMed Google Scholar
Subhadip Basu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumi Paul.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Theory of Discrete Contours

In this section, we recall a method of contour simplification based on selected dominant points. They are computed from a discrete structure, named adaptive tangential cover (ATC) reported by [21, 24], which is well adapted to analyze irregular noisy contours.

A.1 Adaptive Tangential Cover [21, 24]

An adaptive tangential cover (ATC) is composed of a sequence of maximal straight segments, called maximal blurred segments, of the studied contour. The

notion of maximal blurred segment has been introduced by [3] as an extension of arithmetical discrete line presented by [33] with a width parameter for noisy or disconnected digital contours.

Definition 1

An arithmetical discrete line \({\mathcal D}(a,b,\mu ,\omega )\), with a main vector (b,a), a lower bound μ and an arithmetic thickness ω (with \(a,b,\mu ,\omega \in \mathbb {Z}\) and gcd(a,b) = 1) is the set of integer points (x,y) verifying μ ≤ ax − by < μ + ω.

Definition 2

A set S_f is a blurred segment of width ν if the discrete line \({\mathcal D}(a,b,\mu ,\omega )\) containing S_f has the vertical (or horizontal) distance \(d=\frac {\omega -1}{\max \limits {(\mid a \mid , \mid b\mid )}}\) equal to the vertical (or horizontal) thickness of the convex hull of S_f, and d ≤ ν (see Fig. 9a).

Let C be a discrete curve and C_i,j a sequence of points of C indexed from i to j. Let denote the predicate “C_i,j is a blurred segment of width ν” as BS(i,j,ν).

Definition 3

C_i,j is called a maximal blurred segment (MBS) of width ν and denoted MBS(i,j,ν) iff BS(i,j,ν), ¬BS(i,j + 1,ν) and ¬BS(i − 1,j,ν) (see Fig. 9b).

An ATC is designed to capture the local noise on curve by adjusting the thickness of maximal blurred segments in according with the amount of noise present along the contour. In order to prevent the local perturbation, the meaningful thickness [9, 10] estimator is integrated in the construction of ATC as a noise detector at each point of the curve. This meaningful thickness is used as an input parameter to compute the ATC with appropriate widths w.r.t. noise. A non-parametric algorithm is developed in [21, 24] to compute the ATC of a given discrete curve. In the ATC, the obtained MBS decomposition of various widths transmits the noise levels and the geometrical structure of the given discrete curve (see Fig. 10a, c).

A.2 Polygonal simplification [23,24,25]

Using the ATC, we detect the points of local maximum curvature, called dominant points (DP), on the digital curve. Such points contain a rich information which allows to characterize and describe the curve. Issued from the dominant point detection proposed in [23, 25] and the notion of ATC, an algorithm is developed in [24] to determine the dominant points of a given noisy digital curve C. The main idea is that the candidate dominant points are localized in the common zones of successive MBS of the ATC of C.

Then, by using a simple measure of angle m, we can identify the dominant point as point having the smallest angle. More precisely, this measure m is the angle between the considered point and the two left and right endpoints of the left and right MBS involved in the studied common zone. When the considered point varies, m becomes a function of it. A dominant point is defined as a local minimum of m. ATCs are illustrated in Fig. 10a, c, and Dominant points are illustrated in Fig. 10b, d in red points. Red lines represent the polygonal representation of the shape.

First goal of finding the dominant points is to have an approximate description of the input curve, called polygonal simplification. However, due to the nature of the tangential cover, dominant points usually stay very close to each others, which is presumably undesirable in particular for polygonal simplification. So, we associate to each detected dominant point a weight, i.e, the ratio of integral sum of square errors and the angle with the two dominant point neighbours, indicating its importance with respect to the approximating polygon of the curve. Polygonal simplification is illustrated in Fig. 10b, d with green lines.

B Database Images

B.1 Image Database JU_V2_DIGIT

We have collected 1000 images of 0 to 9 from 10 people. For each digit we have collected 10 orientations. To create this dataset, first we have created a small dataset collection tool. After setting up Kinect with the system, this tool helps us to save images on a single click of the save button.

As at runtime, we are able to locate the hand position, so on the click of the save button, we are able to save an RGB image of the whole scene, a depth image of the whole scene, an RGB image with hand region annotated in it, a cropped RGB image with only hand, cropped depth values of only hand, and hand and wrist depth values.

We use depth values here for our further experimentation. First, we convert our depth values to a depth image. Then we threshold the depth image to extract the Region of Interest and extracting the contours. Our further experimentation involves contour based feature extraction. Details are given in the main paper.

In Figs. 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20, we present some sample images from this database. Image ID Pi_Gj_k means that it corresponds to the i-th person, the j-th gesture and the k-th orientation.

B.2 Image Database JU_V2_ALPHA

We have collected another set of alphabetic images with the same tool mentioned above. Here we have captured from a to z (excluding j and z as they are dynamic) 24 static images from 10 people. Each image has 10 orientation. So all together there are 2400 images in the dataset.

In Fig. 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 and Fig. 44, we present some sample images from this database. Image ID Pi_α_k means that it corresponds to the i-th person, the gesture α and the k-th orientation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paul, S., Nasser, H., Mollah, A.F. et al. Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled. Multimed Tools Appl 81, 7285–7321 (2022). https://doi.org/10.1007/s11042-021-11745-8

Download citation

Received: 18 February 2021
Revised: 28 September 2021
Accepted: 22 November 2021
Published: 26 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11042-021-11745-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled

Abstract

Access this article

Similar content being viewed by others

MRCS: multi-radii circular signature based feature descriptor for hand gesture recognition

A Statistical-Topological Feature Combination for Recognition of Isolated Hand Gestures from Kinect Based Depth Images

A New Comprehensive Database for Hand Gesture Recognition

References

Acknowledgements