Skip to main content
Log in

Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Reliable hand gesture recognition is extremely relevant for automatic interpretation of sign languages used by people with hearing and speech disabilities. In this work, we present (i) new benchmark datasets of depth-sensor based, multi-oriented, isolated and static hand gestures of numerals and alphabets following the conventions of American Sign Language (ASL), (ii) an effective strategy for segmentation of hand region from depth data and appropriate preprocessing for feature extraction, and (iii) an effective statistical-geometrical feature set for recognition of multi-oriented hand gestures. Besides setting benchmark performances on the developed datasets, viz. 97.67%, 96.53% and 96.86% on numerals, alphabets and alpha-numerals respectively, the proposed pipeline is also implemented on two related public datasets and is found superior to state-of-the-art methods reported so far.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bai X, Latecki LJ (2008) Path similarity skeleton graph matching. IEEE Trans Pattern Anal Mach Intell 30(7):1282–1292

    Article  Google Scholar 

  2. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522

    Article  Google Scholar 

  3. Debled-Rennesson I, Feschet F, Rouyer-Degli J (2006) Optimal blurred segments decomposition of noisy shapes in linear time. Comput Graph 30 (1):30–36

    Article  Google Scholar 

  4. Dewaele G, Devernay F, Horaud R (2004) Hand motion from 3d point trajectories and a smooth surface model. In: European Conference on Computer Vision. Springer, pp 495–507

  5. Dinh DL, Lee S, Kim TS (2016) Hand number gesture recognition using recognized hand parts in depth images. Multimed Tools Appl 75 (2):1333–1348

    Article  Google Scholar 

  6. Geetha M, Manjusha C, Unnikrishnan P, Harikrishnan R (2013) A vision based dynamic gesture recognition of indian sign language on kinect based depth images. in: 2013 international conference on emerging trends in communication, control, signal processing and computing applications (C2SPCA). IEEE, pp 1–7

  7. Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294

    Article  Google Scholar 

  8. Kapuscinski T, Oszust M, Wysocki M (2013) Recognition of signed dynamic expressions observed by tof camera. In: 2013 signal processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp 291–296

  9. Kerautret B, Lachaud J (2014) Meaningful scales detection: an unsupervised noise detection algorithm for digital contours. Image Process Line 4:98–115

    Article  Google Scholar 

  10. Kerautret B, Lachaud J, Said M (2012) Meaningful thickness detection on polygonal curve. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods: ICPRAM. vol 1. SciTePress, pp 372–379

  11. Kry PG, Pai DK (2006) Interaction capture and synthesis. In: ACM Transactions on Graphics (TOG). vol 25. ACM, pp 872–880

  12. Lachaud J (2010) Digital shape analysis with maximal segments. In: International Workshop on Applications of Discrete Geometry and Mathematical Morphology. Springer, p p14–27

  13. Le QK, Pham CH, Le TH (2012) Road traffic control gesture recognition using depth images. IEIE Trans Smart Process Comput 1(1):1–7

  14. Liang H, Yuan J, Thalmann D (2014) Parsing the hand in depth images. IEEE Trans Multimed 16(5):1241–1253

    Article  Google Scholar 

  15. Lv Z (2013) Wearable smartphone: Wearable hybrid framework for hand and foot gesture interaction on smartphone. In: Proceedings of the IEEE international conference on computer vision workshops, pp 436–443

  16. Lv Z, Esteve C, Chirivella J, Gagliardo P (2017) Serious game based personalized healthcare system for dysphonia rehabilitation. Pervasive Mob Comput 41:504–519

    Article  Google Scholar 

  17. Lv Z, Halawani A, Feng S, Li H, Réhman SU (2014) Multimodal hand and foot gesture interaction for handheld devices. ACM Trans Multimed Comput Communications Appl (TOMM) 11(1s):1–19

    Article  Google Scholar 

  18. Lv Z, Halawani A, Feng S, Ur Réhman S, Li H (2015) Touch-less interactive augmented reality game on vision-based wearable device. Pers Ubiquit Comput 19(3):551–567

    Article  Google Scholar 

  19. Mitra S, Acharya T (2007) Gesture recognition: A survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324

    Article  Google Scholar 

  20. Nasser H, Ngo P, Debled-Rennesson I (2018) Dominant point detection based on discrete curve structure and applications. J Comput Syst Sci 95 (1):177–192

    Article  MathSciNet  Google Scholar 

  21. Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135

    Article  MathSciNet  Google Scholar 

  22. Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135

    Article  MathSciNet  Google Scholar 

  23. Ngo P, Nasser H, Debled-Rennesson I (2015) Efficient dominant point detection based on discrete curve structure. In: International Workshop on Combinatorial Image Analysis (IWCIA), Kolkata, India. Volume 9448 of LNCS, pp 143–156

  24. Ngo P, Nasser H, Debled-Rennesson I, Kerautret B (2016) Adaptive tangential cover for noisy digital contours. In: Discrete Geometry for Computer Imagery - 19th IAPR International Conference, DGCI 2016, Nantes, France. Volume 9647 of LNCS, p p439–451

  25. Nguyen TP, Debled-Rennesson I (2011) A discrete geometry approach for dominant point detection. Pattern Recogn 44(1):32–44

    Article  Google Scholar 

  26. Paul S, Basu S, Nasipuri M (2015) Microsoft kinect in gesture recognition: A short review. Int J Control Theory Appl 8(5):2071-2076

  27. Paul S, Bhattacharyya A, Mollah AF, Basu S, Nasipuri M (2019) Hand segmentation from complex background for gesture recognition. In: Emerging Technology in Modelling and Graphics. Springer Singapore, p 775–782

  28. Paul S, Nasser H, Nasipuri M, Ngo P, Basu S, Debled-Rennesson I (2017) A statistical-topological feature combination for recognition of isolated hand gestures from kinect based depth images. In: 18th international workshop on combinatorial image analysis (IWCIA). Springer LNCS, pp 256–267

  29. Plouffe G, Cretu AM (2015) Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans Instrumen Measur 65(2):305–316

    Article  Google Scholar 

  30. Qin S, Zhu X, Yang Y, Jiang Y (2014) Real-time hand gesture recognition from depth images using convex shape decomposition method. J Signal Process Syst 74(1):47–58

    Article  Google Scholar 

  31. Ren Z, Meng J, Yuan J (2011) Depth camera based hand gesture recognition and its applications in human-computer-interaction. In: Communications and Signal Processing (ICICS) 2011 8th International Conference on Information. IEEE, pp 1–5

  32. Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using kinect sensor. IEEE Trans Multimed 15(5):1110–1120

    Article  Google Scholar 

  33. Reveillès JP (1991) Géométrie discrète, calculs en nombre entiers et algorithmique. Thèse d’état. Université Louis Pasteur, Strasbourg

  34. She Y, Wang Q, Jia Y, Gu T, He Q, Yang B (2014) A real-time hand gesture recognition approach based on motion features of feature points. In: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering. IEEE Computer Society, pp 1096–1102

  35. Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moor e R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  36. Suarez J, Murphy RR (2012) Hand gesture recognition with depth images: a review. In: 2012 IEEE RO-MAN. IEEE, pp 411–417

  37. Wang C, Liu Z, Chan SC (2015) Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans Multimed 17(1):29–39

    Article  Google Scholar 

  38. Wu Y, Lin J, Huang TS (2005) Analyzing and capturing articulated hand motion in image sequences. IEEE Trans Pattern Anal Mach Intell 27 (12):1910–1922

    Article  Google Scholar 

  39. Yang MH, Ahuja N, Tabb M (2002) Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans Pattern Anal Mach Intell 24(8):1061–1074

    Article  Google Scholar 

  40. Zhang C, Yang X, Tian Y (2013) Histogram of 3d facets: A characteristic descriptor for hand gesture recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, pp 1–8

Download references

Acknowledgements

This work is partially supported by three grants from the Govt. of India, namely, grant no. SR/WOS-A/ET-1001/2015, grant no. EMR/2016/007213 of the Department of Science and Technology (DST) within the Ministry of Science and Technology, and grant no. BT/PR16356/BID/7/596/2016 of the Department of Biotechnology and Rashtriya Uchchatar Shiksha Abhiyan (RUSA) from the Department of Higher Education, Govt. of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumi Paul.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Theory of Discrete Contours

In this section, we recall a method of contour simplification based on selected dominant points. They are computed from a discrete structure, named adaptive tangential cover (ATC) reported by [21, 24], which is well adapted to analyze irregular noisy contours.

A.1 Adaptive Tangential Cover [21, 24]

An adaptive tangential cover (ATC) is composed of a sequence of maximal straight segments, called maximal blurred segments, of the studied contour. The

notion of maximal blurred segment has been introduced by [3] as an extension of arithmetical discrete line presented by [33] with a width parameter for noisy or disconnected digital contours.

Definition 1

An arithmetical discrete line \({\mathcal D}(a,b,\mu ,\omega )\), with a main vector (b,a), a lower bound μ and an arithmetic thickness ω (with \(a,b,\mu ,\omega \in \mathbb {Z}\) and gcd(a,b) = 1) is the set of integer points (x,y) verifying μaxby < μ + ω.

Definition 2

A set Sf is a blurred segment of width ν if the discrete line \({\mathcal D}(a,b,\mu ,\omega )\) containing Sf has the vertical (or horizontal) distance \(d=\frac {\omega -1}{\max \limits {(\mid a \mid , \mid b\mid )}}\) equal to the vertical (or horizontal) thickness of the convex hull of Sf, and dν (see Fig. 9a).

Fig. 9
figure 9

(a) Example of arithmetical discrete line \({\mathcal D}(2 , -3 , -5 , 5)\) (grey and orange points) and a blurred segment of width ν = 1.4 (orange points) bounded by D. (b) Maximal blurred segment of width ν = 1.4 (orange points) - color figure online

Let C be a discrete curve and Ci,j a sequence of points of C indexed from i to j. Let denote the predicate “Ci,j is a blurred segment of width ν” as BS(i,j,ν).

Definition 3

Ci,j is called a maximal blurred segment (MBS) of width ν and denoted MBS(i,j,ν) iff BS(i,j,ν), ¬BS(i,j + 1,ν) and ¬BS(i − 1,j,ν) (see Fig. 9b).

An ATC is designed to capture the local noise on curve by adjusting the thickness of maximal blurred segments in according with the amount of noise present along the contour. In order to prevent the local perturbation, the meaningful thickness [9, 10] estimator is integrated in the construction of ATC as a noise detector at each point of the curve. This meaningful thickness is used as an input parameter to compute the ATC with appropriate widths w.r.t. noise. A non-parametric algorithm is developed in [21, 24] to compute the ATC of a given discrete curve. In the ATC, the obtained MBS decomposition of various widths transmits the noise levels and the geometrical structure of the given discrete curve (see Fig. 10a, c).

Fig. 10
figure 10

(a, c): adaptive tangential cover, (b, d): polygonal representation (in red) using the dominant points and polygonal simplification results (in green) - color figure online

A.2 Polygonal simplification [23,24,25]

Using the ATC, we detect the points of local maximum curvature, called dominant points (DP), on the digital curve. Such points contain a rich information which allows to characterize and describe the curve. Issued from the dominant point detection proposed in [23, 25] and the notion of ATC, an algorithm is developed in [24] to determine the dominant points of a given noisy digital curve C. The main idea is that the candidate dominant points are localized in the common zones of successive MBS of the ATC of C.

Then, by using a simple measure of angle m, we can identify the dominant point as point having the smallest angle. More precisely, this measure m is the angle between the considered point and the two left and right endpoints of the left and right MBS involved in the studied common zone. When the considered point varies, m becomes a function of it. A dominant point is defined as a local minimum of m. ATCs are illustrated in Fig. 10a, c, and Dominant points are illustrated in Fig. 10b, d in red points. Red lines represent the polygonal representation of the shape.

First goal of finding the dominant points is to have an approximate description of the input curve, called polygonal simplification. However, due to the nature of the tangential cover, dominant points usually stay very close to each others, which is presumably undesirable in particular for polygonal simplification. So, we associate to each detected dominant point a weight, i.e, the ratio of integral sum of square errors and the angle with the two dominant point neighbours, indicating its importance with respect to the approximating polygon of the curve. Polygonal simplification is illustrated in Fig. 10b, d with green lines.

B Database Images

B.1 Image Database JU_V2_DIGIT

We have collected 1000 images of 0 to 9 from 10 people. For each digit we have collected 10 orientations. To create this dataset, first we have created a small dataset collection tool. After setting up Kinect with the system, this tool helps us to save images on a single click of the save button.

As at runtime, we are able to locate the hand position, so on the click of the save button, we are able to save an RGB image of the whole scene, a depth image of the whole scene, an RGB image with hand region annotated in it, a cropped RGB image with only hand, cropped depth values of only hand, and hand and wrist depth values.

We use depth values here for our further experimentation. First, we convert our depth values to a depth image. Then we threshold the depth image to extract the Region of Interest and extracting the contours. Our further experimentation involves contour based feature extraction. Details are given in the main paper.

In Figs. 111213141516171819 and 20, we present some sample images from this database. Image ID Pi_Gj_k means that it corresponds to the i-th person, the j-th gesture and the k-th orientation.

Fig. 11
figure 11

Image ID: P1_G0_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 12
figure 12

Image ID: P1_G1_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 13
figure 13

Image ID: P1_G2_5, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 14
figure 14

Image ID: P1_G3_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 15
figure 15

Image ID: P1_G4_10, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 16
figure 16

Image ID: P1_G5_1, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 17
figure 17

Image ID: P1_G6_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 18
figure 18

Image ID: P1_G7_6, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 19
figure 19

Image ID: P1_G8_6, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 20
figure 20

Image ID: P1_G9_10, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

B.2 Image Database JU_V2_ALPHA

We have collected another set of alphabetic images with the same tool mentioned above. Here we have captured from a to z (excluding j and z as they are dynamic) 24 static images from 10 people. Each image has 10 orientation. So all together there are 2400 images in the dataset.

In Fig. 2122232425262728293031323334353637383940414243 and Fig. 44, we present some sample images from this database. Image ID Pi_α_k means that it corresponds to the i-th person, the gesture α and the k-th orientation.

Fig. 21
figure 21

Image ID: P1_a_1, a RGB Full Image with Hand Annotation b Depth Full Scene Image c RGB Crop Image d Depth Crop Image e Depth Threshold Image f Contour Image of Hand

Fig. 22
figure 22

Image ID: P1_b_6, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 23
figure 23

Image ID: P1_c_5, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 24
figure 24

Image ID: P1_d_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 25
figure 25

Image ID: P1_e_7, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 26
figure 26

Image ID: P1_f_9, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 27
figure 27

Image ID: P1_g_7, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 28
figure 28

Image ID: P1_h_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 29
figure 29

Image ID: P1_i_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 30
figure 30

Image ID: P1_k_8, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 31
figure 31

Image ID: P1_l_10, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 32
figure 32

Image ID: P1_m_2, a RGB Full Image with Hand Annotation b Depth Full Scene Image c RGB Crop Image d Depth Crop Image e Depth Threshold Image f Contour Image of Hand

Fig. 33
figure 33

Image ID: P1_n_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 34
figure 34

Image ID: P1_o_2, a RGB Full Image with Hand Annotation b Depth Full Scene Image c RGB Crop Image d Depth Crop Image e Depth Threshold Image f Contour Image of Hand

Fig. 35
figure 35

Image ID: P1_p_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 36
figure 36

Image ID: P1_q_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 37
figure 37

Image ID: P1_r_8, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 38
figure 38

Image ID: P1_s_4, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 39
figure 39

Image ID: P1_t_3, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 40
figure 40

Image ID: P1_u_10, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 41
figure 41

Image ID: P1_v_9, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 42
figure 42

Image ID: P1_w_10, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 43
figure 43

Image ID: P1_x_2, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Fig. 44
figure 44

Image ID: P1_y_2, (a) RGB Full Image with Hand Annotation (b) Depth Full Scene Image (c) RGB Crop Image (d) Depth Crop Image (e) Depth Threshold Image (f) Contour Image of Hand

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, S., Nasser, H., Mollah, A.F. et al. Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled. Multimed Tools Appl 81, 7285–7321 (2022). https://doi.org/10.1007/s11042-021-11745-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11745-8

Keywords

Navigation