Abstract
Reliable hand gesture recognition is extremely relevant for automatic interpretation of sign languages used by people with hearing and speech disabilities. In this work, we present (i) new benchmark datasets of depth-sensor based, multi-oriented, isolated and static hand gestures of numerals and alphabets following the conventions of American Sign Language (ASL), (ii) an effective strategy for segmentation of hand region from depth data and appropriate preprocessing for feature extraction, and (iii) an effective statistical-geometrical feature set for recognition of multi-oriented hand gestures. Besides setting benchmark performances on the developed datasets, viz. 97.67%, 96.53% and 96.86% on numerals, alphabets and alpha-numerals respectively, the proposed pipeline is also implemented on two related public datasets and is found superior to state-of-the-art methods reported so far.
Similar content being viewed by others
References
Bai X, Latecki LJ (2008) Path similarity skeleton graph matching. IEEE Trans Pattern Anal Mach Intell 30(7):1282–1292
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Debled-Rennesson I, Feschet F, Rouyer-Degli J (2006) Optimal blurred segments decomposition of noisy shapes in linear time. Comput Graph 30 (1):30–36
Dewaele G, Devernay F, Horaud R (2004) Hand motion from 3d point trajectories and a smooth surface model. In: European Conference on Computer Vision. Springer, pp 495–507
Dinh DL, Lee S, Kim TS (2016) Hand number gesture recognition using recognized hand parts in depth images. Multimed Tools Appl 75 (2):1333–1348
Geetha M, Manjusha C, Unnikrishnan P, Harikrishnan R (2013) A vision based dynamic gesture recognition of indian sign language on kinect based depth images. in: 2013 international conference on emerging trends in communication, control, signal processing and computing applications (C2SPCA). IEEE, pp 1–7
Jadooki S, Mohamad D, Saba T, Almazyad AS, Rehman A (2017) Fused features mining for depth-based hand gesture recognition to classify blind human communication. Neural Comput Appl 28(11):3285–3294
Kapuscinski T, Oszust M, Wysocki M (2013) Recognition of signed dynamic expressions observed by tof camera. In: 2013 signal processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp 291–296
Kerautret B, Lachaud J (2014) Meaningful scales detection: an unsupervised noise detection algorithm for digital contours. Image Process Line 4:98–115
Kerautret B, Lachaud J, Said M (2012) Meaningful thickness detection on polygonal curve. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods: ICPRAM. vol 1. SciTePress, pp 372–379
Kry PG, Pai DK (2006) Interaction capture and synthesis. In: ACM Transactions on Graphics (TOG). vol 25. ACM, pp 872–880
Lachaud J (2010) Digital shape analysis with maximal segments. In: International Workshop on Applications of Discrete Geometry and Mathematical Morphology. Springer, p p14–27
Le QK, Pham CH, Le TH (2012) Road traffic control gesture recognition using depth images. IEIE Trans Smart Process Comput 1(1):1–7
Liang H, Yuan J, Thalmann D (2014) Parsing the hand in depth images. IEEE Trans Multimed 16(5):1241–1253
Lv Z (2013) Wearable smartphone: Wearable hybrid framework for hand and foot gesture interaction on smartphone. In: Proceedings of the IEEE international conference on computer vision workshops, pp 436–443
Lv Z, Esteve C, Chirivella J, Gagliardo P (2017) Serious game based personalized healthcare system for dysphonia rehabilitation. Pervasive Mob Comput 41:504–519
Lv Z, Halawani A, Feng S, Li H, Réhman SU (2014) Multimodal hand and foot gesture interaction for handheld devices. ACM Trans Multimed Comput Communications Appl (TOMM) 11(1s):1–19
Lv Z, Halawani A, Feng S, Ur Réhman S, Li H (2015) Touch-less interactive augmented reality game on vision-based wearable device. Pers Ubiquit Comput 19(3):551–567
Mitra S, Acharya T (2007) Gesture recognition: A survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324
Nasser H, Ngo P, Debled-Rennesson I (2018) Dominant point detection based on discrete curve structure and applications. J Comput Syst Sci 95 (1):177–192
Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135
Ngo P, Debled-Rennesson I, Kerautret B, Nasser H (2017) Analysis of noisy digital contours with adaptive tangential cover. J Math Imaging Vis 59(1):123–135
Ngo P, Nasser H, Debled-Rennesson I (2015) Efficient dominant point detection based on discrete curve structure. In: International Workshop on Combinatorial Image Analysis (IWCIA), Kolkata, India. Volume 9448 of LNCS, pp 143–156
Ngo P, Nasser H, Debled-Rennesson I, Kerautret B (2016) Adaptive tangential cover for noisy digital contours. In: Discrete Geometry for Computer Imagery - 19th IAPR International Conference, DGCI 2016, Nantes, France. Volume 9647 of LNCS, p p439–451
Nguyen TP, Debled-Rennesson I (2011) A discrete geometry approach for dominant point detection. Pattern Recogn 44(1):32–44
Paul S, Basu S, Nasipuri M (2015) Microsoft kinect in gesture recognition: A short review. Int J Control Theory Appl 8(5):2071-2076
Paul S, Bhattacharyya A, Mollah AF, Basu S, Nasipuri M (2019) Hand segmentation from complex background for gesture recognition. In: Emerging Technology in Modelling and Graphics. Springer Singapore, p 775–782
Paul S, Nasser H, Nasipuri M, Ngo P, Basu S, Debled-Rennesson I (2017) A statistical-topological feature combination for recognition of isolated hand gestures from kinect based depth images. In: 18th international workshop on combinatorial image analysis (IWCIA). Springer LNCS, pp 256–267
Plouffe G, Cretu AM (2015) Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Trans Instrumen Measur 65(2):305–316
Qin S, Zhu X, Yang Y, Jiang Y (2014) Real-time hand gesture recognition from depth images using convex shape decomposition method. J Signal Process Syst 74(1):47–58
Ren Z, Meng J, Yuan J (2011) Depth camera based hand gesture recognition and its applications in human-computer-interaction. In: Communications and Signal Processing (ICICS) 2011 8th International Conference on Information. IEEE, pp 1–5
Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using kinect sensor. IEEE Trans Multimed 15(5):1110–1120
Reveillès JP (1991) Géométrie discrète, calculs en nombre entiers et algorithmique. Thèse d’état. Université Louis Pasteur, Strasbourg
She Y, Wang Q, Jia Y, Gu T, He Q, Yang B (2014) A real-time hand gesture recognition approach based on motion features of feature points. In: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering. IEEE Computer Society, pp 1096–1102
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moor e R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Suarez J, Murphy RR (2012) Hand gesture recognition with depth images: a review. In: 2012 IEEE RO-MAN. IEEE, pp 411–417
Wang C, Liu Z, Chan SC (2015) Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans Multimed 17(1):29–39
Wu Y, Lin J, Huang TS (2005) Analyzing and capturing articulated hand motion in image sequences. IEEE Trans Pattern Anal Mach Intell 27 (12):1910–1922
Yang MH, Ahuja N, Tabb M (2002) Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans Pattern Anal Mach Intell 24(8):1061–1074
Zhang C, Yang X, Tian Y (2013) Histogram of 3d facets: A characteristic descriptor for hand gesture recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, pp 1–8
Acknowledgements
This work is partially supported by three grants from the Govt. of India, namely, grant no. SR/WOS-A/ET-1001/2015, grant no. EMR/2016/007213 of the Department of Science and Technology (DST) within the Ministry of Science and Technology, and grant no. BT/PR16356/BID/7/596/2016 of the Department of Biotechnology and Rashtriya Uchchatar Shiksha Abhiyan (RUSA) from the Department of Higher Education, Govt. of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Theory of Discrete Contours
In this section, we recall a method of contour simplification based on selected dominant points. They are computed from a discrete structure, named adaptive tangential cover (ATC) reported by [21, 24], which is well adapted to analyze irregular noisy contours.
A.1 Adaptive Tangential Cover [21, 24]
An adaptive tangential cover (ATC) is composed of a sequence of maximal straight segments, called maximal blurred segments, of the studied contour. The
notion of maximal blurred segment has been introduced by [3] as an extension of arithmetical discrete line presented by [33] with a width parameter for noisy or disconnected digital contours.
Definition 1
An arithmetical discrete line \({\mathcal D}(a,b,\mu ,\omega )\), with a main vector (b,a), a lower bound μ and an arithmetic thickness ω (with \(a,b,\mu ,\omega \in \mathbb {Z}\) and gcd(a,b) = 1) is the set of integer points (x,y) verifying μ ≤ ax − by < μ + ω.
Definition 2
A set Sf is a blurred segment of width ν if the discrete line \({\mathcal D}(a,b,\mu ,\omega )\) containing Sf has the vertical (or horizontal) distance \(d=\frac {\omega -1}{\max \limits {(\mid a \mid , \mid b\mid )}}\) equal to the vertical (or horizontal) thickness of the convex hull of Sf, and d ≤ ν (see Fig. 9a).
Let C be a discrete curve and Ci,j a sequence of points of C indexed from i to j. Let denote the predicate “Ci,j is a blurred segment of width ν” as BS(i,j,ν).
Definition 3
Ci,j is called a maximal blurred segment (MBS) of width ν and denoted MBS(i,j,ν) iff BS(i,j,ν), ¬BS(i,j + 1,ν) and ¬BS(i − 1,j,ν) (see Fig. 9b).
An ATC is designed to capture the local noise on curve by adjusting the thickness of maximal blurred segments in according with the amount of noise present along the contour. In order to prevent the local perturbation, the meaningful thickness [9, 10] estimator is integrated in the construction of ATC as a noise detector at each point of the curve. This meaningful thickness is used as an input parameter to compute the ATC with appropriate widths w.r.t. noise. A non-parametric algorithm is developed in [21, 24] to compute the ATC of a given discrete curve. In the ATC, the obtained MBS decomposition of various widths transmits the noise levels and the geometrical structure of the given discrete curve (see Fig. 10a, c).
A.2 Polygonal simplification [23,24,25]
Using the ATC, we detect the points of local maximum curvature, called dominant points (DP), on the digital curve. Such points contain a rich information which allows to characterize and describe the curve. Issued from the dominant point detection proposed in [23, 25] and the notion of ATC, an algorithm is developed in [24] to determine the dominant points of a given noisy digital curve C. The main idea is that the candidate dominant points are localized in the common zones of successive MBS of the ATC of C.
Then, by using a simple measure of angle m, we can identify the dominant point as point having the smallest angle. More precisely, this measure m is the angle between the considered point and the two left and right endpoints of the left and right MBS involved in the studied common zone. When the considered point varies, m becomes a function of it. A dominant point is defined as a local minimum of m. ATCs are illustrated in Fig. 10a, c, and Dominant points are illustrated in Fig. 10b, d in red points. Red lines represent the polygonal representation of the shape.
First goal of finding the dominant points is to have an approximate description of the input curve, called polygonal simplification. However, due to the nature of the tangential cover, dominant points usually stay very close to each others, which is presumably undesirable in particular for polygonal simplification. So, we associate to each detected dominant point a weight, i.e, the ratio of integral sum of square errors and the angle with the two dominant point neighbours, indicating its importance with respect to the approximating polygon of the curve. Polygonal simplification is illustrated in Fig. 10b, d with green lines.
B Database Images
B.1 Image Database JU_V2_DIGIT
We have collected 1000 images of 0 to 9 from 10 people. For each digit we have collected 10 orientations. To create this dataset, first we have created a small dataset collection tool. After setting up Kinect with the system, this tool helps us to save images on a single click of the save button.
As at runtime, we are able to locate the hand position, so on the click of the save button, we are able to save an RGB image of the whole scene, a depth image of the whole scene, an RGB image with hand region annotated in it, a cropped RGB image with only hand, cropped depth values of only hand, and hand and wrist depth values.
We use depth values here for our further experimentation. First, we convert our depth values to a depth image. Then we threshold the depth image to extract the Region of Interest and extracting the contours. Our further experimentation involves contour based feature extraction. Details are given in the main paper.
In Figs. 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20, we present some sample images from this database. Image ID Pi_Gj_k means that it corresponds to the i-th person, the j-th gesture and the k-th orientation.
B.2 Image Database JU_V2_ALPHA
We have collected another set of alphabetic images with the same tool mentioned above. Here we have captured from a to z (excluding j and z as they are dynamic) 24 static images from 10 people. Each image has 10 orientation. So all together there are 2400 images in the dataset.
In Fig. 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 and Fig. 44, we present some sample images from this database. Image ID Pi_α_k means that it corresponds to the i-th person, the gesture α and the k-th orientation.
Rights and permissions
About this article
Cite this article
Paul, S., Nasser, H., Mollah, A.F. et al. Development of benchmark datasets of multioriented hand gestures for speech and hearing disabled. Multimed Tools Appl 81, 7285–7321 (2022). https://doi.org/10.1007/s11042-021-11745-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11745-8