1 Introduction

Physical movement can express emotions, attitudes, and intentions non-verbally and is known as gestures. In addition, hand gestures are other forms of physical movements used to express feelings. Hand gesture recognition (HGR) is the process of detecting and interpreting hand movements and gestures made by a user. This technology is used in various fields, such as computer vision, human–computer interaction, and gaming. The recognition is typically done through image processing and machine learning algorithms, which analyze the camera’s visual information to identify and classify the gestures. The process of hand gesture recognition involves several steps, including image acquisition, pre-processing, feature extraction, and classification. In [1], the authors present a gesture recognition system based on extracting frames from the 3D positions and velocities of the fingers. These data feed a Long Short-Term Memory (LSTM) and a Convolutional Neural Network (CNN), the authors store data from six gestures.

In [2] presents an architecture they call KTSL, based on the combination of hand position, direction, and shape to represent the meaning of sign language. The Kinect device captures the data. This data feeds Hidden Markov Model classification algorithms and Support Vector Machines. In [3] presents a static and dynamic gesture recognition system based on Dynamic Time Warping. To define the motion trajectory, they also present a composition model called strokelets. In [4], they propose automatic feature extraction. In addition, they propose four manual extraction techniques such as angular point number, angular point series number percentage, subarea percentage, and aspect ratio. These features are fed to the Hidden Markov Model.

In this context, HGR is not a trivial problem because it is viewed as a pattern recognition problem [5, 6]. Also, the HGR problem is challenging to solve using mathematical or statistical models. Because to use mathematical models is necessary to know the complete problem comportment [7]. In contrast, statistical models need to know all variables and their comportment. For this reason, it is feasible to try to solve the HGR problem using deep learning methods or machine learning algorithms.

In this context, deep learning methods can address the HGR problem because these methods extract features automatically. However, deep learning methods have become complex to use because HGR systems need portability. In addition, deep learning methods require a large computational load and programming of complex models. The problem of the computational load has been solved by using GPUs. And the portability problem is solved using portable GPUs. However, portable GPUs require high power consumption, which is against portability.

In this sense, we need to return to traditional machine learning algorithms such as artificial neural network (ANN), support vector machine (SVM), K-nearest neighbors (KNN), decision trees (DT), among others. Machine learning algorithms typically operate in multidimensional environments, where data is represented as high-dimensional feature vectors. These feature vectors capture various data characteristics and represent the data in a format that the machine learning algorithm can analyze. By working in a multidimensional environment, machine learning algorithms can capture complex relationships and patterns within the data that can be used for tasks such as classification, regression, clustering, and dimensionality reduction. The machine learning models use feature selection and feature extraction as dimensionality reduction technics. Feature extraction is a process that transforms the original dataset into a reduced number of features. In contrast, feature selection finds the element that gives us the most information about the problem. According to [9], the performance of a machine learning model is directly related to the number of input variables. According to [10] the excess of variables can reduce the model performance because there may be a high correlation between variables. In the same sense, for the best description of the problem, it is necessary to identify the variables with the most significant description of the problem. The definition of the variables with the highest descriptive load for the problem is not a trivial task. This is the reason why a lot of feature selection methods have been in development. These methods use some selection criteria, such as redundancy or data relevance, among others.

Feature selection methods are grouped into the filter, wrapper, and embedded methods. Filter methods present a score based on correlation, mutual information, or the classifier’s performance on a single variable. This method has no relation to the learning phase. Wrapper methods present the relevance of a subset of features that best predict how a machine learning model performs. Finally, embedded methods incorporate the selection of features into the training phase in algorithms such as DT [11].

In [13,14,15], the authors select the features retrieved by the leap motion controller. They present them as feature selection. These features are the position and the orientation of the fingers and the hands. In addition, the authors use functions such as the mean, the standard deviation, the correlation, the Shannon entropy, the kurtosis and the skewness as feature extraction. On the other hand, [16] presents a feature extraction of fingertip angles, fingertip distance, fingertip height, fingertip position, and over these features use f-value, Sequential feature selection, random forest feature for its study. On the basis of this study, [17] mentions that the reduction of the number of features to a reasonable number is necessary, then they sort the more significant features with the f-score algorithm. The f-score calculates the ratio of the variance between classes and the variance within classes. Also, [18] uses the Gaussian Mixture Model to select features. However, they mention structured sparsity inducing and genetic algorithms as feature selection algorithms for images. Feature selection has been widely used in the medical field, especially in the field of genetics [19, 20]. Also, [21] uses some feature selection algorithms to define the best features for the description of Parkinson’s disease (PCA, SVM, consistency, J48, filtered subset evaluation, information gain, gain ratio, chi-square) of Weka.

In this context, and within the framework of the filtering methods of the feature selection, we analyze the method of the maximum relevance minimum redundancy (MRMR) and the sequentials. In the same sense, for wrapper methods, we analyzed neighbors component analysis with lambda parameters (NCAp) and neighbors component analysis without parameters (NCAsp). Finally, for embedded methods, we analyze the Relief-F and Decision Tree (DT) methods for hand gesture classification and recognition. These methods are applied to a data set consisting of M observations and a set of 17 features. The features are obtained by applying 17 feature extraction functions to a raw data set retrieved from the Leap motion controller. Also, we use machine learning algorithms such as ANN, SVM, K-NN, and DT for validating the performance of feature selection methods. Each algorithm defines the accuracy of classification and recognition, the time processing of classification tests, and the accuracy difference between training and testing.

This analysis proposal is presented because, in the field of hand gesture recognition using signals from the leap motion controller and machine learning algorithms, no guidance could be found in the scientific literature to decide on a priori feature selection methods to achieve high accuracy. In addition, many proposed methods are just a starting point for other studies.

In addition, the importance of the present work lies in the attempt to achieve a high recognition accuracy because, in practical terms, the output of recognition systems is used as input for other systems, such as home automation, human–computer interaction, in the medical field, among others. It can also be used as a classification system in rehabilitation. The classification system would label a healthy hand or a hand with problems. In the same context, these systems could run on embedded hardware with low computational power. However, reducing dimensionality sacrifices accuracy but gains computational power.

2 Objectives

  • Define the best set of feature extraction functions to achieve maximum accuracy for hand gesture classification and recognition using feature selection methods such as MRMR, Sequentials, NCAp, NCAsp, Relief-F, and DT.

  • To analyze the behavior of feature selection methods using classification, recognition accuracy, and processing time with different machine learning algorithms.

The rest of the paper is organized as follows: Sect. 3 presents the overview of work, the dataset building, features extractions and feature selection functions, and machine learning algorithms. The experimentation and result section presents the combination of feature extraction functions according to the ranking of feature selection methods for each machine learning algorithm and shows comparative tables of experimentation results. Finally, the paper presents conclusions and discussion.

3 Methodology

In this section, we describe the general overview of the work, building the dataset, feature extraction functions, feature selection functions, and machine learning algorithms.

3.1 General Work Overview

This work uses the spatial positions and directions retrieved by the Leap Motion Controller (LMC). The LMC represents the position of the fingertips at time t using the matrix.

$${\mathbf{P}}_{t}=[{p}_{\left(1,t\right)}^{\left(x\right)}, {p}_{\left(1,t\right)}^{\left(y\right)},{p}_{\left(1,t\right)}^{\left(z\right)}{; . . . ;{p}_{\left(5,t\right)}^{\left(x\right)}, {p}_{\left(5,t\right)}^{\left(y\right)},{p}_{\left(5,t\right)}^{\left(z\right)}]}_{t}^{\left(leap\right)},$$

being \([{p}_{(i,t)}^{(x)}, {p}_{(i,t)}^{(y)},{p}_{(i,t)}^{(z)}]\) the vector with the spatial positions of the ith finger with respect to the sensor coordinate axes.

The directions of the fingertips at time t are represented using the matrix:

\({\mathbf{D}}_{t}=[{d}_{\left(1,t\right)}^{\left(x\right)}, {d}_{\left(1,t\right)}^{\left(y\right)},{d}_{\left(1,t\right)}^{\left(z\right)}{; . . . ;{d}_{\left(5,t\right)}^{\left(x\right)}, {d}_{\left(5,t\right)}^{\left(y\right)},{d}_{\left(5,t\right)}^{\left(z\right)}]}_{t}^{\left(leap\right)},\)

being \([{d}_{(i,t)}^{(x)}, {d}_{(i,t)}^{(y)},{d}_{(i,t)}^{(z)}]\) the vector with the directions of the ith finger with respect to the sensor coordinate axes.

In this paper, we use six feature selection methods. The methods used are MRMR, Sequential, NCAp, NCAsp, Relief-F, and DT. In the present paper, the movement of the hand is represented by the sum of the values of the spatial positions. In this sense, the data are structured as a matrix of MxN, where M is the total observations and N is the features that describe the movement.

Next, we define 17 feature extraction functions. Initially, the functions have a random order. These statistical functions describe the central tendency and scatter measures, among others. Each feature extraction function is applied to the matrix data, and the matrix Mx17 is obtained. In this sense, each data obtained by the feature extraction function represents a predictor.

Next, the feature selection methods such as MRMR, Sequential, NCAp, NCAsp, Relief-F, and DT are applied over the matrix Mx17, where each column represents a feature extraction function. Each method is applied over the matrix Mx17, as each column represents a feature extraction function. This process returns a matrix where the number represents the function used to extract the feature and its score. We sorted the functions according to their scores. We fed the machine learning algorithm. The way to feed machine learning algorithms is to add one feature extraction function at a time. In the first step, we use the feature extractor function with the highest score. Then we add the first and second function with the better score, and so on. Each feature is added according to the score report until the algorithms are fed with the combination of all 17 feature extractor functions. The machine learning algorithms used are ANN, SVM, KNN, and DT. All processes are shown in Fig. 1.

Fig. 1
figure 1

General evaluation scheme for feature selection and feature extraction functions

3.2 Data Set Building

The dataset was built using 56 people voluntaries of Universidad Técnica de Ambato between students and teachers, women, and men, aged 18–46 years old. None of the volunteers had injuries to the right upper extremity. In addition, information such as ethnicity, occupation, and e-mail are requested for each participant. The number of lumens perceived in the environment in which the data is collected is also entered. Also, for each proposed gesture, the position, direction, and velocity in X, Y, and Z of the palm and each of the fingers are recorded. The device used to acquire data was LMC. We use this device because it is small and cheaper. It is specialized to track the hand. The LMC has three LED sensors and two depth cameras. Also, this device retrieves the spatial positions, directions, and velocity of hands and fingers according to the coordinate axis, which center is the center of the device [23]. Figure 2 shows the data acquisition process using the Leap Motion Controller.

Fig. 2
figure 2

Data acquisition using Leap Motion Controller

Besides, each user develops five gestures. These gestures are Open Hand, Fist, Wave In, Wave Out, and Pinch, as shown in Fig. 3. The user can perform the hand movement representing the gesture at any time during the 5 s [24].

Fig. 3
figure 3

Types of hand gestures

Each user repeats 30 times each gesture. In this sense, the dataset comprises 1680 observations of each gesture, 8400 observations total. The LMC has a sampling frequency of 200 Hz/s. But as our dataset saves images too, the sampling frequency reduces to 70 Hz. In this context, the dataset has 8400 × 70. Each dataset instance has five fingers and three channels X, Y, and Z data [23].

3.3 Feature Extraction Functions

The accuracy of models directly depends on the number of features. And in the proposed approach, we define 17 functions for feature extraction. These functions are based on statistical measures of central tendency, scatter, amplitude, wavelength, etc. The following describes the feature extraction functions used.

Variance (VAR) measures the signal amplitude and the power, Root Mean Square (RMS) is a meaningful way of calculating the average of values over a period of time, Mean Absolute Value (MAV) define the average of the summation of absolute value of signal, Enhanced Mean Absolute Value (EMAV) is an extension of MAV define a p value this value is used to select a region of the signal, Modified Mean Absolute Value (MMAV) is an extension of MAV this assign the weight window function, Modified Mean Absolute Value 2 (MMAV2) is another extension of MAV feature by assigning the continuous weight window function, Difference Absolute Standard Deviation Value (DASDV) is the square root of the average of the difference between the squared contiguous values, Enhanced Wavelength (EWL) is an extension of WL define a p value this value is used to select a region of the signal, Average Amplitude Change (AAC) measures the average change of the signal amplitude, Wavelength (WL) can be calculated by simplifying the cumulative length of waveform summation, Slope Sign Change (SSC) determines the number of times in which the number of wave form changes sign, Detector Log (DL) is good at estimating the exerted force, Pulse Percentage Rate (MYOP) this function is adapted by leap motion controller signal, amplitude Willinson (WAMP) acts as an indicator of the firing of motor unit potentials. Simple Square Integral (SSI) is defined as the summation of square values of signal amplitude, Standard Deviation (SD), Mean Value (MV) [25].

3.4 Feature Selection Functions

The feature selection methods used in the development of this paper are described in this section. Feature selection methods are necessary when working with machine learning, as described in the previous sections. In general, machine learning models present large feature vectors. The large size of the feature vectors is associated with a large dimensionality of the problem. In this context, feature selection is used as a dimensionality reduction technique. This process allows to rank the features according to their maximum relevance by Pattern Recognition community in the last decades [1, 18] and graph isomorphism is its most restrictive form requiring the mapping between the nodes of the two graphs while preserving node adjacency as well as non-adjacency.

3.4.1 Filter Methods

The minimal-redundancy maximum-relevance algorithm belongs to the family of wrapper methods. The algorithm finds an optimal set of mutually and maximally exclusive features and can effectively represent the response variable. The algorithm quantifies redundancy and relevance. Also, the algorithm observes the dependence of a pair of features, and the dependence of a variable against the response variable [19]. The algorithm for maximizing relevance looks at the dependence of a variable against the response variable, as presented in Eq. (1) [26].

$${V}_{s}=\frac{1}{|S|}{\sum }_{x\in S}I\left(x,y\right)$$
(1)

where \({V}_{s}\) represents the maximization value, \(|S|\) represents the set of features or predictor variables, \(\left(x,y\right)\) represents the predictor variable and the response variable, respectively.

While for the minimization, the dependence between the pair of predictor variables is observed, as presented in Eq. (2) [26].

$${W}_{s}=\frac{1}{{|S|}^{2}}{\sum }_{x,z\in S}I\left(x,z\right)$$
(2)

where \({W}_{s}\) represents the value of minimization of the dependence between the predictor variables, and \(\left(x,z\right)\) represents the predictor variables.

The algorithm returns an index and an associated score. The score defines the importance of the predictor variable. Also, the algorithm uses a heuristic algorithm to determine the score.

In this context, the MRMR algorithm is fed with data extracted from the functions in the following order: MAV, EMAV, MMAV, MMAV, MMAV2, VAR, RMS, DASDV, SD, MV, ACC, WL, EWL, LD, SSC, MYOP, WA, SSI. Once processed, the algorithm returns the score of the most significant predictor variables, as shown in Fig. 4.

Fig. 4
figure 4

Order of feature extraction functions after running feature selection functions with the algorithm MRMR

The order of the variables corresponds to VAR, SSC, EWL, SD, WA, WL, LD, DASDV, EMAV, MYOP, MAV, ACC, MMAV, SSI, MMAV2, MV, RMS.

Sequential selects a subset of features from the data matrix X that best predicts the data in y. Where X is a matrix data and y are the classes. This method sequentially selects features until there is no improvement in prediction. The prediction of Sequential is based on a function that defined the criterion that permits selecting the best subset of features. The method executes two processes: first split the dataset in training and testing, and second, the method executes a cross-validation process. In this case, the function sums the values returned by the function and divides that sum by the total number of test observations. It then uses that mean value to evaluate each candidate feature subset. In this sense, the method views the number of misclassified observations for classification.

After computing the mean criterion values for each candidate feature subset, the method chooses the candidate feature subset that minimizes the mean criterion value. This process continues until adding more features does not decrease the criterion.

In this paper, the method is fed with the data matrix Mx17, the values of the features correspond to the values of the randomly ordered feature selection functions. Table 1 presents in the first column the initial functions, while the second column presents the ordered feature selection functions based on the score returned by the method.

Table 1 Score of feature extraction functions using Sequential method

3.4.2 Wrapper Methods

The neighbor component analysis algorithm is a non-parametric method used to perform feature selection and belongs to the wrapper method family. It learns the weights from the original features input data space \(\mathbf{X}=\{({\mathbf{x}}_{i},{y}_{i}),...,({\mathbf{x}}_{n},{y}_{n})\}\) using a diagonal adaptation of NCAp. Also, it uses the distance metric to find a linear transformation of the features that maximize the average classification accuracy.

The process of NCAp is similar to KNN where k is equal to one. KNN is based on a distance metric between near neighbors. In this sense, NCAp also has a distance metric, and for measuring the distance, NCA does select a \({\mathbf{x}}_{i}\) randomly as a reference point to others features vectors \({\mathbf{x}}_{j}\). It obtains \({d}_{ij}=({\mathbf{x}}_{i},{\mathbf{x}}_{j})\), where \(d\) represents the distance. Moreover, NCA does use different parameters for adjusting the classification accuracy value. The parameters are regularization value (lambda), optimization function, fit method, standardized process [27].

In this work, for finding the best lambda value, we generate an array of 50 values equally spaced. The array values are from 0 with steps of 3. Each value multiplies the standard deviation of the total observation number. And it is split for total observations numbers. Then, for each value of the array running the NCA algorithm. And to avoid bias, we use cross-validation with k-fold equal to five. Also, this function use as an optimization function stochastic gradient descent. Finally, we present the order of the most significant feature extractor functions reported according to Table 2.

Table 2 Score of feature extraction functions using NCAp with parameters

Also, A second model is generated using NCAsp only with the observations and classes as parameters. This model takes as input only the matrix of observations (dataSetFeatures) and their respective labels y. This process obtains a score of the most significative predictor functions. Table 3.

Table 3 Score of feature extraction functions using NCAsp without parameters

3.4.3 Embedded Methods

Another algorithm used in this work for feature selection is Relief-F. This algorithm ranks the features by finding the weights of the most significant predictors. Relief-F works similarly to the kNN algorithm. In this sense, this algorithm depends on the k value. We will work with two values from k, with k equal to three and with k equal to five. It is necessary to mention that when k tends to one, the values of the weights are not reliable [28].

Relief-F initially starts with weights in zero. Then, randomly select an observation, and concerning this observation, find the nearest observation according to the k value and its classes. This process is iterative.

The Eq. (3) in [26] updates the weights when the class of the selected observation and the predicted observation class are the same.

$${\mathbf{w}}_{j}^{i}={\mathbf{w}}_{j}^{i-1}-\frac{{\Delta }_{j}\left({\mathbf{x}}_{r}.{\mathbf{x}}_{q}\right)}{m}.{d}_{rq}$$
(3)

where \({\mathbf{x}}_{r},\,and\,{\mathbf{x}}_{q}\) are feature vectors labels in the same class.

Similarly, it is necessary to update the weights when the features vectors \({\mathbf{x}}_{r},\,\mathrm{ and }\,{\mathbf{x}}_{q}\) belong to classes with different labels. According to Eq. (4) [26].

$${\mathbf{w}}_{j}^{i}={\mathbf{w}}_{j}^{i-1}+\frac{{p}_{{y}_{q}}}{1-{p}_{{y}_{r}}}.\frac{{\Delta }_{j}\left({\mathbf{x}}_{r}.{\mathbf{x}}_{q}\right)}{m}.{d}_{rq}$$
(4)

where \({\mathbf{w}}_{j}^{i}\) represent the weights of the predictor for \(i\mathrm{th}\) iteration. \({p}_{{y}_{r}}\) represent a priori probability of the class which \({\mathbf{x}}_{r}\) belongs. While \({p}_{{y}_{q}}\) represent a priori probability of the class which \({\mathbf{x}}_{q}\) belongs. Besides, \(m\) represents the iterations number given for updating the weights. Finally, \({\Delta }_{j}\left({\mathbf{x}}_{r}.{\mathbf{x}}_{q}\right)\) is a difference in the predictors’ value between the observations \({\mathbf{x}}_{r},\,\mathrm{ and }\,{\mathbf{x}}_{q}\).

This study works with k equal to five. The algorithm is fed with the feature extractor functions without a specific order. After applying the algorithm, we obtain the following ranking. LD, SSC, EWL, MYOP, DASDV, MMAV, ACC, WL, EMAV, SD, MAV, MV, RMS, SSI, VAR, MMAV2, WA. It is shown in Fig. 5.

Fig. 5
figure 5

Ranking of most important feature using Relief-F with k = 5

Using DT, the definition of induction tells us that this task consists of extracting implicit general knowledge from particular observations and experiences.

In DT learning, the hypothesis space is the set of all predictor attributes or features. The DT induction task consists of finding the tree that best fits the available, already classified, example data. For each class, a branch is found in the tree that satisfies the conjunction of attribute values represented by the branch.

When generating a DT a crucial element is the method of attribute selection, which determines which criteria are used to generate the different branches of the tree, which determine the classification into the different classes.

Attribute selection is based on the calculation of the entropy value and the information gain. Table 4 shows the order of the feature extraction functions based on the selection of the most relevant attribute using DT.

Table 4 Order of feature extraction functions using decision tree

3.5 Machine Learning Algorithms

For proving the score of feature selection functions reported by studied algorithms. This study uses four machine learning algorithms, two parametric algorithms, ANN and SVM. And two non-parametric algorithms such as KNN and DT.

ANN is a mathematical model used for predicting systems performance. ANN was developed according to human brain functionality. Additionality, its architecture is similar to biological human neuron layers. Also, it is a non-deterministic algorithm due to the random weight’s initialization. The ANN is widely used by it has high parallelism, fault and noise tolerance, and its capabilities of learning and generalization. [29].

The ANN architecture is formed by an input layer, one or more hidden layers, and output layers. The input layer receives the data retrieved from the environment. The hidden layers are composed of neurons or nodes. Each node has an activation function and processes the sum of the multiplied inputs by the weights. The output layer reports class 0 or class 1 if the ANN is binary, but if ANN is multiclass reports an array of probabilities, where each probability value corresponds to one category.

SVM is another algorithm of machine learning. It is used for classification or regression and belongs to the supervised learning family. The researchers use SVM because exist many hyperplanes that could split the classes. In this sense, SVM introduces the concept of a maximum margin classifier. Also, SVM is a deterministic algorithm and would avoid local minima. Additionally, it is a binary algorithm, and for it to be considered multiclass is necessary to apply techniques such as one vs. one or one vs. all. In the same sense, SVM faces two issues when the classes are linearly separable and non-linearly separable [30].

SVM uses linear functions such as \(\mathbf{w}{\mathbf{x}}_{i}+b\ge 1\). where \(\mathbf{w}\) and \(\mathbf{x}\) are vectors. The vectors closer to the hyperplane is called support vector, and the line touching the support vectors is considered the decision boundary. The distance between the two decision boundaries of a hyperplane is called the margin.

KNN is called a lazy algorithm. It is a non-parametric method because it does not involve parameter adjustment or estimation. Also, it is a deterministic algorithm. KNN identifies dynamically k observations of a dataset that are similar to a new observation. To identify similarities KNN uses a distance metric. The distances metrics could be Euclidean distance, Minkowski, or Mahalanobis distance between others. Also, it uses the Dynamic Time Warping (DTW) method [31, 32].

In this sense, KNN defines the neighbor number according to the k value. Besides, KNN represents a higher similarity when the distance between samples is small, so they are highly likely to belong to the same label [33].

DT is an algorithm used in machine learning for classification. DT divides the dataset into training and testing instances. Also, this algorithm selects one attribute from a set of training instances. The training is based on the calculation of the entropy and the gain of information. Next, build the DT using the training instance and the chosen attribute. Where each internal node tests an attribute \({x}_{i}\), each branch assigns an attribute value, and each leaf gives a class. The accuracy of the DT is measured with the testing instances. To classify a new input, traverse the tree from root to leaf and assigns the labeled [34].

4 Experiments

The experimentation was carried out in Alienware computer with six cores, and twelve logical processors of 3.4 GHz core i7 of sixth generations, 32 Gb of RAM, and Windows 10. The design of the experiments is based on the dataset described in the previous section. In all experiments, we measure the accuracy of training and testing of classification and testing of recognizing the hand gestures. The gestures were described in the last section. Also, we measure the time of processing.

In this sense, the acquired data are organized into cells {1,5}, where each cell contains each finger’s X, Y, and Z channels, and each channel contains 70-time instants. The 70-time instants are limited: if less than 70 frames are obtained during the sampling time, an interpolation is performed with the extracted data completed to 70 frames. In the same way, if the frames obtained exceed 70 frames, they are delimited, eliminating redundant data with an equal division, observing that the shape of the original signal does not change. Besides, the data on each channel are normalized between 0 and 1. In addition, the signal is filtered using a Butterworth filter with a sampling frequency of 70 Hz and a cutoff frequency of 18 Hz. This cutoff frequency makes the signal smooth without losing its original shape.

For experimentation, we use the signal composed of spatial positions and directions of the fingers. The algorithms for classification and recognition only use data from the three fingers, thumb, middle and pink. Also, each of these fingers presents data of three channels X, Y, and Z. Besides, we use cross-validation with a k fold equal to five. The technique used to extract the data and feed it to the classifiers is window splitting. The signal is divided into windows of 20 with a stride of 15, and each data window is delivered to the classifier, which returns a label. Finally, we have a vector of labels. And by majority vote, it returns the label the most time is repeated. Additionally, before giving the data to the algorithms, the data are mixed randomly. In this context, the algorithms used are ANN, SVM, KNN, and DT.

In this work, we use a feedforward ANN with two hidden layers. The first hidden layer uses ReLU as an activation function with 25 neurons. The second hidden layer use logsig as an activation function and 15 neurons. The input of ANN is the number of features according to the number of combinations of feature selection functions. As optimization function use cross-entropy a gradient descends as a technic of weights adjust. Additionally, ANN uses 2000 iterations and a regularization factor of 1.0e1.

Also, the SVM classifier needs to set up its hyperparameters as the kernel and scale. To this work, the kernel is gaussian, and the scale is of order ten. In the same sense, KNN was set up with k equal to three and Euclidean distance as a metric. Besides, DT use information gain technic for building the tree. Also, this algorithm for training uses 100 levels of deep.

In this context, we evaluate the combination feature extraction functions according to the score given by MRMR, Sequential, NCAp with adjusting parameters such as cross-validation, regularization, and the optimization function to sgd. Also, this work evaluates NCAsp algorithm without parameters. Also, it evaluates the Relief-F feature selection algorithm with k equal to five. Finally, we evaluate DT using entropy and gain information.

The experimentation consisted of ordering the feature extractor functions according to the score provided by each feature selection method. Then, the dataset is divided into windows. The windows are 20 with jumps of 15. To each window is applied the function with the highest score reported, and at the end, a vector of labels is obtained. The label selected is by majority vote. Then the same process is executed, but with the feature selection function with the highest score and the second one, and so on, until all feature selection functions are combined. This process is repeated 10 times, and finally, a 10 × 17 matrix is obtained, where each cell corresponds to the accuracy. From where the average of the 10 experimentations is obtained, and then the maximum accuracy value is reported. Obtained the index of the maximum value that corresponds to the number of combinations of feature selector functions, as presented in the following pseudocode.

figure a

Table 5 shows the summary of the experiments. It shows the number of combinations represented by the variable idx. The maximum accuracy and the standard deviation of each model trained, tested, and recognized in each algorithm are also shown. In the same sense, the time taken to complete is shown.

Table 5 Summary of the maximum training, testing, and recognition accuracy of feature selection methods evaluated in classification algorithms

Figure 6 shows the testing accuracy grouped by each method and evaluated for each proposed classification algorithm. In addition, the number of combinations of feature extractor functions with which the methods reach their maximum value is presented.

Fig. 6
figure 6

Summary of the maximum training, testing, and recognition accuracy of feature selection methods evaluated in classification algorithms. Standard deviation, number of feature combinations, and processing time

As can be seen, the difference in the reported accuracies is very small, and the standard deviation values overlap. This tells us that there is no significant difference between the methods in the algorithms evaluated. This leads to performing a hypothesis testing statistic to determine significant differences and define the method, the combination of feature selection functions, and the algorithm that best-performing return for the hand gesture recognition problem with Leap Motion Controller signals.

Figure 7 shows how all the methods evaluated in ANN have the highest accuracy value. In addition, ANNs present variability due to the stochastic resetting of the weights.

Fig. 7
figure 7

The boxplot shows the variability of the data and the algorithms with the highest accuracy values

We filter the methods evaluated by ANN for the following analysis. The filtering of the methods evaluated in each algorithm is shown in Fig. 8.

Fig. 8
figure 8

The figure presents the evaluation of the accuracy of the feature selection methods, with the number of combinations of features grouped by the classification algorithm

The differences in the accuracies that are shown are small. The number of combinations of feature selection functions used to achieve these accuracies would be the analysis factor to demonstrate the difference (Fig. 8).

However, the statistical analysis is presented. In fact, there is no significant difference between the Sequential and DT methods, which are the most accurate. However, Sequential uses a combination of 14 feature selector functions, while DT uses six functions. This leads to analyzing the response time of the algorithms with each of the methods (Fig. 9).

Fig. 9
figure 9

The figure presents the evaluation of the accuracy of the feature selection methods, with the number of combinations of features grouped by the classification algorithm

In this sense the processing time is a decision factor when choosing a feature selection method in the context of hand gesture recognition using infrared signals emitted by the Leap Motion Controller (Table 6).

Table 6 Presentation of the difference in processing time, due to the number of combined feature selection functions

5 Conclusion

This paper analyzes feature selection methods such as MRMR, Sequential, NCAp, NCAsp, Relief-F, and DT. These methods report a score that represents the significance of the features. In addition, we propose a hand gesture recognition model to validate the significance of the features. The model consists of five modules such as: data acquisition, pre-processing, feature extraction, classification, and post-processing. In the feature extraction module, the proposed methods are applied. While in the classification module, ANN, SVM, K-NN, and DT are used. The data set used for this study consists of five gestures. These gestures are open hand, fist, wave in, wave out, and pinch. The dataset contains 1680 observations for each gesture, totaling 8400. The input data is an 8400 × 17 matrix. The 17 predictors are formed by computing the functions MAV, EMAV, MMAV, MMAV2, VAR, RMS, DASDV, SD, MV, ACC, WL EWL, LD, SSC, MYOP, WA, SSI. The result of the evaluation of the feature selection methods shows that all methods perform better with ANN. This is because the classification and recognition accuracy are the highest with respect to the other machine learning methods such as SVM, K-NN, and DT. However, the differences in the accuracy of the feature selection methods evaluated in an ANN are insignificant because the standard deviations overlap. In this context, a test statistic is generated to determine whether the feature selection methods differ statistically. The study shows that the Sequential feature selection method, with an accuracy of 93.079%, and DT, with an accuracy of 93.055%, do not have significant differences. In this context, the response time is evaluated because Sequential combines 14 feature extractor functions to achieve the maximum accuracy value. At the same time, DT combines six feature extractor functions. The reported time is measured in milliseconds. In addition, the time is measured after the user finishes executing the gesture until the algorithm returns a response or label. In this sense, the execution time of the combination of feature extractor functions presented by Sequential evaluated on an ANN is 67.5646 milliseconds. While under the same conditions, the time reported by the combination of feature extractor functions presented by DT is 44.7923 milliseconds. The limitation of the presented work would be in the accuracy obtained. Future work would be directed to increase the accuracy value, probably working on post-processing techniques.