Introduction

Hand gesture recognition is observed by the wearable sensor device based on the movements of the hand. The moves indicate the action to perform by the user and provide a response to the human [1]. It mostly used in the field of medical applications by utilizing the Internet of Medical Things (IoMT). The gesture recognition which is observed by the sensor are matched with the IoMT and provides the results. Using IoMT, the diagnosis is easy and accurate [2, 3]. Also, the information is secure to handle [4, 5]. Therefore, the medical data can be quickly sent and received from the healthcare IoMT system. The medical devices are connected to the network for communicating in machine-to-machine [6]. It is mainly used to monitor a remote user who is suffering from prolonging diseases by tracking the patient’s health record [7]. The elderly patients cannot regularly visit the hospital; therefore, for these constrains, the IoMT can be used for remote patients. It acts as the interface between the patient and the healthcare; also, it can recognize the hand gesture and provides the service [8].

In the healthcare environment, the hand gesture is monitored by placing a sensor on the wrist and tracks the movements. Also, the hand gesture can be observed by detecting the single finger flexion to observe the actions of the finger [7, 9]. A pre-defined knowledge is necessary for identifying the hand gesture, because every movement has different meaning [10]. The gestures seem to be similar for finger movements, and it must be correctly identified and resolve. Hand gesture involves the action and rest state of hand and matches the results with the IoMT [11]. In this recognition process, the position, action, and movements of the fingers and hands are required for analysis. The non-verbal communication is obtained in the form of images or video from this format; the recognition is performed [12]. In healthcare, the hand gesture is mostly applicable for the deaf and dumb people; even and paralyzed person communicates through the non-verbal form. Hand gesture recognition is essential for Human–Computer Interaction (HCI) in a smart healthcare system. Using HCI, communication is performed among the patients and the healthcare environment [8, 11, 13]. The artificial intelligence is performed the healthcare systems [14].

The hand gesture is detected by introducing specific algorithms, it segments the hand and classifies the features. Similar movements are easily identified and resolved using databases [15]. In some algorithms, the detection is performed by a prediction process based on the pre-defined class set. All the methodology uses different types of detection using the healthcare gesture database [16]. The user can use the gesture to interact with the device without any physical touch that is observed by developing many mathematical algorithms [17]. The computers can understand gesture recognition by providing the bridge between the machine and human. Deep learning is mostly used to improve the classifications in healthcare [18, 19]. The human–robot interaction is used to analyze the patient’s gesture using a sensor device [8, 13]. Wearable sensors are recently used in different medical applications [20, 21]. Two types of gestures are detected. The first is the online gesture that includes direct handling with objects. The second is an offline gesture that is applicable after the interaction of the user and machine [8, 15].

The proposed system aims to detect the hand gesture for both the nearby and far away patients using IoMT. In this paper, deep learning is used to achieve a better correlation of hand gestures for remote patients.

Related works

Bargellesi et al. [22] proposed a hand gesture recognition using random forest by wearable motion capture sensor. The gesture features were extracted at different time intervals to analyze the movements of hands. The experimental evaluation is performed using the dataset of gestures’ collections.

Cho et al. [23] introduced a personalized classifier to improve the gesture recognition contactless operating room. A computer-aided surgical procedure was presented for the classification of the user gesture. Support vector machine (SVM) and Naïve Bayes classifiers were used to predict the result.

Zhao et al. [24] developed hand gesture recognition for healthcare. A convolutional neural network (CNN) was proposed to classify the hand gesture without any noise. A MobiGesture was introduced for segmentation to improve the precision and recall based on the user gesture.

Tavakoli et al. [25] developed a double surface EMG on a wearable sensor. The gesture recognition is classified using support vector machine (SVM). Four types of gestures were classified using 2 EMGs channels on flexors and extensors. The work aimed to avoid the fault tolerance of the gesture.

Zhao et al. [26] proposed continuous gesture control in Healthcare called ultigesture. It was a combination of diverse application interface. It used to provide comfortable daily wear and affordable for large scale. The ultigesture wrist band acquired both the hardware and software to control the remote access.

Zhang et al. [27] designed EMG armband to identify human gestures based on physiological characters. Hand gesture recognition for wearing-independent was implemented. The gesture recognitions obtained different scale by features of EMG. The recognitions were improved by the unified signal. A random forest used to decide the proceeding and pursuing of gestures.

Li et al. [28] modeled spatial–temporal graph convolutional networks to analyze the dynamic hand gesture. A skeleton-based model was proposed by including three types of edges in the graph linked to the action of hand joints. For obtaining an accurate result, the deep neural network was used to select semantic features.

Alonso et al. [29] presented a string matching to analyze the hand gesture in real-time scenarios. An approximate string matching (ASM) was implemented to encode the characteristics of hand joints by introducing the k-mean algorithm. It was performed by denoting the number of clusters to enhance the accuracy of various types of gestures.

Zhang et al. [30] developed a convolutional pose machine (CPM) and Fuzzy Gaussian mixture models (FGMM) to recognize the hand gesture. The initial stage was proposed to acquire the hand key points using the CPM. Then, the FGMM was used to eliminate the non-gesture and classified them according to the vital point.

Tam et al. [31] designed a real-time fine gesture identifying system using convolution neural network. The system was fabricated for detecting muscle contraction in forearm through frequency–time–space cross-domain pre-processing method.

Li et al. [32] presented spatial fuzzy matching (SFM) in leap motion to improve the recognition of hand gesture. The pairing of gesture was applied by comparing the fused gesture dataset where the frames of gestures were classified. Then, the SFM was used to motivate the fast processing of analysis.

Kopuklu et al. [33] presented online dynamic hand gesture recognition to enhance the efficiency of the gesture analysis. The method was used due to three reasons which are no indication for the gesture start and complete; the reorganization identified once at a time; and analyzed gesture should be under the memory and budget. The two-level hierarchy structure (TLHS) using CNN encourages gesture classification and detection.

Tai et al. [34] implemented continuous hand gesture detection by introducing long short-term memory (LSTM). The analysis was applied on accelerators and/or gyroscopes to sense the input gesture. The output was obtained based on LSTM, which was achieved by efficiently classifying the resultant gesture.

Ryu et al. [35] proposed a temporal feature analysis based on hand gesture. The features were acquired by frequency-modulated continuous wave (FMCW) radar system. A quantum-inspired evolutionary algorithm (QEA) was introduced to correlate the gestures based on minimum redundancy maximum relevance (mRMR).

Dependable gesture recognition (DGR)

A wearable sensor is placed on the wrist of the patient to analyze the movements and assist them accordingly. Hand gesture recognition is an essential step to detect the patient’s requirements and support them. The hand gesture is classified into two analyses. The first detection is obtained on wrist gesture. The second detection is single finger flexion. In this paper, the objective is detecting the local and far infrastructure.

As shown in Fig. 1, the proposed DGR is a system model consisting of a healthcare center to monitor the hand gestures using handcam and wearable bands. This healthcare environment assists remote monitoring to the end-users through the Internet of Things (IoT). The end-user mobile application is responsible for viewing and detecting the gesture signals and converting them into information. The detection of the rest or the movement of hand gesture can be represented as follows:

$$ \partial = \mathop \sum \limits_{\beta }^{{\beta_{0} }} \left\{ {\begin{array}{*{20}c} {\sqrt {\frac{{g_{0} }}{h\prime }} \times \left( {\alpha \to m_{0}^{\prime } } \right) = r^{\prime}} \\ {\sqrt {\frac{{g_{0} }}{h\prime }} \times \left( {\alpha \to m_{0}^{\prime } } \right) \ne r^{\prime}} \\ \end{array} } \right., $$
(1)

where \(\beta \user2{ }\) and \(\beta_{0}\) are starting and allocated time of gestures, \(\partial\) represents the detection of movements, \(m_{0}^{{\prime }}\) is the observed from the sensor, and \( \alpha\) is the action applied by hand \( h^{\prime}\) gesture \(g_{0}\). Also, the moving or resting \(r^{\prime}\) of patient's hand is analyzed by deriving for both cases. The first case is \( \sqrt {\frac{{g_{0} }}{h^{\prime}}} \times \left( {\alpha \to m_{0}^{\prime } } \right) = r^{\prime}\), it indicates the gesture is not moving; therefore, it denotes as rest. The other case \( \sqrt {\frac{{g_{0} }}{{h^{\prime}}}} \times \left( {\alpha \to m_{0}^{{\prime }} } \right) \ne r^{\prime}\), the motion is not in the rest position; therefore, it denotes as moving mode. In this manner, the action and rest states are detected by the two analyses of the hand as follows:

$$ m_{0}^{\prime } = \frac{{\partial \left( {g_{0} } \right)}}{{\sqrt {{{c_{{{n}}} } \mathord{\left/ {\vphantom {{c_{{{n}}} } {o^{\prime}}}} \right. \kern-\nulldelimiterspace} {o^{\prime}}}} }}\left\{ {\begin{array}{*{20}c} {\left. {\begin{array}{*{20}c} {\left[ {\mathop \prod \limits_{\beta }^{{\beta_{0} }} \frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{{h^{\prime}}} \left( {r^{\prime} - o^{\prime}} \right)}}{{g_{{\text{f}}} }}} \right] = w_{0} } \\ {\left[ {\mathop \prod \limits_{\beta }^{{\beta_{0} }} \frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{{h^{\prime}}} \left( {r^{\prime} - o^{\prime}} \right)}}{{g_{{\text{f}}} }}} \right] \ne w_{{\text{o}}} } \\ \end{array} } \right\}} \\ {\left. {\begin{array}{*{20}c} {\left( {\frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{{o^{\prime}}} h^{\prime}\left( {g_{0} } \right)}}{{\mathop \sum \nolimits_{\alpha } r^{\prime}}}} \right) \times \left( {\beta - \beta_{0} } \right) = i_{0} } \\ {\left( {\frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{{o^{\prime}}} h^{\prime}\left( {g_{0} } \right)}}{{\mathop \sum \nolimits_{\alpha } r^{\prime}}}} \right) \times \left( {\beta - \beta_{0} } \right) \ne i_{0} } \\ \end{array} } \right\}} \\ \end{array} } \right.. $$
(2)
Fig. 1
figure 1

IoMT system model for hand gesture recognition

As shown in Eq. (2), the analysis is detected for the hand gesture, whether it is on wrist or finger movements. It is associated with two stages, the first indicates \({\text{at}}\; \left[ {\mathop \prod \limits_{\beta }^{{\beta_{0} }} \frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{h\prime } \left( {r^{\prime} - o^{\prime}} \right)}}{{g_{{\text{f}}} }}} \right] = w_{{\text{o}}}\). In this case, the time of motion is identified and noted. The detection of hand gesture is analyzed as \( \partial \left( {g_{{\text{o}}} } \right)\). The continuous and fixed movements are noted as \( c_{{{n}}} \;{\text{and}}\; g_{{\text{f}}}\), respectively. The first stage is equal to the wrist; it denotes that the wrist-based detection is performed. In another case, it is not equal to the wrist; therefore, detection is not promptly observed, and it does not satisfy the stage.

The second stage of detection is associated with finger-based analysis. The first case states \( \left( {\frac{{\mathop \sum \nolimits_{{c_{{{n}}} }}^{{o^{\prime}}} h^{\prime}\left( {g_{0} } \right)}}{{\mathop \sum \nolimits_{\alpha } r^{\prime}}}} \right) \times \left( {\beta - \beta_{0} } \right) = i_{0}\). In this case, the fixed and oscillation \(o^{\prime}\) of the finger are denoted and applied on timely based. Therefore, the starting and allocated time are represented as \( \beta - \beta_{0}\). The result is equal to \(i_{{\text{o}}}\) which denotes the finger. It satisfies the first case. The second case is not equal to finger movements. For this case, there are monitored on time-based and observing the patient’s requirements.

By applying this classification, the state of the patient can be analyzed, and the gesture is appeared on the wrist or finger. Based on the previous analysis, only the essential features are extracted and matched from the database. Figure 2 illustrates the gesture detection process.

Fig. 2
figure 2

Gesture detection process

In this paper, the patient’s gestures are detected from a nearby or far away attendant. The remote patients are analyzed in local infrastructure accessed by the nearby patients. The adjacent patient’s detection is evaluated as follows:

$$ \gamma \left( {g_{0} } \right) = \mathop \sum \limits_{o^{\prime}}^{{m_{0}^{\prime } }} \left[ {\frac{{\mathop \prod \nolimits_{{g_{0} }}^{\partial } \left( {c_{{{n}}} + o^{\prime}} \right)}}{{{\raise0.7ex\hbox{${g_{{\text{f}}} \left( \alpha \right)}$} \!\mathord{\left/ {\vphantom {{g_{{\text{f}}} \left( \alpha \right)} {r^{\prime}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r^{\prime}}$}}}} \times \mathop \sum \limits_{{w_{0} }}^{{i_{0} }} m_{0}^{\prime } \left( {h^{\prime}} \right)} \right]. $$
(3)

From Eq. (2), both the wrist and finger are detected, and it is observed using Eq. (3). In Eq. (3), \(\gamma\) represents nearby patients’ gesture which is whether at motion or at rest state. The wrist and finger movements are analyzed and stored in the database for future detection. The other condition of gesture analyses is performed for far away patients to associate using IoMT and observed as follows:

$$ \rho = \mathop \prod \limits_{{m_{0}^{{\prime }} }}^{{c_{{{n}}} }} \left( {\mathop \int \limits_{{h^{\prime}}}^{{r^{\prime}}} \partial + \left[ {\frac{{\mathop \sum \nolimits_{\alpha }^{\Delta } \frac{{w_{0} }}{{i_{0} }}}}{{\beta - \beta_{0} }}} \right]} \right). $$
(4)

As shown in Eq. (3), the nearby patients’ state is analyzed. Equation (4) is used to evaluate the patients in far away \(\rho\) and gives appropriate measurement regarding the hand gesture. The integration of both the detection is examined. The results obtained the observing of the patient’s gesture. The derivation of Eqs. (3) and (4) can be rewritten as follows:

$$ \gamma \left( \rho \right)^{{g_{0} }} = \left\{ {\begin{array}{*{20}l} {\frac{{\mathop \prod \nolimits_{{g_{0} }}^{\partial } \left( {c_{{{n}}} + o^{\prime}} \right)}}{{{\raise0.7ex\hbox{${g_{{\text{f}}} \left( \alpha \right)}$} \!\mathord{\left/ {\vphantom {{g_{{\text{f}}} \left( \alpha \right)} {r^{\prime}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r^{\prime}}$}}}} + \left( {\beta - \beta_{0} } \right)} \\ {\left[ {\frac{{\mathop \sum \nolimits_{\alpha }^{\Delta } \frac{{w_{0} }}{{i_{0} }}}}{{\beta - \beta_{0} }}} \right]} \\ \end{array} ,} \right. $$
(5)

where \(\gamma\) represents the nearby patients’ gesture, and Δ is the gesture-related output value. By evaluation of Eqs. (35) is observed to detect the nearby patient on the local infrastructure. The other case is associated with the IoMT if the patient is far away. The proposed method is used to denote both the cases, and it is addressed using deep learning algorithms. Based on the two classifications, the movement gestures of the patients are stored depending on the normal and emergency state. It is useful to derive the gesture from the database and match along with the other gesture in IoMT and local infrastructure. The storing of gesture classification is derived as follows:

$$ \delta = \left\{ \begin{gathered} 1,\quad {\text{if}}\quad \frac{{\left[ {\frac{{\mathop \sum \nolimits_{{g_{{\text{f}}} }}^{{c_{{{n}}} }} \left( {r^{\prime} + m_{0}^{\prime } } \right)}}{{\mathop \sum \nolimits_{\beta }^{{\beta_{0} }} \partial }}} \right] + \left( {{\raise0.7ex\hbox{${\Delta \left( {g_{0} } \right)}$} \!\mathord{\left/ {\vphantom {{\Delta \left( {g_{0} } \right)} {o^{\prime}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${o^{\prime}}$}}} \right)}}{{\left( {\gamma + \rho } \right)}} \hfill \\ 0, \quad {\text{otherwise}} \hfill \\ \end{gathered} \right.. $$
(6)

From Eq. (5) and based on Eq. (6), the detection is obtained for the remote patients using local and IoMT. In Eq. (6), \(\delta\) represents the classification of gesture, if condition denotes on the fixed time how many gestures are sensed are denoted. The gesture in emergency movements is analyzed by matching process applied on the database.

If the emergency gesture actioned by the patient, the attention is given, else the standard gesture is captured. The sensor is placed on the wrist to detect the action by every specific time interval. Also, the remote patients are analyzed by matching the gesture with the other gesture. Based on the time interval, the gesture is sensed and analyzed on IoMT to improve the smart healthcare system and reduces mess detection.

By having a specific meaning for every movement, mess detection is applied to address the wrong analysis of gesture. The fast detection is not achieved as the way of the patient movements is varying for every action, and it is examined with pre-defined knowledge. The gesture matching for the preceding and pursuing gesture is derived as follows:

$$ g_{0} = \left\{ {\begin{array}{*{20}c} {\left. {\begin{array}{*{20}c} {\mathop \prod \limits_{\partial }^{{m_{0}^{\prime } }} \left( {\frac{\gamma + \rho }{{i_{0} + w_{0} }}} \right) + \Delta = x_{0} } \\ {\mathop \prod \limits_{\partial }^{{m_{0}^{\prime } }} \left( {\frac{\gamma + \rho }{{i_{0} + w_{0} }}} \right) + \Delta \ne x_{0} } \\ \end{array} } \right\}} \\ {\left. {\begin{array}{*{20}c} {\mathop \sum \limits_{\beta }^{{\beta_{0} }} \left[ {\delta + \frac{\alpha \to \Delta }{{o^{\prime} - r^{\prime}}}} \right] = y_{0} } \\ {\mathop \sum \limits_{\beta }^{{\beta_{0} }} \left[ {\delta + \frac{\alpha \to \Delta }{{o^{\prime} - r^{\prime}}}} \right] \ne y_{o} } \\ \end{array} } \right\}} \\ \end{array} } \right.. $$
(7)

Using Eq. (6), the classification of gesture is derived. Further, the Eq. (7) is examined based on two levels. The first level consists of two sets, the first set is \(\mathop \prod \nolimits_{\partial }^{{m_{0}^{{\prime }} }} \left( {\frac{\gamma + \rho }{{i_{0} + w_{0} }}} \right) + \Delta = x_{0}\). The movements and detection are analyzed. In this case, \(x_{0}\) represents the other gesture, and \( y_{0}\) represents pursuing gesture. Figure 3 illustrates the gesture matching process.

Fig. 3
figure 3

Gesture matching process

The pursuing and preceding gestures are evaluated based on the database matching. The second set does not match the other gesture; therefore, it is eliminated from the process.

The second level is represented as \( \mathop \sum \nolimits_{\beta }^{{\beta_{0} }} \left[ {\delta + \frac{\alpha \to \Delta }{{o^{\prime} - r^{\prime}}}} \right] = y_{0} ,\) where a timely manner is detected for each captured gesture. In this derivation, the classification of gesture is used along with the sensed movements and the result is equal to pursuing gesture \( y_{0}\). Therefore, the second set denoted not equal to pursuing gestures. Based on these two levels, the detection of the coordinates is rewritten as follows:

$$ g_{0} \left( {x_{0} ,y_{0} } \right) = \frac{{\mathop \prod \nolimits_{\partial }^{{m_{0}^{{\prime }} }} \left( {\frac{\gamma + \rho }{{i_{0} + w_{0} }}} \right) + (g_{{\text{f}}} - c_{{{n}}} )}}{{\mathop \sum \nolimits_{\beta }^{{\beta_{0} }} \left[ {\delta + \frac{\alpha \to \Delta }{{o^{\prime} - r^{\prime}}}} \right] \times \gamma \left( \rho \right)^{{g_{0} }} }}. $$
(8)

Equation (8) is evaluated and resulted as the preceding and pursuing gesture analyses. The local infrastructure and IoMT are detected using this fixed and continuous. Also, timely manner gesture is derived and analyzed. Equation (8) is used to observe the preceding and pursuing gestures of the patients. Mess detection is addressed using deep learning.

Hand gesture detection

The deep learning is used for recognizing the hand gesture. In this paper, multi-layer perception (MLP) algorithm is used to decrease the mess detection. MLP uses the input layer to receive the gesture of the patients, and the output is based on the decision or prediction of the preceding movements. It consists of a hidden layer to train the gesture for improving the detection and address the mess. It also uses activation function for mapping the weight of input and output. The activation function can be calculated as follows:

$$ v = \theta \left( {\mathop \prod \limits_{\Delta }^{\beta } \tanh \left( {1 + e^{ - 1} } \right)^{ - 1} } \right). $$
(9)

Using Eq. (8), the preceding and pursuing of gesture are matched with the database. Equation (9) is used to achieve the activation function for linear mapping of input and output gesture, where \((\tanh ) \) is used as the activation function, \(v \) represents the detection of output data, and \(\theta\) denotes the analysis of input and output. The nodes are identified as \( e\), and it is using the weight and bias of the gesture. Gesture weight and bias are analyzed and evaluated as follows:

$$ \theta = \mathop \prod \limits_{\gamma }^{\rho } \frac{{\left[ {\frac{\partial }{{\left( {w_{0} - i_{0} } \right)}}} \right] \times \delta + \left( {a^{\prime} + b^{\prime}} \right)}}{{\mathop \sum \nolimits_{{x_{0} }}^{{y_{0} }} \left( {\beta_{0} - \beta } \right)}}. $$
(10)

Using Eq. (9), the activation function is used for better mapping of input and output. In Eq. (10), the input and output are analyzed. Further, the weight and bias are evaluated as \( a^{\prime} \;{\text{and}}\; b^{\prime }\). The entire captured gestures are assigned the weight and bias to decide for detection. Based on this weight and bias, the gesture is detected. Also, the error is reduced by deriving gradient descent for weight.

$$ \Delta_{{g_{0} }} = \frac{1}{2}\tau + \mathop \sum \limits_{\beta }^{{\beta_{0} }} \left( {\theta \times \left[ {\gamma + \rho } \right]} \right). $$
(11)

From Eq. (10), the weight and bias are estimated, and it is calculated using Eq. (11). The sensed gesture is captured based on a timely manner, and then the input and output are derived. In Eq. (11), \(\tau\) represents the learning rate of deep learning to improve the output. In this paper, two hidden layers are derived as represented in the following Eqs. (12) and (13).

$$ {\mathcal{H}}_{1} = \left\{ {\begin{array}{*{20}l} {\omega_{1} = \left( {\Delta + g_{0} } \right) \times m_{0}^{{\prime }} } \\ {\omega_{2} = \omega_{1} + (r^{\prime} - m_{0}^{{\prime }} ) + \partial } \\ \vdots \\ {\omega_{n} = \omega_{n - 1} + \beta + \left( {w_{0} + i_{0} } \right) - \partial } \\ \end{array} } \right.. $$
(12)

Equation (12) is used to evaluate the first hidden layer \( {\mathcal{H}}\), where \(\omega\) denotes the weight value used to train the gesture-based movements. Figure 4 presents the process illustration of the Hidden layer 1.

Fig. 4
figure 4

Hidden layer 1 process

Initially, the input is obtained and processed, and the error is given as the input for the second layer. It is followed until the n layers for better output in deep learning. The second hidden layer is represented as follows:

$$ {\mathcal{H}}_{2} = \left\{ {\begin{array}{*{20}l} {\omega_{1} = (w_{0} + i_{0} )} \\ {\omega_{2} = \omega_{1} + \left( {\gamma + \rho } \right) \times \delta } \\ \vdots \\ {\omega_{n} = \omega_{n - 1} + \left( {x_{0} + y_{0} } \right) \times \delta } \\ \end{array} } \right.. $$
(13)

In Eq. (13), the second hidden layer is estimated. The learning rate is used along with the preceding and pursuing gesture. Figure 5 presents the processes in Hidden Layer 2.

Fig. 5
figure 5

Hidden layer 2 processes

The layers are processed until the enhanced output is achieved. Using MLP deep learning, it decides the gesture by validating the training sets and uses to obtain efficient matching. It is derived based on the gesture weight and bias while processing the gesture on the hidden layer. The mess detection is evaluated as follows:

$$ g_{0} \left( \partial \right) = \left\{ {\begin{array}{*{20}c} {\mathop \prod \limits_{{m_{0}^{\prime } }}^{{r^{\prime}}} \left( {\gamma + \rho } \right) \times \delta + \theta < \mu } \\ {\mathop \prod \limits_{{m_{0}^{\prime } }}^{{r^{\prime}}} \left( {\gamma + \rho } \right) \times \delta + \theta > \mu } \\ \end{array} } \right.. $$
(14)

Using Eq. (14), the matching process is obtained by deriving the gesture weight and bias. In this equation, two conditions are used to reduce the mess value. The first condition is \(\mathop \prod \nolimits_{{m_{0}^{{\prime }} }}^{{r^{\prime}}} \left( {\gamma + \rho } \right) \times \delta + \theta < \mu\), it indicates that the movements and rest state of gesture are classified. In Eq. (14), the detection of matching \( \mu\) is achieved higher, and the mess is detected accurately. In Fig. 6, the gesture recognition output using the two hidden layers is presented.

Fig. 6
figure 6

Gesture recognition using hidden layers

The second condition is \( \mathop \prod \nolimits_{{m_{0}^{{\prime }} }}^{{r^{\prime}}} \left( {\gamma + \rho } \right) \times \delta + \theta > \mu ,\) where the matching is not achieved accurately, and the mess detection is not satisfied. The aim of this paper is satisfied, and the mess detection and gesture recognition are performed efficiently. The MLP is used to train the input and output. The correlation of the results between them is achieved. By applying this, the remote of patients’ hand gestures in IoMT is addressed, and it is improved using the effective activation and learning function.

Results

The performance of the DGR system is analyzed. The gesture is observed from a wearable band, and hand cam is used for correlating with the outputs in the dataset [36]. The dataset consists of five gesture motions observed from 6 subjects. The text file with the image dataset consists of the type and name of the gesture. The number of considered gestures is 11 that includes wave, thumb, point (forefinger), crossed fingers (fore and middle fingers), fist (close action), activation, left, and right clicks, cursor, half-bend, and forward movement. Similarly, the gestures captured from five positions (p1–p5) including standing, sitting, lying, walking, and running. The band inputs are observed at a frequency of 50 Hz, and the video input rate is obtained at 128 frames/s. Using this experimental setup, 80 gestures of both hands in 5 different positions are considered as inputs. The performance of this method is validated using the metrics precision, mess factor, accuracy, and recognition time [37,38,39]. The proposed DGR is compared with the methods of Utligesture [26], MobiGesture [24], and TLHS-CNN [33] for comparative analysis.

Precision analysis

The precision metric is compared for gesture time and different gestures in Figs. 7 and 8, respectively. The hidden layer processes \( {\mathcal{H}}_{1}\) and \( {\mathcal{H}}_{2}\) are recurrent in identifying the series of \( \left\{ {\theta_{1} , \theta_{2} , \ldots ,\theta_{\rho } } \right\}\) from which the output \(\Delta\) is obtained. The output is examined as \( \gamma \left( \rho \right)^{{g_{{\text{o}}} }}\) using Eq. (5), for detecting the gestures in all the iterates particularly,\( \omega\) based on \( \left( {\Delta_{{g_{{\text{o}}} }} } \right)\). Also, \(\tau\) is used as the training set of all the outputs of \( g_{{\text{o}}} \left( \partial \right)_{2}\) to \( g_{{\text{o}}} \left( \partial \right)_{\rho - 1}\). In all analyses, the coordinate mapping is verified for independent and joint processing of \( x_{{\text{o}}}\) and \( g_{{\text{o}}}\). The \( \partial\) and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) in \( g_{{\text{o}}}\) process is observed with less \( g_{{\text{o}}} \left( \partial \right)\). Both the hidden layer processes are recurrent, and hence, the \( \partial\) for the different states is identified. In the correlation analysis, the error is delicate using gradient descent, and the \( v\) refines the possibilities of matching both the observed and started gestures. The precision for any number of iterates \(\omega\) of the learning process is handled using \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }}\) and \(\Delta_{{{\text{g}}_{{\text{o}}} }}\) estimation in all the \(\Delta\) and \( g_{o} \left( \partial \right)\) to \( g_{o} \left( \partial \right)_{\rho - 1}\) for the successive outputs. Therefore, the estimation factor for correlation accuracy in multiple iterates is high in the proposed system.

Fig. 7
figure 7

Precision for varying gesture time

Fig. 8
figure 8

Precision for varying gestures

Mess factor analysis

The mess factor is identified in two instances, namely \( \rho\) and \( \partial\) in the proposed system. This process is common for both the varying time intervals and gesture in both \( {\mathcal{H}}_{1}\) and \( {\mathcal{H}}_{2}\). The output of the hidden layer \(\Delta\) and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) is independently analyzed for \( \theta_{\rho }\) and \( g_{{\text{o}}}^{{\prime }} \left( { \partial } \right)_{\rho - 1}\), respectively. From the first \( {\mathcal{H}}_{1} ,\Delta\) is the output ensuring all possible comparisons without false \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }}\). The rest state errors are reduced by classifying the states of the \( \partial_{1}\) such that \( g_{{\text{o}}} \left( {x_{{\text{o}}} ,y_{{\text{o}}} } \right)\) is matching all the instances. In the second learning \( {\mathcal{H}}_{2}\), when the precision factor increases, \(g_{{\text{o}}} \left( \partial \right) \) reduces from both \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }}\) and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\), respectively. The activation function \( v\) computes the occurrence of \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) using Eq. (11) and \( \rho\) using Eq. (4) for both the states of the gesture observation. The varying \( g_{{\text{o}}}\) in all the outputs of \( {\mathcal{H}}_{1}\) and \( \left( {v,\theta } \right)\) analyses using \( {\mathcal{H}}_{2}\) increases the observation and classification of \( \partial\) using \( \delta\) as in Eqs. (5) and (6). The \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) in the hidden layers is extracted from \( \partial\) and \( g_{{\text{o}}} \left( \partial \right)\) indifferent instances. As the time interval increases, the need for \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }} \) increases from which appropriate mess is detected. The comparative analysis of mess for varying time intervals and gestures is presented in Figs. 9 and 10, respectively.

Fig. 9
figure 9

Mess factor for varying time interval

Fig. 10
figure 10

Mess factor for varying gestures

Accuracy analysis

DGR achieves better accuracy compared to the other methods for the varying mess factor and positions as shown in Figs. 11 and 12. The recurrent precision analysis improves the accuracy of the proposed system at any instance. This is achieved by identifying \( \rho\) and \( g_{{\text{o}}}\) in all the \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }}\) for all the inputs and is analyzed for \( \partial\). The mess factor in the proposed method is decreased, as discussed previously. The varying mess factor reduces the accuracy factor; however, the proposed method identifies the mess in two different stages during \( g_{{\text{o}}}\) and \( g_{{\text{o}}} \left( \partial \right)\). The first instance of learning process achieves an optimal accuracy by detecting \( g_{{\text{o}}}\) in the successive iterates. In the second learning, hidden layer approximates the output using \( g_{{\text{o}}} \left( \partial \right)\) mitigation and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) identification to improve the accuracy at any instance. The analysis of accuracy and improving \( \omega\) is augmented in this learning process from \( g_{{\text{o}}} \left( \partial \right)_{1}\) to \( g_{{\text{o}}} \left( \partial \right)_{\rho - 1}\) and \( g_{{\text{o}}} \left( \partial \right)_{2}\) to \( g_{{\text{o}}} \left( \partial \right)_{\rho }\) for identifying \( \tau\) and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) in the successive instances. Therefore, the accuracy for the mess factor is achieved high by reducing \( \Delta_{{{\text{g}}_{{\text{o}}} }}\) and variations in \( \rho\). This is approximated by verifying \( \theta\) in all the identifying instances to improve the accuracy of all \( \partial\) and \( \rho\) as classified under different states.

Fig. 11
figure 11

Accuracy for varying mess factor

Fig. 12
figure 12

Accuracy for varying positions

Recognition time analysis

Dependable gesture recognition (DGR) achieves less recognition time compared to the other methods. This system identifies the difference in \( \rho\) using \( \gamma \left( \rho \right)^{{{\text{g}}_{{\text{o}}} }} \) and hence, the \( {\mathcal{H}}_{2}\) solutions rely on \( \theta_{\rho }\) factor. This does not require additional time as the number of validating instances is initiated from \( \rho\) to \( \tau\). Instead, if \(\Delta_{{{\text{g}}_{{\text{o}}} }}\) is the factor, then \( v\) is used for refining \( g_{{\text{o}}} \left( \partial \right)\) from \( \rho\) and \( \partial\) matching. This is instigated from \( g_{{\text{o}}} \left( \partial \right)_{2}\) to \( g_{{\text{o}}} \left( \partial \right)_{\rho }\), reducing the time of analysis from different iterations. Therefore, the time required for detecting the precise gesture relies on either of \( g_{{\text{o}}} \left( \partial \right)_{\rho }\) or \( g_{{\text{o}}} \left( \partial \right)_{\rho - 1}\) instance, reducing the cumulative time for recognition. The early classification using \( m^{\prime}_{{\text{o}}}\) and \( \gamma \left( {g_{{\text{o}}} } \right)\) is used to differentiate the need for recurrent analysis of the input gestures. Besides, \( {\mathcal{H}}_{1} \) and \( {\mathcal{H}}_{2}\) are analyzed for different cases of \(\Delta\) and \( \theta\) independently without increasing the time factor. Also, the recurrent process of the hidden layer using different constraints of \( \theta_{\rho }\) and \(\Delta\) is handled using \( v\) to derive \(\Delta\) and \( \Delta_{{{\text{g}}_{{\text{o}}} }}\), respectively. The training process is preferred for \( \theta_{\rho } \in \Delta\) and \( g_{{\text{o}}} \left( {x_{{\text{o}}} ,y_{{\text{o}}} } \right).\) Therefore, the time is required for only one learning instance. For any range of mess factor and count of gestures, recognition is retained at the least value in the DGR system as shown in Figs. 13 and 14. In Tables 1, 2 and 3, the comparative analysis results are presented.

Fig. 13
figure 13

Recognition time for varying mess factor

Fig. 14
figure 14

Recognition time for varying gestures

Table 1 Comparative analysis for the varying gesture time
Table 2 Comparative analysis for the varying gestures
Table 3 Comparative analysis for the varying mess factor

From the above Table 1, the proposed DGR achieves 95.94% high precision for the varying gesture time compared to other methods.

The results from Table 2 shows that the DGR proposed method achieves 94.92% high precision, 0.0371 less mess factor, and 4.97-s less recognition time, respectively.

From the presentation in Table 3, the DGR proposed method achieves 89.85% high accuracy and 4.93-s less recognition time, respectively, for the varying mess factor.

In Table 4a–c, the position and its corresponding similarity metrics are presented for three gestures (wave, crossed fingers, and fist). Here, \(p1, p2, p3 ,p4,\) and \( p5\) represent the positions at which the three gestures are observed. The corresponding order for the above representations is the standing, sitting, lying, walking, and running. The accuracy is estimated as symbol ‘A’, followed by the mess as ‘M’; precision using symbol ‘P’ and nil detection as ‘0’. For the varying positions, the similarity in detecting the gesture in different positions is given in the below tables.

Table 4 Similarity metrics for gesture—(a) wave, (b) crossed Fingers, (c) Fist

For the gestures, wave, crossed finger and fist gestures, the number of classification instances, mean accuracy, and the mess is presented in Table 5.

Table 5 Analysis of the outputs in Table 4a–c

In Table 6, the precision and mess at different positions are presented. This tabulation is considered for different gestures that are described in the results section.

Table 6 Precision and mess for different gestures in different positions

As shown in Table 6, the proposed method is capable of retaining the precision at a high level irrespective of the position for different gestures. The change in position is considerable using the state and action classification of the gesture input.

Conclusion

This paper introduces dependable gesture recognition (DGR) for improving the performance of remote monitoring healthcare systems. The proposed system classifies the states and actions of the gesture input using finger and wrist movements at the initial stage. The identified coordinates of the observed gesture are analyzed using multi-layer perception series learning over the classified states and actions. In this analysis, the hidden layer processes are distinguished for gesture analyses, matching, and mess detection. The training of the learning process using the classification constraints and inputs operates weight for unbiased matching and recognition of gestures. The joint function of activation and gradient descent functions are used to differentiate the mess recurrently. This differentiation is performed in a monotonous manner for different states and actions of the gesture input. Therefore, functional filtering of analyses and activation-dependent unbiasing of gradient descent are performed to reduce the mess in gesture recognition. Besides, the proposed recognition system attains better precision, accuracy under less recognition time.