1 Introduction

Recognizing human activities enables to an assessment of human performance and thus its efficiency in daily life [65]. From this perspective, artificial intelligence (AI) has an effective role in evaluating and recognizing human activities [38]. In previous years, human activity recognition (HAR) has become a popular issue for research because of its importance in several fields, such as sports, healthcare, and fitness [18, 24, 34, 35, 52, 57], human-computer interaction, interactive gaming, smart manufacturing [10], and remote monitoring systems.

In other fields, wearable accelerometers are utilized to evaluate human activity for remotely communicating among hospitals and patients [50]. However, the low precision of these accelerometers is a challenging issue yet to be fully overcome [11, 44]. Many traditional machine learning algorithms have been developed to identify human activity [33, 45, 48, 58]. However, the level of accuracy of these algorithms remains an issue [23, 27].

One of the deep learning algorithms i.e., gated recurrent unit (GRU). This algorithm introduces effective solutions to solve the issue of low accuracy. This algorithm is already useful in such applications as digital image processing [51], speech classification [62], and language modeling [31], and is applicable to identify human activities. The motivation for using the GRU is to tackle the vanishing gradient problem, and it is convenient to process sequences of time.

In this paper, a GRU algorithm of deep learning is proposed to classify human activities with the objective of increasing the classification accuracy of the GRU algorithm via presenting a hyper-parameter tuning method. In this context, the primary aim of implementing the proposed GRU algorithm is to achieve high accuracy to identify human activities. Thus, a k-fold cross-validation technique is used to achieve high performance of classification accuracy. The GRU algorithm is tested and trained on the Wireless Sensor Data Mining (WISDM) dataset. Figure 1 shows the framework of the proposed work i.e., gated recurrent unit (GRU) to recognize six human activities: walking, sitting, downstairs, jogging, standing, and upstairs.

Fig. 1
figure 1

Framework of the proposed work

The main contributions of the paper are as follows:

  • Implementing the GRU algorithm to classify human activities;

  • Achieving maximum testing accuracy of the GRU algorithm with a hyper-parameter tuning method on a central processing unit (CPU);

  • Evaluating the performance of the proposed algorithm using different evaluation metrics with the WISDM dataset;

  • Applying the k-fold cross-validation technique to enhance the performance of the proposed GRU algorithm.

The remainder of the paper is organized as follows: Section 2 reviews the related work. Section 3 presents a background of the GRU. Section 4 introduces the methodology of the proposed work. Section 5 explains the evaluation metrics to assess the performance of the proposed algorithm. The results are demonstrated in Section 6. Section 7 shows the results. Conclusions from the proposed work are covered in Section 8.

2 Related work

In the literature, several deep learning algorithms have been presented for recognition of human activities. In [21], Hammarela et al. proposed a bi-directional long short-term memory (LSTM) algorithm to identify a large number of human activities. The authors used inertial sensors to pick up humans’ hand signals. This algorithm was trained on the Opportunity dataset [12] and had an F1-score of 92.7%. In [42], Pienaar and Malekian developed an LSTM algorithm for HAR that used a regularization method to enhance the computations to deal with a huge WISDM dataset [32] and reached a maximum accuracy of 94%. However, the performance of this work was assessed via only two evaluation metrics, namely learning curve and confusion matrix. Cruciani et al. [17] implemented a convolutional neural network (CNN) algorithm for HAR that was applied to the UCI-HAR dataset available in [9] with an overall accuracy of 91.98%. This algorithm was assessed using a variety of evaluation metrics. Ordonez and Roggen [40] proposed a ConvLSTM algorithm to classify human activities. The authors utilized several inertial measurement units (IMUS) and accelerometers. The algorithm achieved an F1-score of 95.8% and classified five activities using the Skoda dataset [49]. In [59], Xia et al. introduced an LSTM-CNN algorithm for daily life activities. This algorithm was also trained to the WISDM dataset [32] and reached an accuracy of 95.85%. However, the computational time consumed in the training process was noticeable.

Alani et al. [2] presented CNN, LSTM, and CNN-LSTM algorithms to recognize imbalanced data for HAR. These algorithms were trained on the SPHERE dataset [54] and reached accuracies of 92.98%, 93.55%, and 93.67%, respectively. The SPHERE dataset has twenty different human activities. The algorithms had limited performance and were evaluated using a single metric. In [8], Alzantot et al. proposed an LSTM algorithm to identify human activities, which was utilized to differentiate between real and synthesized data, but, the accuracy achieved for this algorithm was quite low, it had many training layers led to an architecture complexity.

Researchers [7, 37, 47, 63] have introduced CNN and LSTM deep learning algorithms to identify human activities in daily living. Shakya et al. [47] presented recurrent neural network (RNN) and CNN algorithms, utilized the Shoaib SA, and Actitracker datasets, which were partitioned randomly and reached accuracies of 81.74% and 92.22%, respectively. Mekruksavanich et al. [37] also proposed an LSTM algorithm, achieving an overall accuracy of 96.2%, and an F1-score of 96.3%. In [7], Alsheikh et al. achieved an overall accuracy of 86.6%, but did not demonstrate which dataset was utilized for the test. An LSTM architecture was introduced in [63], which achieved a maximum accuracy of 92.1% on an unknown test dataset split.

Agarwal et al. [1] applied an RNN-LSTM algorithm to classify human activity, utilizing the WISDM dataset. The authors used only two metrics for the performance assessment of the RNN-LSTM algorithm and achieved a total accuracy of 95.78%. Zhao et al. [66] presented a bi-directional LSTM algorithm to recognize human activities, for which a number of sensors were utilized to gather the datasets. The primary disadvantage of this work was the long-time consumed in training process, and thus it wasn’t so suitable for real-time applications. Cipolla et al. [16] implemented an LSTM algorithm to recognize human activities using the SPHERE dataset. The proposed algorithm demonstrated strong ability to deal with unbalanced data. The algorithm was achieved a classification accuracy of 83.2% and applied to five activities.

Over the recent years, CNN gained a lot of attention and is often utilized in fields of text analysis [19], image classification [53], and natural language processing [64]. In [26], Ignatov trained a CNN algorithm for HAR. The accuracy achieved 90.42% and 93.32% for the testing and training datasets, respectively. Xu et al. [61] applied a CNN algorithm on a randomly selected 70% of a dataset, and then utilized the algorithm to assess the remaining 30%, achieving a maximum accuracy of 91.97%. Huang et al. [25] proposed an architecture of two sequential CNNS and utilized a cross-validation technique to achieve an F1-score of 84.6%. Although a number of researchers implemented accurate algorithms to recognize human activities, there is still a wide margin of enhancement to obtain. In particular, the previous studies failed in applying the hyper-parameter tuning method to achieve high accuracy for recognizing human activities.

3 Theoretical background of the gated recurrent unit (GRU)

One of the popular deep learning approaches is the gated recurrent unit (GRU) [14], which is a special type of recurrent neural networks (RNNs). RNN has the vanishing gradient problem [20, 46], which GRU was created to tackle. GRU is convenient to process sequences of time. GRU layers haven’t memory blocks recurrently linked in a memory cell. The architecture of a GRU cell is demonstrated in Fig. 2, which comprises two gates; a reset gate and an update gate [15]. The two gates reject and accept information passing across the cell. In this Figure, the reset gate decides how much of the past information to forget. This decision is executed via a sigmoid activation function (σ). The sigmoid output is rt. If the sigmoid function has a value of 1, the data will be passed into the GRU algorithm; if the sigmoid function value is 0, the data cannot be passed through the GRU algorithm. The input of the Reset gate is the present input xt and previous hidden state ht-1. The Update gate decides what information will be updated to pass along a future state. So, the update gate depends on a very simple part of the previous state. Also, the update gate includes a sigmoid activation function for updating the cell state. This function has a range from 0 to 1. The output of the sigmoid is zt, which is multiplied with a tanh function that gives a new cell state ĥt. This tanh has values from −1 to 1. The output of the multiplication process is added to the present cell state Ct, where ht is the present hidden state output of the present cell. The reset gate and update gate are calculated from Eqs. (1) and (2), respectively. Whereby Wr and Wz represent the weights for the reset gate and update gate, respectively. In the training phase, the weights will be learned. Further, the ht of the GRU cell is determined using Eq. (3) [13].

Fig. 2
figure 2

The Architecture of the gated recurrent unit (GRU) cell

$${r}_t=\sigma \left({W}_r\left[{h}_{t-1},{x}_t\right]\right)$$
(1)
$${z}_t=\sigma \left({W}_z\left[{h}_{t-1},{x}_t\right]\right)$$
(2)
$${h}_t=\left(1-{z}_t\right)\times {h}_{t-1}+{z}_t\times {\hat{h}}_t$$
(3)

4 Methodology

The GRU algorithm is developed to recognize human activities. The proposed GRU algorithm is chosen due to its high performance. The GRU is convenient to process data of time sequence [29]. One of the advantages of GRU is that needs a little computational time, thus it has high speed in the training process [22]. The GRU is a memory extension for the recurrent neural network (RNN). Thus, an advantage of the GRU is avoiding the issue of vanishing gradient [36].

4.1 Gated recurrent unit (GRU) algorithm

Figure 3 illustrates the architecture of the proposed GRU algorithm. This architecture is executed via TensorFlow framework. It comprises an input layer, 2 GRU layers, and an output layer. The input layer includes three features and 90-time steps with a number of samples. The features are ax, ay, and az. Where ax represents the acceleration in the x-axis, while ay and az are the accelerations in the y-axis, and the z-axis, respectively. The two GRU layers are used to capture the features of time for the sequence of data. The GRU layers are stacked in order to improve its stability and accuracy with achieve depth to this algorithm. Each GRU layer has 32 hidden units [55], and utilized a rectified linear unit (ReLU) activation function to enhance the robustness of the GRU algorithm. The output layer has six neurons with a softmax function utilized as activation to find the six classes. In this algorithm, the training epochs are set to 50 with a batch size of 64 and the learning rate is initialized to 0.0025. This rate represents the training speed of the algorithm. Additionally, the utilized optimizer is the Adam, which computes the proper weights for the GRU algorithm, avoids errors, and increases the training accuracy [30]. Further, a regularization technique is implemented to prevent the GRU algorithm from over-fitting [60]. This technique is based on a cross-entropy loss function.

Fig. 3
figure 3

The architecture of the proposed GRU algorithm

4.2 Dataset description

The WISDM dataset utilized to classify human activities is available in [28]. The dataset has 1,098,207 samples of different human activities including walking, sitting, downstairs, jogging, standing, and upstairs. The sample percentages for each activity are 38.6%, 5.5%, 9.1%, 31.2%, 4.4%, and 11.2%, respectively. The WISDM dataset was gathered from 36 individuals utilizing a mobile phone, which has an internal accelerometer sensor positioned in a front trouser pocket. The readings of WISDM dataset are recorded with 20 Hz sampling frequency. The WISDM dataset is based on six factors with information referred to the human activities: time, x-, y-, and z-accelerations. The WISDM dataset is split into a testing set (20%) and a training set (80%). The testing set is utilized to assess the proposed GRU algorithm.

The WISDM dataset is selected due to it includes a variety of daily life activities. The overall size of the dataset is 1,098,207 samples. So, it is sufficient data to train the proposed GRU algorithm. Using Scikit-learn framework, the WISDM dataset is divided into 0.2 to test the GRU algorithm and 0.8 to train the algorithm. A probability sampling approach is used to randomly distribute the dataset, which presents a strong training phase for the GRU algorithm and thus improves its performance [41].

Figure 4 illustrates a sample of the acceleration data gathered from only one person within a period of time. In this figure, three signals of acceleration: ax, ay, and az. Each signal is a function in terms of the gravity acceleration (g). The blue signal represents the acceleration in the x- direction ax. The green and orange signals indicate the ay and az which are the accelerations in the y- and z- direction, respectively. The acceleration signals are stored for 22,000 seconds. The three signals have amplitude from -g to g.

Fig. 4
figure 4

A sample of the acceleration detected for one person

5 Evaluation metrics

The training phase to assesses the performance of the proposed GRU algorithm [56]. For instance, accuracy, sensitivity, precision, and F1-score, which are depend on statistical values that resultant from a confusion matrix: False Negative (FN), True Negative (TN), False Positive (FP), and True Positive (TP). TP means an output in which an algorithm correctly expects the positive class, and TN means an output where an algorithm correctly expects the negative class. FP represents an algorithm that wrongly expects the positive class, and FN presents an algorithm that wrongly expects the negative class. Also, the confusion matrix is utilized to find the ability of an algorithm to classify multi-classes correctly with the GRU algorithm utilizing the testing dataset for comparing a true label with an output predicted label.

The Accuracy metric is the ratio of true predictions to the overall predictions of the testing dataset. The accuracy is determined via Eq. (4).

$$Accuracy=\frac{TP+ TN}{TP+ TN+ FP+ FN\kern0.5em }$$
(4)

The Sensitivity metric represents the ratio of correctly classified positives of a particular class to the overall number of true class activities in the dataset of testing. Sensitivity is also called as Recall and can be determined from Eq. (5).

$$Sensitivity=\frac{TP}{TP+ FN\kern0.5em }$$
(5)

The Precision metric is the ratio of true predictions of a certain class activity to the overall predictions of the same-class in the dataset of testing. It is calculated via Eq. (6).

$$Precision=\frac{TP}{TP+ FP\kern0.5em }$$
(6)

The F1-score metric is the mean of the sensitivity and precision multiplied by a factor of 2. From Eq. (7), F1-score is calculated. This metric is also called the balanced F1-Measure. F1-score takes both false negatives (FN) and false positives (FP) into the calculation. So, this metric is a more feasible metric for an assessing than the metric of accuracy. The best possible value for all evaluation metrics that mentioned above is 1, and the worst value is 0.

$$F1- score\kern0.5em =2\times \frac{Precision\times Recall}{Precision+ Recall\ }$$
(7)

Also, the area under the ROC curve (AUC) is utilized as an assessing metric for the GRU algorithm. AUC is calculated via the integration of the true positive rate (TPR) multiplied by the false positive rate (FPR), see Eq. (8). The possible AUC values are always between 0 and 1. High values of AUC imply an algorithm is capable of differentiating among classes of human activity. Thus, the algorithm with a large area under the ROC curve is high-performance algorithm to classify human activities.

$$AUC={\int}_0^1 TPR\ d(FPR)$$
(8)

FPR and TPR are determined from Eqs. (9) and (10). The FPR represents the ratio of the overall number of false positives (FP) to the sum of true negatives and false positives (TN+ FP). Also, FPR is called as 1-Specificity. The TPR represents the ratio of the total number of true positives (TP) to the sum of false negatives and true positives (FN + TP).

$$FPR=\frac{FP}{TN+ FP\kern0.5em }$$
(9)
$$TPR=\frac{TP}{FN+ TP\kern0.5em }$$
(10)

The used evaluation metrics “accuracy, sensitivity, precision, F1-score, and AUC” are selected because of their effectiveness in assessing and analyzing the performance of the proposed GRU algorithm [3,4,5,6, 39, 43].

6 Experimental results

The proposed GRU algorithm experimentally is implemented via Python programming language in a Spyder environment. The algorithm is executed on a personal computer (PC) with Windows 7 operating system, Intel Core2Duo, two central processing units, 4 GB RAM memory, and a 3.3 GHz processor. Figure 5 demonstrates the two curves for testing and training accuracy of the GRU algorithm. The curve of training is also namely learning curve. The brown curve shows the testing accuracy, which starts from 88.40% and reaches 97.08% after 50 epochs. The red curve represents training accuracy, which continuously changes and reaches a maximum value of 97.56% after 50 epochs.

Fig. 5
figure 5

Accuracy curves of the training and testing for the GRU algorithm

Figure 6 demonstrates the loss rate curves of the GRU algorithm according to the sets of testing and training. The loss rate continuously declines with increasing number of training epochs. For the training set, the loss rate reaches 0.204 after 50 epochs. For the testing set, the loss rate starts at 0.645 and decreases to 0.221.

Fig. 6
figure 6

Loss curves of the training and testing for the GRU algorithm

The confusion matrix obtained from the GRU algorithm is depicted in Fig. 7, demonstrating 10,662 instances in the testing dataset which are correctly classified. The matrix predicts a predicted label, which is compared to the true label in the dataset of testing. The matrix has diagonal values indicate to the classification accuracy, while the values below and above the diagonal demonstrate errors that occurred. The matrix detects data of 916, 3444, 586, 479, 1078, and 4159 as true positives for the six human activities: downstairs, jogging, sitting, standing, upstairs, and walking, respectively.

Fig. 7
figure 7

The confusion matrix for the GRU algorithm

The normalized confusion matrix for the GRU algorithm is presented in Fig. 8. It is clear that the GRU algorithm performs very well, with low errors occurring above and below the diagonal of the normalized confusion matrix. It has a classification accuracy of 0.9 for the downstairs class, while the accuracy of 0.99, 0.98, 0.99, 0.92, and 0.98 for the classes – jogging, sitting, standing, upstairs, and walking, respectively.

Fig. 8
figure 8

The normalized confusion matrix for the GRU algorithm

Table 1 introduces a classification report of the GRU algorithm in terms of the Precision, Sensitivity, and F1-score, to analyze its performance with the WISDM dataset. The average Precision, Sensitivity, and F1-score are 97.11%, 97.09%, and 97.10%, respectively.

Table 1 Classification report of the GRU using the WISDM dataset

ROC curves can be considered probability curves utilized to assess the GRU algorithm and illustrate the true positive rate (TPR) on the ordinate axis, while the false positive rate (FPR) is represented on the abscissa for threshold values from 0.0 to 1.0. The ROC curves for the GRU are illustrated, in Fig. 9 and the AUC values for all six activities are 1.00. This result indicates that the GRU algorithm achieves high performance results.

Fig. 9
figure 9

ROC curves for the GRU algorithm

Figure 10 shows the precision-recall (PR) curves for the GRU algorithm for each activity. Downstairs “class 0” achieves an area value of 0.955, while class 1 “jogging” has an area of 0.999. Sitting “class 2”, standing “class 3”, upstairs “class 4”, and walking “class 5” achieve area values of 0.996, 0.992, 0.970, and 0.998, respectively. Thereby, the area under the micro-average PR curve is 0.995. The computation of the average value is executed using the summation of areas for all six classes divided by the classes’ number (6 classes in this work).

Fig. 10
figure 10

PR curves for the GRU algorithm

PR and ROC curves are robust evaluation metrics, which compute all statistical values “TN, TP, FN, and FP”, these curves are utilized in the assessment of the proposed GRU algorithm. Additionally, both curves enable instant and easy visual diagnosis of the algorithm’s behavior. These curves also depend on the area factor, where a larger area means a more useful test, and the areas under PR and ROC curves are utilized to compare the usefulness of tests.

Also, the performance of the proposed GRU algorithm via the technique of k-fold cross-validation is considered. In this technique, the WISDM dataset is partitioned into 5 equal parts (one part is used for validation and four parts for training), where k denotes the part number and equals 5 in this work. The proposed GRU algorithm is trained five times via various partitioning of the WISDM dataset. Therefore, the mean accuracy for the GRU is 97.97% with a standard deviation of ±0.47%, while the average sensitivity is 97.92% with standard deviation of ±0.58%. Similarly, the mean precision is 97.96% with ±0.52% standard deviation, and the average F1-score is 97.91% with ±0.42% standard deviation. Thereby, the proposed GRU algorithm has high performance and low standard deviation. Furthermore, the technique of k-fold cross-validation enhanced the algorithm’s performance and avoided the biasing of the performance results using a proper division of the testing and training dataset.

7 Discussion

The results illustrate that the proposed GRU algorithm has high accuracy and low loss rate for testing and training. Results from the normalized confusion matrix and PR curves show that the accurate prediction of human activities can be achieved with the proposed GRU algorithm. Figure 5 illustrates the performance of the GRU algorithm in terms of the accuracy of testing and training utilizing the WISDM dataset. Distributions of error percentages for the classes are shown in Figs. 7 and 8.

The micro-average PR curve has an area of 99.5% for the GRU algorithm. Also, for all six activities, the AUC is 100%. Further, the technique of k-fold cross-validation is used to assess the performance results for the GRU algorithm in terms of the average sensitivity, precision, accuracy, and F1-score at k = 5.

Eventually, Table 2 and Fig. 11 present a comparison of the accuracy between this work and previously published works. The accuracy reached by the GRU algorithm, 97.08%, is better than the algorithms in [17, 21, 40, 42, 59].

Table 2 Comparison of the accuracy between this work and previous works
Fig. 11
figure 11

Comparison of the accuracy between this work and previous works

The better performance of the GRU algorithm is based on the accurate tuning of the hyper-parameters of the algorithm, which includes the optimizer type, activation and loss functions, dropout rate, batch size, learning rate, epochs, and the number of neurons for the used layers in the proposed GRU algorithm.

In particular, when the learning rate, batch size, and the number of epochs were set to 0.0025, 32, and 10, respectively, and utilizing a softmax activation function, the GRU algorithm reached a testing accuracy of 93.87%, but in case the batch size was adjusted to 128, the learning rate was changed to 0.001, the epochs were reconfigured to 5 with the same activation function, the GRU algorithm’s testing accuracy achieved 92.32%.

Thus, the proper settings of these hyper-parameters significantly improve the results. Furthermore, the best performance of the GRU algorithm is achieved when the hyper-parameters are set to a learning rate of 0.0025 with epochs of 50 and the batch size of 64, the Adam optimizer is used with a regularization method, depending on the implementation of a cross-entropy loss function.

The parameters of the GRU algorithm are tuned via GridSearchCV technique, which automatically computes the proper values of the hyper-parameters for achieving better performance of the proposed GRU algorithm.

8 Conclusion

This paper proposes an architecture of the GRU algorithm for the classification of daily human activities. The main aim of this paper is to achieve maximum testing accuracy of the GRU algorithm with a hyper-parameter tuning method on a central processing unit (CPU). The performance of the proposed algorithm is evaluated using different evaluation metrics with the WISDM dataset. Experiments carried out to achieve the maximum testing accuracy of the proposed algorithm reached the testing accuracy for the GRU of 97.08%, the testing loss rate of 0.221, the training accuracy of 97.56%, and the loss rate of training is 0.204.

The normalized confusion matrix, precision, sensitivity, F1-score, and ROC curves are determined to allow assessment of the algorithm’s performance. The GRU algorithm achieved a sensitivity is 97.09%, while the precision is 97.11% and the F1-score of 97.10%. The micro-average area under the PR curve is 99.5%. Finally, the GRU algorithm achieved AUC of 100% for all classes.

The hyper-parameters of the GRU algorithm e.g., batch size, the optimizer type, dropout rate, number of training epochs, learning rate, the activation and loss functions, and number of neurons for the used layers in the proposed GRU algorithm, are found to significantly affect the accuracy of the GRU algorithm, and high accuracy achieved is highly correlated to the optimal settings applied.

In addition, the performance of the GRU algorithm is assessed in terms of accuracy, sensitivity, precision, and F1-score via the k-fold cross-validation technique.

In the future, the proposed GRU algorithm can be trained with different datasets and the performance can be compared with a graphics processing unit (GPU) environment. Also, one could implement other algorithms of deep learning, such as long short-term memory (LSTM).