Intelligent system for human activity recognition in IoT environment

Khaled, Hassan; Abu-Elnasr, Osama; Elmougy, Samir; Tolba, A. S.

doi:10.1007/s40747-021-00508-5

Intelligent system for human activity recognition in IoT environment

Original Article
Open access
Published: 07 September 2021

Volume 9, pages 3535–3546, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Intelligent system for human activity recognition in IoT environment

Download PDF

Hassan Khaled¹,
Osama Abu-Elnasr¹,
Samir Elmougy¹ &
…
A. S. Tolba¹

3871 Accesses
12 Citations
Explore all metrics

Abstract

In recent years, the adoption of machine learning has grown steadily in different fields affecting the day-to-day decisions of individuals. This paper presents an intelligent system for recognizing human’s daily activities in a complex IoT environment. An enhanced model of capsule neural network called 1D-HARCapsNe is proposed. This proposed model consists of convolution layer, primary capsule layer, activity capsules flat layer and output layer. It is validated using WISDM dataset collected via smart devices and normalized using the random-SMOTE algorithm to handle the imbalanced behavior of the dataset. The experimental results indicate the potential and strengths of the proposed 1D-HARCapsNet that achieved enhanced performance with an accuracy of 98.67%, precision of 98.66%, recall of 98.67%, and F1-measure of 0.987 which shows major performance enhancement compared to the Conventional CapsNet (accuracy 90.11%, precision 91.88%, recall 89.94%, and F1-measure 0.93).

Wi-Sense: a passive human activity recognition system using Wi-Fi and convolutional neural network and its integration in health information systems

Article Open access 13 July 2021

The Significance of IoT and Deep Learning in Activity Recognition

An Enhanced Deep Learning Approach for Smartphone-Based Human Activity Recognition in IoHT

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An Intelligent Decision Support System (IDSS) is an ideal approach for solving many challenges that can currently face the world. With the popularity and widespread of Machine Learning (ML) algorithms, the creation process of IDSS is easier and faster combined with the easy access to big datasets covering all aspects of our life which helped to fight COVID-19 virus [1]. IDSS helps physicians in detecting the virus in an early stage which increases the probability of survival of the patients. Moreover, recognizing the patients’ hand gestures is a popular application of IDSS in the field of smart healthcare systems. It alerts the staff for the patients’ requests in time without delays in remote monitoring environments [2]. The importance of IDSS in the medical fields is especially appreciated in poor countries as the healthcare service is very weak and, in some places, it does not exist. IDSS can fill the gaps in the services by providing on-time and cheap service without the need for expensive equipment and trained personnel.

Nowadays, there is tremendous growth in IoT-enabled devices for empowering decision-making processes in complex systems. The fast development and miniaturization of sensors and reduced need for power requirement leads to a revolution in the field of Human Activity Recognition (HAR). Detection of early signs of critical disease like diabetes [3] and heart disease [4], even detection of early signs of COVD-19 using smart watches’ [5] sensors data, became a reality.

One trend that has gained importance recently is moving from one size fits all in the field of medicine to Personalized Health Care (PHC) and medicine [6, 7]. This happened due to the growth in aging population and the rise of the costs of chronic diseases. Therefore, a new solution is needed for this problem. This solution should include new ways to monitor and measure the vital signs of every patient to tailor and customize the medication plan for specific needs. This can be achieved through using ML and the Internet of Things (IoT) through using suitable sensors around the patient which send the data continuously to the doctors and hospitals to make informed decisions. Such information is used to help the beneficiaries regarding controlling the daily-life activities [8].

The idea behind this paper is to develop an IDDS for automatically collecting and classifying the daily-life activities through integrating the power of IoT with ML algorithms. This provides the things in this system with such intelligence that can sense, understand, and act according to the information collected through the sensors installed on personal smart phones. The rest of this paper is organized as follows. "Related work" provides some related works. "The proposed model" discusses the proposed work. "Evaluation and results" presents an evaluation of the proposed model and discusses the results. "Conclusion and future work" provides conclusions and future suggested work.

Related work

Dorgham et al. [9] proposed a modern hybrid evolutionary approach that incorporates Genetic Algorithm (GA) with efficient evolutionary techniques. A Decision Support System (DSS) was implemented to assist hospital personnel in the assignment operation. The authors demonstrated the efficacy of the proposed approach to solve many benchmark instances recorded in the literature relevant to the smart health-care system using a true deep experimental analysis. In addition, their hybrid algorithm outperforms powerful approaches from the literature that have the best-known outcomes.

Zhou et al. [10] proposed HAR model based on Long-Short Term Memory (LSTM) Deep Learning (DL) algorithm for empowering the Internet of Healthcare Things (IoHT). It used deep Q-network for automatic labeling of data with reward-based on the distance to handle the issue of lack of labeled data. Then, the fusion of user’s body sensors data and environmental data were applied for feeding the model. The results showed that this work outperformed other approaches like SVM, DNN, and Random Forest (RF) with a ROC curve up to 0.95.

Anguita et al. [11] proposed a system based on Support Vector Machines (SVM). The data is collected using a smartphone (Samsung Galaxy S2). Each person of the participants is doing a different activity: laying, walking, sitting, standing, walking up-stairs, and walking down-stairs. The results of the experiments are conducted through comparing two versions of SVM. The performance of the first version, Multi-Class SVM, achieves 89.3% accurate regarding predicting six different classes. However, the second version, Multi-Class Hybrid Fusion SVM, achieves 89% accuracy.

Murad et al. [12] suggested using deep recurrent neural network (DRNN) model. This model helps capturing the entire long-range of relations in the input data rather than being restricted to the size of the kernel window. Also, the model uses three different architectures: unidirectional, bidirectional, and cascading. Performance using DRNN with other algorithms on UCI-HAR dataset is concluded as follows: DRNN has reached the highest accuracy of 96.7% compared to 96% from SVM, and 95.2% from convolution neural network (CNN) and outperformed the others (SVM, K-nearest neighbor, and CNN).

Another proposed an approach for HAR using Deep Belief Neural Network (DBNs) which is built by sequentially stacking multiple Restricted Boltzmann Machines (RBM) [13]. They used a deep activity recognition model with three layers of one thousand neurons each. The results showed that their approach is better than the traditional methods. Also, their results showed that a hybrid DL and Hidden Markov Model (HMM) achieved recognition accuracy of 99.13%.

Chen and Xue [14] presented a CNN model for HAR through modifying the convolution kernel for the purpose of adapting the characteristics of tri-axial acceleration signals. The results showed that their model achieved an accuracy of 93.8% with no using of feature extraction based on a dataset of 31,688 samples gathered from nine activities.

Qin et al. [15] proposed a unique architecture for HAR that utilized data from multiple sensors. This system converts time series data collected from sensors into images. These transformed images were used to keep required features and patterns for the task of HAR. For enabling the model to be trained and evaluated on the collected data from different sensors, the authors used a fusion residual network by merging two networks and training different data pixel-wise correlations. This model provided state-of-the-art performance with an accuracy of 93.41% on HHAR dataset and 98.5% on MHEALTH dataset.

Xia et al. [16] proposed a deep learning model that fuses LSTM layers with convolution layers to draw out the activity attributes without human interference in the feature selection process and classify them correctly. This model collected smartphone sensor data and fed it to two-layer LSTM followed by the convolution layers. The evaluation of the model was carried out on three public datasets. It achieved an accuracy of 95.85%, 95.78%, and 92.63% on WISDM UCI-Har, and OPPORTUNITY datasets, respectively.

Irvine et al. [17] proposed data driven HAR classifier as an ensemble of neural networks (NNs) for improving the quality of the public datasets. They used an ensemble of four NNs which generated and integrated using support function fusion. They introduced different approaches for handling the disputes between the different models. The final ensemble model achieved the best performance that reached an accuracy of 80.39%.

Mliki et al. [18] proposed an approach to HAR using non-invasive means depending on UAV-taken video sequence of human movement. This approach consists of two stages. The first is an offline stage that generates two CNN models (i.e., human/ non-human and the human activity model). The second is the inference stage that is concerned with indicating humans and their activities by adapting CNN. This system outperformed other methods on UCF-ARG dataset with an accuracy of 56% using instance classification and 68% on the entire sequence of frame classification.

Soleimani et al. [19] proposed a new method called Subject Adaptor Generative Adversial Network (SA-GAN). This method helps in handling the issue of the lack of big enough labeled data. The proposed model used GAN framework to execute cross-subject transfer learning in the domain of HAR depending on the data collected from wearable devices. In more than 66% of experiments, the model outperformed other compared approaches, while in the remaining 25% of experiments, it came in second. This work reached of 90% of the accuracy by supervised training over the same domain data in some cases.

Mazzia et al. [20] presented a modified version of capsule networks by substituting the dynamic routing with a novel non-iterative, highly parallelizable routing algorithm that can handle a smaller number of capsules with ease. Extensive testing with other capsule implementations has shown the efficacy of their approach and the potential of capsule networks to effectively embed more generalizable visual representations.

Jiang et al. [21] used artificial neural network (ANN) to approximate the time-dependent distributions of non-Markovian models using solutions of much simpler time-inhomogeneous Markovian models; the approximation does not increase the model's dimensionality while also allowing the kinetic parameters to be inferred. This network is trained using a small number of noisy measurements derived from experimental data or stochastic simulations of the non-Markovian model. They showed that the Markovian models learned by the NN is accurately reflecting the stochastic dynamics across parameter space using a range of models where the delays are caused by transcriptional processes and feedback control.

Attal et al. [22] applied and compared some ML approaches: k-Nearest Neighbor (kNN), SVM, Gaussian Mixture Models (GMM), RF, k-Means, Gaussian mixture models (GMM), k-Means, Gaussian mixture models (GMM), and HMM for HAR. The dataset contains some main daily living human activities. Some of these activities are walking, lying, and standing. They used three inertial wearable accelerometers placement on the human body dataset. Raw data and extracted/selected features were input for the classifiers. The results showed that that KNN has the high performance among all compared approaches. Also, they showed that MM has better performance among the compared unsupervised classifiers.

Shoaib et al. [23] collected data from 13 human activities performed indoors. In these experiments, each participant had a mobile phone in his right pocket and another at his right wrist. Three motion sensors at the wrist and pocket positions based on different scenarios were evaluated. The authors extracted different features for these sensors over different window sizes without overlap. They used Scikit-learn toolkit for analyzing the performance. Naive Bayes (NB), KNN, and decision tree were applied for practical simple and complex activity recognition. Also, they used ten-fold stratified cross-validation. Results proved that there is relatively smaller enhanced recognition because of data combination taken through different sensors at pocket and wrist positions. Also, they showed that increasing size of the window leads to improve the recognition results of various complex activities. However, this factor has a limited effect on the simple activities.

Garcia et al. [24] presented an ensemble called EkVN for HAR. It combines kNN, Decision Tree, and NB. It is based on heuristic hand-crafted feature extraction. The features were extracted from accelerometer, magnetometer, and gyroscope sensors. The results showed the accuracy of EkVN is more sensitive to data from different users to the window size and to the overlapping factor. Also, they [25] presented a multi-classification approach called EAE for HAR using an ensemble of Auto-Encoders (AEs). In EAE, each AE is trained with data for unique class for reconstructing the sensor measurements; each AE is associated with a label/activity. EAE can be updated with the user’s data when loss drops are occurred below a known value. The results of experimentations based on WISDM, MHealth, and PAMAP2 HAR datasets showed that EAE is efficient and competitive among all compared works. Also, they showed that structure of this modular classifier can permit for more flexible models.

Dua et al. [26] developed a DNN-based model that uses CNN, as well as a Gated Recurrent Unit as an end-to-end model that performs automatic feature extraction and activities classification. The raw data is utilized from wearable sensors without using neither pre-processing nor customized features extraction. This work achieved 96.20%, 97.21%, and 95.27%, respectively, on UCI-HAR, WISDM, and PAMAP2 datasets. Overall, the results showed that the performance of the suggested model outperformed other compared works.

Rashid et al. [27] proposed a low-power edge device-friendly Adaptive CNN for energy-efficient HAR called AHAR. During the inference phase, AHAR employs an adaptive design that choices a component of the baseline design to use. Two datasets, Opportunity and w-HAR, were used to validate the work for categorizing locomotor activities. This work achieved a weighted F1 score of 91.79% and 91.57%, respectively, when compared to fog/cloud computing techniques for the first dataset. Also, it achieves F1 score of 97.55% and 97.64%, respectively, on the w-HAR dataset. When compared to the works on the both datasets, this work is much more energy-efficient (422.38 × less) and memory-efficient (14.29 × less).

Mekruksavanich et al. [28] proposed a revolutionary hybrid model called CNN-LSTM to handle HAR problem. It is a deep learning multichannel architecture. Using DHA public dataset of smart-watch accelerometer, the results proved that this model exceeds other compared deep learning approaches in terms of different performance measures. It achieved 96.87% accuracy.

For the HAR challenge, Athavale et al. [29] presented a pre-trained VGG16 model. This CNN model is used to learn the deep features. The signal classification of human activity, which is recorded by the accelerometer sensor of the mobile phone, was done using VGG16. The accelerometer sensor on a smartphone records these data. The features were trained using VGG16 CNN model is fifth max-pooling layer and fed to SVM. The fully connected layer of this model was replaced by the SVM classifier. This work achieved 79.55% accuracy and 71.63% F-Score based on UniMiB dataset that includes samples of human everyday life activity.

Shang et al. [30] proposed a WiFi-based HAR system. This system can determine different activities via the Channel State Information (CSI) from WiFi devices. They presented a special deep learning framework, LSTM-CNN. It can automatically extract features from temporal and spatial domains. The authors proved the effectively of their work in classifying different activities. Also, the experimentations results proved that this work is better than the compared models on HAR of CSI data; it achieves an average accuracy of 94.14% in multi-activity classification.

Poma et al. [31] presents a way to search for the best number of filters for each convolution layer of a CNN. They advocated searching for the best number of filters in the convolution layer of CNN. In addition, to identify the parameters of the fuzzy system memberships, they applied Fuzzy Gravitational Search Algorithm approach. ORL dataset is used that contains 40 images of different human faces with ten images for each face. The results proved that this work achieves a high%age of recognition.

The proposed model

This paper proposes an intelligent decision support system for recognizing the human’ daily activities that feed the sensing data to the recognition model after handling their imbalanced issues. Figure 1 show our overall proposed framework. It has three steps:

Data collection Tri-axial accelerometers which are integrated in the smartphone have been used for gathering 3D time-series data that represent the linear acceleration based on vibration in three directions X, Y and Z. Our model uses the raw Wireless Sensor Data Mining (WISDM) dataset [32].
Balancing dataset This is done by applying the random oversampling technique to handle the issues of biased dataset.
Activity recognition A modified version of 1-D capsule neural network was used to recognize the activities which were exercised and notify the user with the activity class in accordance with the sensor’s readings.

Using over-sampling for balancing the dataset

In WISDM dataset [32], the samples that represent walking and jogging activity classes out-number the samples of the other classes by large margin. Due to the imbalanced behavior of WISDM dataset that adversely affect the performance of the classifier, the Random-SMOTE algorithm [33] is used to increase the number of the minority class to reach the optimal balanced ratio of 1:1. This is done by randomly selecting examples from the minority class and adding them to the training dataset. For a dataset that has N attributes, taking an attribute n as a sample, the new value is randomly generated using the Random-SMOTE algorithm [33].

Proposed 1D capsule neural network for HAR

A capsule neural network (CapsNet) is a newly developed machine learning that was introduced in [34] as a development of CNN. The idea behind its architecture came from adding structures known as “capsules” to a CNN. Capsules are structures of neurons that are activated when a set of attributes are related to a class activity. Usually, an artificial neuron produces a single value and formally a scalar value is related to the probability of the existence of the class in the feature vector. In CapsNet, the scalar output is replaced with the vector-based capsules. The output of the higher capsule (parent) is computed by the scalar product of the coefficient representation of the probabilities of its related lower capsules (children). The closer the child to the parent is, the higher the coefficient between the parent and the child is. In this paper, we propose 1D-HARCapsNet model as a modified version of 1D capsule neural network presented by Suri and Gupta [35]. The proposed model is applied for recognizing the human activities based on the immediate observations of the human actions. Instead of using a single level of convolutional layer, 1D-HARCapsNet architecture implements three levels 1- D convolutional layer (3-Conv1D). The rest of the architecture comprises the primary capsule layer, the activity capsule layer, and the output layer. Figure 2 shows the structure of the proposed 1D-HARCapsNet from the input to the output.

The input data has 80 3D vectors (80 × 3). The model feeds the data to three consequent levels of convolution layer (3-Conv1D) of sizes (80 × 3, 51 × 256 and 42 × 512) respectively. Next, it uses the primary capsule convolution layer of size 40 × 1024 where its generative output is sent to the fully connected activity layer that produces a scalar vector. Finally, this value is passed to the output layer which generates the most likely target class. Table 1 illustrates the structure of the proposed 1D-HARCapsNet model.

Table 1 Structure of proposed 1D-HARCapsNet

Full size table

The 3-Conv1D layer

Input data samples with (80 × 3) size represent 80 data point wide with the height of three data points are fed into a sequence of three Conv1D with different activation functions to construct the feature maps. The first level of the 3-Conv1D implements 256 filters with a kernel size of (30 × 30) and uses the tanh activation function to calculate the hyperbolic tangent value of the given input. The output is 51 data points wide and the height is 256 data points which is sent to the next level. The Second level implements 512 filters with a kernel size (10 × 10) and uses the ReLu activation function that generates the input directly if it is not negative, otherwise it will output zero. The output of this level is 42 data points wide and 512 data point height which is sent to the last level of the 3-Conv1D layer. The third level implements 1024 filters with a kernel size (3 × 3) and uses the tanh activation function. Totally, the output of this layer is 40 data points wide and 1024 height data points which is sent to the next layer as an array of feature maps for further processing.

The primary capsule layer

The primary capsule layer is a 1-D convolution (Conv1D) layer with a kernel size (30 × 30). It implements the reshape function to convert the array of the feature maps into the corresponding vectors. Finally, it is passed to the squashing function to convert the vector output to a value between 0 and 1.

The activity capsule layer

It replaces each capsule in the network with its actual class activity by implementing the dynamic routing algorithm. Routing by agreement is based on the ability of the lower capsule (i) in the primary capsule layer to predict the output of the higher capsule (j) in the activity capsule layer.

For each capsule i and capsule j, the prediction of the output of capsule j is denoted by U_j|i and calculated by Eq. 1:

$$ U_{j|i} = W_{ij} u_{i} , $$

(1)

where u_i represents the output of the capsule i and W_ij is the weight matrix. Next, the total input S_i to capsule j in the activity capsule layer is calculated using a weighted sum overall the prediction vectors as given in Eq. 2.

$$ S_{i} = \sum_{i} C_{ij} U_{j|i} , $$

(2)

where C_ij are the coupling coefficients between the capsule i and all the capsules in the higher layer. It is calculated using a routing softmax function as given in Eq. 3.

$$ c_{ij} = { }\frac{{{\text{exp}}\left( {b_{ij} } \right)}}{{\mathop \sum \nolimits_{k} {\text{exp}}\left( {b_{ik} } \right)}}, $$

(3)

where b_ij indicates log prior probability of the capsule j in coupled to capsule i, k. Finally, the scalar output vector of capsule j is obtained by applying a non-linear squashing function to its total input according to Eq. 4.

$$ v_{j} = \frac{{\left| {\left| {S_{j} } \right|} \right|^{{2{ }}} }}{{1 + \left| {\left| {S_{j} } \right|} \right|^{{2{ }}} }}\frac{{S_{j} }}{{\left| {\left| {S_{j} } \right|} \right|}}. $$

(4)

The output layer

The output layer is a fully connected layer that consists of 240 sigmoid units that predicts the most likely target class activity y based on the scalar vector x as illustrated in Eq. 5.

$$ y = \frac{1}{{1 + e^{ - x} }}. $$

(5)

Evaluation and results

In the evaluation process, the widely used criteria such as: accuracy, precision, recall, and F-measure will be used. All the four criteria depend on the confusion matrix [36].

Evaluation criteria

Multiple performance evaluation criteria are used for ensuring the improvement of the proposed model compared to other existing models. The confusion matrix [36] is one of the most used evaluation metrics in the field of machine learning. Correct predication is considered as True Positive (TP), but if it is negative and is predicted as such, it is considered True Negative (TN). If it is negative and classified as positive, this is considered False Positive (FP). In case it is positive and classified as negative, this is considered False-Negative (FN). The confusion matrix values are used for measuring other important metrics such as: geometric mean, accuracy, error rate, recall, and F1-measures). Accuracy [37] is the correctly predicted samples rate. It is the ratio between correctly predicted samples to the total number of samples due to its straightforward meaning. It is one of the most used metrics in the field of the machine learning evaluation as illustrated in Eq. 6:

$$ {\text{Acc}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{Tn}} + {\text{FP}} + {\text{FN}}}}. $$

(6)

Precision (positive predictive) [37] is the ratio of correctly predicted positive class to the total number of the positive predicted samples in the dataset as illustrated in Eq. 7:

$$ {\text{PPV}}\;\left( {{\text{Precision}}} \right) = \frac{{{\text{TP}}}}{{{\text{FP}} + {\text{TP}}}}. $$

(7)

Recall or hit rate or true positive rate (TPR) is also known as sensitivity such as in [37]. It is the rate of corrected predicted samples to the total number of positive samples in the dataset as illustrated in Eq. 8:

$$ {\text{Recall}}\;\left( {{\text{TPR}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}. $$

(8)

F1-measure [37] is also called F-measure. It presents the harmonic means between precision and recall as illustrated in Eq. 9:

$$ F1{\text{-measure}} = \frac{{2 \times {\text{Precision}}\;\left( {{\text{PPV}}} \right) \times {\text{Recall}}\;\left( {{\text{TPR}}} \right)}}{{{\text{Precision}}\;\left( {{\text{PPV}}} \right) + {\text{Recall}}\;\left( {{\text{TPR}}} \right)}}. $$

(9)

Wireless sensor data mining (WISDM) dataset

WISDM time-series dataset is used for the task of (HAR) using the tri-axial accelerometer sensor on most android smartphones [32]. It consists of 1,098,207 different examples and each one consists of six different attributes with class distribution [walking: 424,400 (38.6%), jogging: 342,177 (31.2%), upstairs: 122,869 (11.2%), downstairs: 100,427 (9.1%), sitting: 59,939 (5.5%), standing: 48,395 (4.4%)] as illustrated in Table 2.

Table 2 Raw examples distribution

Full size table

The hyper parameters of the proposed 1D-HARCapsNet

This paper introduces 1DHARCapsNet model with the following hyper parameters. The number of epochs is 25 and 50, the learning rate values are 0.001 and 0.002, the number of routing are5 and 10, and the initial weights are 0.002, 0.003, 0.004 and 0.005 as illustrated in Table 3.

Table 3 The hyper parameters of the proposed 1D-HARCapsNet

Full size table

Recognition experiments of the proposed 1D-HARCapsNet

We have conducted our experiments on Kaggle cloud in which the dataset was split into 80% for training and 20% for testing. Table 4 shows the used hardware specifications.

Table 4 Experiments hardware specifications

Full size table

The performance of the proposed 1D-HARCapsNet model is compared with the conventional one-dimensional deep capsule network architecture [35] having the same hyper parameters indicated in Table 3. A series of experiments were conducted to evaluate the results by constructing different 32 test cases generated using the suggested hyper parameters. Table 5 illustrates the variation of the conventional CapsNet recognition results. The best achieved results are 90.11% accuracy, 91.81% precision, 89.94% recall and 0.903F-measure. Table 5 results on the WISDM dataset using the modified architecture without applying Random SMOTE. Table 7 illustrates the variation of the proposed system recognition results. In this model the data is balanced using random SMOTE algorithm and then it is being fed to the proposed structure of 1D-HARCapsNet with the above-mentioned hyper parameters. Figures 3, 4, 5 and 6 show the evaluation results of the constructed test cases. The accuracy values varied from 73.39 to 98.67%, the precision values varied from 76.97 to 98.66%, the recall values varied from 73.77 to 98.67% and the F-measure values varied from 0.724 to 0.987. The best recognition results achieved are based on using the values of 25, 0.002, 10 and 0.002 for the number of epochs, learning rate, routing, and weights, respectively (Tables 6, 7).

Table 5 Recognition results of conventional CapsNet model [35]

Full size table

Table 6 Results of a modified architecture without applying random-SMOTE algorithm on the WISDM dataset

Full size table

Table 7 Results of 1D-HARCapsNet based on the hyper parameters

Full size table

Comparing the proposed model against other models

Table 8 illustrates the obtained accuracy, precision, recall and F-measure of our proposed model compared with the state-of-the-art models [38,39,40,41,42,43,44,45] on raw version of WISDM dataset. The Accuracy of the proposed model has the highest accuracy of 98.67%. In the second place, Spatio-Temporal Deep Learning [46] has accuracy of 98.53%, in third-place Deep learning low power device [41] has accuracy of 98.2% while in the third-place, CNN + BLSTM [44] has accuracy of 97.8%. Based on Precision, the proposed model has achieved the highest precision of 98.66%. In the second place, Random Forest Classifier [43] has precision of 98.1% while in the third-place CNN + BLSTM [44] has precision of 97.8%. Based on recall, the proposed model has achieved the highest recall of 98.67%. In the second place, Random Forest Classifier [43] has recall of 98.1% while in the third-place, CNN + BLSTM) [44] has recall of 97.8%. On basis F-measure, the proposed model has achieved the highest F-measure with 0.987. In the second the place, the Random Forest Classifier [43] has 0.981of F-measure while in the third place, CNN + BLSTM [44] has 0.978 of F-measure. Generally, the proposed model has performed the best across the four performance evaluation criteria.

Table 8 A comprehensive comparison of multiple methods on WISDM dataset

Full size table

Conclusion and future work

In this paper, a modified version of the 1-D capsule neural network called 1DHARCapsNet was proposed to provide an efficient intelligent decision support approach for recognizing the human activity. We implemented the Random SMOTE algorithm to handle the issue of imbalanced behavior of WISD dataset. The proposed model comprises four layers: 3-Conv1D layer, the primary capsule layer, the activity capsule layer, and the output layer. The experimental results were evaluated on a raw version of WISDM dataset. The performance was assessed based on the four criteria: accuracy, precision, recall, and F-measure. Compared to the state-of-the-art algorithms, the proposed model proved its ability to recognize the human activity and outperform the others.

In the future studies, we suggest using Gray Wolf Optimizer (GWO) [50] for feature selection to improve the performance to surpass the-state-of-the-art algorithms and to provide optimal performance. GWO helps reducing the effects of noise and redundancy of data on the overall performance of the system, especially accuracy. Also, in the future work, optimization of the proposed model for different embedded devices will be performed to embed the classifier within power constrained microcontrollers, and to ensure the security of user’s data and preserve its privacy.

References

Aggarwal L, Goswami P, Sachdeva S (2020) Multi-criterion Intelligent Decision Support system for COVID-19. Appl Soft Comput 101:107056
Article Google Scholar
Mahmoud NM, Fouad H, Soliman AM (2020). Smart healthcare solutions using the internet of medical things for hand gesture recognition system. Complex Intell Syst:1–12
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type2 diabetes mellitus using machine learning-based prediction models. Sci Rep 10(1):1–12
Article Google Scholar
Li JP, Haq AU, Din SU, Khan J, Khan A, Saboor A (2020) heart disease identification method using machine learning classification in E-Healthcare. IEEE Access 8:107562–107582
Article Google Scholar
Mishra T, Wang M, Metwally AA, Bogu GK, Brooks AW et al (2020) Pre-symptomatic detection of COVID-19 from smartwatch data. Nat Biomed Eng 4(12):1208–1220
Article Google Scholar
Hu R, Linner T, Trummer J, Guttler J, Kabouteh A, Langosch K, Bock T (2020) Developing a smart home solution based on personalized intelligent interior units to promote activity and customized healthcare for Aging Society. J Popul Ageing 13(2):257–280
Article Google Scholar
Khan S, Alam M (2021) Wearable Internet of Things for Personalized Healthcare: Study of Trends and Latent Research. Health informatics: a computational perspective in healthcare. Springer, Singapore, pp 43–60
Chapter Google Scholar
Steinhubl SR, Muse ED, Topol EJ (2013) Can mobile health technologies transform health care? JAMA 310(22):2395–2396
Article Google Scholar
Dorgham K, Ben-Romdhane H, Nouaouri I, Krichen S (2020) A decision support system for smart health care. IoT and ICT for Healthcare Applications, vol 8. Springer, Cham, pp 85–98
Chapter Google Scholar
Zhou X, Liang W, Kevin I, Wang K, Wang H, Yang LT, Jin Q (2020) Deep-learning-enhanced human activity recognition for Internet of healthcare things. IEEE Internet Things J 7(7):6429–6438
Article Google Scholar
Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2012) Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. International workshop on ambient assisted living. Springer, Berlin, Heidelberg, pp 216–223
Google Scholar
Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensors 17(11):2556
Article Google Scholar
Abu Alsheikh M, Selim A, Niyato D, Doyle L, Lin S, Tan H-P (2016) Deep activity recognition models with triaxial accelerometers. In: The workshops of the thirtieth AAAI conference on artificial intelligence, pp 8–13. arXiv:1511.04664
Chen Y, Xue Y (2015) A deep learning approach to human activity recognition based on single accelerometer. In 2015 IEEE international conference on systems, man, and cybernetics. IEEE, pp 1488–1492. https://doi.org/10.1109/SMC.2015.263
Qin Z, Zhang Y, Meng S, Qin Z, Choo KKR (2020) Imaging and fusing time series for wearable sensor-based human activity recognition. Inf Fusion 53:80–87
Article Google Scholar
Xia K, Huang J, Wang H (2020) LSTM-CNN architecture for human activity recognition. IEEE Access 8:56855–56866
Article Google Scholar
Irvine N, Nugent C, Zhang S, Wang H, Ng WW (2020) Neural network ensembles for sensor-based human activity recognition within smart environments. Sensors 20(1):216
Article Google Scholar
Mliki H, Bouhlel F, Hammami M (2020) Human activity recognition from UAV-captured video sequences. Pattern Recogn 100:107140
Article Google Scholar
Soleimani E, Nazerfard E (2019) Cross-subject transfer learning in human activity recognition systems using generative adversarial networks. Neuro Comput 426:26–34
Google Scholar
Mazzia V, Salvetti F, Chiaberge M (2021) Efficient-Caps net: capsule network with self-attention routing. arXiv:2101.12491
Jiang Q, Fu X, Yan S, Li R, Du W, Cao Z, Qian F, Grima R (2021) Neural network aided approximation and parameter inference of non-Markovian models of gene expression. Nat Commun 12(1):1–12. https://doi.org/10.1038/s41467-021-22919-1
Article Google Scholar
Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical-human activity recognition using wearable sensors. Sensors 15:31314–31338 (CrossRef)
Article Google Scholar
Shoaib M, Bosch S, Incel OD, Scholten H, Havinga PJ (2016) Complex human activity recognition using smartphone and wrist-worn motion sensors. Sensors (Basel) 16(4):426. https://doi.org/10.3390/s16040426 (PMID: 27023543; PMCID: PMC4850940)
Garcia KD, Carvalho T, Mendes-Moreira J, Cardoso JMP, de Carvalho ACPLF (2019) A study on hyperparameter configuration for human activity recognition. In: 14th international conference on soft computing models in industrial and environmental applications (SOCO 2019), May 13–15, Seville, Spain. Springer, Cham, pp 47–56. https://doi.org/10.1007/978-3-030-20055-8_5. ISBN: 978-3-030-20055-8
Garcia KD, de Sá CR, Poel M, Carvalho T, Mendes-Moreira J, Cardoso JM, de Carvalho AC, Kok JN (2021) An ensemble of autonomous auto-encoders for human activity recognition. Neurocomputing 439:271–280. https://doi.org/10.1016/j.neucom.2020.01.125 (ISSN 0925-2312)
Article Google Scholar
Dua N, Singh SN, Semwal VB (2021) Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing 103:11461–1478. https://doi.org/10.1007/s00607-021-00928-8
Rashid N, Demirel BU, Faruque MAA (2021) AHAR: adaptive CNN for energy-efficient human activity recognition in low-power edge devices. arXiv:2102.01875
Mekruksavanich S, Jitpattanakul A (2021) A Multichannel CNN-LSTM network for daily activity recognition using smartwatch sensor data. In: 2021 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering. IEEE, pp 277–280. https://doi.org/10.1109/ECTIDAMTNCON51128.2021.9425769
Athavale VA, Gupta SC, Kumar D (2021) Human Action Recognition Using CNN-SVM Model. Adv Sci Technol Trans Tech Publ Ltd 105:282–290
Google Scholar
Shang S, Luo Q, Zhao J, Xue R, Sun W, Bao N (2021) LSTM-CNN network for human activity recognition using WiFi CSI data. J Phys Conf Ser 1883(1):012139
Poma Y, Melin P (2021). Estimation of the number of filters in the convolution layers of a convolutional neural network using a Fuzzy Logic System. 1–4
Kwapisz JR, Weiss GM, Moore SA (2011) Activity recognition using cell phone accelerometers. ACM SIGKDD Explor Newsl 12(2):74–82
Article Google Scholar
Dong Y, Wang X (2011) A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. In: International conference on knowledge science, engineering and management. Springer, Berlin, Heidelberg, pp 343–352
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. arXiv:1710.09829
Suri K, Gupta R (2019) Continuous sign language recognition from wearable IMUs using deep capsule networks and game theory. Comput Electr Eng 78:493–503
Article Google Scholar
Tharwat A (2018) Classification assessment methods. Applied Computing and Informatics. New England J Entrepreneurship. 17(1):168–192. https://doi.org/10.1016/j.aci.2018.08.003. ISSN: 2634-1964
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australasian joint conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 1015–1021. https://doi.org/10.1007/11941439_114. ISBN: 978-3-540-49788-2
Ignatov A (2018) Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl Soft Comput 62:915–922
Article Google Scholar
Zhang Y, Zhang Y, Zhang Z, Bao J, Song Y (2018) Human activity recognition based on time series analysis using U-Net. arXiv:1809.08113
Catal C, Tufekci S, Pirmit E, Kocabag G (2015) On the use of ensemble of classifiers for accelerometer-based activity recognition. Appl Soft Comput 37:1018–1022
Article Google Scholar
Ravi D, Wong C, Lo B, Yang GZ (2016) Deep learning for human activity recognition: a resource efficient implementation on low-power devices. In: 2016 IEEE 13th international conference on wearable and implantable body sensor networks (BSN). IEEE, pp 71–76
Shakya SR, Zhang C, Zhou Z (2018) Comparative study of machine learning and deep learning architecture for human activity recognition using accelerometer data. Int J Mach Learn Comput 8(6):577–582
Google Scholar
Walse KH, Dharaskar RV, Thakare VM (2016) Performance evaluation of classifiers on WISDM dataset for human activity recognition. In: In Proceedings of the second international conference on information and communication technology for competitive strategies (ICTCS’16), pp 1–7. https://doi.org/10.1145/2905055.2905232
Ihianle IK, Nwajana AO, Ebenuwa SH, Otuka RI, Owa K, Orisatoki MO (2020) A deep learning approach for human activities recognition from multimodal sensing devices. IEEE Access 8:179028–179038
Article Google Scholar
Kolosnjaji B, Eckert C (2015) Neural network-based user-independent physical activity recognition for mobile devices. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 378–386. https://doi.org/10.1007/978-3-319-24834-9_44. ISBN: 978-3-319-24833-2
Nafea O, Abdul W, Muhammad G, Alsulaiman M (2021) Sensor-based human activity recognition with spatio-temporal deep learning. Sensors 21(6):2141
Article Google Scholar
Ghate V (2021) Hybrid deep learning approaches for smartphone sensor-based human activity recognition. Multimedia Tools Appl 6:1–20
Google Scholar
Slim SO, Elfattah MM, Atia A, Mostafa MSM (2021) IoT System based on parameter optimization of deep learning using Genetic Algorithm. 14(2):220–235. https://doi.org/10.22266/ijies2021.0430.20
Sanguannarm P, Elbasani E, Kim B, Kim EH, Kim JD (2021) Experimentation of human activity recognition by using accelerometer data based on LSTM. Advanced Multimedia and Ubiquitous Engineering. Springer, Singapore, pp 83–89
Chapter Google Scholar
Emary E, Zawbaa HM, Grosan C, Hassenian AE (2015) Feature subset selection approach by gray-wolf optimization. Afro-European conference for industrial advancement. Springer, Cham, pp 1–13
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
Hassan Khaled, Osama Abu-Elnasr, Samir Elmougy & A. S. Tolba

Authors

Hassan Khaled
View author publications
You can also search for this author in PubMed Google Scholar
Osama Abu-Elnasr
View author publications
You can also search for this author in PubMed Google Scholar
Samir Elmougy
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Tolba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osama Abu-Elnasr.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Khaled, H., Abu-Elnasr, O., Elmougy, S. et al. Intelligent system for human activity recognition in IoT environment. Complex Intell. Syst. 9, 3535–3546 (2023). https://doi.org/10.1007/s40747-021-00508-5

Download citation

Received: 18 March 2021
Accepted: 14 August 2021
Published: 07 September 2021
Issue Date: August 2023
DOI: https://doi.org/10.1007/s40747-021-00508-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Intelligent system for human activity recognition in IoT environment

Abstract

Similar content being viewed by others

Wi-Sense: a passive human activity recognition system using Wi-Fi and convolutional neural network and its integration in health information systems

The Significance of IoT and Deep Learning in Activity Recognition

An Enhanced Deep Learning Approach for Smartphone-Based Human Activity Recognition in IoHT

Introduction

Related work

The proposed model

Using over-sampling for balancing the dataset