Abstract
Hyperparameter optimization poses a significant challenge when developing deep neural networks. Building a convolutional neural network (CNN) for implementation can be an arduous and time-intensive task. This work proposed an approach to optimize the hyperparameters of one dimensional (1D-CNN) to improve the accuracy of human activity recognition (HAR). The framework includes a parametric depiction of 1D-CNNs along with an optimization process for hyperparameters aimed at maximizing the model's performance. This work designed the method called OPTConvNet for hyperparameter optimization of 1D-CNN using Hierarchical Particle Swarm Optimization (H-PSO). The H-PSO algorithm is designed to optimize the architectural, layer and training parameters of 1D-CNN. The H-PSO optimizes the architecture of the 1D-CNN at initial level. Layer and training hyperparameters will be optimized at the next level. The proposed approach employs an exponential-like inertia weight to fine-tune the balance between exploration and exploitation of particles to prevent premature convergence to a local optimum solution in the PSO algorithm. The H-PSO- CNN is evaluated on publicly available sensor- human activity recognition (S-HAR) datasets namely, UCI-HAR, Daphnet Gait, Opportunity and PAMPA2 datasets.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The aim of Sensor-based Human Activity Recognition (S-HAR) is to identify an individual's activities through the analysis and interpretation of data gathered from multiple sensors. Lately, the accuracy and efficiency of HAR have been greatly improved by applying deep learning techniques, especially on a variety of benchmark datasets. As the use of smartphones and smartwatches has increased, HAR research has shifted its emphasis from depending solely on body-worn sensors to utilizing the built-in sensors (like accelerometers and gyroscopes) in mobile devices to gather signal data. HAR plays a vital role in various critical domains, including healthcare, daily activity monitoring, and elderly care, among others. The collection of data necessary for HAR tasks can be achieved through the utilization of both cameras and sensors. Collecting data from visual sensors/cameras has certain limitations that should be considered. First, a camera must be installed in a designated spot, and the user must constantly stay in the camera's field of view. Second, because of privacy infringement concerns, using cameras in specific areas—like bedrooms or private spaces—is prohibited. This restriction is removed by S-HAR, where sensors are a more practical and affordable alternative to cameras for gathering HAR data.
Nowadays, the usage of wearable and inertial sensors in smart devices has emerged as a promising approach for acquisition of human activity data. This is due to their user-friendly nature, smaller size, and non-intrusive characteristics. Furthermore, these sensors provide benefits of minimal or no installation cost and mimimal energy consumption. The widespread usage of smartphones and smartwatches has made them convenient choices for HAR tasks, as they come equipped with a range of built-in sensors, including accelerometers, gyroscopes, magnetometers, compasses, and more [1].
Activity recognition research has traditionally relied on machine learning (ML) algorithms like decision trees, support vector machines (SVM), Naïve Bayes (NB) and Hidden Markov Models (HMM) to achieve favourable recognition rates in controlled experimental settings with limited labelled data. Nonetheless, the accuracy of these approaches is depending upon the quality and extent of manual feature extraction. Handcrafted feature extraction approaches can extract only superficial features. Because of these constraints, activity recognition using traditional classification methods face limitations in terms of classification accuracy and model generalization [2]. The limitation of manual feature extraction is overcome by Deep Learning (DL) approaches. A DL technique like CNN has the potential to greatly simplify the process of feature selection in traditional methods. It achieves this by autonomously extracting abstract features through multiple layers of hidden units [3].
The popularity of CNN in HAR has been well-documented in the literature [4,5,6], attributed to its ability to capture local dependencies of active signals and maintain feature scale invariance. However, CNNs involve the utilization of numerous hyperparameters that affect their performance. The performance of CNNs relies on various hyperparameters, including the number of layers, neurons, batch size, epochs, dropout rate, strides, and filter shape. Since each hyperparameter has a different impact on the CNN model, the key challenge is determining the best set to use [7]. Acquiring an optimal set of hyperparameters poses a challenge due to their time-consuming nature and the requirement for expertise. Manually optimizing the hyperparameters of a CNN is time-consuming, and the impact of adjusting one hyperparameter may influence the others due to the trade-off among them. Hence, it becomes crucial to investigate methods for identifying an optimal set of hyperparameters that deliver efficient performance. This study focuses on the optimization of CNN hyperparameters using Hierarchical Particle Swarm Optimization Technique (H-PSO).
The contributions of this research paper are as follows:
-
This research provides an overview of the latest techniques in the field, specifically tailored for researchers seeking to apply CNN models to their own datasets.
-
It proposes the H-PSO, which optimizes the hyperparameters of CNN at various levels. The H-PSO is formulated such that it optimizes both architecture level parameters (number of convolutional, pooling and fully connected layer) and hyperparameters at each layer.
-
The novelty of this work lies in the fact that the proposed H-PSO optimizes Architecture level, Layer level, and Training level hyperparameters simultaneously.
-
The paper conducts a performance analysis of the proposed H-PSO with state-of-the-art optimization methods.
The rest of this paper is organized as follows:
Sect. "Sensor-based Human Activity Recognition (S-HAR)" presents the related work on sensor based HAR, while various state-of-the-art CNN hyperparameter optimization techniques and their limitations are discussed in Sect. "Hyperparameter optimization techniques". Sect. "Proposed Methodology" outlines the methodology adopted. The detailed description of the methodology is discussed in subsections of Sect. "Proposed Methodology". Specifically, subSect. "Overview of CNN" provides an overview of CNN and the hyperparameters to be optimized, while subsection 3.3 discusses the architecture of the proposed H-PSO for hyperparameter optimization. Results and comparative analysis are discussed in Sect. "Results and Analysis", followed by the conclusion, acknowledgment, and references.
Related Work
Sensor-Based Human Activity Recognition (S-HAR)
CNN is one of the significant approaches for recognizing human activities from sensor data. Many researchers have employed CNN and its variant models for HAR. The work [8] discusses the recognition of human locomotion activities using a customized shallow CNN, concluding that the customized 1D-CNN outperforms traditional ML algorithms, namely RBF-SVM and Random Forest. In [9], researchers presented conquer-based classification of human activities accomplished using 1D-CNN. A study conducted by [10] compared a shallow CNN framework consisting of five layers to existing solutions using both the WISDM dataset and the UCI-HAR dataset. The findings revealed that their CNN model outperformed other CNN-based approaches on the UCI-HAR dataset, achieving an accuracy rate of 94.35%. 1D-CNN was employed to classify human activities based on data collected from the accelerometer, achieving 92.71% accuracy, outperforming the random forest classifier [11]. The utilization of small kernels in CNN convolution operations directly on the temporal dimension of sensor signals enables the detection and capture of localized temporal dependencies [12]. Numerous research studies have investigated the possibility of modifying filters and their kernels in DL. As the filter plays a pivotal role in the development of CNN, employing a predetermined number of filters with varying kernel sizes enables the capture of diverse data aspects in S-HAR [13]. CNNs demonstrate exceptional proficiency in extracting local features from sensor data, but they do not possess memory and do not consider the temporal dependencies present among data records. On the other hand, problems that involve significant temporal dependencies can be effectively tackled using recurrent neural networks (RNNs). Among RNNs, Long Short-Term Memory (LSTM) models demonstrate superior performance in terms of long-term memory for dependencies due to the unique structure of their repeating module [14]. A deep neural network for HAR, which incorporates a CNN featuring diverse kernel dimensions and bi-directional LSTM (BiLSTM), has been introduced as a solution to address the challenges posed by the aforementioned methods [13, 15, 16]. The ICGNet model, proposed by [17], combines the advantages of CNN and GRU to effectively capture both local features and long-term dependencies in multivariate time series data. It offers an end-to-end solution for HAR by directly processing raw data collected from wearable sensors, eliminating the need for manual feature engineering.
The research works reviewed in this paper examine models that incorporate multiple hyperparameters, typically fine-tuned through empirical adjustments. Various methods are available to automate the process of hyperparameter selection. In this review, we discuss seven of these methods as outlined in the subsequent section.
Hyperparameter Optimization Techniques
Achieving optimal performance in CNNs heavily relies on hyperparameter optimization. Numerous techniques have emerged to effectively fine-tune the hyperparameters of CNNs. This section presents an overview of well-known CNN hyperparameter optimization techniques.
The performance of each DL algorithm depends on its training process, which involves several parameters. Setting the value for one parameter may also impact other hyperparameters, making selecting appropriate values to obtain efficient results a challenging task in the training process. Depending on the parameters used in the CNN, it can be categorized into architecture level, layer level, and training level parameters, as depicted in Fig. 1. To obtain an efficient CNN model, it is required to optimize all parameters of architecture, layer, and training level parameters. Trial and error manual approach is one of the methods for optimization of hyperparameters, which is time-consuming and difficult to get optimized hyperparameters. Other commonly used approaches are Metaheuristic optimization methods, grid search, and random search optimization techniques.
Metaheuristic optimization methods have strong local search capabilities, thus helping to avoid the training network from getting stuck in local optima and enhancing the probability of determining the global optimum. The authors of [18] employed metaheuristic algorithms for automatic optimization of CNN hyperparameters, specifically focusing on "Batch Size, No. of kernels and epochs, size of the kernel, and pooling size." However, a limitation of this work is that the authors did not optimize the network with respect to the activation function (AF) and the number of feature extraction and down sampling layers.
In contrast, the authors of [19] utilized particle swarm optimization for optimal feature selection, while [20] proposed a hybrid optimization method by integrating the PSO and artificial bee colony (ABC) methods. Furthermore, the authors of [21] optimized the hyperparameters of 1D CNN using the harmony search optimization method. Similarly, [22] conducted comprehensive experimentation to study the impact of hyperparameters on the recognition rate of residual BiLSTM networks, adjusting the value of each hyperparameter and analyzing the results to select optimal values. In addition, the authors of [23] used grid search, random parameter search (RPS), and Bayesian optimization (BO) techniques to optimize the ANN's hyperparameters. They arrived at the conclusion that RPS or BO optimization techniques were required in order to obtain the ideal hyperparameters.
A self-supervised model was proposed by authors of [24] for optimization of number of convolutional and fully connected (FC) layers of auto CNN. The experiment was conducted with search space {1, 2, 3} and {1, 2} for number convolutional and FC layers respectively. According to their experimental findings and obseravtions, the performance of the CNN was not remarkably influenced by the presence of more than three CNN layers. Additionally, they established the {2, 3} filter size range for the max-pooling layer and the 64–1024 step size for the number of neurons in each FC layer. The researchers also distributed the learning rate from 1e-4 to 1e-2 and set the dropout factor range from 0.1 to 0.6. The Tiny ImageNet, CIFAR-10, and CIFAR-100 datasets were used to test the optimized CNN. A tree structured Parzen estimator Bayesian optimization technique was proposed by the authors of [25] to optimize a number of parameters, such as the number of FC layers, neurons, learning rate, and dropout rate. They tested this approach with SimClr and SWAV on datasets like CIFAR-10, CIFAR-100, and Tiny ImageNet, and the results looked promising in comparison to other methods. Furthermore, this study was expanded by employing Neural Architecture Search (NAS) to optimize the number of parameters [26]. NAS aids in identifying a suitable neural network tailored to specific tasks, thereby reducing the human effort required to discover an optimal architecture for the given task [27].
Most of the optimization research has been conducted on 2D CNNs for computer vision applications. Table 1 depicts the research conducted on the optimization of hyperparameters of DL approaches on HAR datasets. In Table 1, BS represents batch size, #E represents the number of Epochs, #K represents the number of kernels, #P indicates the number of pooling layers, K_S indicates the kernel size, P_S indicates pooling size, AF represents the activation function, OP represents the optimizer function, and DP indicates dropout.
In Table 1, it is evident that researchers have primarily concentrated on optimizing training and layer-level hyperparameters. Only a limited number of research works have shed light on the optimization of other hyperparameters. Nevertheless, there remains a scope for enhancing the optimization of architecture-level hyperparameters to achieve improvement.
Proposed Methodology
Overview of CNN
The objective of the proposed work is the auto-optimization of CNN hyperparameters. A Convolutional Neural Network (CNN) is a specialized deep learning model designed for processing and analyzing visual data, such as images, videos, and time-series data. The CNN architecture consists mainly of four components: Convolution, Pooling, Activation function, and fully connected (FC) layers.
Convolutional layers are fundamental blocks of a CNN. These layers utilize small kernels to scan the input data, extracting relevant features. Feature maps are generated as the kernels slide over the input, highlighting patterns within the data. Dimensions of feature maps obtained from convolutional layers are downsampled by pooling layers. Max and average pooling are commonly used pooling layers. Activation functions are utilized to introduce the non-linearity into the features extracted by convolution. The Rectified Linear Unit (ReLU) is a frequently used activation function in CNNs, replacing negative values with zero. Other than ReLu, many activation functions have been proposed by researchers. Selecting the appropriate activation function is a challenging task. The authors of [35] studied the impact of activation function on the performance of CNN and proposed a more non-linear activation function called OP-Tanish. FC layers process high-level features from the convolutional layers and make final predictions. They resemble dense layers found in traditional neural networks.
This work explores the key aspects of CNNs along with their corresponding hyperparameters, which play a significant role in determining the performance of the CNN architecture. Each element of the CNN is governed by specific hyperparameters, necessitating careful consideration to enhance the network's efficiency. Concerning the convolutional layer, several hyperparameters are involved, including the number of convolutional layers, the quantity of kernels used, kernel sizes, kernel stride, and padding strategy. These hyperparameters are crucial in shaping the feature extraction process within the convolutional layer, thereby influencing the network's ability to detect important patterns in the input data. Similarly, the pooling layer is characterized by its own set of influential hyperparameters, such as the number of pooling layers, the number of kernels, kernel sizes, and kernel stride. These parameters directly affect the degree of downsampling applied to the extracted features, determining the level of abstraction and information retained in subsequent layers. Additionally, FC layers have their own hyperparameters that significantly impact pattern classification. Training of the neural network is also influenced by hyperparameters such as optimization function, activation function, dropout, and loss function. The number of FC layers, the quantity of neurons in each FC layer, and the activation functions utilized are critical factors in determining the network's ability to make accurate predictions and classifications based on learned representations. A thorough examination of these hyperparameters and their effects on various components of the CNN architecture is essential for optimizing performance and achieving exceptional results across diverse applications. This paper proposes the use of Hierarchical Particle Swarm Optimization (H-PSO) for optimizing all these parameters, resulting in an optimized CNN referred to as OPTConvNet.
OPTConvNet—Hierarchical Particle Swarm Optimization for Optimization of CNN
PSO, a metaheuristic stochastic population-based evolutionary optimization algorithm, operates by exploring the search space using a swarm-based approach to locate the optimal solution. Within the swarm, individual particles possess unique velocities and positions as they navigate through the solution search space. The standard PSO algorithm was proposed and developed by Kennedy and Eberhart [34]. PSO functions by emulating a particle swarm navigating through a search space with multiple dimensions. Each individual particle represents a potential solution to the optimization problem, with its position in the search space reflecting a unique combination of hyperparameters. Guided by their personal best positions and the best position discovered by the entire swarm, the particles adjust their movements. This iterative process enables PSO to efficiently explore the search space, ultimately converging towards promising regions that enhance network performance.
The primary concept of PSO is to determine the most effective solution for the particle (Pbest) and to determine the optimal solution for the group (Gbest) by interacting with other particles in the particle group. In each iteration, every particle adjusts its searching direction and speed, while its velocity is adjusted based on its own momentum Pbest and Gbest [33, 38, 39].
For a d-dimensional searching space, the position Pj and velocity Rj for particle j are given by Equations 1 and 2, respectively.
where \({p}_{j}^{d}\) and \({r}_{j}^{d}\) represent position and velocity of particle j. Equations 3, 4, and 5 represent the update of the particle’s velocity and position.
where, w indicates “inertia weight”, max_iteration represents “maximum iterations”, \({r}_{j}^{d}\left(t+1\right)\) represents “velocity of the particle at time t + 1”, c1 and c2 represents positive constants, rand is a “uniform random variable in the interval [0,1]”, \({x}_{j}^{d}\) represents the “optimal position of particle j at iteration t”, \({x}_{g}^{d}\) is the “optimal solution of current group Gbest”, \({p}_{j}^{d}\) represents the “position of particle j in the d-dimension”. Algorithm 1 describes the steps involved in PSO.
The proposed work optimizes the parameters in two hierarchical levels. To optimize the hyperparameters of CNN, they are organized into architecture, layer, and training levels. The structure of the hyperparameter optimization is shown in Fig. 2. Architectural parameters are considered at PSO level 1, layer-level parameters are considered at PSO level 2, and training hyperparameters are optimized at the last level manually. The structure of the H-PSO is inspired by the PSO structure discussed in [33]. In addition to the optimization discussed in [33], this work optimizes the activation function in convolutional and FC layers, and the optimization of training parameters. Figure 3 depicts the H-PSO flow diagram. The process starts with level -1, where a swarm [P1, P2… Pm] is initialized with random values for architectural hyperparameters as shown in Fig. 2. At hierarchy level-2, multiple swarms are initialized, with each swarm comprising m particles. For every particle in the level-1 swarm, a corresponding swarm is initialized at the second level. The dimensions of particles in this second-level swarm are determined by the number of parameters specified in level-1. Each particle in the second-level swarm is randomly initialized with values corresponding to the layer-level hyperparameters shown in Fig. 2. The CNN’s last FC layer is configured by default with the SoftMax activation function. Initially, the CNN is trained with a fixed batch size, Adam optimizer function, and categorical cross-entropy loss function. The accuracy of each particle at hierarchy level-2 is computed using the SoftMax function of the last FC layer, and the velocity, positions of the particle, and Pbest and gbest are updated as per the algorithm shown in Algorithm 1. In this model, the fitness evaluation of each particle is done by the softmax layer of CNN. The best set of hyperparameters for a CNN, which leads to higher accuracy compared to other configurations, represents the optimal solution when contrasted with a particle having parameters resulting in lower accuracy. Evaluation of fitness is referred from the research work [33] and detailed architecture is depicted in Fig. 4. Number of convolutional, pooling and fully connected layers are parameters considered at swarm level 1 with m number of particles. The second level swarm consists of seven hyperparameters as shown in Fig. 4.
The calculation of a particle's fitness occurs at level 2, based on a set of parameters indicated by (Pi, Pij) as outlined in Eq. (6). In this context, Pi denotes a particle at the first swarm level, while Pij represents a particle at the second swarm level.
where \({P}_{i}=(NC, NP, NFC)\) \({P}_{ij}=(\) NK_C, KS_C, SS_C, AF_C, KS_P, SS_P, AF_FC).
Where NC- number of convolutional layers, NP- number of pooling layers, NFC- number of fully connected layers, NK_C, KS_C, SS_C, AF_C represents number of kernels, kernel size, stride size and activation function in convolutional layer respectively.
KS_P, SS_P represents size kernel and stride in pooling layer respectively. AF_FC represents activation function except last layer in fully connected layer.
The velocity (Rij) ith swarm of jth particle at level-2 is computed by Eq. 7.
The new position of the particle Pij at level-2 is expressed in Eq. 8.
The velocity of the ith particle of level -1 swarm is expressed in Eq. 9.
The new position of the ith particle which updates the number of layers is given in Eq. 10.
The fitness of particle is contingent upon both its internal composition of layers and the globally best hyperparameters it acquired from its corresponding particle's higher level within the ith swarm, as illustrated in Eq. 11. Ultimately, the solution converges towards the optimal best solution gbest, which is determined as the maximum among gbest1, gbest2, ……., gbestm.
Results and Analysis
Determining the optimal hyperparameters of a CNN is one of the challenging tasks in CNN applications. The manual selection of hyperparameters requires expertise in the domain, and the hyperparameters of a CNN may vary from one dataset to another. It is essential to automate the process of hyperparameter optimization to overcome the limitations of manual optimization. This work utilized PSO in a hierarchical manner to determine the architecture and layer-level hyperparameters of the CNN.
Benchmark Datasets
This work carried the optimization of CNN using H-PSO on the HAR benchmark datasets given in Table 2. The Table 2 describes the datasets characteristics and data sampling of UCI-HAR, Opportunity, PAMAP2 and Daphnet FOG datasets.
Results and Analysis
The proposed work evaluated the H-PSO optimized 1D- CNN, called OPTConvNet, on S-HAR benchmark datasets. To optimize the 1D-CNN for S-HAR, it is necessary to initialize the range of values of architecture, layer, and training level parameters as well as initial values of H- PSO parameters manually. Tables 3 and 4 depicts the range of CNN hyperparameters search space, and PSO parameters search space used in this experiment respectively. The evaluation of the proposed method encompasses datasets such as UCI-HAR, Opportunity, PAMPA2, and Daphnet Gait. During the training phase, 80% of the dataset is utilized, while the remaining 20% is used for testing purpose.
The ranges of CNN hyperparameters in this study were carefully selected based on observations. The custom CNN architecture for the datasets listed in Table 2 is discussed in [34]. The architecture incorporates a maximum of three convolutional layers, a configuration also utilized by the authors of [40] for limb activity recognition. Experimentation conducted by the authors of [41] revealed that CNNs with three convolutional layers outperform those with six convolutional layers. Taking these findings into account, the range of convolutional and pooling layers was fixed at 1–4. Analysis of popular CNN-based architectures suggests that the number of fully connected (FC) layers can vary between 1 and 3 to design a classification model with better accuracy [42, 43]. Additionally, a study conducted in [44] suggests that the maximum required number of FC layers for a deep architecture is 3. Therefore, in this study, we consider {1, 2, 3} as the search space for the number of FC layers.
Table 4 shows, that the number of particles at architecture level parameters is initialized to 4 and the number of parameters at the architecture level is 3 (NC, NP, and NFC). Hence, the size of the swarm particle will be 4 X 3. Each particle of the architecture level is extended to the layer and training level parameters and explored to the possible parameters. The number of swarm particles initialized at layer and training level is 4 and the number of parameters to be optimized is 8 (number of kernels, size of the kernels in convolutional layer, stride of convolutional, Kernel size of pooling layer, stride of pooling layer, AF of convolutional, AF of dense layer and number of neurons in dense layer). Hence, the size of the swarm particle will be 4 X 8. To obtain the best solution, the PSO algorithm evaluates 256 possible configurations, 1D-CNN and this has been evaluated on benchmark datasets. By default, the CNN softmax activation function at the output layer, and adam optimization function. Table 4 shows the optimal parameters for all benchmark datasets shown in Table 5.
The proposed method of OPTConvet using H-PSO obtained the outstanding results on UCI-HAR, Opportunity, PAMPA2 and Daphnet Gait datasets with accuracy of 99.72%, 99.82%, 96.03% and 98.52% respectively as shown in Fig. 5.
The class-wise accuracies of the UCI-HAR, PAMAP2, Daphnet Gait, and Opportunity benchmark datasets using OPTConvNet are shown in Figs. 6, 7, 8, 9, respectively.
The Fig. 6 displays the normalized confusion matrix for the OPTConvNet model, which employs H-PSO for CNN hyperparameter optimization on the UCI-HAR dataset. The classifier correctly categorized all instances of the class "LAYING”, “SITTING”, “STANDING” and “WALKING_DOWNSTAIRS”. The model classifies the instances of "WALKING” and "WALKING UPSTAIRS” with an accuracy of 99%. The model misclassified the 1% instances of class “WALKING” as “WALKING UPSTAIRS” and 1% instances of class “WALKING UPSTAIRS” as misclassified as “WALKING”.
Figure 7 illustrates the confusion matrix for the PAMPA2 dataset. The OPTConvnet classifies instances of "Ironing" and "Walking" with a high accuracy of 98%. However, the model encounters confusion when classifying 2% of instances within these categories. Specifically, 2% of the total instances for both "Ironing" and "Walking" classes are misclassified as "Descending stairs" and "Running," respectively.
From Fig. 7, it can be observed that 97% of instances of the class “Ascending stairs” are correctly classified, and the remaining 3% of instances are erroneously classified as “Running”. The accuracy of classifying instances within the categories of "Cycling", “Lying”, "Rope jumping”, “Sitting”, and "Standing" stands at an impressive 96%. However, the model encounters challenges when handling the 4% of instances associated with the "Cycling" class. Within this 4%, 1% of cases are erroneously classified as "Ascending stairs," and the remaining 3% are mistakenly categorized as "Rope jumping." The confusion rate of classifying the instances of “Cycling” and “Rope jumping” is relatively higher, as 3% of “Rope jumping” instances and “Cycling” instances are misclassified as “Cycling” and “Rope jumping,” respectively. The model has more errors while classifying the static activities namely “Lying”, “Sitting”, and “Standing”. 4% of instances of “Lying” and “Sitting” are misclassified as “Sitting” and “Lying,” respectively. 4% of instances of the class “Standing” are erroneously categorized as “Sitting”. “Nordic walking” and “Descending stairs” obtained 95% classification accuracy. Instances of the class “Running” are classified with an accuracy of 94%, and the remaining 6% of instances are misclassified as “Walking” and “Descending Stairs”.
Figure 8 depicts the class-wise accuracy of the Daphnet Gait dataset using OPTConvNet. The classifier correctly classifies the instances of “Freeze” and “No Freeze” with an accuracy of 98% and 99%, respectively. Figure 9 depicts the class-wise accuracy of the Opportunity dataset using OPTConvNet.
Figures 10, 11, 12, 13, depicts the accuracy / convergence plots of UCI-HAR, PAMPA2, Daphnet Gait and Opportunity dataset respectively. The OPTConvnet converged at 108th epoch with best accuracy of 99.72% on UCI-HAR dataset. On PAMPA2 dataset the converge occurred at 80th epoch with best accuracy of 96.03%. On Daphnet Gait dataset, the converge occurred at 104th epoch with best accuracy of 98.52%. and convergence occurred at 104th epoch with best accuracy of 99.82%.
Conclusion
The effectiveness of each deep learning algorithm is intricately linked to its training process, a complex interplay of numerous parameters. It is imperative to recognize that altering the value of one parameter can significantly influence the behavior of other hyperparameters. Therefore, the careful selection of optimal parameter values emerges as a formidable challenge in the pursuit of achieving efficiency in the training process. Hyperparameter optimization is necessary and where optimal parameters may vary from one dataset to another. Optimization of hyperparameters using trial and error is time consuming and requires more human intervention. Many other optimization techniques discussed in the literature are optimizes either network parameter or layers parameters but not both. The proposed approach optimizes the architecture and layer level architecture of CNN. It makes the optimization process easier for the user who don’t have the depth knowledge of the optimization and with less huma intervention. The proposed OPTConvNet using hierarchical particle swarm optimization has obtained good accuracy of 99.72%, 96.03%, 98.52% and 99.82% on benchmark datasets namely UCI-HAR, PAMPA2, Daphnet Gait and Opportunity datasets respectively. This work can be further extended to by including more number of hyperparameters for optimizations and exploring variants of PSO and other heuristic algorithms.
Availability of Data and Materials
This work utilized publicly available datasets to conduct the simulation.
References
Sarkar A, Hossain SKS, Sarkar R. Human activity recognition from sensor data using spatial attention-aided CNN with genetic algorithm. Neural Comput Appl. 2023;35:5165–91. https://doi.org/10.1007/s00521-022-07911-0.
Wang H, Zhao J, Li J, Tian L, Tu P, Cao T, An Y, Wang K, Li S. Wearable sensor-based human activity recognition using hybrid deep learning techniques. Secur Commun Netw. 2020. https://doi.org/10.1155/2020/2132138.
Kaur G, Sinha R, Tiwari PK, Yadav SK, Pandey P, Raj R, Vashisth A, Rakhra M. Face mask recognition system using CNN model. Neurosci Inform. 2022. https://doi.org/10.1016/j.neuri.2021.100035.
Cruciani F, Vafeiadis A, Nugent C, et al. Feature learning for human activity recognition using convolutional neural networks. CCF Trans Pervasive Comp Interact. 2020;2:18–32. https://doi.org/10.1007/s42486-020-00026-2.
Gholamrezaii M, AlModarresi S. A time-efficient convolutional neural network model in human activity recognition. Multimed Tools Appl. 2021;80:19361–76. https://doi.org/10.1007/s11042-020-10435-1.
Han C, Zhang L, Tang Y, Huang W, Min F, He J. Human activity recognition using wearable sensors by heterogeneous convolutional neural networks. Expert Syst Appl. 2022. https://doi.org/10.1016/j.eswa.2022.116764.
Zela A, Klein A, Falkner S, Hutter F, Towards automated deep learning: efficient joint neural architecture and hyperparameter search, preprint, arXiv:180706906, 2018.
Ankalaki S, Thippeswamy MN. A customized 1D-CNN approach for sensor-based human activity recognition. Int J Adv Technol Eng Expl. 2022;9(87):216–31. https://doi.org/10.19101/IJATEE.2021.874828.
Cho H, Yoon SM. Divide and conquer-based 1D CNN human activity recognition using test data sharpening. Sensors. 2018;18(4):1055. https://doi.org/10.3390/s18041055.
Ignatov A. Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput. 2018. https://doi.org/10.1016/j.asoc.2017.09.027.
Chen K, Zhang D, Yao L, Guo B, Yu Z, Liu Y (2020). Deep learning for sensor-based human activity recognition: overview, challenges and opportunities, 37(4). http://arxiv.org/abs/2001.07416
Lee SM, Yoon SM, Cho H, Human activity recognition from accelerometer data using Convolutional Neural Network, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Korea (South), 2017, pp. 131–134, doi: https://doi.org/10.1109/BIGCOMP.2017.7881728.
Nafea O, Abdul W, Muhammad G, Alsulaiman M. Sensor-based human activity recognition with spatio-temporal deep Learning. Sensors. 2021;21(6):2141. https://doi.org/10.3390/s21062141.
Abbaspour S, Fotouhi F, Sedaghatbaf A, Fotouhi H, Vahabi M, Linden M. A comparative analysis of hybrid deep learning models for human activity recognition. Sensors. 2020;20(19):5707. https://doi.org/10.3390/s20195707.
Luwe YJ, Lee CP, Lim KM. Wearable sensor-based human activity recognition with hybrid deep learning model. Informatics. 2022;9(3):56. https://doi.org/10.3390/informatics9030056.
Mutegeki, R., Han, D.S., A CNN-LSTM approach to human activity recognition, In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 2020, pp. 362–366, doi: https://doi.org/10.1109/ICAIIC48513.2020.9065078.
Dua N, Singh SN, Semwal VB, et al. Inception inspired CNN-GRU hybrid network for human activity recognition. Multimed Tools Appl. 2023;82:5369–403. https://doi.org/10.1007/s11042-021-11885-x.
Raziani S, Azimbagirad M. Deep CNN hyperparameter optimization algorithms for sensor-based human activity recognition. Neurosci Inf. 2022;100078:1–8.
Wang H, Ke R, Li J, An Y, Wang K, Lei Yu. A correlation-based binary particle swarm optimization method for feature selection in human activity recognition. Int J Distrib Sens Netw. 2018;14(4):1–17.
Ozcan T, Basturk A. Human action recognition with deep learning and structural optimization using a hybrid heuristic algorithm. Clust Comput. 2020;23:2847–60.
Kim S-H, Geem ZW, Han G-T. Hyperparameter optimization method based on harmony search algorithm to improve performance of 1D CNN human respiration pattern recognition system. Sensors. 2020;20:1–19.
Li Y, Wang L. “Human Activity Recognition Based on Residual Network and BiLSTM,” Sensors, Vol. 22, no.2, 2022
Suto J. The effect of hyperparameter search on artificial neural network in human activity recognition. Open Comput Sci. 2021;11(1):411–22.
Kishore J, Mukherjee S. Auto CNN classifier based on knowledge transferred from self-supervised model. Appl Intell. 2023;53:22086–104. https://doi.org/10.1007/s10489-023-04598-1.
Kishore J, Mukherjee S. Impact of autotuned fully connected layers on performance of self-supervised models for image classification. Mach Intell Res. 2024. https://doi.org/10.1007/s11633-023-1435-7.
Kishore J, Mukherjee S. Minimizing parameter overhead in self supervised models for target task. IEEE Trans Artif Intell. 2024. https://doi.org/10.1109/TAI.2023.3322394.
Elsken T, Metzen JH, Hutter F, et al. neural architecture search: a survey. J Mach Learn Res. 2019;20(55):1–21.
Talaat FM, Gamel SA. RL based hyper-parameters optimization algorithm (ROA) for convolutional neural network. J Ambient Intell Human Comput. 2022. https://doi.org/10.1007/s12652-022-03788-y.
Zatarain Cabada R, Rodriguez Rangel H, Barron Estrada ML, et al. Hyperparameter optimization in CNN for learning-centered emotion recognition for intelligent tutoring systems. Soft Comput. 2020;24:7593–602. https://doi.org/10.1007/s00500-019-04387-4F.
Erkan U, Toktas A, Ustun D. Hyperparameter optimization of deep CNN classifier for plant species identification using artificial bee colony algorithm. J Ambient Intell Human Comput. 2023;14:8827–38. https://doi.org/10.1007/s12652-021-03631-w.
Lee W-Y, Park S-M, Sim K-B. Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm. Optik. 2018. https://doi.org/10.1016/j.ijleo.2018.07.044.
Lin M, Teng S, Chen G, et al. Application of convolutional neural networks based on Bayesian optimization to landslide susceptibility mapping of transmission tower foundation. Bull Eng Geol Environ. 2023;82:51. https://doi.org/10.1007/s10064-023-03069-8.
Singh P, Chaudhury S, Panigrahi BK. Hybrid MPSO-CNN: multi-level particle swarm optimized hyperparameters of convolutional neural network. Swarm Evol Comput. 2021;63:100863. https://doi.org/10.1016/j.swevo.2021.100863.
Ankalaki S, Thippeswamy M. The Customized 1D-CNN for sensor-based human activity recognition using various benchmark datasets. J Eng Sci Technol. 2022;17(4):2315–35.
Ankalaki S, Thippeswamy MN. A novel optimized parametric hyperbolic tangent swish activation function for 1D-CNN: application of sensor-based human activity recognition and anomaly detection. Multimed Tools Appl. 2023. https://doi.org/10.1007/s11042-023-15766-3.
Shen Z, Viscarra Rossel RA. Automated spectroscopic modelling with optimised convolutional neural networks. Sci Rep. 2021;11:208. https://doi.org/10.1038/s41598-020-80486-9.
Qi X, Xu B. Hyperparameter optimization of neural networks based on Q-learning. SIViP. 2023;17:1669–76. https://doi.org/10.1007/s11760-022-02377-y.
Kennedy, J, Eberhart, R, Particle swarm optimization”, Proceedings of IEEE International Conference on Neural Network, 1995, pp. 1942–1948
Mei-Ling H, Yueh-Ching C. Combining a gravitational search algorithm, particle swarm optimization, and fuzzy rules to improve the classification performance of a feed-forward neural network. Comput Methods Progr Biomed. 2019. https://doi.org/10.1016/j.cmpb.2019.105016.
Vijayvargiya A, Khimraj, Kumar R, et al. Voting-based 1D CNN model for human lower limb activity recognition using sEMG signal. Phys Eng Sci Med. 2021;44:1297–309. https://doi.org/10.1007/s13246-021-01071-6.
Huang W, Zhang L, Gao W, Min F, He J. Shallow convolutional neural networks for human activity recognition using wearable sensors. IEEE Trans Instrum Meas. 2021;70:1–11. https://doi.org/10.1109/TIM.2021.3091990.
Krizhevsky A, Sutskever I, Hinton GE, ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp.1097–1105, 2012.
Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition”, [Online], Available: https://arxiv.org/abs/1409.1556, 2014
Basha SHS, Vinakota SK, Pulabaigari V, Mukherjee S, Dubey SR. AutoTune: Automatically tuning convolutional neural networks for improved transfer learning. Neural Netw. 2021;133:112–22. https://doi.org/10.1016/j.neunet.2020.10.009.
Funding
Open access funding provided by Manipal Academy of Higher Education, Manipal. Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ankalaki, S., Thippeswamy, M.N. Optimized Convolutional Neural Network Using Hierarchical Particle Swarm Optimization for Sensor Based Human Activity Recognition. SN COMPUT. SCI. 5, 447 (2024). https://doi.org/10.1007/s42979-024-02794-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02794-5