Introduction

The aim of Sensor-based Human Activity Recognition (S-HAR) is to identify an individual's activities through the analysis and interpretation of data gathered from multiple sensors. Lately, the accuracy and efficiency of HAR have been greatly improved by applying deep learning techniques, especially on a variety of benchmark datasets. As the use of smartphones and smartwatches has increased, HAR research has shifted its emphasis from depending solely on body-worn sensors to utilizing the built-in sensors (like accelerometers and gyroscopes) in mobile devices to gather signal data. HAR plays a vital role in various critical domains, including healthcare, daily activity monitoring, and elderly care, among others. The collection of data necessary for HAR tasks can be achieved through the utilization of both cameras and sensors. Collecting data from visual sensors/cameras has certain limitations that should be considered. First, a camera must be installed in a designated spot, and the user must constantly stay in the camera's field of view. Second, because of privacy infringement concerns, using cameras in specific areas—like bedrooms or private spaces—is prohibited. This restriction is removed by S-HAR, where sensors are a more practical and affordable alternative to cameras for gathering HAR data.

Nowadays, the usage of wearable and inertial sensors in smart devices has emerged as a promising approach for acquisition of human activity data. This is due to their user-friendly nature, smaller size, and non-intrusive characteristics. Furthermore, these sensors provide benefits of minimal or no installation cost and mimimal energy consumption. The widespread usage of smartphones and smartwatches has made them convenient choices for HAR tasks, as they come equipped with a range of built-in sensors, including accelerometers, gyroscopes, magnetometers, compasses, and more [1].

Activity recognition research has traditionally relied on machine learning (ML) algorithms like decision trees, support vector machines (SVM), Naïve Bayes (NB) and Hidden Markov Models (HMM) to achieve favourable recognition rates in controlled experimental settings with limited labelled data. Nonetheless, the accuracy of these approaches is depending upon the quality and extent of manual feature extraction. Handcrafted feature extraction approaches can extract only superficial features. Because of these constraints, activity recognition using traditional classification methods face limitations in terms of classification accuracy and model generalization [2]. The limitation of manual feature extraction is overcome by Deep Learning (DL) approaches. A DL technique like CNN has the potential to greatly simplify the process of feature selection in traditional methods. It achieves this by autonomously extracting abstract features through multiple layers of hidden units [3].

The popularity of CNN in HAR has been well-documented in the literature [4,5,6], attributed to its ability to capture local dependencies of active signals and maintain feature scale invariance. However, CNNs involve the utilization of numerous hyperparameters that affect their performance. The performance of CNNs relies on various hyperparameters, including the number of layers, neurons, batch size, epochs, dropout rate, strides, and filter shape. Since each hyperparameter has a different impact on the CNN model, the key challenge is determining the best set to use [7]. Acquiring an optimal set of hyperparameters poses a challenge due to their time-consuming nature and the requirement for expertise. Manually optimizing the hyperparameters of a CNN is time-consuming, and the impact of adjusting one hyperparameter may influence the others due to the trade-off among them. Hence, it becomes crucial to investigate methods for identifying an optimal set of hyperparameters that deliver efficient performance. This study focuses on the optimization of CNN hyperparameters using Hierarchical Particle Swarm Optimization Technique (H-PSO).

The contributions of this research paper are as follows:

  • This research provides an overview of the latest techniques in the field, specifically tailored for researchers seeking to apply CNN models to their own datasets.

  • It proposes the H-PSO, which optimizes the hyperparameters of CNN at various levels. The H-PSO is formulated such that it optimizes both architecture level parameters (number of convolutional, pooling and fully connected layer) and hyperparameters at each layer.

  • The novelty of this work lies in the fact that the proposed H-PSO optimizes Architecture level, Layer level, and Training level hyperparameters simultaneously.

  • The paper conducts a performance analysis of the proposed H-PSO with state-of-the-art optimization methods.

The rest of this paper is organized as follows:

Sect. "Sensor-based Human Activity Recognition (S-HAR)" presents the related work on sensor based HAR, while various state-of-the-art CNN hyperparameter optimization techniques and their limitations are discussed in Sect. "Hyperparameter optimization techniques". Sect. "Proposed Methodology" outlines the methodology adopted. The detailed description of the methodology is discussed in subsections of Sect. "Proposed Methodology". Specifically, subSect. "Overview of CNN" provides an overview of CNN and the hyperparameters to be optimized, while subsection 3.3 discusses the architecture of the proposed H-PSO for hyperparameter optimization. Results and comparative analysis are discussed in Sect. "Results and Analysis", followed by the conclusion, acknowledgment, and references.

Related Work

Sensor-Based Human Activity Recognition (S-HAR)

CNN is one of the significant approaches for recognizing human activities from sensor data. Many researchers have employed CNN and its variant models for HAR. The work [8] discusses the recognition of human locomotion activities using a customized shallow CNN, concluding that the customized 1D-CNN outperforms traditional ML algorithms, namely RBF-SVM and Random Forest. In [9], researchers presented conquer-based classification of human activities accomplished using 1D-CNN. A study conducted by [10] compared a shallow CNN framework consisting of five layers to existing solutions using both the WISDM dataset and the UCI-HAR dataset. The findings revealed that their CNN model outperformed other CNN-based approaches on the UCI-HAR dataset, achieving an accuracy rate of 94.35%. 1D-CNN was employed to classify human activities based on data collected from the accelerometer, achieving 92.71% accuracy, outperforming the random forest classifier [11]. The utilization of small kernels in CNN convolution operations directly on the temporal dimension of sensor signals enables the detection and capture of localized temporal dependencies [12]. Numerous research studies have investigated the possibility of modifying filters and their kernels in DL. As the filter plays a pivotal role in the development of CNN, employing a predetermined number of filters with varying kernel sizes enables the capture of diverse data aspects in S-HAR [13]. CNNs demonstrate exceptional proficiency in extracting local features from sensor data, but they do not possess memory and do not consider the temporal dependencies present among data records. On the other hand, problems that involve significant temporal dependencies can be effectively tackled using recurrent neural networks (RNNs). Among RNNs, Long Short-Term Memory (LSTM) models demonstrate superior performance in terms of long-term memory for dependencies due to the unique structure of their repeating module [14]. A deep neural network for HAR, which incorporates a CNN featuring diverse kernel dimensions and bi-directional LSTM (BiLSTM), has been introduced as a solution to address the challenges posed by the aforementioned methods [13, 15, 16]. The ICGNet model, proposed by [17], combines the advantages of CNN and GRU to effectively capture both local features and long-term dependencies in multivariate time series data. It offers an end-to-end solution for HAR by directly processing raw data collected from wearable sensors, eliminating the need for manual feature engineering.

The research works reviewed in this paper examine models that incorporate multiple hyperparameters, typically fine-tuned through empirical adjustments. Various methods are available to automate the process of hyperparameter selection. In this review, we discuss seven of these methods as outlined in the subsequent section.

Hyperparameter Optimization Techniques

Achieving optimal performance in CNNs heavily relies on hyperparameter optimization. Numerous techniques have emerged to effectively fine-tune the hyperparameters of CNNs. This section presents an overview of well-known CNN hyperparameter optimization techniques.

The performance of each DL algorithm depends on its training process, which involves several parameters. Setting the value for one parameter may also impact other hyperparameters, making selecting appropriate values to obtain efficient results a challenging task in the training process. Depending on the parameters used in the CNN, it can be categorized into architecture level, layer level, and training level parameters, as depicted in Fig. 1. To obtain an efficient CNN model, it is required to optimize all parameters of architecture, layer, and training level parameters. Trial and error manual approach is one of the methods for optimization of hyperparameters, which is time-consuming and difficult to get optimized hyperparameters. Other commonly used approaches are Metaheuristic optimization methods, grid search, and random search optimization techniques.

Fig. 1
figure 1

Categorization of Hyperparameters

Metaheuristic optimization methods have strong local search capabilities, thus helping to avoid the training network from getting stuck in local optima and enhancing the probability of determining the global optimum. The authors of [18] employed metaheuristic algorithms for automatic optimization of CNN hyperparameters, specifically focusing on "Batch Size, No. of kernels and epochs, size of the kernel, and pooling size." However, a limitation of this work is that the authors did not optimize the network with respect to the activation function (AF) and the number of feature extraction and down sampling layers.

In contrast, the authors of [19] utilized particle swarm optimization for optimal feature selection, while [20] proposed a hybrid optimization method by integrating the PSO and artificial bee colony (ABC) methods. Furthermore, the authors of [21] optimized the hyperparameters of 1D CNN using the harmony search optimization method. Similarly, [22] conducted comprehensive experimentation to study the impact of hyperparameters on the recognition rate of residual BiLSTM networks, adjusting the value of each hyperparameter and analyzing the results to select optimal values. In addition, the authors of [23] used grid search, random parameter search (RPS), and Bayesian optimization (BO) techniques to optimize the ANN's hyperparameters. They arrived at the conclusion that RPS or BO optimization techniques were required in order to obtain the ideal hyperparameters.

A self-supervised model was proposed by authors of [24] for optimization of number of convolutional and fully connected (FC) layers of auto CNN. The experiment was conducted with search space {1, 2, 3} and {1, 2} for number convolutional and FC layers respectively. According to their experimental findings and obseravtions, the performance of the CNN was not remarkably influenced by the presence of more than three CNN layers. Additionally, they established the {2, 3} filter size range for the max-pooling layer and the 64–1024 step size for the number of neurons in each FC layer. The researchers also distributed the learning rate from 1e-4 to 1e-2 and set the dropout factor range from 0.1 to 0.6. The Tiny ImageNet, CIFAR-10, and CIFAR-100 datasets were used to test the optimized CNN. A tree structured Parzen estimator Bayesian optimization technique was proposed by the authors of [25] to optimize a number of parameters, such as the number of FC layers, neurons, learning rate, and dropout rate. They tested this approach with SimClr and SWAV on datasets like CIFAR-10, CIFAR-100, and Tiny ImageNet, and the results looked promising in comparison to other methods. Furthermore, this study was expanded by employing Neural Architecture Search (NAS) to optimize the number of parameters [26]. NAS aids in identifying a suitable neural network tailored to specific tasks, thereby reducing the human effort required to discover an optimal architecture for the given task [27].

Most of the optimization research has been conducted on 2D CNNs for computer vision applications. Table 1 depicts the research conducted on the optimization of hyperparameters of DL approaches on HAR datasets. In Table 1, BS represents batch size, #E represents the number of Epochs, #K represents the number of kernels, #P indicates the number of pooling layers, K_S indicates the kernel size, P_S indicates pooling size, AF represents the activation function, OP represents the optimizer function, and DP indicates dropout.

Table 1 Current Advancements in Hyperparameter Optimization Techniques: An Overview of State-of-the-Art Methods and Approaches

In Table 1, it is evident that researchers have primarily concentrated on optimizing training and layer-level hyperparameters. Only a limited number of research works have shed light on the optimization of other hyperparameters. Nevertheless, there remains a scope for enhancing the optimization of architecture-level hyperparameters to achieve improvement.

Proposed Methodology

Overview of CNN

The objective of the proposed work is the auto-optimization of CNN hyperparameters. A Convolutional Neural Network (CNN) is a specialized deep learning model designed for processing and analyzing visual data, such as images, videos, and time-series data. The CNN architecture consists mainly of four components: Convolution, Pooling, Activation function, and fully connected (FC) layers.

Convolutional layers are fundamental blocks of a CNN. These layers utilize small kernels to scan the input data, extracting relevant features. Feature maps are generated as the kernels slide over the input, highlighting patterns within the data. Dimensions of feature maps obtained from convolutional layers are downsampled by pooling layers. Max and average pooling are commonly used pooling layers. Activation functions are utilized to introduce the non-linearity into the features extracted by convolution. The Rectified Linear Unit (ReLU) is a frequently used activation function in CNNs, replacing negative values with zero. Other than ReLu, many activation functions have been proposed by researchers. Selecting the appropriate activation function is a challenging task. The authors of [35] studied the impact of activation function on the performance of CNN and proposed a more non-linear activation function called OP-Tanish. FC layers process high-level features from the convolutional layers and make final predictions. They resemble dense layers found in traditional neural networks.

This work explores the key aspects of CNNs along with their corresponding hyperparameters, which play a significant role in determining the performance of the CNN architecture. Each element of the CNN is governed by specific hyperparameters, necessitating careful consideration to enhance the network's efficiency. Concerning the convolutional layer, several hyperparameters are involved, including the number of convolutional layers, the quantity of kernels used, kernel sizes, kernel stride, and padding strategy. These hyperparameters are crucial in shaping the feature extraction process within the convolutional layer, thereby influencing the network's ability to detect important patterns in the input data. Similarly, the pooling layer is characterized by its own set of influential hyperparameters, such as the number of pooling layers, the number of kernels, kernel sizes, and kernel stride. These parameters directly affect the degree of downsampling applied to the extracted features, determining the level of abstraction and information retained in subsequent layers. Additionally, FC layers have their own hyperparameters that significantly impact pattern classification. Training of the neural network is also influenced by hyperparameters such as optimization function, activation function, dropout, and loss function. The number of FC layers, the quantity of neurons in each FC layer, and the activation functions utilized are critical factors in determining the network's ability to make accurate predictions and classifications based on learned representations. A thorough examination of these hyperparameters and their effects on various components of the CNN architecture is essential for optimizing performance and achieving exceptional results across diverse applications. This paper proposes the use of Hierarchical Particle Swarm Optimization (H-PSO) for optimizing all these parameters, resulting in an optimized CNN referred to as OPTConvNet.

OPTConvNet—Hierarchical Particle Swarm Optimization for Optimization of CNN

PSO, a metaheuristic stochastic population-based evolutionary optimization algorithm, operates by exploring the search space using a swarm-based approach to locate the optimal solution. Within the swarm, individual particles possess unique velocities and positions as they navigate through the solution search space. The standard PSO algorithm was proposed and developed by Kennedy and Eberhart [34]. PSO functions by emulating a particle swarm navigating through a search space with multiple dimensions. Each individual particle represents a potential solution to the optimization problem, with its position in the search space reflecting a unique combination of hyperparameters. Guided by their personal best positions and the best position discovered by the entire swarm, the particles adjust their movements. This iterative process enables PSO to efficiently explore the search space, ultimately converging towards promising regions that enhance network performance.

The primary concept of PSO is to determine the most effective solution for the particle (Pbest) and to determine the optimal solution for the group (Gbest) by interacting with other particles in the particle group. In each iteration, every particle adjusts its searching direction and speed, while its velocity is adjusted based on its own momentum Pbest and Gbest [33, 38, 39].

For a d-dimensional searching space, the position Pj and velocity Rj for particle j are given by Equations 1 and 2, respectively.

$${P}_{j}=\left({p}_{j}^{1},{p}_{j}^{2},\dots ,{p}_{j}^{d}\right)\quad for \quad j=\mathrm{1,2},\dots ., N$$
(1)
$${R}_{j}=\left({r}_{j}^{1},{r}_{j}^{2},\dots ,{r}_{j}^{d}\right) \quad for\quad j=\mathrm{1,2},\dots ., N$$
(2)

where \({p}_{j}^{d}\) and \({r}_{j}^{d}\) represent position and velocity of particle j. Equations 3, 4, and 5 represent the update of the particle’s velocity and position.

$${\text{w}} = {\text{w}} + 0.5 \cdot e^{ - \left| {(p^{i - 1} - p_{best}^{i - 1} } \right|}$$
(3)
$${{\text{r}}}_{{\text{j}}}^{{\text{d}}}\left({\text{t}}+1\right)={{\text{wr}}}_{{\text{j}}}^{{\text{d}}}\left({\text{t}}\right)+{{\text{c}}}_{1}\times \mathrm{rand }\times \left({{\text{x}}}_{{\text{j}}}^{{\text{d}}}-{{\text{p}}}_{{\text{j}}}^{{\text{d}}}({\text{t}})\right)+{{\text{c}}}_{2}\times {\text{rand}}\times \left({{\text{x}}}_{{\text{g}}}^{{\text{d}}}-{{\text{p}}}_{{\text{j}}}^{{\text{d}}}({\text{t}})\right)$$
(4)
$${{\text{p}}}_{{\text{j}}}^{{\text{d}}}\left({\text{t}}+1\right)={{\text{p}}}_{{\text{j}}}^{{\text{d}}}\left({\text{t}}\right)+{{\text{r}}}_{{\text{j}}}^{{\text{d}}}\left({\text{t}}+1\right)$$
(5)

where, w indicates “inertia weight”, max_iteration represents “maximum iterations”, \({r}_{j}^{d}\left(t+1\right)\) represents “velocity of the particle at time t + 1”, c1 and c2 represents positive constants, rand is a “uniform random variable in the interval [0,1]”, \({x}_{j}^{d}\) represents the “optimal position of particle j at iteration t”, \({x}_{g}^{d}\) is the “optimal solution of current group Gbest”, \({p}_{j}^{d}\) represents the “position of particle j in the d-dimension”. Algorithm 1 describes the steps involved in PSO.

figure a

PSO optimization algorithm missing

The proposed work optimizes the parameters in two hierarchical levels. To optimize the hyperparameters of CNN, they are organized into architecture, layer, and training levels. The structure of the hyperparameter optimization is shown in Fig. 2. Architectural parameters are considered at PSO level 1, layer-level parameters are considered at PSO level 2, and training hyperparameters are optimized at the last level manually. The structure of the H-PSO is inspired by the PSO structure discussed in [33]. In addition to the optimization discussed in [33], this work optimizes the activation function in convolutional and FC layers, and the optimization of training parameters. Figure 3 depicts the H-PSO flow diagram. The process starts with level -1, where a swarm [P1, P2… Pm] is initialized with random values for architectural hyperparameters as shown in Fig. 2. At hierarchy level-2, multiple swarms are initialized, with each swarm comprising m particles. For every particle in the level-1 swarm, a corresponding swarm is initialized at the second level. The dimensions of particles in this second-level swarm are determined by the number of parameters specified in level-1. Each particle in the second-level swarm is randomly initialized with values corresponding to the layer-level hyperparameters shown in Fig. 2. The CNN’s last FC layer is configured by default with the SoftMax activation function. Initially, the CNN is trained with a fixed batch size, Adam optimizer function, and categorical cross-entropy loss function. The accuracy of each particle at hierarchy level-2 is computed using the SoftMax function of the last FC layer, and the velocity, positions of the particle, and Pbest and gbest are updated as per the algorithm shown in Algorithm 1. In this model, the fitness evaluation of each particle is done by the softmax layer of CNN. The best set of hyperparameters for a CNN, which leads to higher accuracy compared to other configurations, represents the optimal solution when contrasted with a particle having parameters resulting in lower accuracy. Evaluation of fitness is referred from the research work [33] and detailed architecture is depicted in Fig. 4. Number of convolutional, pooling and fully connected layers are parameters considered at swarm level 1 with m number of particles. The second level swarm consists of seven hyperparameters as shown in Fig. 4.

Fig. 2
figure 2

The structure of hyperparameter optimization used in proposed work

Fig. 3
figure 3

Workflow of Hierarchical Particle Swarm Optimization (H-PSO) for optimization of CNN hyperparameters

Fig. 4
figure 4

Architecture of H-PSO Framework for Hyperparameter Optimization of CNN

The calculation of a particle's fitness occurs at level 2, based on a set of parameters indicated by (Pi, Pij) as outlined in Eq. (6). In this context, Pi denotes a particle at the first swarm level, while Pij represents a particle at the second swarm level.

$$Fitness\left({P}_{ij}\right)=CNN\left({P}_{i}, {P}_{ij}\right)$$
(6)

where \({P}_{i}=(NC, NP, NFC)\) \({P}_{ij}=(\) NK_C, KS_C, SS_C, AF_C, KS_P, SS_P, AF_FC).

Where NC- number of convolutional layers, NP- number of pooling layers, NFC- number of fully connected layers, NK_C, KS_C, SS_C, AF_C represents number of kernels, kernel size, stride size and activation function in convolutional layer respectively.

KS_P, SS_P represents size kernel and stride in pooling layer respectively. AF_FC represents activation function except last layer in fully connected layer.

The velocity (Rij) ith swarm of jth particle at level-2 is computed by Eq. 7.

$${R}_{ij}=\omega {R}_{ij}+{c}_{1}{rand}_{1}\left({pbest}_{ij} -{P}_{ij}\right)+{c}_{2}{rand}_{2}\left({gbest}_{i}-{P}_{ij}\right)$$
(7)

The new position of the particle Pij at level-2 is expressed in Eq. 8.

$${P}_{ij}={P}_{ij}+{R}_{ij}$$
(8)

The velocity of the ith particle of level -1 swarm is expressed in Eq. 9.

$${R}_{i}=\omega {R}_{i}+{c}_{1}{rand}_{1}\left({pbest}_{i} -{P}_{i}\right)+{c}_{2}{rand}_{2}\left(gbest-{P}_{i}\right)$$
(9)

The new position of the ith particle which updates the number of layers is given in Eq. 10.

$${P}_{i}={P}_{i}+{R}_{i}$$
(10)

The fitness of particle​ is contingent upon both its internal composition of layers and the globally best hyperparameters it acquired from its corresponding particle's higher level within the ith swarm, as illustrated in Eq. 11. Ultimately, the solution converges towards the optimal best solution gbest​, which is determined as the maximum among gbest1, gbest2, ……., gbestm.

$$Fitness\left({P}_{i}\right)=CNN\left({P}_{i}, {gbest}_{i}\right)$$
(11)

Results and Analysis

Determining the optimal hyperparameters of a CNN is one of the challenging tasks in CNN applications. The manual selection of hyperparameters requires expertise in the domain, and the hyperparameters of a CNN may vary from one dataset to another. It is essential to automate the process of hyperparameter optimization to overcome the limitations of manual optimization. This work utilized PSO in a hierarchical manner to determine the architecture and layer-level hyperparameters of the CNN.

Benchmark Datasets

This work carried the optimization of CNN using H-PSO on the HAR benchmark datasets given in Table 2. The Table 2 describes the datasets characteristics and data sampling of UCI-HAR, Opportunity, PAMAP2 and Daphnet FOG datasets.

Table 2 Description of S-HAR benchmark datasets utilized in the proposed work

Results and Analysis

The proposed work evaluated the H-PSO optimized 1D- CNN, called OPTConvNet, on S-HAR benchmark datasets. To optimize the 1D-CNN for S-HAR, it is necessary to initialize the range of values of architecture, layer, and training level parameters as well as initial values of H- PSO parameters manually. Tables 3 and 4 depicts the range of CNN hyperparameters search space, and PSO parameters search space used in this experiment respectively. The evaluation of the proposed method encompasses datasets such as UCI-HAR, Opportunity, PAMPA2, and Daphnet Gait. During the training phase, 80% of the dataset is utilized, while the remaining 20% is used for testing purpose.

Table 3 CNN Hyperparamters search space used in experiments
Table 4 PSO parameter search space used for the experiment

The ranges of CNN hyperparameters in this study were carefully selected based on observations. The custom CNN architecture for the datasets listed in Table 2 is discussed in [34]. The architecture incorporates a maximum of three convolutional layers, a configuration also utilized by the authors of [40] for limb activity recognition. Experimentation conducted by the authors of [41] revealed that CNNs with three convolutional layers outperform those with six convolutional layers. Taking these findings into account, the range of convolutional and pooling layers was fixed at 1–4. Analysis of popular CNN-based architectures suggests that the number of fully connected (FC) layers can vary between 1 and 3 to design a classification model with better accuracy [42, 43]. Additionally, a study conducted in [44] suggests that the maximum required number of FC layers for a deep architecture is 3. Therefore, in this study, we consider {1, 2, 3} as the search space for the number of FC layers.

Table 4 shows, that the number of particles at architecture level parameters is initialized to 4 and the number of parameters at the architecture level is 3 (NC, NP, and NFC). Hence, the size of the swarm particle will be 4 X 3. Each particle of the architecture level is extended to the layer and training level parameters and explored to the possible parameters. The number of swarm particles initialized at layer and training level is 4 and the number of parameters to be optimized is 8 (number of kernels, size of the kernels in convolutional layer, stride of convolutional, Kernel size of pooling layer, stride of pooling layer, AF of convolutional, AF of dense layer and number of neurons in dense layer). Hence, the size of the swarm particle will be 4 X 8. To obtain the best solution, the PSO algorithm evaluates 256 possible configurations, 1D-CNN and this has been evaluated on benchmark datasets. By default, the CNN softmax activation function at the output layer, and adam optimization function. Table 4 shows the optimal parameters for all benchmark datasets shown in Table 5.

Table 5 Experimental results of Optimal parameters obtained for OPTConvNet using hierarchical PSO

The proposed method of OPTConvet using H-PSO obtained the outstanding results on UCI-HAR, Opportunity, PAMPA2 and Daphnet Gait datasets with accuracy of 99.72%, 99.82%, 96.03% and 98.52% respectively as shown in Fig. 5.

Fig. 5
figure 5

Performance of OPTConvNet model over four benchmark datasets

The class-wise accuracies of the UCI-HAR, PAMAP2, Daphnet Gait, and Opportunity benchmark datasets using OPTConvNet are shown in Figs. 6, 7, 8, 9, respectively.

Fig. 6
figure 6

Normalized confusion matrix of UCI-HAR dataset for six human activity classes

Fig. 7
figure 7

Normalized confusion matrix of PAMPA2 dataset for 11 human activity classes

Fig. 8
figure 8

Normalized confusion matrix of Daphnet Gait dataset for binary classes

Fig. 9
figure 9

Normalized confusion matrix of Opportunity dataset for eighteen classes

The Fig. 6 displays the normalized confusion matrix for the OPTConvNet model, which employs H-PSO for CNN hyperparameter optimization on the UCI-HAR dataset. The classifier correctly categorized all instances of the class "LAYING”, “SITTING”, “STANDING” and “WALKING_DOWNSTAIRS”. The model classifies the instances of "WALKING” and "WALKING UPSTAIRS” with an accuracy of 99%. The model misclassified the 1% instances of class “WALKING” as “WALKING UPSTAIRS” and 1% instances of class “WALKING UPSTAIRS” as misclassified as “WALKING”.

Figure 7 illustrates the confusion matrix for the PAMPA2 dataset. The OPTConvnet classifies instances of "Ironing" and "Walking" with a high accuracy of 98%. However, the model encounters confusion when classifying 2% of instances within these categories. Specifically, 2% of the total instances for both "Ironing" and "Walking" classes are misclassified as "Descending stairs" and "Running," respectively.

From Fig. 7, it can be observed that 97% of instances of the class “Ascending stairs” are correctly classified, and the remaining 3% of instances are erroneously classified as “Running”. The accuracy of classifying instances within the categories of "Cycling", “Lying”, "Rope jumping”, “Sitting”, and "Standing" stands at an impressive 96%. However, the model encounters challenges when handling the 4% of instances associated with the "Cycling" class. Within this 4%, 1% of cases are erroneously classified as "Ascending stairs," and the remaining 3% are mistakenly categorized as "Rope jumping." The confusion rate of classifying the instances of “Cycling” and “Rope jumping” is relatively higher, as 3% of “Rope jumping” instances and “Cycling” instances are misclassified as “Cycling” and “Rope jumping,” respectively. The model has more errors while classifying the static activities namely “Lying”, “Sitting”, and “Standing”. 4% of instances of “Lying” and “Sitting” are misclassified as “Sitting” and “Lying,” respectively. 4% of instances of the class “Standing” are erroneously categorized as “Sitting”. “Nordic walking” and “Descending stairs” obtained 95% classification accuracy. Instances of the class “Running” are classified with an accuracy of 94%, and the remaining 6% of instances are misclassified as “Walking” and “Descending Stairs”.

Figure 8 depicts the class-wise accuracy of the Daphnet Gait dataset using OPTConvNet. The classifier correctly classifies the instances of “Freeze” and “No Freeze” with an accuracy of 98% and 99%, respectively. Figure 9 depicts the class-wise accuracy of the Opportunity dataset using OPTConvNet.

Figures 10, 11, 12, 13, depicts the accuracy / convergence plots of UCI-HAR, PAMPA2, Daphnet Gait and Opportunity dataset respectively. The OPTConvnet converged at 108th epoch with best accuracy of 99.72% on UCI-HAR dataset. On PAMPA2 dataset the converge occurred at 80th epoch with best accuracy of 96.03%. On Daphnet Gait dataset, the converge occurred at 104th epoch with best accuracy of 98.52%. and convergence occurred at 104th epoch with best accuracy of 99.82%.

Fig. 10
figure 10

Convergence plot of OPTConvNet on the UCI-HAR dataset, showcasing the convergence behavior of H-PSO. The plot highlights the Gbest at both level 1 and level 2 of the optimization process

Fig. 11
figure 11

Convergence plot of OPTConvNet on the PAMPA2 dataset, showcasing the convergence behavior of H-PSO. The plot highlights the Gbest at both level 1 and level 2 of the optimization process

Fig. 12
figure 12

Convergence plot of OPTConvNet on the Daphnait Gait dataset, showcasing the convergence behavior of H-PSO. The plot highlights the Gbest at both level 1 and level 2 of the optimization process

Fig. 13
figure 13

Convergence plot of OPTConvNet on the Opportunity dataset, showcasing the convergence behavior of H-PSO. The plot highlights Gbest at both level 1 and level 2 of the optimization process

Conclusion

The effectiveness of each deep learning algorithm is intricately linked to its training process, a complex interplay of numerous parameters. It is imperative to recognize that altering the value of one parameter can significantly influence the behavior of other hyperparameters. Therefore, the careful selection of optimal parameter values emerges as a formidable challenge in the pursuit of achieving efficiency in the training process. Hyperparameter optimization is necessary and where optimal parameters may vary from one dataset to another. Optimization of hyperparameters using trial and error is time consuming and requires more human intervention. Many other optimization techniques discussed in the literature are optimizes either network parameter or layers parameters but not both. The proposed approach optimizes the architecture and layer level architecture of CNN. It makes the optimization process easier for the user who don’t have the depth knowledge of the optimization and with less huma intervention. The proposed OPTConvNet using hierarchical particle swarm optimization has obtained good accuracy of 99.72%, 96.03%, 98.52% and 99.82% on benchmark datasets namely UCI-HAR, PAMPA2, Daphnet Gait and Opportunity datasets respectively. This work can be further extended to by including more number of hyperparameters for optimizations and exploring variants of PSO and other heuristic algorithms.