1 Introduction

Artificial neural network (ANN) is considered an intelligent mathematical model inspired by the neurons in the biological brain where the connections between neurons exchange signals to communicate their data [1]. In machine learning, the main applications of ANN are feature extraction, classifications, predictions and regressions problems [2,3,4]. Since their establishment in 1943 [5], ANNs different types have been developed including radial basis function (RBF) network [6], Feedforward neural network [7], convolutional neural network [8], recurrent neural network [9], and spiking neural networks [10]. The main difference between these types is the learning process. Normally, the learning process is either supervised, where ANN takes feedback from the outsource, or unsupervised, where the model discovers hidden patterns in the data by itself [11].

The multilayer perceptron (MLP) neural network is one of the most popular feedforward versions of the ANN that has been applied successfully for several classification problems [12,13,14]. This is because of its success in the learning process during the training stage. The MLP normally used a supervised method based on the backpropagation principle to accomplish accurate training which adjusts the weights and biases of the MLP through at least three layers (i.e., input, hidden, and output). The backpropagation algorithm is a well-known gradient decent technique [15]. Generally speaking, gradient descent techniques suffer from chronic problems related to the slow convergence and local optima trap [16, 17]. In order to overcome such shortcomings, stochastic methods such as metaheuristic-based techniques came to the fore [18].

Metaheuristic-based techniques such as evolutionary algorithms and swarm-based algorithms can greatly support the MLP by accelerating the convergence as well as avoiding becoming trapped in local optima. There is a wide range of metaheuristic-based approaches used in the training process of MLP. The earliest methods are genetic algorithm (GA) [19], particle swarm optimization (PSO) [20], and differential evolution (DE) [21, 22]. Nowadays, a plethora of recent metaheuristic-based algorithm are being used for MLPs with a very successfully outcomes such as gray wolf optimizer [18, 23], salp swarm algorithm [24], glowworm swarm optimization [25], grasshopper optimization algorithm [26, 27], artificial bee colony [28], butterfly optimization algorithm [29], monarch butterfly optimization [30], social spider optimization algorithm [31], dragonfly algorithm [28], fish swarm algorithm [32], ant colony optimization [33], bat algorithm [34], biogeography-based optimization [35], gravitational search algorithm [36], krill herd algorithm [37], ant lion optimizer [38], cuckoo search algorithm [39], organisms search algorithm [40], and lightning search algorithm [41].

According to the no free lunch (NFL) theorem for optimization [42], there is no superior algorithm that can perform well and excel others for all optimization problems or even for different instances of the same optimization problem. Therefore, there is still a window for improving the MLP performance by investigating other state-of-the-art metaheuristic-based methods to function as a training method in MLP. Quite recently, a new human-based metaheuristic algorithm called coronavirus herd immunity optimizer (CHIO) has been proposed for global optimization problems [43]. The main idea of CHIO is inspired from herd immunity as a strategy to confront the pandemic. CHIO can be considered as an evolutionary algorithm which is initiated with a population of random individuals. The population has three types of individual cases: susceptible, infected, and recovered. During the improvement loop, the susceptible case can be infected based on their inherited attributes. Also, the infected cases can be either recover or die based on their improvement over a specific period (i.e., specific iterations or can be called age). The recovered cases which are the highly immuned cases are stronger than the other cases and stand as a shield to stop the pandemic. CHIO algorithm will stop when the whole population is immune based on the herd immunity strategy. The main advantage of using CHIO to tackle any optimization problem is its ability to be adapted to the optimization problem without prior-knowledge or derivative information in the initial search. It is very simple and easy-to-use as a black-box. Recently, CHIO has been successfully applied for solving several problems such as traveling salesman problems [44] and wheel motor design [45].

Archive methods have been introduced by many researchers to promote the population diversity during evolution and to store the potential optima. For example, Lacroix et al. [46] introduced an archive method to store the best-known solution in the evolving population into one collection. The collection is used as an indexer for searching space regions to identify the weak regions for exploration. Later, these regions are stored in another collection. During the evolution, the two collections are continuously updated. Archive method for sub-populations was introduced in Zhang et al. [47], Kundu et al. [48] to undergo regeneration that will eventually form an initial population of solutions for the evolving population. In [49], an archive method is implemented to store stagnant solutions. In this method, the detected stagnant solution will be reinitialized, along with its neighbors whose fitness is lower than the stagnant individual. Additionally, Sheng et al. [50], Turky and Abdullah [51] utilizes the archive to solve dynamic optimization problems. The archive aims to improve the population evolution and maintain the best potential solutions for subsequent cycles. The findings demonstrate that the method has achieved better performance than other methods in terms of locating several optimal solutions in the problem search space reliably. The external archive is used by other researchers to tackle multi-objective optimization problems [52, 53]. Such modification can increase the methods’ exploration capabilities by speeding up the convergence toward their optimal/near-optimal solutions.

To improve the exploration capabilities of CHIO an archive that saves the best results is implemented. The use of archive [46, 49, 54] can help in enhancing the ability to search into more promising regions. The modified CHIO is called archive-CHIO (ACHIO). The proposed algorithm is used to improve the performance of MLP in training a single hidden layer neural network. ACHIO is used to a good initial sets of weights and biases to train MLP efficiently. In order to measure the performance of the proposed ACHIO-MLP system, the mean square error (MSE) is used as an accuracy measurement [24, 55,56,57]. The proposed ACHIO-MLP system is evaluated using 15 classification datasets of various complexity. The proposed algorithm is evaluated against the original CHIO and six well-known swarm intelligence algorithms (Artificial Bee Colony (ABC), Bat Algorithm (BA), Flower Pollination Algorithm (FPA), Particle Swarm Optimization (PSO), Sine Cosine Algorithm (SCA), and Harmony Search (HS)).

The remaining parts of the present paper are arranged as follows: the feedforward neural network (FNN) is discussed in Sect. 2. The proposed ACHIO-MLP is thoroughly described in Sect. 3. The experimental results and their discussions are provided in Sect. 4. Finally, the paper is concluded and the possible future developments are given in Sect. 5.

2 Feedforward neural networks

A feedforward neural networks (FFNN) is computational learning algorithm inspired by the processing units of the human brain. These processing units are called neurons; they are interconnected and grouped in three layers. The first layer, the input layer, is composed of the same number of neurons as the number of the input features corresponding vector, the middle layers are called hidden layers, and the last layer is the output layer that consists of output neurons to the predicted class labels [58]. Multilayer perceptron (MLP) is a FFNN model whose architecture is formed by neurons interconnected in layers. Through these connections, the information flows one-way. Figure 1 shows the network structure of an MLP with only one hidden layer. The mathematical model of an MLP is based on three factors: input data, weights, and biases. These factors are applied in three steps in order to calculate the output of MLPs as follows:

  1. 1.

    Initially, the weighted sum of the input is calculated using Eq. (1).

    $$\begin{aligned} S_{j} = \sum _{i=1}^{n} (w_{ij}.X_{i}) - \beta _{j} , j= 1,2, ... , h \end{aligned}$$
    (1)

    where n represents the number of the input nodes in the network, \(w_{ij}\) represents the weight on the connections between the input node i and hidden node j, \(X_{i}\) is the ith input, and \(\beta _{j}\) is the bias of the jth hidden node.

  2. 2.

    In this step, an activation function (e.g., Sigmoid, which is commonly used in MLPs) is adopted to transfer the weighted output in the hidden layer to the next layer as follows:

    $$\begin{aligned} S_{j} = \textrm{Sigmoid} (S_{j}) = \frac{ 1 }{ 1+ \textrm{exp}(-S_{j})}, j=1,2, ... h \end{aligned}$$
    (2)
  3. 3.

    Finally, the last output of the network is computed as follows:

    $$\begin{aligned} \hat{y}_{k}= \sum _{i=1}^{m} w_{kj}f_{i} + b_{k} \end{aligned}$$
    (3)
Fig. 1
figure 1

Network structure of MLPs with only single one hidden layer

where \(w_{jk}\) is the weighted edge connecting hidden node j to the output node k, and the bias of the output node k is \(b_{k}\).

As observed from Eqs.  (1) and (3), the weights and biases are primary factors for computing the final output in MLPs. Having robust training for MLPs entails seeking the proper values for both weights and biases [23]. In the next sections, the CHIO algorithm is adapted as a trainer for MLPs.

3 Archive-based coronavirus herd immunity optimizer for MLP training

Coronavirus herd immunity optimizer (CHIO) is a recent nature-inspired human-based optimization algorithm proposed by Al-betar et al. [43]. CHIO imitates how herd immunity can be utilized to confront the COVID-19 pandemic. The algorithm proves its effectiveness when compared with other comparative methods to tackle optimization problems [59, 60].

In CHIO, the total population is divided into three sub-populations. The susceptible case sub-populations are the solutions not infected by the virus, so they can be infected. The infected case sub-populations have the solutions where they are changed from being susceptible to being infected after they inherit values from an infected case. The immuned case sub-populations have solutions which are the cases that survived after being infected. They are the strongest portion of the population who are not affected by the infected cases.

If a susceptible individual takes attributes from an infected case and is not immuned, he will become also infected (see Eq. (14) and lines 44–45 of Algorithm 1). The individual will stay in this status until MaxAge is reached where the case becomes either immuned (recovered) or died (see lines 65-72 of Algorithm 1). Any susceptible individual who takes attributes from the infected person will become also infected. The contagion is possible only for the susceptible individual. As the infected case will become immuned (recovered) or it will die after reaching MaxAge, but the immuned case cannot become infected. Note that if a susceptible individual takes attributes from an immuned individual his status will not change (i.e., will still be susceptible) as shown in lines 57-60 of Algorithm 1.

The external archive is used to improve CHIO performance by saving the best solutions to be used in the next iterations. This enhanced version of CHIO is called ACHIO. This section describes how ACHIO can be used to train an MLP. As mentioned before, weights and biases are the variables in MLP neural networks. The proposed technique uses ACHIO to optimize the selected weights and biases of the MLP neural networks by choosing the MLPs that obtain the highest classification accuracy. In ACHIO-MLP, the MLP is used in each iteration in the process of evaluating the current solutions where the weights and biases are the input vectors.

The archive rate (\(A_r\)) indicates the percentage of the best solutions from the population that will be used (i.e., the best weights and biases for the fittest MLP). These best solutions as well as are stored in an external archive to be used as a part of the initial population in the following run. This allows ACHIO-MLP to make use of the best-obtained solutions so far, during the algorithm evolution. This process is repeated in each run. Since the initial run has no feedback from the historical runs, the population is randomly constructed in the first run. The archive will be utilized after the first run.

The procedural steps of ACHIO-MLP to train a NN are presented next. Figure 2 presents the steps of the ACHIO-MLP algorithm. Furthermore, the pseudo-code of ACHIO-MLP is given in Algorithm 1.

Fig. 2
figure 2

The steps of the proposed ACHIO-MLP

The proposed ACHIO-MLP algorithm has eight main steps as follows:


Step 1: Define the external archive The external archive is a matrix (ARCH) of size \(K \times N\) (see Eq. (4)) where N is the total number of weights and biases in the solution vector while K is the number of best MLPs selected after each training session for copying to the archive. K is set in Eq. (4) by the ratio \(A_r\). \(A_r\) is a parameter of the ACHIO algorithm that is chosen in a preliminary experiment (see Sect. 4.3). Note that in the first run, the ARCH will be empty and it will be updated after the first run.

$${\mathbf{ARCH}} = {\text{ }}\left[ {\begin{array}{*{20}c} {w_{1}^{1} } & \cdots & {w_{n}^{1} } & {b_{1}^{1} } & \cdots & {b_{m}^{1} } \\ {w_{1}^{2} } & \cdots & {w_{n}^{2} } & {b_{1}^{2} } & \cdots & {b_{m}^{2} } \\ \vdots & \vdots & \cdots & \vdots & {} & {} \\ {w_{1}^{K} } & \cdots & {w_{n}^{K} } & {b_{1}^{K} } & \cdots & {b_{m}^{K} } \\ \end{array} } \right].{\text{ }}$$
(4)
$$\begin{aligned} K= \,& {\rm HIS} \times A_r \end{aligned}$$
(5)

where the \(A_r\) is the archive rate which determines the rate of extracting the best solutions from the previous run. In other words, Ar is the ratio of MLPs selected from the training set of MLPs for the next version of the archive. The population size (i.e., HIS) determines the number of solutions where each solution consists of a vector of weights and biases for each MLP. Note that the archive size depends on the population size using the archive rate (\(A_r\)). Note that ARCH is only constructed at the beginning of the execution and it keeps updating after each run.


Step 2: setting ACHIO and MLP parameters In MLP neural network, the optimization problem can be represented using a one-dimensional vector that is the set of weights and biases that needs to be adjusted to increase the fitness of an MLP to the data. The number of weights and biases in the vector can be calculated using Eq. (6):

$$\begin{aligned} h&= 2 \times F + 1; \end{aligned}$$
(6)
$$\begin{aligned} n1&= N \times h + h \times o \end{aligned}$$
(7)
$$\begin{aligned} b&= h + o \end{aligned}$$
(8)

where h is the number of neurons in the hidden layer, F is the number of features in the dataset, n1 is the number of weights, o is the number of outputs from the MLP neural network, and b is the number of biases.

The resulting MLP is applied to all instances in the dataset, and it is measured using mean square error (MSE). The MSE is the difference between the MLP output and the actual data. MSE is a common metric used that calculates the difference between the actual value and the predicted value. The equation for MSE is demonstrated in Eq. (9). Note that y represents the actual value, \(\hat{y}\) represents the predicted value, and k represents the number of training samples.

$$\begin{aligned} \textrm{MSE} = \frac{1}{k} \sum _{i=1}^{k}{(y- \hat{y})^2} \end{aligned}$$
(9)

The \(\hat{y}\) is predicted based on feeding the current weights and biases to the MLP and identifying the correct class label for each data input. The output of MLP is compared against the actual data output to identify the quality of the MLP (i.e., the MSE against the data). In the training phase, the MSE value is the difference between the actual outcomes and the predicted outcomes by the generated MLPs by the proposed ACHIO-MLP algorithm.

Note that the size of the output prediction vector produced by the neural network will be depending on the number of classes. For example, in case we have 3 classes then we have an output vector of size 3 from the neural network. Now the predicted class will have the larger value of the 3 values in the vector. Assuming that the correct class is the first one and the neural network predicted it to be the second class. Then when calculating MSE the predicted value (i.e., \(\hat{y}\)) used for the first and the third classes will be zero. On the other hand, the actual value (i.e., y) for the second class and the third class will be zero.

In general, the classification problem can be modeled as follows:

$$\begin{aligned} \min _x f({{\varvec{x}}}) \quad {{\varvec{x}}} \in [{{\varvec{lb}}},{{\varvec{ub}}}] \end{aligned}$$
(10)

where \(f({{\varvec{x}}})\) is MSE value (i.e., the objective function that must be lowered) which is evaluated for the case \({{\varvec{x}}}=(x_1,x_2,\ldots ,x_N)\). Note that for MLP, the vector \({{\varvec{x}}}\) has two parts which are the weights (n) and biases (m) such that \({{\varvec{x}}}=(w_1,w_2,\ldots ,w_n,b_1,b_2, \ldots ,b_m)\). Here, \(x_i\) is the decision variable indexed by i, and N is the total number of decision variables in each individual which is \(N=n+m\). The weights and biases values for the MLP are within the interval \(\in [lb_i,ub_i]\) where \(lb_i\) and \(ub_i\) are the smallest and highest limits of the variable \(x_i\) (i.e., the acceptable range for MLP weights and biases).

There are five algorithmic parameters for ACHIO-MLP which are as follows:

  • \(C_0\) represents how many infected cases we have initially, normally set to be one.

  • \({\rm Max}_{\rm Itr}\) represents the maximum number of iterations.

  • HIS is the population size (i.e., the weights and biases vectors of MLPs).

  • N represents the number of variables in the solution (i.e., the number of weights and biases for each MLP).

  • \(A_r\) represents the archive rate which is the percentage of re-using the best solutions from the previous run.

In addition, ACHIO-MLP has two control parameters:

  • Basic reproduction Rate (BR): which identifies the speed of virus spreading among individuals, where it assigned a random value in the range of [0, 1]. In other words, BR is the average proportion of MLP’s weights and biases that are changed toward the corresponding weight or bias of another MLP at each time step (see Eq. (19)).

  • Maximum infected cases age (\({\rm Max}_{\rm Age}\)): when an infected (S = 1) (but not immune) case’s age reaches \({\rm Max}_{\rm Age}\) it will either die (i.e. removed from the population of MLPs being trained) or become immune (S = 2), depending on its MSE value compare to the average MSE value of all the MLPs in the population being trained (see Eq. (19)).


Step 3: Produce the population for MLP configuration Firstly, in the first run, ACHIO-MLP generates HIS random solutions and stores them in CHIO memory (CHIOM) as shown in Eq. (11) where in the consecutive run, the ACHIO-MLP will generate \({\rm HIS}-K\) random solutions and K solution will be taken from ARCH. Each solution represents possible weights and biases as input for MLP. Each solution is a vector \({{\varvec{x}}}=(w_1,w_2,\ldots ,w_n,b_1,b_2,\ldots ,b_m)\).

$${\mathbf{CHIOM}} = \left[ {\begin{array}{*{20}c} {w_{1}^{1} } & \cdots & {w_{n}^{1} } & {b_{1}^{1} } & \cdots & {b_{m}^{1} } \\ {w_{1}^{2} } & \cdots & {w_{n}^{2} } & {b_{1}^{2} } & \cdots & {b_{m}^{2} } \\ \vdots & \vdots & \cdots & \vdots & {} & {} \\ {w_{1}^{{{\text{HIS}}}} } & \cdots & {w_{n}^{{{\text{HIS}}}} } & {b_{1}^{{{\text{HIS}}}} } & \cdots & {b_{m}^{{{\text{HIS}}}} } \\ \end{array} } \right].{\text{ }}$$
(11)

where each row represents a solution \({{\varvec{x}}}^j\) which is a set of weights and biases. The solution is generated as follows: \(x_i^{j} =lb_i + (ub_i - lb_i) \times U(0,1)\), \(\forall i=1,2, \ldots , N\). The cost function is computed using MSE as presented in Eq. (9). It is worth mentioning that after the first run, some solutions K are copied from the archive ARCH directly and the remaining solutions are constructed randomly to fill up CHIOM.

For simplicity, the term \(x_i^{j}\) is used next to refer to the variable i (weight or bias) of a solution vector j.


Step 4: The progress of herd immunity The ACHIO-MLP algorithm is used to improve the weights and biases of all MLP in the current population. Note that the current population is improved including the archive added from the previous population. The weight or bias (\(x_i^j\)) (i.e., \(x_i^j= w^{i}_{j}\) or \(x_i^j= b^{i}_{j}\) ) for the individual \({{\varvec{x}}}^j\) stored in CHIOM would be changed or not by applying the following three social distancing rules based on the BR ratio:

$$x_{i}^{j} (t + 1) \leftarrow \left\{ {\begin{array}{*{20}l} {C(x_{i}^{j} (t))} \hfill & {r \in \left[ {0,\frac{1}{3}BR} \right).\quad {\text{//infected case}}} \hfill \\ {N(x_{i}^{j} (t))} \hfill & {r \in \left[ {\frac{1}{3}BR,\frac{2}{3}BR} \right).\quad {\text{//susceptible case}}} \hfill \\ {R(x_{i}^{j} (t))} \hfill & {r \in \left( {\frac{2}{3}BR,BR} \right].\qquad {\text{//immuned case}}} \hfill \\ {x_{i}^{j} (t)} \hfill & {r \ge BR} \hfill \\ \end{array} } \right.{\text{ }}$$
(12)

Note that r is a random number within the range [0,1]. The following is how the weights and biases of an MLP change depending on other MLPs:

Infected case : in case \(r\in [0, \frac{1}{3} BR)\), the new weight or bias value \(x_i^j (t+1)\) would be based on a previous value of an infected case \({{\varvec{x}}}^c\) computed as follows:

$$\begin{aligned} x_i^j (t+1)=C( x_i^j(t)) \end{aligned}$$
(13)

where

$$\begin{aligned} C( x_i^j(t))= x_i^j(t)+ r\times (x_i^j(t)-x_i^c(t)) \end{aligned}$$
(14)

where \(x_i^c(t)\) is from a randomly chosen infected case \({{\varvec{x}}}^c\).

Susceptible case: in case \(r\in [\frac{1}{3} BR,\frac{2}{3} BR)\) then the new weight or bias value \(x_i^j (t+1)\) would be based on a previous susceptible case \({{\varvec{x}}}^m\) as follows:

$$\begin{aligned} x_i^j (t+1)=N( x_i^j(t)) \end{aligned}$$
(15)

where

$$\begin{aligned} N( x_i^j(t))= x_i^j(t)+ r\times (x_i^j(t)-x_i^m(t)) \end{aligned}$$
(16)

where \(x_i^m(t)\) is from a randomly chosen susceptible case \({{\varvec{x}}}^m\).

Immuned case: in case \(r\in [\frac{2}{3} BR, BR)\), the new weight or bias value \(x_i^j (t+1)\) would be based on a previous immuned case \({{\varvec{x}}}^v\) as follows:

$$\begin{aligned} x_i^j (t+1)=R( x_i^j(t)) \end{aligned}$$
(17)

where

$$\begin{aligned} R( x_i^j(t))= x_i^j(t)+ r\times (x_i^j(t)-x_i^v(t)) \end{aligned}$$
(18)

Note that \(x_i^v(t)\) is from a randomly chosen immuned case \({{\varvec{x}}}^v\).

$$\begin{aligned} f(x^v)= \arg \min _{j\thicksim \{k|{\mathcal {S}}_k=2\}} f(x^j). \end{aligned}$$

The weights and biases of \(x_i^j (t+1)\) are used as input parameters for MLP. The obtained result of MLP is used in MSE which is a common metric used to evaluate the performance of the obtained result. The objective here is to find the set of weights and biases that minimize MSE using the training instances from the selected dataset.

It is worth mentioning that in each CHIO operator, the next value of any variable is calculated based on the original value plus a small distance between the current value and a randomly chosen variable value from a solution with the same type.


Step 5: Refreshing the population The cost function \(f({{\varvec{x}}}^j(t+1))\) of the newly generated weights and biases vector, \({{\varvec{x}}}^j(t+1)\), is computed. Then, it will replace the current case \({{\varvec{x}}}^j(t)\) if better, such as \(f({{\varvec{x}}}^j(t+1))<f({{\varvec{x}}}^j(t))\) then the age vector \({\mathcal {A}}_j\) would be incremented one step.

For each case \({{\varvec{x}}}^j\), the status value (\({\mathcal {S}}_j\)) is modified according to the herd immune threshold represented in Eq. (19).

$$\begin{aligned} {\mathcal {S}}_{j}\leftarrow {\left\{ \begin{array}{ll} 1 &{} f({{{\varvec{x}}}^j(t+1)}) < \frac{f({{{\varvec{x}}})}^j(t+1)}{\triangle {f({{\varvec{x}}})}} \wedge {\mathcal {S}}_j=0 \wedge is\_\textrm{Corona}\,({{\varvec{x}}}^j(t+1)) \\ \\ 2 &{} f({{{\varvec{x}}}^j(t+1)}) > \frac{f({{{\varvec{x}}})}^j(t+1)}{\triangle {f({{\varvec{x}}})}} \wedge {\mathcal {S}}_j=1 \end{array}\right. } \end{aligned}$$
(19)

Note that \(is\_\textrm{corona}\, ({{\varvec{x}}}^j(t+1))\) symbolizes a binary value that is equal to one if the newly generated case \({{\varvec{x}}}^j(t+1)\) was based on any infected case. Additionally, \(\triangle {f({{\varvec{x}}})}\) is the mean value of the population immune rates which is defined as \(\frac{\sum _{i=1}^{\rm HIS}f(x_i)}{\rm HIS}\). Note that each MLP’s status value indicates its current state; for \(MLP_j\) its status is \({S}_j\) \(\in \{0, 1, 2\}\) where \({S}_j=0\) indicates a susceptible case, \({S}_j=1\) indicates an infected case, and \({S}_j=2\) indicates an immuned case. The status of an MLP can change at any iteration of the training, see Step 6 below.


Step 6: Fatality cases The MLP becomes dead if it is an infected case (\({\mathcal {S}}_j\) == 1) and its immunity rate (\(f({\textbf {x}} ^j (t+1)\)) did not improve over a predefined number of trials determined by age comparable to the maximum age \({\rm Max}_{\rm Age}\) (i.e., \({\mathcal {A}}_j\) \(\ge\) \({\rm Max}_{\rm Age}\)). In such a situation, the case is reconstructed as a new solution by applying \(x_i^{j} (t+1)\) = \(lb_i + (ub_i - lb_i) \times U(0,1)\),    \(\forall i=1,2, \ldots , N\). The algorithm performs that to diversify its population (i.e., weights and biases).


Step 7: Stop and test Steps 4 to 6 are replicated until we reach the maximum number of iterations. After that, the MLP with the lowest MSE value is tested with the test dataset. In this study, all datasets are split into 30% for testing and 70% for training.


Step 8: Update the external archive At the end of each run, the solutions (i.e., the vector of weights and biases for each MLP) in the population are arranged in ascending order according to MSE values. The archive is cleared, the best \(A_r\) solutions are copied to a new ARCH. Even though the archive (ARCH) is emptied at the start of each training run, its K MLPs are included in the new population of MLPs to be trained, and the best \(A_r\) of the new population of MLPs after training are copied to a new version archive (ARCH), so the archive can be considered the store of accumulated knowledge.

figure a

4 Experiments and results

In this section, the effectiveness and robustness of the proposed ACHIO-MLP algorithm are studied using 15 benchmark datasets with different levels of complexity. The characteristics of these datasets are presented in Sect. 4.1. The experimental settings are demonstrated in Sect 4.2. The influence of the archive rate parameter on the performance of the ACHIO-MLP is studied in Sect. 4.3. Finally, the performance of the proposed ACHIO-MLP against the classical CHIO-MLP and six other metaheuristic algorithms in terms of classification accuracy and algorithm convergence are discussed and analyzed in Sect. 4.4.

4.1 Test datasets

The effectiveness of the proposed ACHIO-MLP is investigated using a set of experiments by utilizing 15 benchmark classification problems. These problems are selected from the UCI Machine Learning Repository.Footnote 1 The number of classes, features, and instances (or samples) of these datasets are presented in Table 1. The selected benchmark datasets have different numbers of classes, i.e. 2, 4, 6, or 10 classes.

The datasets are normalized by applying min-max normalization to improve the performance and training stability of the model. The following is the mathematical formula used to reduce the scale of the features:

$$x^{\prime } = \frac{{x_{i} - \min _{F} }}{{\max _{F} - \min _{F} }}{\text{ }}$$
(20)

where \(x'\) is the normalized value of x in the range \([\min _{F} ,\max _{F} ]\).

Table 1 The classification datasets and MLP structure for each dataset

In the last two columns of Table 1, the number of nodes in the hidden layer and the MLP structure is presented. The number of nodes in the hidden layer can be determined using different techniques. In this paper, we followed the method presented in [61, 62] in which the number of neurons in the hidden layer can be identified using the formula demonstrated in Eq. (21).

$$\begin{aligned} h=2 \times L +1; \end{aligned}$$
(21)

where L represents the number of features in the dataset. Therefore, the whole MLP structure of each dataset is presented in the form of input-hidden-output. For example, in the Monk dataset, the MLP structure is 6-13-2 where the number of input features is 6, the number of nodes in the hidden layers is 13, and the number of output class labels is 2.

All datasets are split into 30% for testing and 70% for training. We used a stratified sampling to split the dataset [63]. This technique computes the ratio of each class and then satisfies the train/test split percentage for each dataset based on the computed ratio. Using this strategy can help in maintaining the proportion of each class in the divided data and in increasing the presence of minority classes. Such that the train/test portions will have a balanced number of classes.

4.2 Experimental settings

The proposed algorithm is compared against six swarm intelligence algorithms using the same datasets. All experiments are conducted using a Microsoft Azure server with MATLAB version 9.7.0 on a PC with Windows operating system, Intel R Xeon Silver 1.8 GHz CPU, and 6 GB of RAM. The algorithms are implemented for each dataset over 30 independent runs and the number of iterations is 250. The size of the population of MLPs to be trained for all comparative algorithms is set to 70. The parameter settings of all comparative methods are given in Table 2. These parameters are set based on the recommendation given by researchers of their original papers. Note that the proposed ACHIO-MLP is compared with other algorithms in terms of classification accuracy. Classification accuracy represents the number of correct predictions from all predictions made.

Table 2 Parameters settings of the comparative algorithms

4.3 Study the influence of archive rate

The influence of using various settings of the archive rate (\(A_r\)) parameter on the performance of the proposed ACHIO-MLP is studied in this section. Note that each preliminary experiment divided the data 70% for training and 30% for testing. Three different \(A_r\) values are considered \(A_r \in \{0.1, 0.2, 0.5\}\) . It should be noted that the higher value of \(A_r\) leads to a higher rate of exploration.

It is worth mentioning that for choosing Ar, the data are divided into 70% for training and 30% for testing in ACHIO. However, the meta parameters of the other swarm intelligence algorithms were set using the default values as suggested in the literature.

Table 3 shows the classification accuracy of the three variants of the proposed ACHIO-MLP compared to the original CHIO-MLP in terms of the mean, the standard deviation, and the best results. The higher accuracy results mean better performance, while the lower STD values reflect the algorithm’s robustness. The best results are highlighted using bold fonts.

The accuracy mean results shows that the proposed ACHIO-MLP with \(A_r=0.2\) was ranked first by achieving the best accuracy results in 9 out of 15 datasets. The proposed ACHIO-MLP with \(A_r=0.5\) ranked second with the best accuracy results in 4 datasets, while the remaining two algorithms (i.e., CHIO-MLP and ACHIO-MLP with \(A_r=0.1\)) ranked last with each one obtains the best accuracy results in 2 datasets.

Table 3 The accuracy results of the proposed ACHIO-MLP using various settings of the archive rate parameter

According to the best accuracy results, it is clear that the proposed ACHIO-MLP with \(A_r=0.2\) was ranked first by obtaining the highest accuracy results in 8 datasets. In addition, the CHIO-MLP was placed second by obtaining the best results in 6 datasets. While the proposed ACHIO-MLP with \(A_r=0.5\) ranked third by getting the best results in 4 datasets. The ACHIO-MLP with \(A_r=0.1\) was placed last by obtaining the best results in 3 datasets.

Reading the standard deviation results in Table 3, it can be concluded that the performance of the three variants of the proposed ACHIO-MLP is more robust than the CHIO-MLP by getting the minimum standard deviation results in the largest number of datasets.

Table 3 shows that the proposed ACHIO-MLP with \(A_r=0.2\) ranked first by having the minimum average ranking using Friedman’s test, while the two remaining variants of the proposed ACHIO-MLP (i.e., ACHIO-MLP with \(A_r=0.5\) and ACHIO-MLP with \(A_r=0.1\)) are ranked second and third. The CHIO-MLP is ranked last by having the highest Friedman score. This proves the effectiveness of the proposed changes to the CHIO framework when it is used for optimizing the weights of neural networks.

Note that ACHIO-MLP with \(A_r=0.2\) will be used in the next comparison section as it obtains the best results.

4.4 Comparison with other swarm-based optimization algorithms

In this section, the performance of the proposed ACHIO-MLP is evaluated and compared against seven swarm-based metaheuristics. These metaheuristics are the original CHIO [43], artificial bee colony (ABC) [28], bat algorithm (BA) [34], flower pollination algorithm (FPA) [68], particle swarm optimization (PSO) [20], sine cosine algorithm (SCA) [70], harmony search (HS) [71]. In order to ensure a fair comparison, all comparative algorithms are coded by the authors using the same datasets. The same parameter settings of all comparative methods are also unified as mentioned in Sect. 4.2.

The experimental results obtained by the ACHIO-MLP and all comparative methods are presented in Table 4. The results are presented in terms of the mean, standard deviation, and best classification accuracy. The bold values indicate the best value in each dataset. Note that the fittest MLPs which is the one with the lowest MSE on the training dataset are measured on the test dataset.

The best classification accuracy results are presented in Table 4. It shows that the ACHIO-MLP obtains the best classification accuracy for 6 datasets, including Monk (2), Balloon (2), Iris (3), Seeds (3), Glass (6), and Yeast (10). Note that the number of classes is shown between parentheses. Surprisingly, ACHIO-MLP excels the other comparative methods in two large datasets with six and ten classes. This shows the proposed algorithm’s strength in navigating the search space in different ways and being able to achieve promising results. These high-quality results are due to the high balance between the exploration and exploitation of ACHIO-MLP. The proposed ACHIO-MLP algorithm comparison against the comparative methods shows that the ACHIO-MLP algorithm outperforms the CHIO, ABC, BA, FPA, PSO, SCA, and HS algorithms in seven, ten, nine, ten, five, twelve, and twelve datasets, respectively. On the other hand, PSO and CHIO achieve the best results in five and three datasets, respectively. Indeed, the PSO and ACHIO have common characteristics where they behave efficiently when navigating the search space of the weights and biases. They can widely explore several regions of the search space and exploit deeply each region of the MLP search space and find the local optima. Furthermore, since the MLP search space is non-convex and multimodal, the PSO and ACHIO are proven to be very efficient in dealing with the nature of this search space. Finally, the ACHIO behaves as a strong exploiter through the proposed archive-based concept. This allows it to make use of the accumulative knowledge and remember the best points in the MLP search space. Note that all of the algorithms produce the same optimal results for the Balloon dataset. Since the size of this dataset is small with only two classes, the algorithms did not require much effort to achieve the best results.

Similarly, the proposed algorithm is compared against comparative methods in terms of the mean of the classification accuracy. Table 4 shows that the performance of the ACHIO-MLP performs better than other comparative algorithms on ten datasets (i.e., Monk (2), Balloon (2), Ionosphere (2), German (2), Parkinson (2), Iris (3), Seeds (3), Vehicle (4), Glass (6), and Yeast (10)). Furthermore, the performance of the ACHIO-MLP is similar to CHIO and PSO by obtaining the best results for the Balloon dataset. The ACHIO-MLP is able to achieve the second-best results on three datasets (i.e., Cancer (2), Heart (2), and Titanic (2)). While ACHIO-MLP obtained the third-best results on the Blood (2) dataset. The strength of ACHIO-MLP is due to the behavior of its efficient operators where the infection and susceptible cases’ operators can follow any random solution in the population while the recovered case operator exploits the attributes of the best solution. Furthermore, archiving the best results to be used in the next iteration improves the algorithm search. The lower standard derivation reflects the robustness of the algorithm. From Table 4, it can be observed that ACHIO-MLP has better robustness than other comparative algorithms in most datasets.

Table 4 The accuracy results of the proposed ACHIO-MLP in comparison with other swarm-based algorithms

4.4.1 Convergence analysis

The performance of the comparative methods can be investigated using the convergence behavior toward the optimal solution. Accordingly, ACHIO-MLP and all the comparative methods convergence behaviors for all datasets are plotted in Fig. 3. In the figure, the iteration number is the x-axis and the fitness values are the y-axis. Notably, ACHIO-MLP significantly and rapidly converges toward its optimal solution without stagnation in local optima. This yields improvement in its achievements. In addition, ACHIO-MLP obtains the best convergence rate in ten datasets (i.e., Monk (2), Iris (3), Cancer (2), Heart (2), Vertebral (2), Seeds (3), Glass (6), Vehicle (4), Parkinson (2), and Yeast (10)) as the best MSE is reached within the defined number of iterations. It achieves the second-best in almost all other datasets. The ACHIO-MLP operators allow the algorithm to explore efficiently the search space niches and exploit each niche deeply. Using this strategy, the ACHIO-MLP owns a maneuver behavior movement strategy in the search space to escape the local optima trap during the search.

The boxplots for various classification datasets are shown in Fig. 4. Note that the MSE values obtained from MLP using the test dataset by utilizing the fittest MLP with the lowest MSE using the training dataset are plotted. This figure boxplots the obtained MSEs for the 15 classification datasets. In the boxplot, the smaller distance between the best, median, and worst MSE demonstrates the stability of the algorithm. It is worth mentioning that the whiskers represent the farthest MSE values, while the box represents the interquartile range. The outliers are represented by the small circles, and the median value is represented by the bar in the box. In this figure, the boxplots demonstrate and explain the good performance of ACHIO-MLP for training MLP. The ACHIO-MLP shows the smallest MSE distance in twelve datasets (i.e., Balloon (2), Iris (3), Cancer (2), Heart (2), Blood (2), Seeds (3), Glass (6), German (2), Titanic (2), Vehicle (4), Parkinson (2), and Yeast (10)). In addition, the proposed ACHIO-MLP presents the second-best MSE distance in most of the rest datasets.

Fig. 3
figure 3

Convergence results for the different algorithms

Fig. 4
figure 4

Boxplot charts of MSE results for the different algorithms

4.4.2 Friedman’s statistical test

Figure 5 shows the ranking of the comparative algorithms using Friedman’s test. It should be noted that the experimental results provided in Table 4 are used to calculate the rankings of the comparative algorithms. The null hypothesis (\(H_0\)) is that there is no significant difference between the performance of the proposed ACHIO-MLP and the alternative methods judged over all the datasets. On the other hand, the alternative hypothesis (\(H_1\)) is that there is a significant difference between the performance of the ACHIO-MLP and the alternative methods judged over all the datasets. Fig. 5 proves the high performance of the proposed method, where the ACHIO-MLP achieves the best ranking among all compared algorithms. The \(\rho\)-value calculated by Friedman’s test is equal to 8.134649E-11, and this value is below the significance level (\(\alpha\) = 0.05). As a result, there is a significant difference between the comparative algorithms, and thus, the hypothesis \(H_0\) is rejected.

Fig. 5
figure 5

Average rankings of the comparative algorithms using Friedman’s statistical test

The Holm and Hochberg procedures are used as post-hoc techniques to calculate the adjusted \(\rho\)-value in order to show if there is a significant difference between the controlled algorithm and other algorithms. It should be noted that the proposed ACHIO-MLP is the controlled algorithm because it obtains the first ranking as shown in Fig. 5. The null hypothesis \(H_0\) is rejected using Holm’s procedure when the \(\rho\)-value \(\le 0.01667\), and the null hypothesis \(H_0\) is rejected using Hochberg’s procedure when the \(\rho\)-value \(\le 0.0125\). As presented in Table 5, there is a significant difference between ACHIO-MLP and the other five comparative algorithms (HS, SCA, BA, FPA, and ABC). However, no significant difference between the behavior of the ACHIO-MLP and the two algorithms CHIO and PSO. This proves that the proposed ACHIO-MLP algorithm is a new good alternative algorithm that is able to succeed in solving such problems.

Table 5 Holm/Hochberg outcome when having ACHIO the controlled algorithm against the other algorithms

5 Conclusion and future work

CHIO is a powerful algorithm recently proposed to imitate the herd immunity treatment strategy to tackle the coronavirus pandemic. CHIO algorithm is selected because of its capabilities in finding the right trade-off between the exploration of the different search space niches and the exploitation of each search space niche. In this paper, to maintain the local optima and to preserve a good level of population diversity, an archive of best solutions is implemented. The new proposed algorithm (called ACHIO-MLP) selects and trains MLPs. The MLP training problem is mathematically modeled to minimize MSE. The decision variables are the weights and biases in MLPs for which ACHIO-MLP searches to find the elite amount for weights and biases.

In order to evaluate the performance of ACHIO-MLP, a collection of 15 classification datasets with different degrees of difficulty is utilized. Each dataset is normalized before it is used. All datasets are split into 30% for testing and 70% for training. A stratified way is used to split each dataset to maintain the proportion of each class in the divided data to have a balanced number of classes in the train/test split. As each dataset has a different number of features (or class labels), each MLP uses a variant number of inputs, hidden, and output nodes.

The results of the proposed method are compared against the original CHIO and six swarm optimization algorithms: HS, PSO, BA, ABC, FPA, and SCA. Interestingly, ACHIO-MLP can produce very accurate results which excel other comparative methods in ten out of fifteen classification datasets and very competitive results for other datasets. In addition to that, the results demonstrate a better convergence of the proposed algorithm. In a nutshell, the proposed ACHIO-MLP avoids local optima because of its different diversification techniques. Moreover, the results expose how fast the convergence of the proposed algorithm is when compared to other comparable methods. Finally, ACHIO-MLP can train MLPs to obtain a promising set of weights and biases that can produce better results.

In the future, the proposed algorithm will be applied to tackle real-world applications. Also, the proposed algorithm can be hybridized with other local search algorithms to improve its exploitation abilities