Keywords

1 Introduction

Feature selection problem, also named as feature subset selection problem, refers to the selection of N features in the range of the existing M features to optimize the system's specific objectives, thereby reducing the data dimension and improving the performance of learning algorithms. In recent years, with the development of big data, industrial internet, and financial data analysis, more and more high-dimensional datasets are used in various fields of information systems, such as financial analysis, business management, and medical research. The dimensional disaster brought about by high-dimensional datasets makes feature selection an urgent and important task.

Feature selection methods can be divided into filter, wrapper, embedded, and ensemble [1]. The filter feature selection algorithm and learning algorithm are not related to each other. All features are sorted by specific statistical or mathematical attributes, such as Laplacian scores, Constraint scores, Fisher scores, Pearson correlation coefficients, and finally, a subset of features is selected by sorting. The wrapper feature selection algorithm encapsulates the selected learner looks like a black box, evaluates the performance of the selected feature according to its predictive accuracy on the feature subset, and gets the better subset with search strategy to obtain an approximate optimal subset. The embedded feature selection algorithm is embedded in the learning algorithm, with the training process of the classification algorithm is over, a subset of features can be obtained, such as ID3, C4.5, CART, etc. The features used in training are the result of feature selection. The ensemble feature selection algorithm draws on the idea of ensemble learning, which trains multiple feature selection methods and ensembles the results of all feature selection methods to achieve better performance than a single feature selection method. By introducing Bagging, many feature selection algorithms can be improved to be the ensemble.

Swarm intelligence optimization algorithms are often used to solve the feature selection problem and achieved good results. For example, genetic algorithm (GA) [2], ant colony algorithm (ACO) [3], and particle swarm optimization algorithm (PSO) [4], and so on. The Chicken swarm optimization algorithm (CSO) [5] proposed in 2014 is a kind of swarm intelligence optimization algorithm, which is inspired by the foraging behavior of the flock, is obtained a good optimization effect by grouping and updating the population, and has been applied in some fields. Hafez et al. [6] proposed a new feature selection method by using the CSO algorithm as part of the evaluation function. Ahmed et al. [7] applied logistic and tend chaotic mapping to help CSO explore the search space better. Liang, et al. [8] proposed a hybrid heuristic group intelligence optimization algorithm for cuckoo search-chicken swarm optimization (CSCSO) to optimize the excitation amplitude and spacing between the excitation amplitude of the linear antenna array (LAA) and the array of arrays of the circular antenna array (CAA). CSCSO has better solution accuracy and convergence speed in the optimization of LAA and CAA radiation patterns.

In this paper, an improved chicken swarm optimization algorithm (ICSO) is raised, which brings in the Levy flight strategy in the hen location update strategy and the nonlinear strategy of decreasing inertial weight in the chick location update strategy to enhance the ability of global search and decrease the probability of the algorithm falling into a local minimum. There are 18 UCI datasets are applied to compare the of effectiveness the algorithm in this paper with the other 3 algorithms. It's apparent that the algorithm in this paper has huge advantages.

2 Chicken Swarm Optimization Algorithm (CSO)

The chicken swarm optimization algorithm simulates the hierarchy of the chicken swarm and the competitive behavior in foraging. Within the algorithm, the chicken swarm is split into many subgroups, every as well as a rooster, many hens, and chicks. Completely different subgroups of the chicken swarm are subject to specific hierarchical system constraints, and there's competition within the foraging method. Positions of chickens are updated according to their respective motion rules. The behavior of chickens in the chicken swarm optimization algorithm is idealized with four rules, they are as follows:

  1. i.

    The chicken swarm is divided into many subgroups, there are three types of chick in every subgroup: a rooster, several hens, and chicks.

  2. ii.

    There are three types of chickens: rooster with the best fitness value, chick with the worst fitness value, and the others. The three types of chickens correspond to the roosters, the chicks, and the hens. It's worth noting that all the hens can freely choose the subgroup to which they belong. At the same time, the mother-child relationship between hens and chicks is also randomly established.

  3. iii.

    The hierarchal order, dominance relationship, and mother-child relationship in a subgroup will change every period, but in the period all the relationships will keep unchanged.

  4. iv.

    All the chickens in the flock follow the rooster in their subgroup to find food and prevent other chickens from competing for food. The chicks follow the hens for food while assuming the chicks can eat food whichever the chickens find. Among them, chickens with better fitness have more advantages in finding food.

Assuming that the search space is D-dimensional, the total number of chickens in the entire chicken swarm is N, the number of roosters is \(N_R\), the number of hens is \(N_H\), the number of chicks is \(N_C\), and mother hens is \(N_M\). Let \(x_{i,j}^t\) represents the position of the \(i^{th}\) chicken, the t is the \(t^{th}\) iteration, the j is the \(j^{th}\) dimension searching space, where \(i \in \left( {1,2, \ldots ,N} \right)\), \({\text{j}} \in \left( {1,2, \ldots ,{\text{D}}} \right)\), \({\text{t}} \in \left( {1,2, \ldots ,{\text{T}}} \right)\), the maximal iterative number is T.

(a) Rooster location update strategy. The roosters are the chickens with the best fitness value in the chicken swarm. The roosters with better fitness have the advantage over the roosters with poor fitness, so they can find food quickly than the roosters with poor fitness. At the same time can search for food on a larger scale in its position, realize the global search. Meanwhile, the rooters’ location update is influenced by the location of other roosters randomly selected. The position update formulas of the rooster are as follows:

$$ x_{i,j}^{t + 1} = x_{i,j}^t \ast \left( {1 + Randn\left( {0,\;\sigma^2 } \right)} \right) $$
(1)
$$ \sigma^2 = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\;f_i \le f_k ,} \hfill \\ {\exp \left( {\frac{f_k - f_i }{{\left| {f_i } \right| + \varepsilon }}} \right),} \hfill & {otherwise,} \hfill \\ \end{array} k \in \left[ {1,N} \right],k \ne i} \right. $$
(2)

where \( Randn\left( {0,{\upsigma }^2 } \right)\) obey a normal distribution with standard deviation \(\sigma\). \(k\) is the index of a rooster randomly selected from the rooster group. \(f_i\) is the fitness value of the corresponding rooster \(x_i\). \(\varepsilon\) is the smallest constant to avoid the divide 0.

(b) Hen location update strategy. The search ability of hens is slightly worse than that of the roosters. Hens search food following their group-mate roosters, so the location update of the hens is affected by the position of their group-mate roosters. At the same time, due to their food stealing and competition between them, other roosters and hens also affect the location update. The location update formulas of the hen are as follows:

$$ x_{i,j}^{t + 1} = x_{i,j}^t + S1 \ast Rand \ast \left( {x_{r1,j}^t - x_{i,j}^t } \right) + S2 \ast Rand \ast \left( {x_{r2,j}^t - x_{i,j}^t } \right) $$
(3)
$$ S1 = \exp \left( {\frac{{f_i - f_{r1} }}{{abs\left( {f_i } \right) + \varepsilon }}} \right) $$
(4)
$$ S2 = \exp \left( {f_{r2} - f_i } \right) $$
(5)

where \({\text{Rand}}\) is a uniform random number between 0 and 1. \(abs\left( \cdot \right)\) is an absolute value operation. \(r_1\) is the index of the rooster, and the \(i^{th}\) hen search food following it. \(r_2\) is an index of the roosters or hens randomly chosen from the whole chicken swarm, and \(r_1 \ne r_2\).

(c) Chick location update strategy. The chicks have the worst search ability. They follow their mother hen, and the search range is the smallest. The chicks realize the mining of the local optimal solution. The search range of the chicks is affected by the position of their mother hen, and their position update formula is as follows:

$$ x_{i,j}^{t + 1} = x_{i,j}^t + FL \ast \left( {x_{m,j}^t - x_{i,j}^t } \right) $$
(6)

where \(m\) is an index of the mother hen, and the \(i^{th}\) chick follows it to search for food. \(FL\) is a random value selected in the range [0, 2], and its main role is to keep the chick searching for food rounding its mother.

3 Improved Chicken Swarm Optimization Algorithm (ICSO)

Although the CSO algorithm can improve the population utilization rate through a hierarchical mechanism, the effectiveness of its location update method is low, which leads to a decrease in the overall search ability of the algorithm. Given this, this paper proposes an improved chicken swarm optimization algorithm (ICSO), which is based on the grouping idea of the CSO algorithm. The ICSO algorithm improves the position update method of the hens and the chicks respectively to enhance the algorithm's global search ability and decrease the probability of the algorithm falling into the local minimum.

3.1 Hen Location Update Strategy of ICSO

Levy flight is a strategy in the random walk model. In Levy flight, short-distance exploratory local search is alternated with occasional long-distance walking. Therefore, some solutions are searched near the current optimal value, which speeds up the local search; the other part of the solution can be searched in a space far enough from the current optimal value to ensure that the system will not fall into a local optimal [9, 10]. In the CSO algorithm, the number of hens is the largest in three types, so the hens play an important role in the entire population [11]. Inspired by this, the Levy flight search strategy is introduced to the hen location update formula, which can hold back falling into the local minimum while increasing the global search ability of the algorithm in a way. The improved location update formula of the hen is as follows:

$$ x_{i,j}^{t + 1} = x_{i,j}^t + S1*Rand*\left( {x_{r1,\;j}^t - x_{i,j}^t } \right) + S2*R{\text{a}}nd*Levy\left( \lambda \right) \otimes \left( {x_{r2,j}^t - x_{i,j}^t } \right) $$
(7)

where \(\otimes\) is point-to-point multiplication. \(Levy\left( \lambda \right)\) is a random search path.

3.2 Chick Location Update Strategy of ICSO

In the CSO algorithm, the chicks only are affected by their mother hen, not by the rooster in the subgroup. Therefore, the location update information of the chicks only comes from their mother hen, and the location information of the rooster is not used. In this case, once the mother hen of a chick falls into the local optimal solution, the following chicks are easy to fall into the local optimal solution. Using a nonlinear strategy of decreasing inertial weight to update the position of the chick allows the chick to learn from itself while allowing the chick to be affected by the rooster in the subgroup, which can prevent the algorithm from falling into a locally optimal solution as soon as possible. The improved location update formulas of the chick are as follows:

$$ x_{i,j}^{t + 1} = w \ast x_{i,j}^t + FL \ast \left( {x_{m,j}^t - x_{i,j}^t } \right) + C \ast \left( {x_{r,j}^t - x_{i,j}^t } \right) $$
(8)
$$ w = wmin \ast \left( {\frac{wmax}{{wmin}}} \right)^{\left( {\frac{1}{{1 + 10\; \ast \frac{t}{T}}}} \right)} $$
(9)

where \(w\) is the self-learning coefficient of the chick, which is very similar to the inertial weight in particle swarm optimization algorithm. \( wmin\) is the minimum inertial weight, \(wmax\) is the maximum inertial weight, \(t\) is the current number of iterations, and \(T\) is the maximum iteration. Let \(C\) denote the learning factor, which means that the chick is affected by the rooster in the subgroup. \(r\) is the index of the rooster which is the chick’s father.

3.3 Experimental Results and Analysis

To verify the effectiveness of the ICSO algorithm, a comparison experiment is set up. The algorithms in comparison are chicken swarm optimization algorithm (CSO), genetic algorithm (GA), and particle swarm optimization algorithm (PSO).

3.4 Fitness Function

Each particle in the chicken swarm corresponds to a solution of feature selection. The particles are coded by real numbers, as shown in Eq. (10). Each solution \(X\) contains \(n\) real numbers, and \(n\) represents the total number of features of the corresponding dataset, where each dimension \(x_i\) represents whether to select this feature. To form a feature subset, it is necessary to perform a decoding process before decoding. The position of the particle can be converted into a subset of the following features:

$$ {\text{X = }}\left[ {{\text{x}}_{1} ,\;{\text{x}}_{2} ,\; \ldots ,\;{\text{x}}_{\text{n}} } \right] $$
(10)
$$ A_{\text{d}} = \left\{ \begin{gathered} 1,x_d > 0.5 \hfill \\ 0,else \hfill \\ \end{gathered} \right. $$
(11)

where \(A_d\) represents the feature subset decoded from the d-dimension of each solution. \(A_d\) can be selected as 0 or 1, according to the value \(x_d\) of the d-dimensional feature of the particle: if \(A_d\) = 1, it means that the d-dimensional feature is selected; otherwise, the dimensional feature is not selected.

The purpose of feature selection is to find a combination that has the highest classification accuracy and the smallest number of selected features. Although it is a combination, the classification accuracy is the first consideration. The fitness function is to maximize classification accuracy over the test sets given the train data, as shown in Eq. (12) at the same time keeping a minimum number of selected features.

$$ Fitness\left( i \right) = \alpha \ast ACC\left( i \right) + \left( {1 - \alpha } \right) \ast \left( {\frac{FeatureSum\left( i \right)}{{FeatureAll}}} \right) $$
(12)

where \(\alpha\) is a constant less than 1 and bigger than 0, which controlling the importance of classification accuracy to the number of selected features. The bigger the \(\alpha\), the more important the classification accuracy. \(ACC\left( i \right)\) is the classifier accuracy of the particle \(i\). \(FeatureSum\left( i \right)\) is the number of features corresponding to the particle \(i\). \(FeatureAll\) is the total amount number of features in the dataset.

3.5 Parameters Setting

In this paper, all comparative experiments work on a PC that has 8GB of memory, and the programming environment is Python 3.8.5. Let set 50 is the population size, the \(\alpha\) in the fitness function is set to 0.9999, 20 independent running experiments are performed on the datasets, and setting 500 is the maximum number of iterations. The KNN (K = 5) classifier is used to test the classification accuracy of the selection scheme corresponding to each particle. The hyperparameter settings of each algorithm are shown in Table 1. The information of the eighteen UCI datasets is described in Table 2. Most datasets are two-class, as well as there are multi-class datasets. It can be seen intuitively that the largest number of features is 9 and the lowest is 309 in datasets.

Table 1. Hyperparameter settings
Table 2. Datasets description

3.6 Results and Analysis

Table 3 shows the experimental results of the ICSO algorithm and the other three comparison algorithms on eighteen datasets. Where bold fonts represent the largest mean classification accuracy among all algorithms. It can be seen intuitively from Fig. 1 that the ICSO algorithm has obtained the best results on eighteen test datasets. And the mean accuracy of the ICSO algorithm is more excellent than the CSO algorithm, the mean accuracy of the CSO algorithm is more excellent than the PSO algorithm, the mean accuracy of the PSO algorithm is more excellent than the GA algorithm, the mean accuracy of the GA algorithm in feature selection is the worst. Through observation and calculation, the datasets with poor mean accuracy on full features, such as Wine, LSVT, Arrhythmia, etc., after the ICSO algorithm feature selection, the mean accuracy increases by 20% ~ 50%. Datasets with better mean accuracy on full features, such as Breast Cancer, WDBC, Zoo, etc., after the ICSO algorithm feature selection, the mean accuracy was improved by less than 10%. The experimental results fully verify the superiority of the ICSO algorithm.

Table 3. Mean accuracy for the different algorithms
Fig. 1.
figure 1

Mean accuracy line chart

Table 4 lists the mean features and dimension standard deviation of the four algorithms after feature selection for each dataset. It can be seen intuitively that, compared with the GA algorithm and the PSO algorithm, the CSO algorithm and the ICSO algorithm have obvious dimensionality reduction effects, and the dimensional standard deviation is low, indicating that the algorithm stability is relatively high. The experimental results directly verify that the ICSO algorithm has a strong superiority in eliminating redundant features, and can achieve better classification accuracy on datasets, while greatly reducing the number of redundant features.

Table 4. Mean and Std dimension after different algorithm feature selection

4 Conclusions

Swarm intelligence optimization achieved good results in the feature selection problem. In the chicken swarm optimization algorithm, there is a weakness in that it is still easy to fall into the local minimum. To overcome this, this paper proposes an improved chicken swarm optimization algorithm. On the basis of the population grouping update mechanism of the CSO algorithm, the ICSO algorithm introduces the Levy flight strategy in the hen location update strategy and the nonlinear strategy of decreasing inertial weight in the chick location update strategy to enhance the algorithm's global search ability and decrease the probability of the algorithm falling into the local minimum. It can be seen from the experimental results that compared with the other three related algorithms, the ICSO algorithm can tremendously decrease the redundant features while ensuring classification accuracy in the feature selection.