A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification

Sun, Xiang; Guo, Shunsheng; Liu, Shiqiao; Guo, Jun; Du, Baigang

doi:10.1007/s11063-024-11440-3

A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification

Open access
Published: 06 March 2024

Volume 56, article number 86, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification

Download PDF

Xiang Sun^1,2,
Shunsheng Guo^1,2,
Shiqiao Liu^3,4,
Jun Guo^1,2 &
…
Baigang Du^1,2

429 Accesses
Explore all metrics

Abstract

The processing of high-dimensional datasets has become unavoidable with the development of information technology. Most of the literature on feature selection (FS) of high-dimensional datasets focuses on improvements in search strategies, ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could degrade the algorithm's search effectiveness. Thus, this paper proposes a correlation-redundancy guided evolutionary algorithm (CRGEA) to address high-dimensional FS with the objectives of optimizing classification accuracy and the number of features simultaneously. A new correlation-redundancy assessment method is designed for selecting features with high relevance and low redundancy to speed up the entire evolutionary process. In CRGEA, a novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. A local acceleration evolution strategy based on a parallel simulated annealing algorithm and a pruning method is developed, which can search in different directions and perform deep searches combing the annealing stage around the best solutions to improve the local search ability. Finally, the comparison experiments on 16 public high-dimensional datasets verify that the designed CRGEA outperforms other state-of-the-art intelligent algorithms. The CRGEA can efficiently reduce redundant features while ensuring high accuracy.

MPF-FS: A multi-population framework based on multi-objective optimization algorithms for feature selection

Article 24 June 2023

Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering

Article 24 April 2024

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The classification of data is an indispensable subject in the domain of machine learning and data mining [1]. It serves as a pervasive technique for discerning and categorizing different system states, thereby finding extensive applications in various fields. As a pre-processing technology, feature selection (FS) is widely used for eliminating redundant and irrelevant features from the original dataset to improve the efficiency and accuracy of the classification model [2]. However, the number of measurements to measure the state of a system has increased dramatically with the development of information technology. High-dimensional datasets may contain more redundant and irrelevant features[3], which definitely puts more pressure on FS to filter out the best subset of features. Thus, it is of great significance to minimize the number of features as much as possible to enhance model efficiency for FS of high-dimensional datasets apart from ensuring classification accuracy. This makes an urgent need for designing an efficient method for solving multi-objective FS (MOFS) of high-dimensional datasets.

FS can be regarded as a problem of combinatorial optimization [4]. The exhaustive method can be employed to obtain all possible combinations of features and determine the subset that optimizes the evaluation criteria. Nevertheless, the solution space for a set of $n$ features is ${2}^{n}-1$, and the computational time required for the exhaustive method grows exponentially with the number of feature dimensions [5]. Thus, despite its simplicity, the exhaustive method proves challenging to implement in practical applications [6]. To solve this problem, many heuristic algorithms have been proposed for solving FS problems. Among these methodologies, sequential forward selection (SFS) and sequential backward selection (SBS) [7] emerge as two prominent approaches. These methods involve initializing either an empty set or a complete set and subsequently iteratively adding or removing features to optimize the subset of features. Despite improving search efficiency relative to exhaustive search, a limitation of both SFS and SBS is that it cannot be reconsidered for further inclusion or exclusion once a feature has been included or excluded. But some features are functionally interconnected, exhibiting a correlated coupling effect. To address this drawback, a novel selection method known as the plus-l minus-r selection method [8] has been introduced. This method effectively alleviates the aforementioned concern associated with SFS and SBS, but the plus and minus of features are fixed. Based on this, the sequential floating forward selection [9] and sequential floating backward selection [10] are designed based on plus-l minus-r selection method, where the plus and minus features can change each time. Although the heuristic methods can significantly reduce the time complexity in comparison to exhaustive search, they often encounter the challenge of being susceptible to local optima due to their fixed search strategies. Thus, intelligent algorithm (IA), as an ideal method has received increasing attention in recent years for its ability to achieve near-optimal/optimal solutions within a reasonable time [11, 12].

However, IA is easy to be trapped in suboptimal solutions when handling FS in high-dimensional datasets [13, 14]. Recently, many search strategies have been designed to improve the search efficiency of the meta-heuristic algorithm. Based on the Pearson correlation coefficient, Kabir et al. [15] developed a novel local search operation that relied on the distinct and informative nature of input features, as computed through correlation information. Moradi and Gholampour [16] introduced a local search strategy that selected distinct features by considering their correlation information. However, the Pearson correlation coefficient can only measure linear relationships between variables, not non-linear relationships, and is sensitive to outliers. To overcome this shortcoming, mutual information is used to measure the correlation between variables [17]. Vinh et al. [18] introduced a correlation-redundancy metric (CRM) to characterize the correlation and redundancy of each feature, which denoted the correlation between a feature and the labels, as well as the redundancy between two features. Furthermore, Wang et al. [19, 20] designed a “relevance-redundancy index” for evaluating the feature importance based on CRM in an ant colony optimization to classification. Song et al. [21] proposed supplementary and deletion operators to enhance the local search ability. However, CRM can measure the feature when the feature has a high correlation between class labels and a low correlation with other features. When both the correlation and redundancy of a feature are high, it will cause the probability of the feature to be close to 0, which will make features with high correlation be discarded. Thus, a novel evaluation function is designed to measure the correlation-redundancy (CR) of each feature. Based on this, a novel local search strategy is proposed which considers both the search depth and the CR of features simultaneously.

Additionally, population initialization also significantly impacts the performance of IA. Most existing literatures use traditional random initialization (TRI) [1, 4, 16, 22] which may influence the quality of the initial population, resulting in slowing down the overall convergence rate [23]. To handle this issue, Xu et al. [24] devised a split initialization approach to enhance the diversity of the initial population, thereby amplifying the breadth of the exploration process. Xue et al. [25] introduced an interval initialization technique that constrained the selection of features to a specific number. Xue et al. [26] devised three novel warm initialization strategies, namely small, large, and mixed initialization methods. The above papers focus on the controlling the quantity of selected features to enhance diversity, disregarding the importance of the selected features themselves, such as the correlation and redundancy of each feature. To overcome this problem, Deniz et al. [27] employed an information gain technique to evaluate the significance of each feature and subsequently initialized the population based on the obtained scores. Song et al. [21] developed a swarm initialization strategy that leverages label correlation as a guiding principle. Mohsen et al. [28] employed cosine similarity in their pheromone initialization approach. Li and Ren [29] designed a swam initialization method based on the maximal information coefficient between the feature and label. Although the aforementioned studies considered the correlation between features and labels, the redundancy between features is ignored. Thus, this paper introduces a novel initialization strategy, which considers a new evaluation function for evaluating the CR of each feature and utilizes it to determine the probability of selecting a feature. This provides an efficient way to get a high-quality initial population and reduce the solution space, thereby expediting the convergence speed.

To sum up, this study proposes a correlation-redundancy guided evolutionary algorithm, named CRGEA, to solve high-dimensional MOFS. A new correlation-redundancy assessment method (CRAM) is designed for measuring the CR of each feature, which guides the entire evolutionary process. In CRGEA, a novel initialization strategy combined with a multiple threshold selection mechanism (ISMTS) is developed to produce a high-quality initial population. A local acceleration evolution strategy (LAES) based on a parallel simulated annealing algorithm and a pruning method is developed to improve the local search ability of the CRGEA. The contributions of this paper can be summarized as follows:

(1) A new correlation-redundancy assessment method is designed in this paper for measuring the CR of each feature and paying more attention to the feature with high relevance and low redundancy of high-dimensional datasets to speed up the entire evolutionary process;

(2) A correlation-redundancy guided evolutionary algorithm is designed for solving MOFS of high-dimensional datasets. In CRGEA, CRAM first evaluates the CR of each feature, and produces high-quality initial populations. LAES performs deep searches combing the annealing stage around the best solutions to improve the local search ability of the CRGEA, which facilitates individuals in escaping local optima solutions and obtaining the best features subset;

(3) Numerical experiments are conducted based on 16 high-dimensional datasets, where nine datasets contain more than 1000 dimensions, three datasets have over 6000 dimensions, and two datasets have over 10,000 dimensions. The experiment results show that CRGEA can efficiently reduce redundant features while ensuring high accuracy compared to other state-of-the-art intelligent algorithms.

The following sections of this paper are structured in the following manner: Sect. 2 provides related works about MOFS, mutual information, K-Nearest Neighbor algorithm, etc. Section 3 presents a comprehensive explanation of the designed CRGEA. Section 4 describes the results obtained from analyzing 16 publicly high-dimensional datasets. In Sect. 5, a summary and discussion are provided based on the outcomes of the three numerical experiments. Finally, Sect. 6 concludes with an overview of the findings and future research directions.

2 Related Work

2.1 Multi-objective Feature Selection for High-Dimensional Datasets

In most existing studies, FS has usually been regarded as a single objective problem for improving classification accuracy [21, 30,31,32,33,34]. However, the feature subset often contains redundant features, which would reduce algorithm efficiency and classification accuracy. Therefore, some researchers introduce the number of features in the FS to remove the redundancy and irrelevant features [35,36,37,38,39,40,41,42]. The average classification error and the number of features are combined into one goal by setting a weight. The weight for the two targets requires expert experience and may be different for different datasets, which makes it hard to put into practice. In addition, the early research of FS focused on small and medium-sized datasets. With the advent of the information era, more and more research has begun to focus on FS of high-dimensional datasets. However, most of the literature on high-dimensional datasets focuses on improvements in search strategies [13, 22, 23, 25, 43], single objective problems [44,45,46], and grouping features considering relevance between features [47, 48], ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could potentially degrade the algorithm's search effectiveness. Thus, the MOFS for high-dimensional datasets considering correlation-redundancy of each feature is studied in this study to minimize classification error and the number of features simultaneously. The MOFS can be described as follows:

Let the input be the normalized dataset $X\in {R}^{n\times D}$, where $n$ indicates the number of samples in $X$ and $D$ indicates the dimension of $X$. The feature set $S=\left\{{f}_{1},{f}_{2},\dots ,{f}_{D}\right\}$ represents the dimensional features of the $X$. The aim of MOFS is to optimize a function vector $f\left({\text{x}}\right)=\left\{{f}_{1}\left(x\right),{f}_{2}\left(x\right),\dots ,{f}_{m}\left(x\right)\right\}$, where $m$ represents the number of objectives to be optimized.

2.2 Mutual Information

Mutual information (MI) can evaluate the relationship between two discrete variables [49], which is widely used to evaluate the relevance between features and labels as well as the redundancy between features in research [50,51,52]. However, the number of features needs to be specified in advance and cannot automatically obtain the optimal number of features and classification accuracy in the above research. Thus, the MI is used in intelligent algorithms to optimize classification accuracy and the number of features simultaneously offer end-to-end convenience in this paper. The MI is described as Eq. (1):

$$I\left(x,y\right)=\sum_{y}\sum_{x}p\left(x,y\right)lg\frac{p\left(x,y\right)}{p\left(x\right)p(y)}$$

(1)

where x and y are two random variables. The edge probabilities of x and y are denoted as $p(x)$ and $p(y)$ respectively, while $p(x,y)$ represents their joint probability density. Building upon this foundation, Vinh et al. [18] introduced a CR metric to characterize the correlation and redundancy of each feature. CR denotes the correlation between a feature and the sample labels, as well as the redundancy between two features. For the $i-th$ feature within a feature subset, its CR value can be defined as Eq. (2).

$${\varphi }\left({x}_{i}\right)=I\left({x}_{i},c\right)-\frac{1}{|D|}\sum_{{x}_{i}\in L}I\left({x}_{i},{x}_{l}\right)$$

(2)

where ${\varphi }\left({x}_{i}\right)$ represents the CR of feature ${x}_{i}$, $c$ means the class label of ${x}_{i}$, and $|D|$ means the size of the feature subset.

2.3 Non-dominated Sorting Genetic Algorithm II

The non-dominated sorting genetic algorithm II (NSGA-II) [53] employs the concept of Pareto, which can optimize multiple objectives simultaneously and realize a balance between all objectives [54]. It has been successfully used in many research fields [55,56,57,58]. NSGA-II has been transformed into a multi-objective IA by combining non-dominated rank and crowding distance, which can provide a set of no-dominated solutions (NDSs). The solutions are compared based on the non-dominance rank and the crowding distance. The solution with the lower non-dominated rank is preferred if solutions belong to different fronts. Otherwise, the solution with the bigger crowded distance is preferred.

2.4 Simulated Annealing

The simulated annealing algorithm (SA) [59] accepts an inferior solution with a probability and finds better solutions through multiple cycles. This method can enhance the local search ability, which facilitates individuals in escaping local optima solutions. Thus, SA has a strong local search capability [60]. Equation (3) is widely applied in single-objective problems to determine the probability of accepting a new solution, but it is not suitable for multi-objective problems. Thus, a new probability is introduced as Eq. (4) in this paper [61].

$${P}_{SA}=\left\{\begin{array}{l}1, {f}^{new}<{f}^{cur} \\ {{\exp}}\left(-\frac{{f}^{new}-{f}^{cur}}{T}\right), {f}^{new}\ge {f}^{cur}\end{array}\right.$$

(3)

$${P}_{SA}=\left\{\begin{array}{l}1, \exists m, {f}_{m}^{new}<{f}_{m}^{cur} \\ {\prod }_{m=1}^{M}{{\exp}}(-\frac{{f}_{m}^{new}-{f}_{m}^{cur}}{T}), \exists m, {f}_{m}^{new}\ge {f}_{m}^{cur}\end{array}\right.$$

(4)

where $m$ {1, 2, …, $M$}, ${f}_{m}^{new}$ and ${f}_{m}^{cur}$ are the $m-th$ objective function values of new solution ${X}_{new}$ and current solution ${X}_{cur}$ respectively.

2.5 K-Nearest Neighbor Algorithm

The K-Nearest Neighbor algorithm (KNN) is a simple clustering algorithm that can produce competitive and easily interpretable results [33]. It is widely recognized and extensively employed in the field of pattern recognition for its effectiveness in both classification and regression tasks [42]. In the classification phase, a new test sample is calculated according to Eq. (5) to compute the distance between the sample and all training samples, and then the majority label of the $k$ training samples closest to the unlabeled sample is assigned to classify the unlabeled sample.

$${\text{d}}(X,Z)={\left(\sum_{i=1}^{n}({z}_{i}-{X}_{i})\right)}^{1/2}$$

(5)

where X and Z are D-dimensional vectors in the feature space; In addition, the cross-validation [62] was introduced to assess the classification accuracy for KNN, which leads to the highest classification generalizability (Fig. 1).

3 Proposed CRGEA for Feature Selection

This section presents the details of CRGEA to solve the MOFS. The flowchart of the CRGEA is described in Fig. 2. Algorithm 1 describes the main procedures of the designed CRGEA.

As shown in Fig. 1, firstly, a novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. Then, the multi-point crossover operator and hybrid mutation strategy are applied to search in multiple directions to improve the search ability. Then, the potential NDSs are obtained by the genetic operations. Next, a local acceleration evolution strategy is proposed to enhance the quality of NDSs and overall population quality for the next generation, which further improves the local search ability of the CRGEA. Select superior individuals to participate in next generation evolution. Genetic operations and local search strategy are iteratively performed until the stopping criterion is met, otherwise the algorithm is stopped and the best solutions are reported. The additional details of the CRGEA are described in the following subsections.

3.1 Binary Encoding Strategy and Decoding

The aim of FS is to choose an appropriate subset from the pool of available features and make the optimization objectives optimal. Each feature can exist in one of two states: selected or unselected. We use a binary encoding strategy, i.e., a 0–1 encoding pattern. 1 indicates the selection of the corresponding feature, whereas 0 denotes its exclusion. More specifically, if an individual is encoded as a vector of 01001000, it indicates that the number of features is 8 and the second feature and the fifth feature are selected. Then, the objectives can be calculated using a KNN classifier with K-fold crossover validation. The first objective of average classification error is described by Eq. (6).

$${{\min}}{f}_{1}\left(x\right)=\frac{1}{n}\sum_{k=1}^{n}\frac{{Nerr}_{k}}{Nval}$$

(6)

where $k$ represents $k-th$ clustering process; $x$ is a feasible solution.${Nerr}_{k}$ and $Nval$ are the number of misclassified samples and the total number of all samples in the $k-th$ classification process, respectively. The second objective of the number of features is described by Eq. (7).

$${{\min}}{f}_{2}\left(x\right) = \sum_{i=1}^{D}{x}_{i}$$

(7)

where ${x}_{i}$ indicates the value of $i-th$ position on the sample $x$, and $D$ represents the number of features.

3.2 Initialization Strategy

The quality of the initial population seriously influences the convergence speed of the meta-heuristic-based algorithm. A novel initialization strategy combined with a multiple threshold selection mechanism (ISMTS) is developed to produce a high-quality initial population. Firstly, a CRAM is designed to assess the CR of each feature. This strategy overcomes the shortcomings of evaluating CR for each feature using Eq. (2), i.e., when both correlation and redundancy of features are high, it will cause the probability of features to be close to 0, which will make such features be discarded. Moreover, a multiple thresholds selection strategy is proposed to restrict the number of selected features and enhance the distribution of the population in the target space. This approach ensures that the initial solutions maintain a high level of quality and diversity. The threshold vector is set as $T=[{0.50,0.65,0.80}]$, and ${T}_{m}$ means the $m-th$ value in $T$. The following definitions are used to describe the details of the designed ISMTS:

Definition 1

(${C}_{iy}$) Assume the labels vector of feature ${x}_{i} (i={1,2},\dots ,D)$ is ${\text{y}}$. $D$ is the number of features. The correlation between ${x}_{i}$ and $y$, is calculated by Eq. (8), denoted as ${{\text{C}}}_{iy}$, as follows:

$${C}_{iy}=I\left({x}_{i},y\right) = \sum_{y}\sum_{{x}_{i}}p\left({x}_{i},y\right)lg\frac{p\left({x}_{i},y\right)}{p\left({x}_{i}\right)p(y)}$$

(8)

where ${x}_{i}$ and y are two random variables. The edge probabilities of ${x}_{i}$ and y are denoted as $p({x}_{i})$ and $p(y)$ respectively, while $p({x}_{i},y)$ represents their joint probability density. The importance of the feature to class labels decreases as the value of ${{\text{C}}}_{iy}$ decreases. In addition to retaining the most relevant features, redundant features should be removed where possible. In this paper, ${R}_{i}$ is used to assess redundancy between feature $i$.

Definition 2

(${R}_{i}$) Let ${x}_{i} ,{x}_{j} (i,j={1,2},\dots ,D)$ be the samples of the feature$i,j$, the redundancy of feature $i$ is denoted as${R}_{i}$, as follows:

$${R}_{i}=\frac{1}{|D|}\sum_{j=1}^{D}I({x}_{i},{x}_{j}), i\ne j,i,j={1,2},\dots ,D$$

(9)

Definition 3

(${CR}_{i}$) A new correlation-redundancy assessment method is designed to describe the CR of feature $i$ (${CR}_{i}$), which is calculated by:

$${CR}_{i}={C}_{iy}-\alpha \frac{\mathit{min}\left({C}_{iy}, { R}_{i}\right)}{\mathit{max}\left({C}_{iy}, { R}_{i}\right) },i={1,2},\dots ,S$$

(10)

where $\alpha $ ($\alpha =0.3$ in this paper) is a moderator that is used to adjust the possibility of feature being selected when high correlation and high redundancy occurs. As shown in Eq. (10), it prefers features with high correlation and low redundancy. And it will reduce the probability of feature when it has high correlation and high redundancy. Based on this idea, we normalize ${CR}_{i}$ and define ${P}_{i}$ to determine the probability of a feature is selected.

Definition 4

(${P}_{i}$) The probability of a feature is selected, denoted as ${P}_{i}$, is calculated by:

$${P}_{i}=\frac{{CR}_{i}-{}_{j=1,\dots ,D}{}^{min}({CR}_{i})}{{}_{j=1,\dots ,D}{}^{max}({CR}_{i})-{}_{j=1,\dots ,D}{}^{min}({CR}_{i})}, i={1,2},\dots ,D$$

(11)

Decide whether a feature $i$ will be selected by comparing the ${P}_{i}$ with the threshold ${T}_{m}$ in $T$. The proposed ISMTS is stated as follows procedures, and corresponding flowchart is described as Fig. 2.

Step 1: Parameters initialization: initial population $(IP={\varnothing })$, the size of population ($N$), the number of features ($D$), the probability of feature $i$ is selected (${P}_{i}$);

Step 2: Set $n=1$, which is the index of individual ${x}_{n}$; Set $m=1$, which is the index of threshold vector $T$;

Step 3: Set individual ${x}_{n}\left({x}_{n1,}..,{x}_{nj,}..,{x}_{nD}\right).$ Set $j=1$, which is index of feature;

Step 4: If ${P}_{j}>{T}_{m}$,${x}_{nj} =1$, else ${x}_{nj} =0$;

Step 5: If$j\le D$, Set$j=j+1$, go to Step 4, elseif $n\le ((0.75*N)/3)*m,$ add ${x}_{n} {\text{to}}$ set $ IP,n=n+1$. Set$j=1$, go to Step 4, elseif $m\le 3 ,$ $m=m+1$ go to Step 4, else $n=n+1,$ set$j=1$, go to Step 6.

Step 6: If ${P}_{j}>rand$,${x}_{nj} =1$, else ${x}_{nj} =0$;

Step 7: If $j\le D$, Set $j=j+1$, go to Step 6, elseif $n\le N,$ add ${x}_{n} {\text{to}}$ set $ IP,$ $n=n+1$, Set $j=1$, go to Step 6, else output the obtained initial population $IP.$

3.3 Multi-points Crossover Operator

The crossover operator performs a global search by exchanging the parent's excellent genes, which helps to improve the global search ability. The crossed individuals are compared with their parents by Pareto analysis, and the better two of these individuals are selected to enter the next evolution. Furthermore, an external file (EF) is designed to store solutions that are not dominated by the current solution during the search, which will be used in the selection operator to make an improvement of the overall population. Procedure 1 shows the detailed process of the crossover procedure.

3.4 Hybrid Parallel Mutation Operator

Mutation operator improves the diversity of the population by changing the structure of existing chromosomes. Considering that the aim of FS is to obtain subsets with strong correlation and low redundancy, the proposed mutation strategy specifically designs two mutation strategies to change the number of selected feature quantities, i.e., addition strategy and deletion strategy. This guides individuals to search in different directions to improve the local search ability. The proposed novel evaluation function in Eq. (10) is adopted to determine adding or deleting a gen in a selected individual. The mutated individual $x$ is selected according to the mutation probability ${P}_{m}$. The selected features of $x$ is ${S}_{s}$, and all unselected feature is ${S}_{us}$. According to Eq. (11), calculated ${P}_{i}$ for all unselected features in ${S}_{us}$. The addition strategy and deletion strategy are presented below:

Addition strategy: The feature with maximum ${P}_{i}$ is added to ${S}_{s}$, and its corresponding gene is set to 1. Get the mutated individual ${x}_{1}$;

Deletion strategy: The feature with the smallest value of CR, f, is removed from ${S}_{s}$, and its corresponding gene is set to 0. Get the mutated individual ${x}_{2}$.

As shown in Fig. 3, the designed mutation strategies can change the number of the selected features directionally, which makes the solutions search in different directions and improves the local search ability. The details of the hybrid parallel mutation operator are stated in Procedure 2.

3.5 Local Acceleration Evolution Strategy

In this paper, a local acceleration evolution strategy (LAES) based on a parallel simulated annealing algorithm (PSA) with two search strategies i.e., addition strategy and deletion strategy, is employed to enhance the quality of NDSs derived from genetic operations. This allows LAES to perform a depth parallel search combing the annealing stage around the NDSs, which facilitates individuals in escaping local optima solutions. Unlike the traditional SA, this paper uses parallel SA according to the different characteristics of the addition strategy and deletion strategy, which can search in different directions and improve search efficiency. In addition, a pruning strategy is introduced into LAES to avoid invalid searches and improve search efficiency. If the new solution obtained by a search strategy is dominated by its parent, the strategy is disabled in the next search phase. If the new solution dominates its parent, the strategy is still retained in the next search phase, which indicates that the search strategy of features reduces the classification error. The description of the LAES is indicated in Procedure 3.

3.6 Selection Operator

The selection operator aims to identify individuals with relatively superior performance for inclusion in the next generation population. The NDSs are considered individuals exhibiting better performance. The rapid non-dominated sorting technique and the crowding distance algorithm [53] are employed to identify the NDSs after the genetic operations. The selection operator is executed as follows: Firstly, combine the offspring population ($OP$) after LAES and the $EF$ as $NP$. Then, execute rapid non-dominated sorting for $NP$. Assuming that individuals need to be selected from the first $n$ layers. Finally, individuals from the first $n-1$ front are added to the next generation population $GP$, whose number is $m$. And $N-m$ number of individuals from the $n-th$ front is selected to $GP$ according to the crowding distance.

4 Experimental Study

This section conducts three experiments to validate the effectiveness of the designed CRGEA and search strategies for solving MOFS. The first experiment is to compare the CRGEA with using all features to indicate the necessity of performing FS in classification. The second experiment is to verify the effectiveness and superiority compared with the other five well-known algorithms. In the last experiment, an ablation experiments and statistical analysis are used to prove that the designed strategies are effective. The KNN is used as the classifier to evaluate the selected feature subsets, and K-fold ($K=5$) cross-validation is introduced to assess the classification accuracy of KNN. All experiments in this study are coded in MATLAB R2018a and run on a PC with an Intel (R) Core (TM) i5-12400F CPU and 16G of RAM. The remaining parameters of the have been experimentally determined and set as follows: the population size equals 60, the maximum iteration equals 200, the crossover probability equals 0.9, the mutation probability equals 0.2, the initial temperature equals 10, the final temperature equals 1, the cooling coefficient equals 0.7, the number of cycles at each temperature level equals 3.

All experiments are based on 16 well-regarded high-dimensional datasets involving different domains from the research repository. The first five datasets are sourced from research [21], datasets 6–8 are sourced from research [1], and the last nine datasets are sourced from research [63]. Note that we preprocess the data at first, i.e., remove missing data and features with a constant value to reduce search space. Table 1 presents the details of the dataset, including the varying numbers of features, samples, and classes across different datasets. Unlike many existing studies, the cases in this paper are high-dimensional datasets.

Table 1 The information of the high-dimensional datasets

Full size table

4.1 Verification of FS Model

This section compares the CRGEA with the method using all features (without FS) to reveal the necessity of FS in classification. The results are shown in Table 2. The CRGEA can obtain a set of no-dominated solutions, the table shows the solution with the best ${f}_{1}$. The ${RR}_{1}$ and ${RR}_{2}$ mean the reduction rate of ${f}_{1}$ and ${f}_{2}$ respectively. As described in Table 2, significant improvements have been made in both indicators ${f}_{1}$ and ${f}_{2}$. The CRGEA achieves the greatest improvement on WarpAR10P and WarpPIE10P of ${f}_{1}$, which is reduced by 100%. The CRGEA achieves the most dimensionality reduction on WarpPIE10P, and its subset size is reduced by 99.6694%. The results verify the necessity of performing FS in classification, which can reduce the classification error while reducing the size of datasets.

Table 2 Experimental results of CRGEA and using all features

Full size table

4.2 Performance Analysis of the Designed CRGEA

To analyze the performance of the designed CRGEA, the widely applied NSGA-II [53], MOEA/D [64], SPEA-II [65], MOBGA-AOS [4], MOEA-ISa [25] and ISAGA [60] are used as the comparison algorithms. To ensure fairness, all algorithms have the same number of iterations, i.e. 200 generations. All parameters of the comparison algorithms have been experimentally determined as follows. All algorithms also have the same population size. The ${P}_{c}$ and ${P}_{m}$ of NSGA-II equal 0.9 and 0.2 respectively. The T (the number of the weight vectors in the neighborhood of each weight vector) of MOEA/D equals 30. The ${P}_{c}$ and ${P}_{m}$ are of MOBGA-AOS equals 0.9 and 1/D respectively as literature [4]. For MOEA-ISa, the crossover probability equals 1. For ISAGA, the initial temperature, the number of cycles, the cooling rate and iteration number in the simulated annealing operation are set to 1, 10, 0.9 and 3, respectively.

The best and average value of ${f}_{1}$ and ${f}_{2}$ obtained by seven algorithms in 30 runs are shown in Table 3. The results indicate that CRGEA demonstrates competitiveness against all comparison algorithms across all datasets. The CRGEA can obtain the best values of both ${f}_{1}$ and ${f}_{2}$ on all datasets, which means that the designed algorithm has a stronger search capability and the obtained solutions are closer to the ideal Pareto front. MOEA-ISa achieves the best ${f}_{1}$ by employing interval initialization which constrains the number of selected features. However, the solution obtained by MOEA-ISa is not as good as that obtained by CRGEA. In addition, compared to the remaining five algorithms using random initialization, they tend to fall into local optima easily for high-dimensional problems, such as Isolet1, WarpPIE10P, Orl, Qsar, and only find sub-optimal solutions. On the WarpAR10P dataset, the CRGEA obtained the minimum best value in ${f}_{1}$, the average performance is worse compared to the other algorithms. This is because the solutions of the designed CRGEA are more widely distributed over ${f}_{1}$. The search in the ${f}_{1}$ direction of the comparison algorithms falls into a local optimum. Figure 4 supports this conclusion. The above results verify the superiority of the designed CRGEA.

Table 3 The best and average values of ${f}_{1}$ and ${f}_{2}$ obtained by seven algorithms

Full size table

Figure 4 indicates the NDSs acquired by the seven comparison algorithms in 30 runs to illustrate the distribution of the obtained solutions. Each algorithm yields multiple solutions, and for enhanced visualization, we present only the normalized NDSs, represented by seven distinct colors. Apparently, the NDSs obtained by CRGEA exhibit dominance over a significant portion of the non-dominated solutions obtained by other comparison algorithms. The solutions obtained by CRGEA also have better convergence and diversity, as they closely approximate the true Pareto front surface.

The evaluation of the distribution and convergence of the solutions for each algorithm is conducted using two widely recognized performance metrics, i.e., the hypervolume (HV) [66] and the inversion generation distance (IGD) [67]. Due to the Pareto frontier of each dataset is not known, we aggregate the solutions obtained by all algorithms and subsequently select the first non-dominated front as the ideal non-dominated frontier surface. Figure 5 describes the results of IGD and HV in 30 runs. The results show that the IGD and HV obtained by the proposed CRGEA are superior to the other algorithms when dealing with high-dimensional datasets.

Additionally, the comparison between the performance of the other algorithms and CRGEA is examined using the Wilcoxon rank sum test with a significance level of 0.05. This statistical analysis helps determine if there are significant differences in the performance of the comparison algorithms compared to CRGEA. The HV is used as a comprehensive evaluation metric in the Wilcoxon test. The results in Table 4 reveal that the results of CRGEA have statistical significance compared with other comparison algorithms on all the high-dimensional datasets.

Table 4 p-values of the Wilcoxon test of between CRGEA and five comparison algorithms

Full size table

To illustrate the relationship between performance improvement and computational cost, Table 5 summaries the average CPU time of comparison algorithms. The comparison algorithms show similar time consumption for all datasets. Classifiers are usually necessary in FS to evaluate schemes performance, which consumes most of the algorithm's computation time [21]. In addition, the number of samples and features is the main determinant of the time cost of a classifier. By using the local search operator, the designed CRGEA requires additional fitness evaluations, CRGEA consumes much more runtime. However, it achieves the best classification performance. Overall, the proposed CRGEA algorithm, although adding several new operators, still has acceptable running times similar to other comparison algorithms.

Table 5 Running time consumed by the seven algorithms on the seventeen datasets (unit: minute)

Full size table

4.3 Effectiveness Validation of the Proposed Strategies

To validate the effectiveness of the designed search techniques in CRGEA, an ablation experiment is conducted. The HV is employed to evaluate the performance of algorithms. The origin algorithm NSGA-II, named MOEA; Combine the MOEA with the proposed initialization strategy (MOEA-ISMTS). Combine the MOEA with the proposed local acceleration evolution strategy (MOEA-LAES). The average values of HV obtained by CRGEA and other variants in 30 runs are shown in Table 6.

Table 6 The results of the designed CRGEA and other variants

Full size table

As described in Table 6, it can be observed that both MOEA-ISMTS and MOEA-LAES exhibit superior performance compared to MOEA across all datasets. This finding strongly suggests that the proposed ISMTS and LAES employed in these algorithms are indeed effective. Moreover, MOEA-ISMTS performs better than MOEA-LAES when the number of features exceeds 1000, suggesting that the designed ISMTS is more effective in solving high-dimensional MOFS. MOEA-LAES performs better than MOEA-ISMTS when the number of features is less than 1000. This confirms that local search is more effective than MOEA-ISMTS when dealing with small-scale problems. The results obtained by CRGEA are superior to MOEA-ISMTS, which indicates that the local acceleration strategy can further improve the search ability and facilitates individuals in escaping local optima solutions. In addition, the Wilcoxon test is applied to analyze the statistical significance of the results between MOEA and the other three variants. The results of the Wilcoxon test are shown in Table 7, which confirms that the results in Table 6 have statistical significance on all the datasets.

Table 7 p-values of the Wilcoxon test of between MOEA and three comparison algorithms

Full size table

Additionally, to verify the superiority of the designed ISMTS, a comparison experiment is conducted by comparing with the traditional random initialization (TRI) method. Taking three different scale datasets (WarpAR10P, Multiple Features and DrivFace) as an example, the distributions of individuals using ISMTS and TRI are described in Fig. 6. The results show that the proposed ISMTS method exhibits superior convergence and diversity across various datasets in comparison to the TRI. This serves as further confirmation of the effectiveness of the proposed TRI.

To verify the effectiveness of the proposed novel evaluation function, CRGEA using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets, i.e., the method using the designed Eq. (10) and existing research using Eq. (2), named CRGEA and CRGEA-I respectively. The fourth column in Table 8 indicates the p-values of the Wilcoxon test between CRGEA and CRGEA-I. It is evident that the proposed CRGEA achieve the best results on all datasets, the Wilcoxon test supports the conclusion. This demonstrates the effectiveness of the designed novel evaluation function.

Table 8 The results of the CRGEA and CRGEA-I on 16 high-dimensional datasets

Full size table

5 Summary and Discussions

In Sect. 4.1, the CRGEA is compared with without FS which demonstrates the necessity for performing MOFS in high-dimensional datasets. In Sect. 4.2, we analyze the ${f}_{1}$,${f}_{2}$,IGD and HV metrics between each comparison algorithm, and perform a statistical analysis employing the Wilcoxon test. The performance superiority of CRGEA in tackling high-dimensional MOFS problems has been unequivocally demonstrated. Furthermore, we present the final Pareto frontier of all algorithms, which demonstrates that the Pareto frontier obtained by CRGEA exhibits greater diversity and convergence efficacy in comparison to the comparison algorithms. In Sect. 4.3, an ablation experiment is performed to prove the effectiveness of the designed ISMTS and LAES, and reveal that they both affect the performance of CRGEA. Among them, the designed initialization method can significantly improve the performance of CRGEA in solving high-dimensional MOFS problems. The local acceleration strategy can further improve the search ability and help the individuals jump out of the local optimal solutions. Then, we compare the distribution within the solutions space of the designed ISMTS with that of the TRI. The findings indicate that the proposed ISMTS method exhibits superior convergence and diversity characteristics. Finally, we compare the performance of the CRGEA using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets, and the results demonstrate the effectiveness of the proposed novel correlation-redundancy assessment method.

6 Conclusions and Future Work

Most of the literature on FS of high-dimensional datasets focuses on improvements in search strategies, ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could potentially degrade the algorithm's search effectiveness. Thus, a correlation-redundancy guided evolutionary algorithm is designed to address high-dimensional FS with the objective of optimizing classification accuracy and the number of features simultaneously. A new correlation-redundancy assessment method is proposed to select the features with high relevance and low redundancy to speed up the entire evolutionary process. A novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. A local acceleration evolution strategy based on a parallel simulated annealing algorithm and a pruning method is developed to improve the local search ability of the CRGEA. Numerical experiments are conducted by comparing CRGEA with state-of-the-art algorithms on 16 high-dimensional datasets to demonstrate the superiority of CRGEA in solving MOFS of high-dimensional datasets. The results show that CRGEA outperforms the other comparative algorithms in ${f}_{1}$,${f}_{2}$, IGD and HV metrics on all datasets. As the dimensionality of the data increases, this advantage becomes increasingly evident. The results of the statistical analysis support the conclusions. Additionally, an ablation experiment was performed to evaluate the potential impact of the proposed ISMTS and LAES on the algorithm's performance. The results demonstrate a significant enhancement in both the convergence speed and search capability of CRGEA when incorporating these proposed techniques. In addition, we also compare the performance of the algorithm using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets. The results indicate that the proposed evaluation function can better evaluate the correlation and redundancy of each feature.

Despite the results from the aforementioned series of experiments highlighting the exceptional performance of the proposed CRGEA for solving MOFS of high-dimensional datasets. However, we mainly focus on the effect of correlation and redundancy of features on classification. The interactions between features are disregarded in this paper, which is common in biology [68,69,70]. In the future, we will improve the proposed algorithm for solving interactions between features on biological datasets.

References

Zhou J, Hua ZS (2022) A correlation guided genetic algorithm and its application to feature selection. Appl Soft Comput 123:108964. https://doi.org/10.1016/j.asoc.2022.108964
Article Google Scholar
Wang YD, Li XP, Ruiz R (2023) Feature selection with maximal relevance and minimal supervised redundancy. IEEE Trans Cybern 53(2):707–717. https://doi.org/10.1109/Tcyb.2021.3139898
Article Google Scholar
Xue B, Zhang MJ, Browne WN et al (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626. https://doi.org/10.1109/Tevc.2015.2504420
Article Google Scholar
Xue Y, Zhu HK, Liang JY et al (2021) Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification. Knowl-Based Syst 227:107218. https://doi.org/10.1016/j.knosys.2021.107218
Article Google Scholar
Tuo SH, Li C, Liu F et al (2023) MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex Intell Syst 9(1):637–658. https://doi.org/10.1007/s40747-022-00813-7
Article Google Scholar
Xue Y, Tang T, Pang W et al (2020) Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers. Appl Soft Comput 88:106031. https://doi.org/10.1016/j.asoc.2019.106031
Article Google Scholar
Nakariyakul S, Casasent DP (2008) Improved forward floating selection algorithm for feature subset selection. Int C Wavel Anal Pat 793–798. https://doi.org/10.1109/ICWAPR.2008.4635885
Strearns SD (1976) On selecting features for pattern classifiers. International Conference on Pattern Recognition. https://doi.org/10.1016/S0031-3203(99)00041-2
Ververidis VD (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. Eur Signal Process Conf. https://doi.org/10.5281/ZENODO.39825
Leardi R, Nørgaard L (2010) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemom 18(11):486–497. https://doi.org/10.1002/cem.893
Article Google Scholar
Tuo SH, Zhang JY, Yuan XG et al (2017) Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep-Uk 7:11529. https://doi.org/10.1038/s41598-017-11064-9
Article Google Scholar
Tuo SH, Liu HY, Chen H (2020) Multipopulation harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics 36(16):4389–4398. https://doi.org/10.1093/bioinformatics/btaa215
Article Google Scholar
Xue Y, Xue B, Zhang MJ (2019) Self-Adaptive Particle Swarm Optimization for Large-Scale Feature Selection in Classification. ACM Trans Knowl Discov D. https://doi.org/10.1145/3340848
Article Google Scholar
Tian Y, Liu R, Zhang X et al (2020) A multi-population evolutionary algorithm for solving large-scale multi-modal multi-objective optimization problems. IEEE Trans Evolut Comput 25(3):405–418. https://doi.org/10.1109/TEVC.2020.3044711
Article Google Scholar
Kabir MM, Shahjahan M, Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17):2914–2928. https://doi.org/10.1016/j.neucom.2011.03.034
Article Google Scholar
Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl Soft Comput 43:117–130. https://doi.org/10.1016/j.asoc.2016.01.044
Article Google Scholar
Weigend AS, Bonnlander B (1994) Selecting input variables using mutual information and nonparemetric density estimation. SFB 373 Discussion Papers 42–50
Vinh LT, Lee S, Park YT et al (2012) A novel feature selection method based on normalized mutual information. Appl Intell 37(1):100–120. https://doi.org/10.1007/s10489-011-0315-y
Article Google Scholar
Wang ZQ, Gao SC, Zhang Y et al (2022) Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl-Based Syst 256:109874. https://doi.org/10.1016/j.knosys.2022.109874
Article Google Scholar
Wang ZQ, Gao SC, Zhou MC et al (2023) Information-theory-based nondominated sorting ant colony optimization for multiobjective feature selection in classification. Ieee T Cybernetics 53(8):5276–5289. https://doi.org/10.1109/TCYB.2022.3185554
Article Google Scholar
Song XF, Zhang Y, Gong DW et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognit 112:107804. https://doi.org/10.1016/j.patcog.2020.107804
Article Google Scholar
Zhu YB, Li WS, Li T (2023) A hybrid Artificial Immune optimization for high-dimensional feature selection. Knowl-Based Syst 260:110111. https://doi.org/10.1016/j.knosys.2022.110111
Article Google Scholar
Tian Y, Zhang XY, Wang C et al (2020) An evolutionary algorithm for large-scale sparse multiobjective optimization problems. IEEE Trans Evolut Comput 24(2):380–393. https://doi.org/10.1109/Tevc.2019.2918140
Article Google Scholar
Xu H, Xue B, Zhang MJ (2020) Segmented initialization and offspring modification in evolutionary algorithms for bi-objective feature selection. Gecco'20: proceedings of the 2020 genetic and evolutionary computation conference, pp 444–452. https://doi.org/10.1145/3377930.3390192
Xue Y, Cai X, Neri F (2022) A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2022.109420
Article Google Scholar
Xue B, Zhang MJ, BrowneSchool WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276. https://doi.org/10.1016/j.asoc.2013.09.018
Article Google Scholar
Deniz A, Kiziloz HE (2019) On initial population generation in feature subset selection. Expert Syst Appl 137:11–21. https://doi.org/10.1016/j.eswa.2019.06.063
Article Google Scholar
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2020) MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowl-Based Syst 192:105285. https://doi.org/10.1016/j.knosys.2019.105285
Article Google Scholar
Li XQ, Ren J (2022) MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data. Neurocomputing 501:328–342. https://doi.org/10.1016/j.neucom.2022.05.048
Article Google Scholar
Hancer E, Xue B, Karaboga D et al (2015) A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl Soft Comput 36:334–348. https://doi.org/10.1016/j.asoc.2015.07.023
Article Google Scholar
Hancer E, Xue B, Zhang MJ (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119. https://doi.org/10.1016/j.knosys.2017.10.028
Article Google Scholar
Hashemi A, Joodaki M, Joodaki NZ et al (2022) Ant colony optimization equipped with an ensemble of heuristics through multi-criteria decision making: a case study in ensemble feature selection. Appl Soft Comput 124:109046. https://doi.org/10.1016/j.asoc.2022.109046
Article Google Scholar
Jadhav S, He HM, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553. https://doi.org/10.1016/j.asoc.2018.04.033
Article Google Scholar
Zhang Y, Song XF, Gong DW (2017) A return-cost-based binary firefly algorithm for feature selection. Inform Sci 418:561–574. https://doi.org/10.1016/j.ins.2017.08.047
Article Google Scholar
Beheshti Z (2022) BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection. Knowl-Based Syst 252:109446. https://doi.org/10.1016/j.knosys.2022.109446
Article Google Scholar
Chen K, Xue B, Zhang MJ et al (2022) Correlation-guided updating strategy for feature selection in classification with surrogate-assisted particle swarm optimization. IEEE Trans Evol Comput 26(5):1015–1029. https://doi.org/10.1109/Tevc.2021.3134804
Article Google Scholar
Emary E, Zawba HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381. https://doi.org/10.1016/j.neucom.2015.06.083
Article Google Scholar
Hu Y, Zhang Y, Gao XZ et al (2023) A federated feature selection algorithm based on particle swarm optimization under privacy protection. Knowl-Based Syst 260:110122. https://doi.org/10.1016/j.knosys.2022.110122
Article Google Scholar
Khalid AM, Hamza HM, Mirjalili S et al (2022) BCOVIDOA: a novel binary coronavirus disease optimization algorithm for feature selection. Knowl-Based Syst 248:108789. https://doi.org/10.1016/j.knosys.2022.108789
Article Google Scholar
Mafarja MM, Mirjalili S (2017) Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. https://doi.org/10.1016/j.neucom.2017.04.053
Article Google Scholar
Samieiyan B, MohammadiNasab P, Mollaei MA et al (2022) Novel optimized crow search algorithm for feature selection. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2022.117486
Article Google Scholar
Thaher T, Chantar H, Too JW et al (2022) Boolean particle swarm optimization with various evolutionary population dynamics approaches for feature selection problems. Expert Syst Appl 195:116550. https://doi.org/10.1016/j.eswa.2022.116550
Article Google Scholar
Wei W, Xuan M, Li L et al (2023) Multiobjective optimization algorithm with dynamic operator selection for feature selection in high-dimensional classification. Appl Soft Comput 143:110360. https://doi.org/10.1016/j.asoc.2023.110360
Article Google Scholar
Guo X, Hu J, Yu H et al (2023) A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput Biol Med 166:107538. https://doi.org/10.1016/j.compbiomed.2023.107538
Article Google Scholar
Yuan GT, Zhai Y, Tang JS et al (2023) CSCIM_FS: Cosine similarity coefficient and information measurement criterion-based feature selection method for high-dimensional data. Neurocomputing 552:126564. https://doi.org/10.1016/j.neucom.2023.126564
Article Google Scholar
Askr H, Abdel-Salam M, Hassanien AE (2024) Copula entropy-based golden jackal optimization algorithm for high-dimensional feature selection problems. Expert Syst Appl 238:121582. https://doi.org/10.1016/j.eswa.2023.121582
Article Google Scholar
Deng S, Li Y, Wang J et al (2023) A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection. Appl Soft Comput 148:110765. https://doi.org/10.1016/j.asoc.2023.110765
Article Google Scholar
Yu K, Sun S, Liang J et al (2023) A bidirectional dynamic grouping multi-objective evolutionary algorithm for feature selection on high-dimensional classification. Inform Sci 648:119619. https://doi.org/10.1016/j.ins.2023.119619
Article Google Scholar
Weigend AS, Bonnlander BV (1994) Selecting input variables using mutual information and nonparemetric density estimation. SFB 373 Discussion Papers 42–50
Gu XY, Guo JC, Xiao LJ et al (2020) A feature selection algorithm based on equal interval division and minimal-redundancy-maximal-relevance. Neural Process Lett 51(2):1237–1263. https://doi.org/10.1007/s11063-019-10144-3
Article Google Scholar
Gu XY, Guo JC, Ming T et al (2022) A feature selection algorithm based on equal interval division and conditional mutual information. Neural Process Lett 54(3):2079–2105. https://doi.org/10.1007/s11063-021-10720-6
Article Google Scholar
Zhang L (2023) A feature selection method using conditional correlation dispersion and redundancy analysis. Neural Process Lett 55:7175–7209. https://doi.org/10.1007/s11063-023-11256-7
Article Google Scholar
Deb K, Pratap A, Agarwal S et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197. https://doi.org/10.1109/4235.996017
Article Google Scholar
Fang YL, Ming H, Li MQ et al (2020) Multi-objective evolutionary simulated annealing optimisation for mixed-model multi-robotic disassembly line balancing with interval processing time. Int J Prod Res 58(3):846–862. https://doi.org/10.1080/00207543.2019.1602290
Article Google Scholar
Pal P, Tripathi S, Kumar C (2022) Bandwidth estimation in high mobility scenarios of MANET using NSGA-II optimized fuzzy inference system. Appl Soft Comput 123:108936. https://doi.org/10.1016/j.asoc.2022.108936
Article Google Scholar
Tiwari S, Kumar A, Basetti V (2022) Multi-objective micro phasor measurement unit placement and performance analysis in distribution system using NSGA-II and PROMETHEE-II. Measurement 198:111443. https://doi.org/10.1016/j.measurement.2022.111443
Article Google Scholar
Xu JS, Tang H, Wang X et al (2022) NSGA-II algorithm-based LQG controller design for nuclear reactor power control. Ann Nucl Energy 169:108931. https://doi.org/10.1016/j.anucene.2021.108931
Article Google Scholar
Yazdinejad A, Dehghantanha A, Parizi RM et al (2023) An optimized fuzzy deep learning model for data classification based on NSGA-II. Neurocomputing 522:116–128. https://doi.org/10.1016/j.neucom.2022.12.027
Article Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi A (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671
Article MathSciNet Google Scholar
Ji Y D, Bu X G, Sun J W et al (2016) An improved simulated annealing genetic algorithm of EEG feature selection in sleep stage. Asia-Pacific signal and information processing association annual summit and conference, Jeju, pp 1–4. https://doi.org/10.1109/APSIPA.2016.7820683
Sankararao B, Chang KY (2011) Development of a robust multiobjective simulated annealing algorithm for solving multiobjective optimization problems. Ind Eng Chem Res 50(11):6728–6742. https://doi.org/10.1021/ie1016859
Article Google Scholar
Hu P, Pan JS, Chu SC (2020) Improved Binary Grey Wolf Optimizer and Its application for feature selection. Knowl-Based Syst 195:105746. https://doi.org/10.1016/j.knosys.2020.105746
Article Google Scholar
UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 6 Mar 2018
Zhang QF, Li H (2007) MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731. https://doi.org/10.1109/Tevc.2007.892759
Article Google Scholar
Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the strength pareto evolutionary algorithm. Technical Report Gloriastrasse. https://doi.org/10.3929/ethz-a-004284029
Article Google Scholar
Rostami S, Neri F (2017) A fast hypervolume driven selection mechanism for many-objective optimisation problems. Swarm Evol Comput 34:50–67. https://doi.org/10.1016/j.swevo.2016.12.002
Article Google Scholar
Zitzler E, Thiele L, Laumanns M et al (2003) Performance assessment of multiobjective optimizers: an analysis and review. Ieee T Evolut Comput 7(2):117–132. https://doi.org/10.1109/TEVC.2003.810758
Article Google Scholar
Tuo S, Liu F, Feng ZY et al (2022) Membrane computing with harmony search algorithm for gene selection from expression and methylation data. J Membr Comput 4:293–313. https://doi.org/10.1007/s41965-022-00111-8
Article MathSciNet Google Scholar
Tuo SH, Chen H, Liu HY (2019) A survey on swarm intelligence search methods dedicated to detection of high-order SNP interactions. IEEE Access 7:162229–162244. https://doi.org/10.1109/Access.2019.2951700
Article Google Scholar
Tuo SH, Li C, Liu F et al (2022) A novel multitasking ant colony optimization method for detecting multiorder SNP interactions. Interdiscip Sci 14(4):814–832. https://doi.org/10.1007/s12539-022-00530-2
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 52075401, 51705386); China Scholarship Council (No.201606955091).

Author information

Authors and Affiliations

School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan, 430070, China
Xiang Sun, Shunsheng Guo, Jun Guo & Baigang Du
Hubei Digital Manufacturing Key Laboratory, Wuhan, 430070, China
Xiang Sun, Shunsheng Guo, Jun Guo & Baigang Du
Naval University of Engineering, Wuhan, 430070, China
Shiqiao Liu
Research Institute of Nuclear Power Operation, Wuhan, 430070, China
Shiqiao Liu

Authors

Xiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shunsheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shiqiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Baigang Du
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XS: Methodology, Writing- Original draft preparation. SG: Supervision. Shiqiao Liu: Investigation. JG: Validation, Writing- Reviewing and Editing BD: Conceptualization.

Corresponding author

Correspondence to Jun Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, X., Guo, S., Liu, S. et al. A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification. Neural Process Lett 56, 86 (2024). https://doi.org/10.1007/s11063-024-11440-3

Download citation

Accepted: 18 December 2023
Published: 06 March 2024
DOI: https://doi.org/10.1007/s11063-024-11440-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification

Abstract

Similar content being viewed by others

MPF-FS: A multi-population framework based on multi-objective optimization algorithms for feature selection

Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering

Feature Selection Optimization Using a Hybrid Genetic Algorithm

1 Introduction

2 Related Work

2.1 Multi-objective Feature Selection for High-Dimensional Datasets

2.2 Mutual Information

2.3 Non-dominated Sorting Genetic Algorithm II

2.4 Simulated Annealing

2.5 K-Nearest Neighbor Algorithm

3 Proposed CRGEA for Feature Selection

3.1 Binary Encoding Strategy and Decoding

3.2 Initialization Strategy

Definition 1

Definition 2

Definition 3

Definition 4

3.3 Multi-points Crossover Operator

3.4 Hybrid Parallel Mutation Operator

3.5 Local Acceleration Evolution Strategy

3.6 Selection Operator

4 Experimental Study

4.1 Verification of FS Model

4.2 Performance Analysis of the Designed CRGEA

4.3 Effectiveness Validation of the Proposed Strategies

5 Summary and Discussions

6 Conclusions and Future Work

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation