Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study

Braik, Malik; Awadallah, Mohammed A.; Al-Betar, Mohammed Azmi; Hammouri, Abdelaziz I.; Alzubi, Omar A.

doi:10.1007/s12559-023-10149-0

Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study

Published: 05 June 2023

Volume 15, pages 1884–1921, (2023)
Cite this article

Download PDF

Cognitive Computation Aims and scope Submit manuscript

Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study

Download PDF

Malik Braik ORCID: orcid.org/0000-0003-4180-3734¹,
Mohammed A. Awadallah^2,3,
Mohammed Azmi Al-Betar^3,4,
Abdelaziz I. Hammouri¹ &
…
Omar A. Alzubi¹

1191 Accesses
8 Citations
Explore all metrics

Abstract

Feature selection (FS) is a crucial area of cognitive computation that demands further studies. It has recently received a lot of attention from researchers working in machine learning and data mining. It is broadly employed in many different applications. Many enhanced strategies have been created for FS methods in cognitive computation to boost the performance of the methods. The goal of this paper is to present three adaptive versions of the capuchin search algorithm (CSA) that each features a better search ability than the parent CSA. These versions are used to select optimal feature subset based on a binary version of each adapted one and the k-Nearest Neighbor (k-NN) classifier. These versions were matured by applying several strategies, including automated control of inertia weight, acceleration coefficients, and other computational factors, to ameliorate search potency and convergence speed of CSA. In the velocity model of CSA, some growth computational functions, known as exponential, power, and S-shaped functions, were adopted to evolve three versions of CSA, referred to as exponential CSA (ECSA), power CSA (PCSA), and S-shaped CSA (SCSA), respectively. The results of the proposed FS methods on 24 benchmark datasets with different dimensions from various repositories were compared with other k-NN based FS methods from the literature. The results revealed that the proposed methods significantly outperformed the performance of CSA and other well-established FS methods in several relevant criteria. In particular, among the 24 datasets considered, the proposed binary ECSA, which yielded the best overall results among all other proposed versions, is able to excel the others in 18 datasets in terms of classification accuracy, 13 datasets in terms of specificity, 10 datasets in terms of sensitivity, and 14 datasets in terms of fitness values. Simply put, the results on 15, 9, and 5 datasets out of the 24 datasets studied showed that the performance levels of the binary ECSA, PCSA, and SCSA are over 90% in respect of specificity, sensitivity, and accuracy measures, respectively. The thorough results via different comparisons divulge the efficiency of the proposed methods in widening the classification accuracy compared to other methods, ensuring the ability of the proposed methods in exploring the feature space and selecting the most useful features for classification studies.

Improved versions of snake optimizer for feature selection in medical diagnosis: a real case COVID-19

Article 16 August 2023

Intelligent Health Care System Using Modified Feature Selection Algorithm

Feature Selection with a Binary Flamingo Search Algorithm and a Genetic Algorithm

Article 15 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The Internet is currently being adopted by people, governments, and institutions all over the world in almost all aspects of life. Thereby, a large amount of data is generated every day in a wide variety of forms, with these datasets typically serving as an extension of knowledge [1]. In this regard, society may create data from a broad range of sectors, including health, agriculture, and industry, among many others [2, 3]. These datasets may be categorized and then used for information, forecasts, and insights. Due to the favors of rapidly advancing data collection and storage technologies, organizations and governments often collect and use these vast volumes of data on a regular basis. Raw datasets are frequently fruitless and possibly unusable unless a proper automated method is used to extract useful information from them. In any case, getting usable information has proved to be a very difficult task [4]. Traditional data analysis techniques often fail when attempting to analyze huge amounts of datasets. Even when the dataset is very small, the atypical form of the data might make it challenging to employ traditional methods to effectively process and analyze such datasets. In many cases, the problems that need to be addressed cannot be solved with the existing data analysis methods, and so new methods have to be taken [5]. Based on these discussions, the key principles of data mining (DM) can be used to identify the observable patterns in unprocessed datasets [6]. DM approaches especially attempt to learn about various features within the data by removing and simulating the content of the data. In this, DM is a generic process that involves a number of transformation operations, starting from pre-processing the data and ending with post-processing the output produced by a pre-built DM technique [7]. Due to some problems inherent in the raw datasets, which may often include redundant, irrelevant, or unimportant data [8], these datasets cannot then be utilized directly for post-processing steps. In this, pre-processing steps must be used to the gathered data to clean it up and get it ready for further phases of machine learning (ML) methods [7]. Data pre-processing is perhaps the most prolonged and difficult phase of the knowledge detection process due to the variety of ways that may be used to collect data. From these angles, this work specifically focuses on feature selection (FS) to confront with the above obstacles.

Feature selection is an important area of study in ML tasks, where ML is one of the key areas of cognitive computation. In many ML tasks, high dimensionality, on the one side, augments the information of data, on the other side, leads to curse of dimensionality problem [6]. In point of fact, for a lot of real-world applications, a small subset of informative and discriminate features can act better than employing all of the features. Data dimensionality reduction [9, 10] may be thought of as a cognitive method for analyzing the inherent properties of data. Feature selection [1, 7], favorably, can tackle the problems of high dimensions by lowering the dimensionality in ML tasks. It poses one of the most important pre-steps, which aims to remove duplicate or irrelevant data from the dataset that will be analyzed [11]. On this account, the benefits of FS stretch from data dimensionality and reducing over-fitting to eliminating noisy data, ameliorating classification accuracy, speeding up the model’s learning cycle, and reducing complexities within the dataset, among many other merits of FS methods [12]. Due to the above blessings, FS techniques have become active research areas and have been properly used in various fields such as facial recognition [13], image classification [14], and micro-array analysis [15].

Recently, many COVID-19 cases have been collected, and the dataset has been established [16]. Regardless of the location, cases volume, pandemic wave, or time of the collected COVID-19 data, the gathered dataset consists of fifteen features, and FS methods attempt to locate the subset of the most informative features. This subset will be used in machine learning or deep learning methods for classification purposes. Moreover, some methods have used COVID-19 data to discover chronic lung diseases. This is to estimate the severity and mortality rate among COVID-19 patients [17,18,19]. As of now, there more than 900 million people in China who have been infected with COVID-19, and that number is rising by the millions every day. Thus, China immediately ceased providing daily COVID figures and abandoned its zero COVID policies. An increase in COVID cases is also expected in rural China this year. It is expected that the COVID wave in China would peak in 2 to 3 months. COVID-19 and chronic obstructive pulmonary disease (COPD) have several potential adverse interrelations that may influence infection course and clinical results. There are some mechanisms that may be considered for increased COVID-19 infection susceptibility in COPD such as ineffective immunity and decreased antiviral defense [20]. COPD is linked with worse clinical results from COVID-19 [20], where COVID-19 has a large impact on the habitual care of COPD patients. There is evidence that COPD patients have worse outcomes from COVID-19 [18, 19]. A result of COVID-19 has been isolation and increased anxiety in COPD patients, with conceivably deleterious long-term repercussions [20].

Returning to the essence of this study, in general, during the implementation of the FS process, a subset of the candidate features is selected from the original feature set, and its relevance is then measured by an evaluation criterion. The process of selecting and evaluating a subset of features is repeated until a preset stopping condition is met. Then, the best obtained subset is validated in the test dataset [21]. Two opposing goals are used in feature selection methods to optimize the search in the search space, namely, to reduce the redundancy of the selected features and to increase the relevance of the class label [22]. Various search techniques could be used to deal with these goals. These techniques can be classified as single-objective and multi-objective methods [23]. In the single-objective FS methods, the solution is enhanced by evaluating a particular objective function. Therefore, the used objective function will affect the quality of the solution obtained from the optimization algorithm. Moreover, there is no specific objective that can be suitable for all optimization problems. Thus, defining a fitness function for optimizing a single objective can lessen the performance of the optimization method. Consideration of different conflicting goals in a fitness function can overcome the above obstacles and may return a non-dominated set of solutions, which can be several subsets of feature that meet different objectives. The chief handicap of multi-objective optimization methods is the increased complexity of the search space [24]. Furthermore, the intricacy of the majority of real-world problems are further challenges of FS methods. This, in turn, requires a large amount of solution space due to the dependency and non-linear needs between the dataset characteristics [6]. Examining each subset produced during the generation of various subsets is necessary in order to determine which subsets best suit assessment techniques like maximizing classification rate or reducing error [25]. This method is computationally costly and cumbersome, especially for datasets with high-dimensions. In such a case, creating all possible subsets of high dimensional datasets becomes impractical and computationally costly. Hence, dealing with such complex problems is difficult using conventional FS methods. These and other difficulties have led researchers to investigate many different strategies to get excellent performance levels in classification purposes [26]. Thus, the application of meta-heuristics has been targeted in the search for improved solutions to FS problems with an optimal rate of performance [5]. These techniques have generally proven to be effective in tackling optimization real-world problems in reasonable amounts of time with minimal computing effort in a wide range of engineering and science fields.

From the standpoint of cognitive computation, the challenges posed by high dimensionality problems in ML tasks may be overcome by adopting highly reliable FS approaches as well as robust classification models that learn from data. In this study, a newly developed meta-heuristic, named capuchin search algorithm (CSA) [27], was adopted to solve broadly available feature selection problems in the field of medical diagnosis. Although CSA has the ability to get the optimal solution in solving diverse problems in the optimization field [28, 29], it is customarily confined to the local optima especially when it encounters complex problems with many local optimums. This may be ascribed to its narrow search ability and modest convergence property. Hence, this study made some improvements to the basic CSA to efficiently solve such convoluted FS problems. This is developed in keeping with the ideology of continuous improvements to come up with highly powerful solutions to real-world problems. This is the first and main motive for this work. The basic CSA has two essential parameters in the velocity updating model, referred to as cognitive and social parameters, which help the capuchins to reach the optimal solution. However, these parameters are constants during the iterative process of CSA, which may impair exploration and exploitation features when addressing hard optimization problems. As well-adapted cognitive and social parameters as well as an efficient inertia weight mechanism are expected to influence the performance of CSA, it is worth noting and investigating to enhance CSA from the point of view of using adaptive strategies for these parameters. Based on the underlying CSA, a promising velocity updating model with proper control parameters can be implemented to guide the local and global search stages of the capuchins in the surrounding environment. In this respect, three improved versions of CSA were developed to deal with the early convergence and low search ability of CSA. Each version has added a reasonable improvement to the parent CSA by adopting different growth functions to update the values of the cognitive and social models during the path iterations of CSA. These versions are called exponential CSA (ECSA), power CSA (PCSA), and S-shaped CSA (PCSA). Subsequently, a new inertia weight was proposed for all of these versions to further control the velocity model. This improvement is intended to empower these versions to have more exploration and exploitation abilities. These functions not only provide effective guidance for the capuchins in the search areas, but they are also useful in alleviating the stagnancy of CSA. As the optimization frameworks of the proposed and basic variants of CSA are continuous, they can only handle continuous search spaces, but they have trouble in tackling problems with binary search spaces. In light of this, binary versions of these variants were created by adjusting the key operators and parameters of these variants to align with the nature of the search space of FS problems. This is the second motive of the current work. In this work, we address the shortcomings of CSA based on cognitive models to provide efficient and reliable optimization algorithms. To strengthen the efficacy of the proposed methods on FS problems, we expand the work utilizing growth computation models. For many of the datasets considered in this study, we found that the proposed FS methods when in combination with adaptive inertia weight during the iteration process of these methods can consistently provide higher performance levels than other FS methods. To sum up, the theoretical contributions of the proposed work can be recapped as follows:

Three enhanced binary versions of CSA were proposed and applied together with the basic binary CSA to solve a variety of 24 datasets collected from the UCI machine learning repository.
Three new cognitive and three new social models were embedded into the proposed methods to improve the diversity of solutions and increase the balance between exploration and exploitation features to promote the best-found solutions.
The performance of the evolved FS algorithms were compared with other highly effective FS methods in terms of several relevant criteria.

The reset of this work is arranged as follows: a literature review of several feature selection methods is presented in the “Related Works” section. The “Basic Capuchin Search Algorithm” section provides a brief description of the parent CSA. The following “Proposed Algorithms of CSA” section presents in detail the proposed algorithms. Next, the “Proposed Algorithms for Feature Selection” section describes the binary versions of the proposed feature selection methods. In the “Experimental Results and Discussions” section, the experimental results are presented and discussed. Finally, conclusions and several future directions are provided in the “Conclusion and Future Works” section.

Related Works

More recently, meta-heuristic algorithms have been widely used by many researchers to address different kinds of FS problems of varied levels of complexity [4, 5]. Astonishingly, meta-heuristics have reported notable advantages and delivered impressive accuracy when used as wrapper-based methods for solving FS problems. Well-known classes of meta-heuristics including swarm intelligence, evolutionary algorithms, and physics-based algorithms, as well as several hybridization algorithms that combine two algorithms of the same class or different classes, have been used to solve FS problems [6]. The following is a review of some selected FS methods classified according to the class of algorithms used, which have reported promising performance in the literature.

Swarm Intelligence-Based FS

There are many prominent examples of applications of swarm intelligence (SI) algorithms as search methods for wrapper-based approaches to solve FS problems in different domains [2, 4, 5, 30]. An efficient study for solving FS problems was presented by Arora et al. [2]. In that study, Arora et al. evolved two diversified binary variants of butterfly optimization algorithm (BBOA). The S- and V-shape transfer functions were applied to generate the two binary versions of BOA. These versions were assessed on 21 datasets collected from the UCI repository. It was found in [2] that BBOA with S-shape is better than BBOA with V-shape as well as many other similar FS algorithms in all evaluation methods and all studied datasets. Xian-Fang et al. [30] evolved a three-stage FS method as follows: (1) In the first stage, irrelevant features were eliminated using C-relevance; (2) in the second stage, the kth feature cluster was used to collect analogous features in the same cluster; and (3) an improved version of particle swarm optimization (PSO) was used to determine the optimal feature set. This algorithm, referred to as HFS-C-P, was assessed on 18 datasets collected from public repositories. Xian-Fang et al. stated that the HFS-C-P algorithm achieved promising results, in respect of fitness score, number of selected features, and computational time, in all considered datasets better than those achieved by other algorithms. In a more recent and effective work on FS problems, an improved binary variant of the rat swarm optimizer (RSO) combined with the local search paradigm of PSO was proposed [4]. In this method, three crossover mechanisms, controlled by a switch probability, were embedded with RSO to improve the diversity of its solutions. This method was examined on 24 datasets collected from various repositories and assessed using several evaluation methods. While this method revealed promising levels of performance, its convergence rate is a little modest. Last but not least, another good work for solving FS problems is presented in [5] using a binary version of Horse herd Optimization Algorithm, referred to as BHOA. In this algorithm, three transfer functions, namely, S-shape, V-shape, and U-shape, were used to obtain the binary domain of the HOA, where these variants were integrated with three types of crossover mechanisms to produce fifteen different variants of the BHOA. The performance of these versions was examined on 24 real-world datasets and evaluated using a set of six metric measures. The best-formed version of the proposed versions is a BHOA with an S-shape and a one-point crossover. A comparative evaluation was conducted against 21 FS methods. The BHOA method was able to find very competitive results against these comparative methods, but the implementation of the 15 versions of BHOA demands a large computational burden.

Evolutionary Algorithms-Based FS

Evolutionary algorithms (EAs) represent another broad class of meta-heuristics inspired by the natural processes of evolution. These algorithms have been broadly adapted to mature appropriate approaches for solving FS problems with a promising degree of accuracy using low computational burden and a small number of selected features [31, 32]. Appropriately, many variants of EAs, such as genetic algorithm (GA) [33], genetic programming (GP) [31], and binary differential evolution (DE) [32], have been used in the literature to tackle several types of FS problems. Recently, Awadallah et al. [34] developed a FS method based on a binary version of the JAYA algorithm, denoted as BJAM, with the help of an adaptive mutation operator. This operator was used to diversify the population during the iterative process, where this operator was managed using a pre-defined mutation rate. The JAYA algorithm was converted to binary utilizing an S-shape transfer function. The BJAM algorithm was assessed on 22 datasets selected from the UCI repository, where a good performance degree was realized compared to other FS methods.

Physics-Based Algorithms-Based FS

Simulated annealing (SA) [35] is one of the popular and broadly physics-based (PB) algorithms used to tackle FS problems. In [36], SA was used as a FS method for detecting various denial of service attacks. Several hybridization methods of PB algorithms with SI algorithms or EAs have been presented in the literature to solve FS problems. For example, a hybridization method of the whale optimization algorithm (WOA) with SA was used to solve FS problems for 18 datasets taken from the UCI repository [37]. In this method, SA was utilized to improve neighborhood search ability of WOA, where WOA was used to ameliorate the exploitation ability of the native algorithms. The performance of this hybrid model is promising and superior to the parent and other algorithms. Other examples of hybrid-based FS methods include a hybridization of a binary coral reefs algorithm with SA [38] and a binary spotted hyena algorithm with SA [39]. There are many other hybrid FS methods evolved in the literature such as a hybrid approach of genetic algorithms and artificial bee colony (ABC) [40] and a hybrid approach of Harris Hawks optimization algorithm with SA [41]. These hybrid models-based FS problems have achieved acceptable performance, but were subject to significant computations and complexity. There are many other researchers who have combined wrapper-based with filter-based methods to solve FS problems as discussed below.

Hybrid Filter-Wrapper Model-Based FS

Hybridization of filter and wrapper-based methods has been extensively used in the literature to strengthen the performance of FS tasks [42]. These hybrid models mainly comprise of two stages: the first stage is carried out based on a filter method to select the most important features, while the second stage is implemented based on a wrapper method in order to select a subset of features based on these selected features. For example, hybridization of mutual information (MI) as a filter method and binary cuckoo search as a wrapper method was presented in [43] to solve FS problems. Lai et al. [44] evolved a hybrid FS model using information gain (IG) as a filter method and an improved simplified swarm optimization (ISSO) as a wrapper method. In this method, IG was applied to select the most important features representing genes, while ISSO was applied to search for the optimal subset of genes. Another hybrid filter-wrapper method gene selection from microarray data for gene expression is reported in [45], where improved swarm-optimization was implemented as a wrapper-based method. The above hybrid filter-wrapper models for FS problems revealed reasonable performance. However, the filter methods may prevent some important features despite their sensible performance. Also, the wrapper methods that used some meta-heuristics may suffer from poor stability [46], where random search affected the stability of the selected features. A promising FS technique using a hybridization of binary biogeography optimization (BBO) with support vector machine recursive feature elimination (SVM-RFE), known as BBO-SVM-RFE, was developed in [6]. The SVM-RFE is embedded into the BBO to improve the quality of the obtained solutions in the mutation operator in order to reinforce the exploitation capability as well as to strike an adequate balance between exploitation and exploration of the original BBO. The BBO-SVM-RFE was assessed on 18 benchmark datasets. Comparative results showed that BBO-SVM-RFE revealed a wise degree of performance in terms of accuracy and number of selected features against other existing FS methods. However, the structure of BBO-SVM-RFE is complex and has slow convergence behavior, where the optimal solutions require high computational efforts. While the aforesaid feature selection methods realized promising levels of performance in a fair-minded time in addressing the FS problems deemed in the aimed applications and datasets, they cannot ensure that in all experimental runs, the optimal solutions will be identified. This insinuates that locating the optimal feature subset is not ensured. Moreover, they demonstrated that they can address FS problems that were taken into account in their studies by identifying a minimal number of attributes from a selected subset. However, each of these FS methods behaved properly in the problems considered and might fell short in other real-world FS problems, especially those with high-dimensional datasets. Besides, some FS methods such as the one reported in [4] suffered from large computational time and local optimums. Also, the method presented in [5] provided sensible results in some datasets but not for all considered datasets in that study. The deficiencies of the above FS systems might be attributed to the possibility that meta-heuristic algorithms may fall into local optimum solutions, especially when tackling complex FS problems with varied degrees of complexity and high dimensionality. Hence, there is still a need for further amelioration on FS methods, particularly for datasets with a large number of features and a high level of complexity. This motivated this study to strengthen the performance of the basic CSA to deal with FS problems of high complexity. This is carried out by developing three improved variants of this basic algorithm and using binary versions of the core and proposed algorithms of CSA to explore their capacities and efficiencies in handling familiar FS problems with different numbers of features, samples, and dimensions.

Basic Capuchin Search Algorithm

CSA is a new swarm intelligence algorithm developed to imitate the foraging activity and locomotion practices of capuchins while roaming in forests. The population of CSA is divided into two bunches: leaders (i.e., alpha capuchins) and followers (i.e., the remaining capuchins). Leaders lead the followers, where they go after each other and the leaders directly or indirectly. Leaders are accountable for locating food sources for themselves and the other capuchins in the group. The other capuchins (i.e., followers) update their position by pursuing the leaders. As reported in [27], leaders employ the following strategies of movements while foraging, which can be presented as shown below:

The leaders’ positions when jumping on trees are determined as follows:
$$\begin{aligned} x_j^i= & {} F_j + \frac{P_{bf}(v_{j}^{i})^2 sin(2\theta )}{g} \\{} & {} i< n/2; \;\;\; 0.1 < rand \le 0.25 \nonumber \end{aligned}$$
(1)
where $x_j^i$ denotes the current position of the leaders at dimension j, $F_j$ is the position of the food of the capuchins found so far at dimension j, $P_{bf}$ indicates the equilibrium probability provided by the capuchins’ tails which is equal to 0.75, $v_{j}^{i}$ is the current velocity of the ith capuchin at dimension j which is defined in Eq. 3, $\theta$ is the capuchins’ leaping angle defined in Eq. 2, g is the force of gravity which is equal to 9.81, n is the number of capuchins, and rand is a random value generated in the interval [0, 1].
$$\begin{aligned} \theta = \frac{3}{2} r \end{aligned}$$
(2)
where r is a random value produced in the range [0, 1].
$$\begin{aligned} v_j^i = \rho v_j^i + a_{1} \left( x^{i}_{best_j}-x_j^i\right) r_1 + a_{2} \left( F_j-x_j^i \right) r_2 \end{aligned}$$
(3)
where $x^{i}_{best_j}$ stands for the best position of capuchin i at dimension j, $a_{1}$ and $a_{2}$ are positive values that are equal to 1.5 and 1.5, respectively, $r_1$ and $r_2$ are random values in the interval [0, 1], and $\rho$ stands for the inertia weight of the velocity defined as given by Eq. 4.
$$\begin{aligned} \rho = w_{max} - \left( w_{max} - w_{min} \right) \frac{k}{K} \end{aligned}$$
(4)
where k is the iteration index representing the current number of iterations, K is a predefined maximum number of iterations, and $w_{min}$ and $w_{max}$ are the minimum and maximum weight values that were set to 0.2 and 0.9, respectively.
The leaders’ positions while foraging on the banks of rivers using the jumping strategy can be determined as follows:
$$\begin{aligned} \begin{aligned} x_j^i =&F_j + \frac{P_{ef} P_{bf}(v_{j}^{i})^2 sin(2 \theta )}{g} \\&i< n/2; \;\;\; 0.25 < rand \le 0.50 \end{aligned} \end{aligned}$$
(5)
where $P_{ef}$ stands for the elasticity probability of capuchins’ motion on the ground which is equal to 9.
The leaders’ position while foraging on the ground using normal walking can be decided as follows:
$$\begin{aligned} \begin{aligned} x_j^i =&x_j^i + v_{j}^{i} \\&i< n/2; \;\;\; 0.5 < rand \le 0.70 \end{aligned} \end{aligned}$$
(6)
The leaders’ position while swaying on trees can be decided as follows:
$$\begin{aligned} \begin{aligned} x_j^i =&F_j + \tau P_{bf}\times sin(2\theta )\\&i< n/2; \;\;\; 0.7 < rand \le 0.8 \end{aligned} \end{aligned}$$
(7)
where $\tau$ is a dominant parameter defined in Eq. 8.
$$\begin{aligned} \tau = 2 e^{-21 (\frac{k}{K})^2} \end{aligned}$$
(8)
The leaders’ position while climbing trees can be decided as follows:
$$\begin{aligned} \begin{aligned} x_j^i=&F_j + \tau P_{bf} (v_j^i-v_{j-1}^i)\\&i< n/2; \;\;\; 0.80 < rand \le 1.0 \end{aligned} \end{aligned}$$
(9)
where $v_{j-1}^i$ is the former velocity of capuchin i at dimension j.
The random movement of leaders while foraging can be decided as follows:
$$\begin{aligned} \begin{aligned} x_j^i =&\tau \times \left[ lb_{j} + rand \times (ub_{j}-lb_{j})\right] \\&i < n/2; \;\;\; rand \le Pr \end{aligned} \end{aligned}$$
(10)
where Pr denotes the random search probability of the leaders that has a value of 0.1, and $ub_{j}$ and $lb_{j}$ stand for the upper and lower limits of the search space at dimension j.

The followers’ positions can be updated as per Eq. 11:

$$\begin{aligned} x_j^i = \frac{1}{2}\left( {\acute{x}}_j^i + x_j^{i-1} \right) \;\;\;\;\; n/2 \le \;i\le n \end{aligned}$$

(11)

where $x_j^{i-1}$ and $x_j^{i}$ stand for the former and current position of the followers at dimension j, respectively, and $\acute{x}_j^i$ represents the current leaders’ position at dimension j.

Each new solution for each capuchin’s position is assessed using a pre-defined fitness criterion. The optimization process of CSA can be implemented through iterative steps, whereby capuchins’ positions are evaluated and updated. These steps are reiterated at each iteration, through which the convergence behavior can be got when the maximum number of iterations is realized. Algorithm 1 presents a short illustration of the iterative steps of CSA.

As CSA has affirmed its reliability and efficacy in addressing a lot of broadly well-known real-world problems [27, 28], we concluded that CSA could be an appropriate alternative algorithm to avail as a feature selection method.

Proposed Algorithms of CSA

Issues of CSA

Although CSA can search for optimal solutions while solving optimization problems, its search ability is limited by its original mathematical model and the defects of the velocity update model, where these flaws often lead to local optima in addressing complex optimization problems [27, 28]. The reason is that capuchins in CSA update their velocities toward food sources by relying on constant social and cognitive models for locomotion and repetitive foraging. However, these fixed values for such key parameters cannot guarantee that CSA can escape stagnation or that it is not confined to local optima. In addition, CSA challenges another issue of feeble exploration and exploitation competencies. This is obviously faced as a result of updating the original position of the capuchins in CSA, which does not take into account the control of key parameters of the velocity model during the iterative process. Therefore, there must be some strategies that help update the velocity model of the capuchins as well as fine-tuning of the key parameters in CSA. Furthermore, the exploration aptitude of CSA is insufficient in the inception search stage, so its modest exploitation causes the difficulty of finding the global solution in the late search stage. In this case, local optima is usually received. Thus, a reasonable compromise must be made between exploration and exploitation in order to enhance the search capacity of CSA. For this purpose, a new mechanism is needed to reinforce the swarming behavior among capuchins, in which the best ones play a spirited role in leading others. In this, the best capuchins can help other capuchins to avoid premature convergence once they are bounded into local optima, and without any capability to prevent or deal with this circumstance. To deal with the aforementioned issues in CSA, an update was made to the velocity model of the basic CSA that uses adaptive social and cognitive models, with the goal of improving the performance score of CSA. The following subsections provide detailed descriptions of the proposed variants of CSA, referred to as exponential CSA (ECSA), power CSA (PCSA), and S-shaped CSA (SCSA).

Exponential Model of CSA (ECSA)

During the iterative process of CSA, the characteristics of the population distribution vary not only with iteration number but with the iterative state as well. For example, at a premature stage, the capuchins may be dispersed in different areas of the search space, and thus, the distribution of the population is scattered. In the proposed ECSA, two steps including adaptation of the inertia weight and controlling the acceleration coefficients were carried out for the velocity model as shown below:

Adaptation of the Inertia Weight

The inertia weight $\rho$ in CSA is used to balance global and local search potentials. For optimal performance, the value of $\rho$ is expected to be large in the case of exploration and small in the case of exploitation. However, it is not necessarily true to reduce $\rho$ in CSA simply with time. Thus, an iterative factor f was proposed to be used in the inertia weight $\rho$ to take part some properties with $\rho$. In this, the factor f is also relatively large during the exploration case and becomes relatively small in the convergence condition. Thus, it would be useful to enable $\rho$ to put up with the iterative state using a sigmoid mapping $\rho (f)$: $\Re ^{+}\rightarrow \Re ^{-}$. Here, the inertia weight $\rho$ shown in Eq. 12 was proposed to be utilized in the velocity model of ECSA.

$$\begin{aligned} \rho (f) = \frac{1}{1.5e^{-2.6*f}} \in \left[ 0.4, 0.9\right] \;\;\;\;\;\;\;\;\;\;\;\; \forall f\in \left[ 0, 1\right] \end{aligned}$$

(12)

In this work, $\rho$ is initialized to 0.9. Since $\rho$ is not necessarily monotonic over time, but monotonic with the iterative factor f, $\rho$ will, thus, adapt to the search environment illustrated by f. In the case of jumping out or exploration case, large f and $\rho$ will benefit the global search, as noted earlier. On the contrary, when f is small, an exploitation case or convergence case is identified, and thus, $\rho$ goes down to benefit the local search. In view of this, the movements of the capuchins in the proposed ECSA model are implemented by a new integrated velocity-updating model with adaptation of the inertia weight throughout the iterative process. To be more specific, this new inertia weight was proposed to help update the capuchin’s capuchins to move adaptively toward food.

Control of the Acceleration Coefficients

In addition to the inertia weight, the acceleration coefficients $a_1$ and $a_2$ are also important parameters in CSA that control the overall velocity of the capuchins. Accordingly, an adaptive control can be devised for these coefficients on the basis of the following idea. Parameter $a_{1}$ represents the “self-cognition” that attracts the capuchins to their own historical best positions, helping to explore local niches and to preserve the diversity of capuchins. This parameter reveals how much confidence a capuchin has in itself. Parameter $a_{2}$ represents the “social effect” that drives the capuchins to converge towards the current globally best area, which aids in rapid convergence. This parameter divulges how much trust a capuchin has in its neighbors. Braik et al. [27] stated in their original work of CSA that the implemented experiments showed that both the “cognitive-only” model and the “social-only” model are essential for the success of CSA, for which a constant value of 1.50 was used for each of the acceleration coefficients. However, it is anticipated that using ad hoc values of $a_{1}$ and $a_{2}$ instead of a constant value of 1.50 for different problems could result in better performance. To do so, the exponential models introduced in [47] were used to define the cognitive and social models in order to originate an enhanced variant of CSA, referred to as exponential CSA (ECSA). The exponential growth function shown in Eq. 13 was proposed to represent the cognitive model of the capuchins in ECSA.

$$\begin{aligned} a_1 (t; \beta _0, \beta _1) = \beta _0 (1 - e^{-\beta _1 t}) \end{aligned}$$

(13)

where $t=\frac{k}{k^2}$, the parameter $\beta _0$ denotes the initial estimate of the cognitive parameter, and the parameter $\beta _1$ represents the final estimate of the social parameter approximately achievable by the capuchins at the end of the ECSA’s iterative process.

It is important to note that the exponential function $a_2$ is derived from the exponential function $a_1$ as settled in Eq. 14.

$$\begin{aligned} a_2(t; \beta _0, \beta _1) = \frac{\partial a_1(t; \beta _0, \beta _1)}{\partial t} \end{aligned}$$

(14)

According to Eqs. 13 and 14, the exponential function of the social model of the capuchins in ECSA is established in Eq. 15.

$$\begin{aligned} a_2 (t; \beta _0, \beta _1)= \beta _0 \beta _1 e^{-\beta _1 t} \end{aligned}$$

(15)

To estimate the parameters $\beta _0$ and $\beta _1$ for the functions of cognitive and social parameters, there are several conventional and intelligent methods mentioned in the literature [27]. One of the common traditional estimation methods is the least square estimation method [48]. This method has many issues with estimation accuracy and needs a large number of measurements to be able to give a good estimation of parameters. Other methods include the use of meta-heuristics in estimating the parameters [27]. These methods may demand senior computational efforts.

The parameters $\beta _0$ and $\beta _1$ of the exponential models used for $a_1$ and $a_2$ were selected through the use of experimental design by examining the proposed ECSA on feature selection problems. For all of the feature selection problems solved in this work, $\beta _0$ and $\beta _1$ are equal to 2.0 and 1.0, respectively. These values presented a high level of efficiency in solving feature selection problems as presented in the results section. However, optimal values are often obtained only empirically, perhaps, not the “best” values. Thereby, these parameters can be adapted to other problems as it is demanded.

The values of $a_1$ and $a_2$ are updated exponentially at each iteration loop of ECSA. In Fig. 1, the curve of the cognitive parameter of ECSA shows that the tendency of this parameter diminishes in an exponential manner. This has an impact on the conduct of the capuchins in ECSA which can be shifted towards more exploration and exploitation. Due to that, the capuchins can finish their foraging by locating the food at the end of their wanderings. This may also avoid local optimal solutions.

As can be observed in Fig. 1, the ECSA is proposed with time-varying acceleration coefficients, where a larger $a_{1}$ and a smaller $a_{2}$ were initially set and gradually reversed during the search process. As per this manner, it is expected that ECSA could divulge better overall performance than the basic CSA. This may be due to the time-varying of $a_{1}$ and $a_{2}$ that can balance the global and local search capabilities, which means that the adaptation of $a_{1}$ and $a_{2}$ can be encouraging in improving the performance of the basic CSA. As the iterative process of ECSA continues, the capuchins would clump together and converge into a locally or globally optimal region. Thus, the population distribution information would be various from that in the premature stage. In other words, the curve in Fig. 1 that represents the cognitive parameter expands exponentially until the capuchins realize and find the position of food sources. In this way, exploration and exploitation stages of the basic CSA are ameliorated, and the capuchins can eventually find food and not lose other comrade capuchins in the group while foraging.

Bounds of the Acceleration Coefficients

As discussed earlier, the aforementioned adjustments for inertia weight and acceleration coefficients should not be too troublesome. Thus, the maximum increase or decrease between two iterations is bounded by

$$\begin{aligned} \left| a_i(t+1) - a_i(t)\right| \le \delta , \;\;\;\;\;\;\;\; i = 1, 2 \end{aligned}$$

(16)

where $\delta$ is called the “acceleration rate.”

Experiments revealed that a uniformly created random value for $\delta$ in the range of [0.05, 0.1] performed better in most of the feature selection problems under study. Note that 0.5 was used for $\delta$, where it is recommended to make “slight” changes.

Power Model-Based CSA (PCSA)

The inertia weight $\rho$ used in PCSA is the same as that used in ECSA which is given in Eq. 12. The difference between PCSA and ECSA models is the functions used to control the acceleration coefficients of the velocity of these models. Anyway, the cognitive and social coefficients of PCSA are adapted using the power model described in [49]. This is why this algorithm of CSA is referred to as power CSA (PCSA). This model is based upon the heterogeneous Poisson process model. The mathematical function formulated in Eq. 17 was utilized to carry out $a_1$ in the velocity model in PCSA.

$$\begin{aligned} a_1 (t; \beta _0, \beta _1)= \beta _0 t^{\beta _1} \end{aligned}$$

(17)

where $t=\frac{k}{K}$. It is important to note that the power model of $a_2$ is derived from the power model of $a_1$ as defined in Eq. 14. According to Eqs. 17 and 14, the power function of the social model of the capuchins in PCSA can be defined as shown in Eq. 18.

$$\begin{aligned} a_2 (t; \beta _0, \beta _1)= \beta _0\beta _1 t^{\beta _1 - 1} \end{aligned}$$

(18)

Equations 17 and 18 were employed to update $a_1$ and $a_2$ during the iterative process of PCSA. It is important to know that Eq. 18 was used to find $a_2$ in this model. A pool of values in the range from 0 to 5 were applied to each of $\beta _0$ and $\beta _1$. For all of the feature selection problems tackled in this work, $\beta _0$ and $\beta _1$ are equal to 2.0 and 0.1, respectively. These parameters were selected by experimental testing of a large subset of test datasets of varied complexities, where this value reported the best accuracy of the proposed PCSA. However, these parameters can be adapted for other problems as it is needed. The values of $a_1$ and $a_2$ were updated in PCSA, in non-linear form, as displayed in Fig. 2.

It is evident from Fig. 2 that the proclivities of $a_1$ and $a_2$ of PCSA are decreasing and increasing non-linearly, respectively. This has an impact on the search conduct of PCSA which can be moved towards more exploration or exploitation in a faster base when compared to CSA that utilizes constant values for the parameters $a_1$ and $a_2$.

It is clear from Eqs. 17 and 18 that when t is small, $a_1$ has an extreme value and swiftly lessens to a minimum value. On the other hand, the value of $a_2$ is small when t is small, and it progressively augments towards its maximum value. In this context, the capuchins in PCSA can find a food source at the end of their foraging activity. In detail, $a_1$ starts from a large value and declines little by little to a small value to denote that the capuchins find a source of food. Conversely, $a_2$ starts with a small value and gradually expands to a maximum value to indicate that the capuchins ultimately became aware of the location of the food source. This scenario of using power functions for $a_1$ and $a_2$ can ameliorate exploration and exploitation as presented in the evaluation results. The maximum increase or decrease between two successive iterations of the acceleration coefficients in PCSA is bounded using Eq. 16.

Delayed S-Shaped Model-Based CSA (SCSA)

The inertia weight $\rho$ used in SCSA is the same as that used in ECSA and PCSA which is defined in Eq. 12. The difference between the former proposed algorithms of CSA and SCSA is the models used to control the acceleration coefficients of the velocity model of SCSA. In any case, the acceleration coefficients of SCSA are adapted using the S-shaped model introduced in [50]. This is why this algorithm of CSA is called S-shaped CSA (SCSA). The S-shaped mathematical growth model used to define the parameter $a_1$ is presented as given in Eq. 19.

$$\begin{aligned} a_1(t; \beta _0, \beta _1) = \beta _0 \left( 1- \left( 1+ \beta _1 t\right) e^{-\beta _1 t}\right) \end{aligned}$$

(19)

where $t=\frac{K}{k}$.

Equation 14 was used to derive $a_2$ from $a_1$ in terms of iterations as defined in 19, where $a_2$ was got as presented in Eq. 20.

$$\begin{aligned} a_2 (t; \beta _0, \beta _1) = \beta _0 \beta _1^2 t e^{-\beta _1 t} \end{aligned}$$

(20)

For all of the feature selection problems addressed in this work, $\beta _0 and \beta _1$ are equal to 2.0 and 1.0, respectively. These parameters were determined by empirical testing of a large number of feature selection problems, where these parameters were changed several times until a trustworthy solution was acquired by the proposed SCSA. The growth functions representing the parameters $a_1$ and $a_2$ are updated in a non-linear shape as displayed in Fig. 3.

In Fig. 3, the curve of $a_1$ in SCSA shows that the trend of this parameter is gradually decreasing in a non-linear manner. Briefly, the parameters $\beta _0$ and $\beta _1$ in Eqs. 13, 15, 17, 18, 19, and 20 can be fine-tuned for other problems as demanded. The three new algorithms of the basic CSA were proposed to lay out a suitable setting for $a_1$ and $a_2$ in order to enhance the exploration and exploitation features of CSA. These proposed algorithms are anticipated to fulfill efficient convergence and ameliorate the performance of the parent CSA in solving feature selection problems under study. Besides, the proposed algorithms of CSA could deliver outstanding potential to reliably evade stagnation in local optima regions and help them to find the global optima.

Briefly, in the original mechanism of updating capuchins’ velocity as presented in Eq. 3, the values of $a_1$ and $a_2$ are constants during the iterative process of CSA. This points out that exploration and exploitation processes in CSA are based on a mathematical model that relies on stationary parameters. This has a big impact on global and local search abilities that can only give sensible exploration and exploitation without strict structure. In the proposed ECSA, PCSA, and SCSA, the parameters $a_1$ and $a_2$ were applied as interactive operators to foster exploration and exploitation capabilities of these proposed versions of CSA. With a set of different values for $a_1$ and $a_2$, these proposed algorithms can switch between global and local searches to promote the convergence performance of CSA to realize optimality.

Supportive Positioning Update Process

For further exploration and exploitation, the positions of leaders and followers in ECSA, PCSA, and SCSA are then updated as per the mechanism proposed in Eq. 21.

$$\begin{aligned} x^{i}_{k+1} = {\left\{ \begin{array}{ll} x^{i}_{k}+ v^{i}_{k} \;\;\;\;\; rand \ge 0.5 \\ F_j + \lambda \left[ lb_{j} + r \left( ub_{j}-lb_{j}\right) \right] \;\; rand<0.5 \end{array}\right. } \end{aligned}$$

(21)

where $x^{i}_{k+1}$ and $x^{i}_{k}$ denote the next and present positions of the ith capuchin at the next and current iterations, respectively, $v^{i}_{k}$ stands for the present velocity of the ith capuchin at iteration k, r and rand are uniformly random values generated in the interval from 0 to 1, and $\lambda$ is defined as a function of iterations as drafted in Eq. 22.

$$\begin{aligned} \tau = \mu _0 e^{-(\mu _1 k/K)^{\mu _2}} \end{aligned}$$

(22)

the parameters $\mu _0$, $\mu _1$, and $\mu _1$ are constant values used to automatically update the parameter $\lambda$ at each iteration. These parameters are useful for strengthening exploration and exploitation conducts. For all of the test problems subsequently addressed in this work, $\mu _0$, $\mu _1$, and $\mu _2$ are set to 2, 4, and 2, respectively. These constants were captured by pilot testing of a bunch of test problems. However, they can be refined to suit other problems that those problems require.

The parameter $\lambda$ is defined as a function of iterations k to control the random movement of capuchins iteratively and thus decreases with the number of iterations. Specifically, this parameter was proposed for dynamic system optimization to secure convergence by diminishing the search speed as well as to enhance exploration and exploitation of the proposed algorithms. This parameter can enable capuchins to scout more search space and exploit each region while looking for food or other capuchins’ food. This is to arrive at an efficient convergence process, which can further improve the performance of ECSA, PCSA, and SCSA in solving feature selection problems. In this light, it is anticipated that the combination of the parameter $\lambda$ with ECSA, PCSA, and SCSA will intensify the exploration capacity of these versions and bring them closer to the optimal solution.

The first case of Eq. 21 (i.e., when $rand \ge 0.5$) was suggested to allow capuchins to advance toward food sources. The second case of Eq. 21 (i.e., when $rand< 0.5$) was suggested to empower capuchins to scout several random positions in the search domain to improve local and global search capabilities and to strike a sufficient balance between exploration and exploitation. This gives capuchins in ECSA, PCSA, and SCSA a great deal of ability to explore every potential position in the search space. In sum, Eq. 21 was combined with the mathematical models of ECSA, PCSA, and SCSA during their search iterations to address a number of FS problems from various domains, where the optimum feature subset characterizing the dataset is chosen. This is performed to get better classification efficiency by these versions than can be got with the basic CSA. In addition, the length of the feature subset is anticipated to be reduced with these versions over that realizable by the basic binary CSA.

To summarize, the proposed exponential, power, and S-shaped models for the cognitive and social models of the proposed ECSA, PCSA, and SCSA were utilized in these algorithms to improve the mobility of capuchins. In this work, the proposed binary ECSA, PCSA, and SCSA were applied to identify the most significant features from medium, small, and high-dimensional datasets in binary search spaces, on top of reducing the redundant and irrelevant features.

Complexity Analysis of ECSA, PCSA, and SCSA

Time complexity of an optimization algorithm can be drafted using a function that relates the algorithm’s running time to the size of the optimization problem. Given this, Big-O notation can be used. The time complexity analysis of these optimization methods basically relies on the following steps: problem definition, initialization procedure, population update, fitness evaluation, and selection method. The computational time of the fitness assessment is highly dependent on the particular optimization problems. In doing so, the general computational complexity of each of ECSA, PCSA, and SCSA is the same and can be computed as follows:

$$\begin{aligned} \begin{aligned} \mathcal {O} \left( ECSA\right) =&\mathcal {O} \left( problem \; def.\right) + \mathcal {O} \left( init.\right) \\&+ \mathcal {O}\left( K \left( population \; update\right) \right) \\&+ \mathcal {O}\left( K \left( fitness \; eval.\right) \right) \\&+ \mathcal {O}\left( K \left( selection \; procedure\right) \right) \end{aligned} \end{aligned}$$

(23)

As Eq. 23 implies, the time complexity of ECSA, PCSA, and SCSA mainly depends on the total number of iterations (K), the population size (n), the dimension of the problem under study (d), and the cost of the fitness function (c). Also, the S-shaped transfer function used to get binary versions of these proposed algorithms is intended to update the solutions. In specific form, the general computational complexity of ECSA, PCSA, and SCSA can be formulated in the worst case as follows:

$$\begin{aligned} \begin{aligned} \mathcal {O} \left( ECSA\right)&= \mathcal {O}\left( 1\right) + \mathcal {O}\left( nd\right) + \mathcal {O}\left( VKnd\right) \\&+ \mathcal {O}\left( VKnc\right) + \mathcal {O}\left( VKn^2d\right) \end{aligned} \end{aligned}$$

(24)

where V stands for the number of assessment experiments.

The number of iterations (K) is often greater than the population size (n), the problem dimension (d), and the cost of the fitness function (c). In this regard, the main parameters K and n are essential in evaluating the complexity issue of optimization algorithms. Also, as $nd \ll VKnd$ and $nd \ll VKcn$, so the components 1 and nd can be ruled out from the time complexity given in Eq. 24. Consequently, the general time complexity of ECSA, PCSA, and SCSA can be viewed as follows:

$$\begin{aligned} \mathcal {O} (ECSA) \cong (VKnd + VKnc + VKn^2d) \end{aligned}$$

(25)

Proposed Algorithms for Feature Selection

The proposed ECSA, PCSA, and SCSA algorithms aspire to extend the exploration and exploitation characteristics of CSA beyond enhancing CSA to deal with complex search spaces for challenging feature selection problems. The proposed binary FS algorithms aim to find the optimal subset of attributes for classification tasks by locating the most relevant attributes and abolishing the unimportant attributes. In this work, a wrapper-based FS approach was intended using four different binary algorithms to address a pool of benchmark feature selection datasets that vary in complexity, number of attributes, and characteristics. However, the CSA was originally introduced to handle continuous optimization problems, whereby each search agent in CSA updates its position based on its current position, the position of the best individual found so far, and the position of all other remaining search agents [27]. Thus, the proposed algorithms of CSA can only deal with continuous search spaces. Anyway, as these algorithms are tailored to solve FS problems, where the search spaces of such problems can be delineated by binary values due to their familiar nature. Hence, there is a need to use binary versions of the algorithm, CSA, ECSA, PCSA, and SCSA, to evolve a search strategy for FS problems of binary search spaces. In light of this, some operators of these methods demand to be altered to create binary versions of them, which can provide output in binary forms as either “*0” or “*1.” This allows the search agents of these algorithms (i.e., capuchins) to have binary solutions while solving FS problems. As per this, the capuchins can change their position in the search space during the iterative procedural loops, where the movements of the capuchins are associated with values of “*1” and “*0.” Considering this, an S-shape transfer function (TF) was used to transform the continuous CSA, ECSA, PCSA, and SCSA into binary ones, referred to as BCSA, BECSA, BPCSA, and BSCSA, as described in the “S-Shape Transfer Function” section, with the objective function of these algorithms in the “Fitness Function” section.

S-Shape Transfer Function

As per the assertions conducted in [51], one of the most practical ways of transforming an optimization method that deals with continuous problems to a method that can address binary problems is to create a binary version of the purposed optimization method. To do this without modifying the basis of the algorithm, a transfer function is the most widely norm used in this area. This strategy aims to establish the probability that an element in the feature subset, $x^i$, will be constrained in a binary form that can be either “*0” or “*1.” An element value of “*0” presents that the corresponding feature is not chosen, while a value of “*1” points out that the feature was chosen. In this study, an S-shape TF was utilized to conduct the transformation task. The S-shape TF that was applied to transform the position of the capuchins (i.e., search agents) in the proposed algorithms of CSA from continuous to binary is presented in Eq. 26:

$$\begin{aligned} S\left( x_{m, j}^{k}\right) = \frac{1}{1+exp^{- x_{m, j}^{k}}} \end{aligned}$$

(26)

where S refers to the transfer vector, $S\left( x_{m, j}^{k}\right)$ which represents the probability value of the S-shape value, $x_{m, j}^{k}$ stands for the jth element in the solution x (i.e., search agents) at the mth dimension, and k denotes the current iteration.

The values of the elements in the solution x are converted to either “*1” or “*0” using Eq. 27.

$$\begin{aligned} x_{m, j}^{k} ={\left\{ \begin{array}{ll} 1 &{} \qquad rand < S\left( x_{m, j}^{k}\right) ,\\ 0 &{} \qquad rand \ge S\left( x_{m, j}^{k}\right) \end{array}\right. } \end{aligned}$$

(27)

where rand stands for a function that generates a uniform random number in the range [0, 1].

Fitness Function

A wrapper-based FS method needs a learning algorithm to be involved in evaluation of the selected subset of features. Here, the k-Nearest Neighbor (k-NN) classifier [52] was used to get a sense of the classification accuracy of the obtained solutions. In addition, each optimization problem must be solved using a fitness function, which is essential to any wrapper-based FS approach. While drafting a FS method, two key considerations must be made: How to express the solution and evaluate it. A binary vector with a length equal to the number of features in the dataset is used in this study to create a feature subset. Two discrepancy objectives are imposed on the feature selection problem as a multi-objective optimization problem as follows: (1) lessen the amount of features that are chosen, and 2) problem the k-NN classifier’s classification accuracy. It is broadly known that as the number of features selected in a solution diminishes, the quality of the solution improves, by which the solution with the lowest number of features associated with the largest classification value is the best solution which is required to be achieved. These two opposite goals should be deemed into one objective function in the proposed FS algorithms. By this, these goals were implemented into one objective function for each solution in the iterative process of the proposed BCSA, BECSA, BPCSA, and BSCSA that was computed using the k-NN classifier as presented below:

$$\begin{aligned} fitness=\alpha \zeta _{k}+\beta \frac{|R|}{|N|} \end{aligned}$$

(28)

where $\zeta _{k}$ is the rate of classification error got by k-NN, |R| and |N| are the number of selected and original features, respectively, and $\alpha$ and $\beta$ are the weights of the classification quality and selection ratio of the chosen features, respectively, where they are two counteractive parameters in the interval [0, 1], in which $\beta$ is the complement of $\alpha$.

The fitness function presented in Eq. 28 that considers the trade-off between the selected features in each solution vector (i.e., minimization) and the classification accuracy rate of the learning classifier (i.e., maximization) is used to assess the selected subsets of the proposed BECSA, BPCSA, and BSCSA as FS methods.

The proposed BECSA, BPCSA, and BSCSA as FS methods are evaluated using the fitness function shown in Eq. 28, which takes into account the trade-off between the classification accuracy rate of the learning classifier (i.e., maximization) and the selected features in each solution vector (i.e., minimization). The schematic diagram of the proposed work is presented in Fig. 4.

Experimental Results and Discussions

The effectiveness and robustness of the proposed algorithms of CSA and basic CSA in solving FS problems are studied and analyzed in this section. First, a brief description of the datasets used for the evaluation tasks is found in the “Dataset Description” section. Then, the parameter settings of the proposed algorithms and the characteristics of the system used to run the algorithms are provided in the “Parameter Settings” section, while the evaluation metrics are formulated in the “Performance Measures” section. The results of evaluation and analysis of the proposed algorithms are summarized in the “Evaluation of the Proposed Methods” section. Finally, the outcomes of the fundamental CSA and the best developed algorithm compared to other algorithms are presented in the “Performance Comparison with Other Methods” section.

Dataset Description

The performance of the proposed algorithms of CSA is evaluated using 24 benchmark datasets of different number of features and varying complexity. These datasets were extracted from patients’ medical diagnoses. It should be noted that these datasets were collected from the UCI repository, KEEL repository, Kaggle, and another well-known medical website.^{Footnote 1} Table 1 presents a brief description of the first 23 datasets used in the current study.

Table 1 A brief description of the first 23 datasets

Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study

Abstract

Similar content being viewed by others

Improved versions of snake optimizer for feature selection in medical diagnosis: a real case COVID-19

Intelligent Health Care System Using Modified Feature Selection Algorithm

Feature Selection with a Binary Flamingo Search Algorithm and a Genetic Algorithm

Introduction

Related Works

Swarm Intelligence-Based FS

Evolutionary Algorithms-Based FS

Physics-Based Algorithms-Based FS

Hybrid Filter-Wrapper Model-Based FS

Basic Capuchin Search Algorithm

Proposed Algorithms of CSA

Issues of CSA

Exponential Model of CSA (ECSA)

Adaptation of the Inertia Weight

Control of the Acceleration Coefficients

Bounds of the Acceleration Coefficients

Power Model-Based CSA (PCSA)

Delayed S-Shaped Model-Based CSA (SCSA)

Supportive Positioning Update Process

Complexity Analysis of ECSA, PCSA, and SCSA

Proposed Algorithms for Feature Selection

S-Shape Transfer Function

Fitness Function

Experimental Results and Discussions

Dataset Description

Parameter Settings

Performance Measures

Evaluation of the Proposed Methods

Discussion of the Results

Limitations of the Proposed Methods

Sensitivity Analysis

Performance Comparison with Other Methods

Statistical Test

Conclusion and Future Works

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation