1 Introduction

Back in 2003, the amount of generated data was around five exabytes. Nowadays, the same amount of data, and even more, is produced within two days [1]. This rapid increase in the volume, velocity and variety of data raises challenges and, at the same time, opportunities. Dealing with such data is a challenge, but there are opportunities to utilize the data for beneficial applications [2].

In order to perform data mining, data are first pre-processed [3], which involves cleaning and preparing the data to best meet the requirements of input for later stages. One possible pre-processing step is Feature Selection (FS) [3], which is a method of choosing a subset of features of a dataset that can best represent the data accurately without redundancy, noise, or repetition. FS is used in a wide number of applications, including data classification [4,5,6], data clustering [7,8,9], image processing [10,11,12,13], and text categorization [14, 15].

Generally speaking, FS techniques are either based on an evaluation criterion or on a search strategy. Evaluation criterion-based methods can be further classified as either filters or wrappers. The main difference between these two is the absence or existence (respectively) of a learning algorithm in the process to evaluate feature subsets. Chi-Square [16], Gain Ratio [17], Information Gain [18], support vector machines [19], ReliefF [20, 21], and hybrid ReliefF [22, 23] are filter methods. They depend upon correlations between features and classes in the dataset. Wrapper FS methods [24], on the other hand, utilize learning algorithms. A disadvantage of wrapper FS methods is the high computational cost, however they often give precise results.

Due to the huge search space, the FS problem has been shown to be NP-Hard [25, 26]. Thus, it is costly and time-consuming to employ exact methods to find a solution. However, when searching for approximate solutions, randomization searching strategies, such as sequential forward, sequential backward, random, and heuristic [27], often enhance results. Further, metaheuristic algorithms often lead to efficient implementations of various FS methods.

Metaheuristic algorithms use heuristic strategies or guidelines in optimization algorithms to solve complex optimization problems (e.g., FS problem) in real time. Unlike single-purpose algorithms, metaheuristic algorithms can be used for many different optimization problems [27,28,29,30,31,32,33]. One major category of metaheuristic algorithms is Swarm Intelligence (SI), where creature swarms are the main inspiration (e.g., ants, flocks, bees) [34]. SI algorithms have been tested with various optimization problems, including FS. For instance, the authors of [35] utilize the powerful SI algorithm Grey Wolf Optimizer (GWO) with an FS problem, and the results reported a respectable performance. Similarly, the Antlion Optimizer (ALO) [36] has been successfully used as a wrapper for a FS strategy, and the Whale Optimization Algorithm (WOA) has been utilised in several different implementations of FS algorithms [37,38,39,40], as has Particle Swarm Optimization (PSO) [41], Artificial Bee Colony (ABC) [42], Ant Colony Optimization (ACO) [43, 44], Gravitational Search Algorithm [45], and the Salp Swarm Algorithm (SSA) [46,47,48].

Indeed, the hardness of tackling the FS problem is considerably increased with an increase of the original problem’s dimensions. For instance, when the FS data has n features, its search space has 2n different solutions. Thus, any metaheuristic algorithm used to tackle such an FS problem often requires modification to work well given the complex nature of the FS search space. This is also mentioned in the No Free Lunch (NFL) theorem [49], which states that no superior algorithm can achieve the best performance for all optimization problems or even for the same optimization problem with different instances. Therefore, research opportunities are still available to introduce new/modified metaheuristic algorithms for FS problems.

Besides the previously mentioned SI algorithms, metaheuristics algorithms can imitate a physical rule, evolutionary phenomena, or human-based technique [50]. To this end, Seyedali Mirjalili proposed a metaheuristic algorithm called the Sine Cosine Algorithm (SCA) [50] in 2016. SCA is a population-based algorithm inspired by the sine and cosine trigonometric functions. The simplicity, robustness and efficiency of the algorithm are SCA’s main advantages. Those characteristics have motivated others to implement SCA for different optimization problems. For example, truss structure optimization is an architecture-based optimization problem [51] where SCA has been applied. SCA has also been adapted to support the travelling salesperson problem [52], text categorization [53], image segmentation [54], object tracking [55], unit commitment [56], optimal design of a shell and tube evaporator [57], abrupt motion tracking [58], and parameter optimization for support vector regression [59].

Because real-world problems are complex and have constraints, researchers have attempted to enhance SCA in a number of different ways. Firstly, SCA operators have been modified to deal with particular problems [60,61,62,63]. Alternatively, SCA has been hybridized with i) local-based algorithms [52, 64, 65], ii) population-based algorithms [66, 67], iii) operators from other optimization algorithms [65, 68]. For instance, in [62] the SCA exploration and exploitation phases were managed by a nonlinear conversion parameter. In addition, to help avoid local optima, the position update equation was modified. Another example of SCA hybridization is improving exploitation utilizing the Nelder-Mead simplex concept and the Opposition-based learning (OBL) searching strategy [64]. Further, the diversification of SCA has been enhanced by integrating SCA with a random mutation and gaussian local search technique [65]. Quite recently, Al-betar et al. [69] introduced a memetic version of SCA to solve the economic load dispatch problem. In this approach, adaptive β-hill climbing [70] was hybridized with the optimization framework of SCA to better balance exploration and exploitation.

SCA was initially proposed for continuous decision variables. However, with a mapping function (transforming a continuous search space to binary), a binary SCA (BSCA) version was introduced in [71], where it was implemented for an FS optimization problem, and verified to be an efficient technique. The performance, accuracy, capability, and variety of decision variables’ types are the factors that motivated us to conduct the research described in this paper. We propose three versions of the Improved Binary Sine Cosine Algorithm (i.e., IBSCA1, IBSCA2, and IBSCA3) for the FS problem, in which different approaches of exploration and exploitation are conducted. Consequently, this leads to the following contributions:

  • We apply Opposition Based Learning (OBL) in IBSCA1 to ensure a diverse population of solutions. The use of OBL is expected to expand the search region and improve the solution’s approximation.

  • IBSCA2 builds on IBSCA1 and includes Variable Neighborhood Search (VNS) and Laplace distribution to explore the search space using several mutation methods (swap, insert, inverse, or random mutation). One of the advantages of VNS is that the mutated solution may break out of a local optimum.

  • IBSCA3 builds on IBSCA2 and enhances the best candidate solution using Refraction Learning (RL). RL is a novel opposition learning approach that is based on the principle of light refraction. It is expected to improve the ability of IBSCA3 to jump out of local optima.

  • The three exploration techniques are applied in an incremental manner, where IBSCA3 implements all of the three exploration techniques. Our purpose here is to show that the incremental integration of each exploration method gradually improves the performance of IBSCA and eventually leads to a strong optimization algorithm (IBSCA3).

  • The candidate solutions produced by the optimization process of SCA and RL are continuous. Therefore, we used the V3 transfer function to convert the values of continuous decision variables into binary ones. V3 was selected based on extensive simulations on eight binary transfer functions (4 S-shaped and 4 V-shaped transfer functions). The experimental results indicated that V3 is the most viable transfer function.

  • We evaluate the variations of IBSCA utilizing 19 well-known datasets (18 FS datasets from UCI repository and a COVID-19 dataset). IBSCA3 is found to be the most efficient version of IBSCA (Section 5.2).

  • The performance of IBSCA3 was evaluated and compared to 10 popular binary algorithms (Section 5.3). The overall simulation results indicate that IBSCA3 outperformed all the compared algorithms in terms of accuracy and number of features selected over most of the datasets.

  • We compared IBSCA3 to 10 state-of-the-art algorithms that adopt OBL-enhanced methods, VNS and Laplace distribution (Section 5.4). We found that IBSCA3 produces the best results among the results of the compared algorithms.

  • We compared IBSCA3 to seven popular variations of SCA (Section 5.5). The experimental results indicate that IBSCA3 is the most accurate algorithm.

The accumulative advantages proposed for IBSCA are included in IBSCA3 where the method has the ability to diversify the search through Opposition Based Learning (OBL) and intensify the search through Variable Neighborhood Search (VNS) while also having the ability to escape local optima through Refraction Learning (RL). By means of these improvements, a superior method (i.e., IBSCA3) is introduced for the FS problem.

In general, the overall simulation results indicate that IBSCA3 outperforms the compared algorithms, based on accuracy and number of features selected, over almost all tested datasets. Note that there are two main differences between IBSCA3 and the other hybrid optimization algorithms that attempt to solve the FS problem. First, IBSCA3 is the only hybrid algorithm that combines OBL, RL, VNS and Laplace distribution in a single algorithm. Second, IBSCA3 is the first such algorithm to include Laplace distribution inside VNS.

The rest of the paper is organized as follows: SCA optimization problem implementations and versions are highlighted in Section 2. Section 3 then reviews the binary Sine Cosine algorithm and the objective function used. The newly proposed Improved Binary SCA with multiple exploration and exploitation approaches (IBSCA) for solving the FS problem is presented in Section 4. For the purpose of evaluation, the algorithms’ performances over different experiments are compared and discussed in Section 5. Lastly, Section 6 summarises the work and presents potential future research avenues.

2 Related work

Several discrete variations of SCA have been developed to solve the FS problem [48, 61, 72,73,74,75,76,77,78]. This section examines recently proposed variations of the SCA for global optimization and solving the FS problem.

El-kenawy and Ibrahim [72] introduced a binary hybrid optimization algorithm (Binary SC-MWOA) that includes the SCA algorithm and a modified Whale Optimization algorithm. Binary SC-MWOA converts the continuous candidate solutions generated by the optimization operators of the SC and whale optimization algorithms into binary discrete solutions that can be used for the FS problem using the sigmoid function. Binary SC-MWOA was evaluated over 10 UCI repository datasets and compared to a number of popular optimization algorithms including the Grey Wolf Optimizer (GWO) [79], Whale Optimization Algorithms (WOA) [80] and memetic firefly algorithm. The Binary SC-MWOA was able to find an optimum subset of features with the best category error.

Neggaz et al. [48] presented a new hybrid optimization algorithm for FS called ISSAFD that combines the optimization operators of the SC algorithm and the Disrupt Operator of the Salp Swarm Optimizer (SSA). ISSAFD optimizes followers’ positions in the SSA algorithm using sinusoidal mathematical functions similar to those in SCA operators. The disrupt operator diversifies the population of candidate solutions in the algorithm. The performance of ISSAFD was compared to many optimization algorithms including SSA, SCA, binary GWO (bGWO), PSO, ALO, and Genetic Algorithm (GA) over four well-known datasets. The simulation results suggested that ISSAFD was more accurate, had higher sensitivity, and chose fewer features than the other tested FS algorithms.

Hussain et al. [73] suggested an algorithm to solve continuous optimization problems and the FS problem called SCHHO that integrates the SCA algorithm in the Harris Hawks Optimization (HHO) algorithm. The goal of SCHHO is to use SCA as an exploration method in HHO. In addition, the exploitation ability of HHO is improved in SCHHO by having candidate solutions adjust dynamically to help avoid staying in local optima. As reported in [73], SCHHO performs much better than popular optimization algorithms, including Dragonfly algorithm (DA), grasshopper optimization algorithm (GOA), GWO, WOA, and SSA.

The wrapper-based Improved SCA (ISCA) [61] adds an Elitism strategy to SCA as well as a mechanism to update the best solution. The experimental results in [61] suggest that ISCA provides more accurate results and fewer features than GA, PSO and the original SCA algorithm.

Abd Elaziz et al., [74] proposed SCADE, an algorithm that combines the differential evolution (DE) algorithm with the SCA algorithm. DE’s optimization operators are used at each iteration of SCA to improve its population of solutions. This helps the SCA algorithm avoid local optima. SCADE’s performance was assessed over eight UCI datasets with comparison to three popular algorithms (social spider optimization (SSO), ABC and ACO [74]), with SCADE obtaining the best results.

Abualigah and Dulaimi [75] introduced the hybrid SCA and GA algorithm (SCAGA) for solving the FS problem. In SCAGA, the genetic optimization operators (crossover and mutation) are used to improve the optimization process of SCA and balance between its exploration and exploitation of candidate solutions. SCAGA was compared to SCA, PSO, and ALO using 16 UCI datasets. SCAGA was found to be a better feature-selection method than the other tested algorithms in terms of the maximum obtained accuracy and minimal obtained features.

Sindhu et al., [77] proposed an algorithm named Improved Biogeography Based Optimization (IBBO) for solving the FS problem. IBBO attempts to improve the optimization process of Biogeography Based Optimization (BBO) by employing the optimization operators of SCA after the migration operator of BBO. The performance of IBBO was compared to the performance of popular optimization algorithms such as BBO, SCA, GA, PSO, and ABC using four popular datasets. The simulation results suggest that IBBO is more accurate and selects fewer features compared to the other FS algorithms.

SCA may get stuck in sub-optimal regions during its optimization process. This is because its exploration operators (i.e., the two trigonometric functions of SCA) are unable to efficiently explore the search space. Abd Elaziz et al., [76] proposed Opposition-based SCA (OBSCA), which is a variation of SCA that uses the OBL technique to improve the performance of SCA. In OBSCA, OBL selects the best candidate solutions and generates their opposite solutions in an attempt to lead to more accurate solutions. OBSCA was compared in [76] to several optimization algorithms including SCA, Harmony Search (HS), GA, and PSO using standard optimization test functions and real-world engineering problems. OBSCA performed competitively compared to the other algorithms.

Kumar and Bharti [78] proposed the Hybrid Binary PSO and SCA algorithm (HBPSOSCA). In this algorithm, a V-shaped transfer function converts continuous candidate solutions into binary solutions. The effectiveness of HBPSOSCA was compared in [78] to binary PSO, modified BPSO with chaotic inertia weight, binary moth flame optimization algorithm, binary DA, binary WOA, binary SCA, and binary ABC using 10 standard benchmark functions and seven real-world datasets. The conducted experiments showed that HBPSOSCA exhibited better performance in most of the tested cases.

ASOSCA [81] is a hybrid optimization algorithm based on the Atom Search Optimization (ASO) algorithm and the SCA algorithm. It is basically used for automatic clustering. In ASOSCA, SCA is used to improve the quality of candidate solutions (i.e., reduce the number of features and improve accuracy of the solutions) in ASO. The performance of ASOSCA was compared in [81] to other optimization methods (e.g., SCA, ASO, PSO) using 16 clustering datasets and different cluster validity indexes. ASOSCA performed better than the other tested algorithms.

The Artificial Algae Algorithm (AAA) is a metaheuristic for solving continuous optimization problems [82]. It was originally inspired by the living behaviors of microalgae, photosynthetic specie. Turkoglu et al. [83] proposed eight binary versions of the AAA algorithm for solving the FS problem. Each binary version of AAA uses a different transfer function (four V-shaped and four S-shaped transfer functions). The performance of the binary versions of AAA was compared to the performance of seven well-known optimization algorithms (BBA, binary CS, binary Firefly algorithm, binary GWO, binary Moth flame algorithm, binary PSO, binary WOA [83]) using the UCI datasets. The experimental results indicate that the binary versions of AAA outperform the other tested algorithms.

The Horse herd Optimization Algorithm (HOA) is a metaheuristic that simulates the survival behaviour of a pack of horses in solving NP-hard optimization problems [84]. Awadallah et al. [85] proposed fifteen binary versions of HOA (BHOA) for solving the FS problems. The fifteen variations of BHOA were created by combining three popular crossover operators (one-point, two-point and uniform operators) with three transfer-functions categories (S-shaped, V-shaped and U-shaped transfer functions). The versions of BHOA were tested and evaluated against each other using 24 real-world datasets and the experimental findings suggest that the best version of BHOA is the one with S-shape and one-point crossover.

The Black Widow Optimization (BWO) algorithm is a new population-based optimization algorithm that mimics the mating process of black-widow spiders to solve the continuous optimization problems [86]. However, the BWO algorithm converges slowly to solutions when attempting to solve hard optimization problems. Therefore, the enhanced version of BWO (SDABWO) was proposed in [87] to improve the convergence behaviour of BWO and solve the FS problem. Three techniques were integrated in SDABWO. First, the spouses of male spiders are chosen based on a computational procedure that takes into consideration the weight of female spiders and the distance between spiders. Second, the mutation operators of differential evolution are used in SDABWO at its mutation phase in order to escape from local optima. Lastly, the three key parameters of SDABWO (procreating rate, cannibalism rate, and mutation rate) are adjusted dynamically over the course of the simulation process of SDABWO. SDABWO was compared to the performance of five well-established optimization algorithms (GWO, PSO, DE, BOA, HHO) using 12 datasets from the UCI repository. The experimental results indicate that SDABWO outperforms the other compared algorithms.

The chimp optimization algorithm (ChOA) is an optimization algorithm that is inspired by the behaviour of individual chimps in their group hunting for prey [88]. This algorithm was originally proposed for solving continuous optimization problems. The binary chimp optimization algorithm (BChOA) for solving the FS problem was introduced in [89]. BChOA has two variations, which are a result of combining the chOA with the one-point crossover operator and two transfer-functions categories (S-shaped and V-shaped transfer functions). The two versions of BChOA were compared to six popular metaheuristics (GA, PSO, BA, ACO, firefly algorithm, and flower pollination) and the results revealed that the two versions of BChOA perform better than the other tested algorithms.

The Hunger Games Search Optimization (HGSO) algorithm is an optimization algorithm for continuous mathematical problems. It was inspired by the prey anxiety from being eaten by their predators [90]. Devi et al. [91] presented two binary versions of the HGSO algorithm for the FS problem. It uses V-shaped and S-shaped transfer functions to transfer continuous solutions to binary ones. Binary HGSO was compared to well-known optimization algorithms (e.g., binary GWO and BSCA) using 16 datasets from the UCI repository. The simulation results demonstrated that the binary HGSO are more accurate with less selected features than the other tested algorithms.

In summary, many of the hybrid SCA variations in this section, including Binary SC-MWOA, ISSAFD, SCHHO, SCADE, HBPSOSCA and SCAGA, have internal parameters that require fine tuning and use iterative-based optimization operators inside their optimization loops (e.g., the crossover and mutation operators in SCAGA). In general, when compared to traditional optimization algorithms, hybrid methods use more computations (e.g., ASOSCA, HBPSOSCA, SCHHO). We are encouraged to use SCA in this new work because the candidate solutions in SCA can easily be converted to binary solutions using the transfer function described in Section 4.3.

3 Binary version of sine cosine algorithm for FS

The Sine Cosine Algorithm (SCA) [50], summarized in code in Algorithm 3 and pictorially in Fig. 1, iteratively optimizes a population of candidate solutions using basic trigonometric functions. A candidate solution is usually made of m decision variables X =< x1,x2,...,xm >, each initially generated randomly between the lower (LB) and upper (UB) bound for the variable. Once an initial population of candidate solutions has been randomly generated, SCA uses the problem’s fitness function to calculate a fitness value of each candidate solution. The iterative optimization process of SCA then begins, and the decision variables of each candidate solution \({X^{t}_{i}}\) are updated as follows:

$$ x^{t+1}_{i}= \left\{\begin{array}{ll} x^{t}_{i} + r_{1} \times sin(r_{2}) \times |r_{3} {P^{t}_{i}} - {x^{t}_{i}}|,\qquad r_{4}< 0.5 \\ x^{t}_{i} + r_{1} \times cos(r_{2}) \times |r_{3} {P^{t}_{i}} - {x^{t}_{i}}|, \qquad r_{4}\geq 0.5 \end{array}\right. $$
(1)

where r1, r2, r3 and r4 are random numbers and \({P^{t}_{i}}\) is the position of the destination point in \({x^{t}_{i}}\) at iteration t. In detail, r1 is used to balance between exploration and exploitation of the range of the trigonometric functions in (1). The value of r1 is selected at each iteration of SCA as follows:

$$ r_{1}= a-t\frac{a}{T} $$
(2)

where a is a constant, t is the iteration number and T is the maximum number of iterations. r2 ∈ [0,2π] specifies the distance and direction of the movement related to the destination. r3 ∈ [0,2] determines the weight of the destination point \({P^{t}_{i}}\). The fourth parameter r4 ∈ [0,1] is a number used to randomly choose one of the two options in (1).

Fig. 1
figure 1

The flowchart of SCA algorithm

Algorithm 1
figure e

SCA pseudo-code.

The FS problem is a binary optimization problem. A hypercube represents its search space, and a bit flip in the candidate vector changes the candidate position in the search space (X = {x1,x2,...,xm}). However, given that SCA is originally for continuous optimization problems, there is a need for a mapping function. The transfer function (TF) proposed by [92] is utilized to map a candidate continuous value to its corresponding binary value. In this paper, the use of the TF is based on literature work described in [93].

In more detail, the use of the TF is conducted as follows. First, the probability of flipping a bit is calculated using (3). Where \({v_{i}^{d}}(t)\) refers to the velocity of the dth dimension in the ith step vector (velocity) for the current iteration (t). Next, the decision value is updated based on (4), in which a random number r ∈ [0,1] is generated and, if the probability of flipping \(T({v_{i}^{d}}(t))\) is greater than r, then a bit flip takes place on the i-th element of the position vector (Xi(t + 1)). This TF is called V-shaped and is visualized in Fig. 2.

$$ T({v^{i}_{d}}(t))=|({v^{i}_{d}}(t))/\sqrt{1+({v^{i}_{d}}(t))^{2}}| $$
(3)
$$ X(t+1)=\left\{\begin{array}{ll} \neg X_{t} & r<T({v^{i}_{k}}(t)) \\ X_{t} & r\geq T({v^{i}_{k}}(t)) \end{array}\right. $$
(4)
Fig. 2
figure 2

V-shaped Transfer function

3.1 Objective function

In every optimization problem, there must be an objective function, which is an evaluation function that is used to measure a solution’s effectiveness. In the case of the FS optimization problem, a wrapper (optimizer) aims to i) minimize the number of the selected feature, and ii) increase the algorithm accuracy. Therefore, the developed objective function is as illustrated in (5). The focus is to minimize the classification error rate and the selection ratio, where the classification error rate is denoted as ERR(D) and the selection ratio is calculated by dividing the selected number of features (|R|) over the total number of features (|N|). α ∈ [0,1] is the weight assigned to the classification error rate, and β = 1 − α is the weight assigned to the selection ratio [94].

$$ Fitness = \alpha \times ERR(D) + \beta \times \frac{|R|}{|N|} $$
(5)

4 Proposed algorithm: an improved binary sine cosine algorithm with multiple exploration and exploitation approaches for feature selection

We present three versions of our binary optimization algorithm called Improved Binary SCA with multiple exploration and exploitation approaches (IBSCA) which can be used to solve FS problems. Algorithm 2 and the flowchart in Fig. 3 present the details of this approach. Three exploration techniques are applied in an accumulative manner to the three versions of IBSCA (IBSCA1, IBSCA2, IBSCA3), where IBSCA3 uses all of the three exploration techniques. The three versions of IBSCA are as follows:

  • IBSCA1: OBL is used as the exploration method.

  • IBSCA2: Builds on IBSCA1 by additionally using the VNS method combined with the Laplace distribution to explore the search space using several mutation methods.

  • IBSCA3: Builds on IBSCA2 by additionally using Refraction Learning to improve the current best candidate solution at each iteration of the optimization loop of SCA.

Algorithm 2
figure f

Improved Binary SCA with multiple exploration and exploitation approaches (IBSCA).

Fig. 3
figure 3

The flowchart of IBSCA

4.1 Representation of candidate solutions

A candidate solution for a FS problem with m features is a vector of m binary decision variables. Given a candidate solution X, xi = 1 means that the i th feature is included in X, whereas xi = 0 means that it is not. Table 1 shows an example candidate solution consisting of 10 decision variables X =< x1 = 0,x2 = 1,x3 = 1,...,x10 = 1 >.

Table 1 A sample binary candidate solution

4.2 Population initialization

The performance of optimization algorithms can be improved by a diversified initial population of solutions [95,96,97]. One possible way to create a diverse initial population is by using the opposition-based learning (OBL) approach. OBL is an intelligent method developed from the observation that considering opposite candidate solutions can lead to improved search times [98]. It can be be applied to the decision variables in machine learning, optimization and search algorithms. For example, if X = 〈x1,x2...,xm〉 is a candidate solution with m decision variables, the opposite candidate solution Xo is as follows:

$$ X^{o}= \langle {x_{1}^{o}}, {x_{2}^{o}} ..., {x_{m}^{o}}\rangle, \text{where} {x_{i}^{o}}= LB_{i} + UB_{i} - x_{i} $$
(7)

where LBi is the lower bound for variable i and UBi is its upper bound.

The initialization stage is similar in all versions of IBSCA. In this stage, the first half of the population is generated randomly. The remainder of the population is generated by applying OBL to the first half (Line 1 in Algorithm 2). The use of OBL is expected to expand the search region and improve the solution’s approximation.

While OBL can also be applied in the initialization stage of other optimization algorithms (e.g., Cuckoo Search [96, 99], Grey Wolf Optimizer [100], Whale Optimization [101]), as can be seen in Section 5.2, the performance of IBSCA using only OBL is slightly better than the performance of BSCA. This leads to it being a good base to later combine VNS, Laplace distribution, and RL to strongly improve IBSCA’s performance.

4.3 Discretization strategy

Candidate solutions produced by the optimization process of SCA and RL are continuous. Therefore, we use two-step transfer functions to convert the continuous decision variables into binary ones (lines 8 and 10).

Table 2 shows eight binary transfer functions (4 S-shaped and 4 V-shaped transfer functions). We conducted extensive simulations to verify the efficiency of these transfer functions and found that V3 was the most viable transfer function. The experimental results in [93, 102] confirm our conclusion about V3. Thus, V3 is adopted in our experiments.

Table 2 S-shaped and V-shaped transfer functions

In V3, each decision variable \({x^{j}_{i}}\) in candidate solution \(X_{i}=<{x_{i}^{1}},{x_{i}^{2}}, ..., {x_{i}^{m}}>\) at iteration t is used to calculate the probability of altering \({x^{j}_{i}}\) to 0 or 1. The probability is calculated as follows:

$$ T({x^{j}_{i}}(t)) = |{x^{j}_{i}}(t)/\sqrt{1+({x^{j}_{i}}(t))^{2}}| $$
(8)

Then, \({x^{j}_{i}}(t)\) is set to 0 or 1 as follows:

$$ {x^{j}_{i}}(t+1)= \left\{ \begin{array}{rl} 1-{x^{j}_{i}}(t),\qquad & r< T({x^{j}_{i}}(t)) \\ {x^{j}_{i}}(t), \qquad & r\ge T({x^{j}_{i}}(t)) \end{array}\right. $$
(9)

where r ∈ [0,1] is generated randomly. The chance of flipping the new value \({x^{j}_{i}}(t+1)\) increases as the value \(T({x^{j}_{i}}(t))\) increases.

4.4 Fitness function

In wrapper FS methods, we seek to minimize the number of selected features while maximizing classification accuracy. These two conflicting goals should be taken into account in the fitness function. We adopted the following fitness function to be used in our proposed algorithm:

$$ F(X)= \alpha \times ERR(D) + \beta \times \frac{|R|}{|N|} $$
(10)

where F(X) is the fitness function of candidate solution X, ERR is the error rate obtained by a k-Nearest Neighbor classifier using X, |R| is the number of features in X, |N| is the total number of features in the dataset, α is the weight for ERR and β = 1 − α is the weight for the selection ratio (|R|/|N|).

4.5 Optimization loop

The optimization loop of IBSCA starts at Line 3 in Algorithm 2, and ends at line 15. The first step is to evaluate each candidate solution using the fitness function (Section 4.4). Then, the random parameters of the algorithm are initialized (r1, r2, r3 and r4) and the best solution is determined (P = X). Afterwards, all the candidate solutions are updated using (1) and the two-step transfer function (Section 4.3) is applied to the updated solutions to generate binary equivalences. In line 9, RL is applied to the best solution X as described in Section 4.5.1 and then the result is converted to a binary solution using the two-step transfer function. Finally, a combination of the variables neighborhood search with Laplace distribution (lines 11-14) is applied to a randomly selected solution from the current population, as described in Section 4.5.2.

4.5.1 Refraction learning

IBSCA3 applies RL to the current best solution to improve it. In this section, we describe RL and then show how it can be used in IBSCA3.

The refraction of light is caused by a light ray hitting an interface between two different mediums (e.g., air and water). The ray bends as its velocity changes when it moves toward the boundary between the two mediums. RL is an OBL method based on the principle of light refraction. The one-dimensional spatial refraction-learning process for the global optima X at iteration t is illustrated in Fig. 4 [95, 103].

Fig. 4
figure 4

Refraction Learning for the Global Optimal x

The inverse of X can be calculated using refraction learning as follows:

$$ X^{\prime*} =(\text{LB}+\text{UB})/2 + (\text{LB}+\text{UB})/(2k\eta)-X^{*}/(k\eta), $$
(11)

where η is the refraction index, given by:

$$ \eta = \frac{\sin \theta_{1} }{\sin \theta_{2}}, $$
(12)

where \({\sin \limits } \theta _{1}= ((\text {LB}+\text {UB})/2-X^{*})/h\) and \({\sin \limits } \theta _{2}= (X^{\prime *}-(\text {LB}+\text {UB})/2)/h^{\prime }\)

In the above equations, X represents the incidence point (original candidate solution) while \(X^{\prime }\) is the refraction point (opposite candidate solution). O denotes the center point of the search interval [LB, UB], h denotes the distance between x and O and \(h^{\prime }\) denotes the distance between \(X^{\prime }\) and O.

In general, (11) can handle n decision variables as follows:

$$ x^{\prime*}_{j} =(\text{LB}_{j}+\text{UB}_{j})/2 + (\text{LB}_{j}+\text{UB}_{j})/(2k\eta)-x^{*}_{j}/(k\eta), $$
(13)

where \(x^{*}_{j}\) and \(x^{\prime *}_{j}\) are the jth decision variable of X and \(X^{\prime *}\), respectively, and LBj and UBj are the lower and upper bounds of the jth decision variable, respectively.

In IBSCA3, (11) is applied to the best solution yet discovered (Line 9 in Algorithm 2).

4.5.2 Variables neighborhood search with laplace distribution

Two versions of IBSCA (IBSCA2 and IBSCA3) employ a combination of the Laplace distribution and VNS method. In this section, we first explain the Laplace distribution and VNS method and then show how they are applied in these algorithms.

Variable Neighborhood Search (VNS) is a powerful metaheuristic for solving combinatorial optimization problems. The primary goal when using VNS is to enhance a candidate solution by performing a series of operations (e.g., mutation) on a solution. This nearby solution may break out of a local optimum. The optimization process of VNS is iterative and moves between adjacent solutions in an attempt to identify a better candidate [97, 104].

The Laplace distribution is suitable for stochastic modeling because it is stable under geometric, rather than ordinary, summation [105, 106]. The Laplace distribution’s density function is given by:

$$ f(x)= \frac{1}{2b} e^{-\frac{|x-a|}{b}}, $$
(14)

where \(-\infty <x<\infty \). The Laplace distribution is then defined as follows:

$$ x^{t+1}_{i}= \left\{\begin{array}{ll} \qquad\frac{1}{2} e^{\frac{|x-a|}{b}},\qquad\qquad\qquad\qquad\qquad x \leq a \\ 1-\frac{1}{2} e^{-\frac{|x-a|}{b}},\qquad\qquad\qquad\quad~~~~~~~~~~ {x>a} \end{array}\right. $$
(15)

where aR is the location parameter and b > 0 is the scale parameter.

IBSCA2 and IBSCA3 employ a combination of the Laplace distribution and VNS method (lines 11 to 14 in Algorithm 2). In detail, these algorithms randomly pick a candidate solution \({x_{i}^{t}}\) at iteration t from the current population of solutions. They then generate a random number r ∈ [0,1] using the Laplace distribution. r is then used as a probability to select one of four operations on the selected candidate solution (swap, insert, inverse, or random mutation), as follows:

$$ x^{t+1}_{i} = \left\{\begin{array}{ll} 0\le r < 0.25\qquad\qquad\qquad\qquad\qquad & \text{Apply swap operator to~} x^{t}_{i}\\ 0.25\le r < 0.5\qquad\qquad\qquad\qquad\qquad& \text{Apply insert operator to~} {x^{t}_{i}}\\ 0.5\le r < 0.75\qquad\qquad\qquad\qquad\qquad & \text{Apply inverse operator to~} {x^{t}_{i}}\\ 0.75\le r < 1.0\qquad\qquad\qquad\qquad\qquad\qquad & \text{Apply random operator to} {~x^{t}_{i}} \end{array}\right. $$
(16)

The swap operator randomly selects two decision variables in the candidate solution (say xi and xj) and then exchanges the values of xi and xj, as illustrated in Fig. 5.

Fig. 5
figure 5

Swap operator between x3 and x6

The insert operator randomly selects two decision variables (say xi and xj) in the candidate solution and then shifts the values between xi+ 1 and xj− 1 down one position, inserting xi into xj− 1, as illustrated in Fig. 6.

Fig. 6
figure 6

Insert operator between x2 and x9

The inverse operator, shown in Fig. 7, randomly selects two decision variables (xi and xj) in the candidate solution and then inverses the order of values from xi to xj.

Fig. 7
figure 7

Inverse operator between x3 and x6

The random operator, shown in Fig. 8, randomly selects a number of decision variables (say p) in the candidate solution and then flips the binary value of each selected decision variable.

Fig. 8
figure 8

Random operator for x3, x6 and x9

4.6 Computational complexity of IBSCA

The purpose of this section is to show the detailed computational complexity of IBSCA. We assume that the cost of any basic vector operation is O(1) and we denoted MaxItr as M.

The computational complexity of IBSCA (Algorithm 2) can be calculated as follows:

  • In Line 1(a), the generation of n/2 candidate solutions using a random generation function requires O(n/2) operations.

  • In Line 1(b), the generation of n/2 opposite candidate solutions using OBL (7) requires O(n/2) operations.

  • Line 2 requires O(1) operations.

  • The internal operations inside the while loop (lines 3 to 15) are as follows:

    • The number of operations required to evaluate the fitness of the candidate solutions is O(n) operations (Line 4).

    • Updating the best candidate solution so far (P = X) requires O(n) operations (Line 5).

    • Generating four random numbers requires O(1) operations (Line 6).

    • Updating the candidate solutions using (1) requires O(n) operations (Line 7).

    • Applying the two-step transfer function (Section 4.3) to the updated candidate solutions requires O(n) operations (Line 8).

    • Applying RL to the best solution X requires O(1) operations (Line 9).

    • Applying the two-step transfer function (Section 4.3) to the updated solution using RL requires O(1) operations (Line 10).

    • Selecting a random solution from the current population of solutions (say \({X_{i}^{t}}\)) requires O(1) operations (Line 11).

    • Generating a random number r ∈ [0,1] based on the Laplace distribution requires O(1) operations (Line 12).

    • Selecting one of four moves based on the value of r requires O(1) operations (Line 13).

    • Line 14 requires O(1) operations.

  • Overall, the cost of the operations in the while loop (lines 3 to 15) is O(M(n + n + 1 + n + n + 1 + 1 + 1 + 1 + 1 + 1)), where M is the maximum number of iterations. This can be reduced to O(M.n).

  • The total number of operations in IBSCA (lines 1 to 16) is O(n/2 + n/2 + 1 + M.n + 1). This can be reduced to O(M.n) because M.n is greater than n + 2.

In summary, the computational complexity of IBSCA is O(M.n).

5 Experiments

In this section, we first demonstrate the performance of the three variations of IBSCA when solving the FS problem. The detailed characteristics of the used datasets are presented in Section 5.1. Section 5.2 provides a comparison of the convergence behavior of the original Binary Sine Cosine Algorithm (BSCA) [107] to the convergence behaviors of the three variations of IBSCA over the UCI datasets. Section 5.3 shows the performance of IBSCA3 in comparison to other well known FS algorithms.

Table 3 illustrates the parameter settings of our proposed approach. The values of the parameters of all of the algorithms have been finely tuned based on several experiments. Thus, the algorithms in this section were compared to each other based on their best parameter settings. Since the general feature of the optimization algorithms is random in nature, we executed the algorithms for 30 independent runs. We executed our experiments on a Windows 7 computer with an Intel Core i7-3517U CPU @ 1.90GHz 2.40GHz and 8.0 GB memory.

Table 3 Parameters Settings

5.1 Datasets properties

The performance of IBSCA was evaluated using nineteen datasets (18 from UCI repository [108] and a real-world COVID-19 datasetFootnote 1. Table 4 provides a description of these datasets in terms of their dimensions, number of instances, and number of classes. All datasets were split randomly into 80 training instances and 20 testing instances [38] where the k-nearest neighbors classifier (KNN) is used. The KNN technique is a supervised machine learning method for solving classification and regression problems [102].

Table 4 Datasets description

5.2 Convergence behavior of BSCA vs three variations of IBSCA

Figures 910 and 11 show the convergence behavior of BSCA, IBSCA1, IBSCA2 and IBSCA3 over the UCI datasets. In each chart of these figures, the x-axis represents the iteration number, and the y-axis represents the fitness value. The convergence charts show that IBSCA3 converges faster to good solutions than all of the other algorithms for all of the datasets. The superiority of IBSCA3 is mainly because it uses three exploration techniques. First, it uses OBL when initializing the population to improve quality and diversity. Second, it integrates the VNS and Laplace distribution to explore the search space using multiple mutation methods. Third, it uses RL to search the neighborhood of best candidate solutions for better solutions.

Fig. 9
figure 9

Convergence behavior of BSCA, IBSCA1, IBSCA2 and IBSCA3 over the datasets: Breastcancer, BreastEW, CongressEW, Exactly, Exactly2 and HeartEW

Fig. 10
figure 10

Convergence behavior of BSCA, IBSCA1, IBSCA2 and IBSCA3 over the datasets: IonosphereEW, KrvskpEW, Lymphography, M-of-n, penglungEW and SonarEW

Fig. 11
figure 11

Convergence behavior of BSCA, IBSCA1, IBSCA2 and IBSCA3 over the datasets: SpectEW, Tic-tac-toe, Vote, WaveformEW, WineEW and Zoo

The second best performing algorithm was IBSCA2. It uses two exploration techniques compared to IBSCA3 that uses three techniques. IBSCA1 was the third best performing algorithm. It uses only one exploration technique. BSCA exhibits the worst convergence behavior compared to the other algorithms. This may be because it does not use any additional exploration techniques compared to the other algorithms.

5.3 Performance analysis of IBSCA3 compared to baseline algorithms

In this section, we present a comparison between IBSCA3 and other binary versions of the baseline algorithms: BSCA, Random based Binary Dragonfly Algorithm (RBDA) [102], Linear based Binary Dragonfly Algorithm (LBDA) [102], Quadratic based Binary Dragonfly Algorithm (QBDA) [102], Sinusoidal based Binary Dragonfly Algorithm (SBDA) [102], Binary Gray Wolf Optimizer (BGWO) [109], Binary Gravitational Search Algorithm (BGSA) [109], and Binary Bat Algorithm (BBA) [109]. These algorithms were compared according to their classification accuracy, number of selected features, and their best fitness values. We also compared IBSCA to Coronavirus Herd Immunity Optimizer (CHIO) [110] and Coronavirus Herd Immunity Optimizer-Greedy Crossover (CHIO-GC) [110]. Table 5 shows the parameter settings of these algorithms, as in [102, 110].

Table 5 Parameter settings of the baseline algorithms

Table 6 shows the average value and standard deviation of the results obtained by the proposed IBSCA3 algorithm, and the other compared algorithms, in terms of average classification accuracy. IBSCA3 outperforms the other algorithms and obtains the best classification accuracy on all the UCI and COVID-19 datasets.

Table 6 Average and standard deviation of classification accuracy for the proposed IBSCA3 algorithm in comparison to existing algorithms

Table 7 presents the average number of selected features for the tested algorithms. IBSCA3 outperforms the other tested algorithms on 14 out of 18 datasets. This is better than the second best algorithm (SBDA algorithm) which outperforms the remaining compared algorithms on 11 out of 18 datasets.

Table 7 Average and standard deviation of average selected features for the proposed IBSCA3 algorithm in comparison to existing algorithms

Table 8 illustrates the best fitness values obtained by the tested algorithms. We can observe that IBSCA3 shows superior performance over the other algorithms. It obtains the best fitness values on all datasets.

Table 8 Average and standard deviation of the best fitness value for the proposed IBSCA3 algorithm in comparison to existing algorithms

In summary, the enhanced version of the Binary Sine Cosine algorithm outperformed the other algorithms for all of the tested datasets, with IBSCA3 providing the highest classification accuracy and the lowest fitness function for all datasets with different dimensions, and the lowest average number of selected features in most cases. The overall results indicate that IBSCA3 converges faster than the other algorithms to the most accurate solutions with the least number of features.

The original SCA employs a random update method to update the solutions in the algorithm. This negatively affects the ability of SCA to balance between the exploration and exploitation of the search space. In contrast, IBSCA3 improves exploration and exploitation in the original SCA by employing several techniques. First, it employs an OBL approach to improve the diversity of initial population. Second, it integrates the VNS and Laplace distribution to explore the search space using multiple mutation methods. Third, it uses RL to search the neighborhood of the best candidate solutions for better ones. The overall results indicate that IBSCA3 improves the performance and convergence behavior of the original SCA in solving the FS problem.

5.4 Performance analysis of IBSCA3 compared to state-of-the-art algorithms that adopt OBL-enhanced methods, VNS and laplace distribution

In this section, we demonstrate a comparison between IBSCA3 and other new algorithms that incorporate OBL into their basic structure. These algorithms are: Improved Salp Swarm Algorithm based on opposition based learning and novel local search algorithm for feature selection (ISSA) [111], Improved Harris Hawks Optimization using elite opposition-based learning and novel search mechanism for feature selection (IHHO) [112], and New feature selection methods based on opposition-based learning and self-adaptive cohort intelligence for predicting patient no-shows (OSACI) [113]. We also compare IBSCA3 with other new algorithms that employ similar methods (VNS and Laplace distribution): A variable neighborhood search algorithm for human resource selection and optimization problem in the home appliance manufacturing industry (VNS-HRS) [114], Improving feature selection performance for classification of gene expression data using Harris Hawks Optimizer with variable neighborhood learning (VNLHHO) [115], Improved equilibrium optimization algorithm using elite opposition-based learning and new local search strategy for feature selection in medical datasets (IEOA) [116], Dynamic salp swarm algorithm for feature selection (DSSA) [117], Semi-supervised feature selection with minimal redundancy based on local adaptive (SFS-LARLRM) [118] and Binary optimization using hybrid grey wolf optimization for feature selection (BGWOPSO) [119]. Table 9 shows the parameter settings of these algorithms, as in [111,112,113,114,115,116,117,118,119].

Table 9 Parameter settings of ISSA, IHHO, OSACI, VNS-HRS, VNLHHO, IEOA, DSSA, SFS-LARLRM and BGWOPSO

Table 10 shows a comparison of the average classification accuracy achieved by the proposed IBSCA3 algorithm, BSCA and the other algorithms that incorporate OBL, VNS and Laplace distribution. In Table 10, we report the average value and standard deviation of the results. Among all the datasets from UCI and COVID-19 functions (except one, where it is second best), IBSCA3 delivers the best classification accuracy.

Table 10 Average and standard deviation of classification accuracy for the proposed IBSCA3 algorithm in comparison to BSCA and the other algorithms that incorporate OBL, VNS and Laplace distribution

5.5 Performance analysis of IBSCA3 compared to state-of-the-art SCA algorithms

A comparison of IBSCA3 with other SCA variants is presented in this section. These variants include: An efficient hybrid sine-cosine Harris Hawks Optimization for low and high-dimensional feature selection (SCHHO) [73], A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm (SCAGA) [75], A Hybrid Feature Selection Framework Using Improved Sine Cosine Algorithm with Metaheuristic Techniques (MetaSCA) [120], A novel hybrid BPSO–SCA approach for feature selection (BPSO–SCA) [78], Boosting Salp Swarm Algorithm by Sine Cosine algorithm and Disrupt Operator for Feature Selection (ISSAFD), and An improved sine cosine algorithm to select features for text categorization (ISCA) [121]. Table 11 shows the parameter settings of these algorithms, as in [72, 73, 75, 78, 120, 121].

Table 11 Parameter settings of SCHHO, SCAGA, MetaSCA, BPSO–SCA, ISSAFD and ISCA

Table 12 displays the average classification accuracy of the proposed IBSCA3 algorithm, BSCA and the other state-of-the-art SCA algorithms. In Table 12, we report the average value and standard deviation of the results. IBSCA3 consistently outperforms other algorithms when applied to UCI and COVID-19 datasets. Based on the classification accuracy of IBSCA3 and these algorithms, we determined that it had the best performance.

Table 12 Average and standard deviation of classification accuracy for the proposed IBSCA3 algorithm in comparison to BSCA and the other SCA variants algorithms

5.6 Performance analysis of IBSCA3 compared to other new nature-inspired metaheuristic algorithms

This section shows a comparison between IBSCA3 and other new nature-inspired metaheuristic algorithms, including: A novel Binary Farmland Fertility Algorithm (BFFAG) [122], African vultures optimization algorithm (AVOA) [123] and Artificial gorilla troops optimizer (GTO) [124]. Table 13 shows the parameter settings of these algorithms, as in [122,123,124].

Table 13 Parameter settings of BFFAG, AVOA and GTO

A comparison of the average classification accuracy achieved by the proposed IBSCA3 algorithm, BSCA and the other new nature-inspired metaheuristic algorithms is shown in Table 14, where we report the average value and standard deviation of the results. In all datasets from UCI and COVID-19, IBSCA3 delivers the best classification accuracy.

Table 14 Average and standard deviation of classification accuracy for the proposed IBSCA3 algorithm in comparison to BSCA and the other the other new nature-inspired metaheuristic algorithms

Consequently, the overall results summarized in all different sets of experiments indicate the strength of the IBSCA3 algorithm in improving the performance and convergence behavior of the original SCA when solving the FS problem.

5.7 Runtime performance comparison of IBSCA3 to existing algorithms

Tables 15161718 provide the running time comparison of IBSCA3, BSCA, and the other algorithms described in Tables 61012, and 14, respectively. The results are given in milliseconds, representing an average of 30 independent runs. For each algorithm in the tables, the values in the tables represent the run time to obtain the results after 100 iterations. As shown in the tables, IBSCA3 is faster than the other algorithms when applied to all datasets.

Table 15 Runtime Performance Comparison for the proposed IBSCA3 algorithm in comparison to existing algorithms
Table 16 Runtime Performance Comparison for the proposed IBSCA3 algorithm in comparison to the other algorithms that incorporate OBL, VNS and Laplace distribution
Table 17 Runtime Performance Comparison for the proposed IBSCA3 algorithm in comparison to the other SCA variants algorithms
Table 18 Runtime Performance Comparison for the proposed IBSCA3 algorithm in comparison to the other new nature-inspired metaheuristic algorithms

The experiments were conducted using an Intel Core i7-3517U, 1.90 GHz CPU with 16 GB RAM running 64-bit Windows. All the algorithms were implemented using Python programming language.

5.8 Statistical test results

An investigation of the significance of the results in Tables 61012, and 14 has been conducted. We applied both Friedman’s test and Wilcoxon’s test [125] to the classification accuracy in the tables with α = 0.05. Tables 192021 and 22 present the results of the Friedman’s test. The best ranks in each row are highlighted in bold. The average ranks of the algorithms were as follows (best to worst): In Table 19: IBSCA3, SBDA, LBDA, RBDA, QBDA, BSCA. BGWO, BGSA, and BBA. In Table 20: IBSCA3, VNS-HRS, IHHO, BGWOPSO, OSACI, ISSA, VNLHHO, SFS-LARLRM, IEOA, and DSSA. In Table 21: IBSCA3, ISSAFD, SCHHO, BPSO-SCA, SCAGA, MetaSCA, and ISCA. In Table 22: IBSCA3, GTO, AVOA, and BFFAG.

Table 19 Friedman’s test when comparing IBSCA3 with existing algorithms based on classification accuracy (Table 6)
Table 20 Friedman’s test when comparing IBSCA3 with the other algorithms that incorporate OBL, VNS and Laplace distribution based on classification accuracy (Table 10)
Table 21 Friedman’s test when comparing IBSCA3 with the other SCA variants algorithms based on classification accuracy (Table 12)
Table 22 Friedman’s test when comparing IBSCA3 with the other new nature-inspired metaheuristic algorithms based on classification accuracy (Table 14)

It is clear from the results that IBSCA3 achieves the best rank over 12 datasets, and competitive results for the other datasets. Therefore, IBSCA3 is the best in terms of the average of ranks among the other compared algorithms.

We also conducted the Wilcoxon’s test with α = 0.05 as summarized in Tables 232425 and 26 to evaluate the data in Tables 61012, and 14, respectively. Our purpose here is to evaluate the significance of the values of the classification accuracy of IBSCA3 compared to the other algorithms in the tables. The reported p-values indicate that the values of the classification accuracy of IBSCA3 are statistically significant compared to the values of the other algorithms.

Table 23 Wilcoxon’s test results when comparing IBSCA3 with existing algorithms based on classification accuracy (Table 6)
Table 24 Wilcoxon’s test results when comparing IBSCA3 with the other algorithms that incorporate OBL, VNS and Laplace distribution based on classification accuracy (Table 10)
Table 25 Wilcoxon’s test results when comparing IBSCA3 with the other SCA variants algorithms based on classification accuracy (Table 12)
Table 26 Wilcoxon’s test results when comparing IBSCA3 with the other new nature-inspired optimization algorithms based on classification accuracy (Table 14)

In addition, we used Mann-Whitney U test to compare IBSCA3 against all other algorithms. Based on the results, IBSCA3 produces significant results compared to the other algorithms except for IHHO (0.28014), VNS-HRS (0.35758), BGWOPSO (0.0536), AVOA (0.39532), and GTO (0.65272).

Accordingly, the statistical analysis gives evidence that the included modifications of IBSCA3 improve its search strategy, as compared to the original SCA algorithm, and thus achieves the highest accuracy for most of the datasets.

6 Conclusion and future work

This paper introduced three versions of a binary optimization algorithm by the name of Improved Binary Sine Cosine Algorithm with multiple exploration and exploitation approaches (IBSCA) for solving the Feature Selection problem. All versions of IBSCA (IBSCA1, IBSCA2, IBSCA3) employ an opposition-based learning approach in their initialization stage to generate a diverse population of candidate solutions. IBSCA2 and IBSCA3 use a combination of the variable neighborhood search and Laplace distribution to explore the search space using several mutation methods. Further, IBSCA3 improves the best candidate solution using Refraction Learning, which is a novel opposition learning approach that is based on the principle of light refraction. All versions of IBSCA use two-step transfer functions to convert continuous decision variables into binary ones.

The three versions of IBSCA were compared with each other using 18 FS datasets from UCI repository and one COVID-19 dataset. These datasets are suitable for comparison because the numbers of features, objects and classes in these datasets vary significantly. IBSCA3 was found to be the most efficient version of IBSCA. Furthermore, the performance of IBSCA3 was evaluated and compared to several popular binary algorithms (RBDA, LBDA, QBDA, SBDA, BGWO, BGSA, BBA, CHIO, CHIO-GC, ISSA, IHHO, OSACI, VNS-HRS, VNLHHO, IEOA, DSSA, SFS-LARLRM, BGWOPSO, SCHHO, SCAGA, MetaSCA, BPSO–SCA, ISSAFD, ISCA, BFFAG, AVOA, GTO) using the 18 FS datasets from UCI repository and a COVID-19 dataset. The overall simulation results indicate that IBSCA3 outperformed all comparative algorithms in terms of accuracy and number of features selected over most datasets.

It is worth mentioning that the performance of IBSCA is affected by the limitations of its methods. To begin, OBL and RL tend to generate good solutions at the beginning of the optimization process, but the generated solutions may converge to sub-optimality as the optimization process progresses [98]. Besides, every optimization problem requires a special OBL strategy that is suitable for the problem structure. In other words, there are no clear guidelines for designing OBL strategies for different optimization problems [126, 127]. Secondly, if the VNS method is implemented too frequently, the population of solutions could be spread over a larger area than necessary [128].

In the future, we are interested in conducting two research studies based on IBSCA3. We are going to apply IBSCA3 to multi-agent cooperative reinforcement learning [129, 130] based on the models described in [131, 132]. We also plan to incorporate the island model [96, 133,134,135,136,137] with IBSCA3 to further improve its performance over the FS problem. Applying the proposed methods on other FS applications can also be addressed in future work.