1 Introduction

Some types of cancer are considered genetic diseases [1] that occur when one or more cells begin to mutate in an out-of-control manner. A sequence of such mutations may spread cancer to other cells [2]. Microarray is one of the methods used to monitor gene expression levels in several tissues of the body [1]. In the biomedical era, cancer identification based on microarray gene expression data is a complex task because these datasets are high-dimensional, few observation data where the number of features is vast (most of which are irrelevant). However, the number of observations is limited, usually less than 100 [3, 4]. The predictive models that are built based on such datasets are unstable and prone to overfitting. Feature or gene selection methods are widely used to tackle this challenge by identifying the essential genes that lead to high predictive results [2]. Feature selection methods can be divided into filter-based, embedded-based, wrapper-based, ensemble-based, and hybrid-based methods [3].

On the one hand, in the filter-based method, each subset of features is assessed based on the essential data characteristics [5]. Several studies addressed gene selection problem using this type of method; for example, Rafael Arias-Michel and co-workers in [5] modified the approximated Markov blanket to consider the relationships among input data features. The correlation-based feature selection method was used to evaluate the relationships, while the fast correlation method was applied as a search approach. The experiments were conducted with support vector machines (SVM) with linear kernel function and Na\(\ddot{\iota }\)ve Bayesian network (NB). In the same context, the experts in [6] developed a two-stage filter selection method using Spearman’s correlation and distributed feature selection. This method was tested using decision tree (DT), k-nearest neighbor (kNN), SVM, and NB. In 2019, the authors in [7] used Relief and least absolute shrinkage and selection algorithm (LASSO) as feature selection approaches. The selected subsets of features were evaluated using three classifiers: multilayer perceptron networks (MLP), random forest (RF), and SVM. Both LASSO and SVM outperformed other developed approaches. The benefits of these sorts are that they have the rapid and simple computation, and they can simply cope with high-dimensional datasets. In contrast, the major drawback is that they disregard the interaction with the classification methods, which may result in poor predictive results [4].

On the other hand, wrapper-based methods, such as meta-heuristic algorithms, assess the usefulness of features by optimizing the error rate or accuracy during the training phase of a classifier [8]. For instance, Aiguo Wang and co-workers [9] suggested an efficient wrapper method by introducing the Markov blanket approach to reduce the required wrapper evaluation time. Similarly, the authors in [10] used a binary learning-based optimization algorithm to select the relevant features, and discriminant analysis, SVM, DT, and kNN were used as classifiers. In [11], the authors developed a firefly-based feature selection method to identify the most valuable genes. SVM with leave-one-out cross-validation was used as a classifier. Another study on this topic eliminated the redundant features from microarray gene expression datasets using a binary bat algorithm as a wrapper method, with an extreme learning machine as a classifier. The authors in [2] introduced the cuckoo search algorithm as a method for gene selection aided by a memory-based technique to identify the most valuable genes. In [12], the author used a simulated annealing optimizer as a wrapper algorithm, and RF, DT, and SVM were used as machine learning methods. These methods achieve superior predictive results but require high computational costs compared with filter-based methods.

Contrarily, embedded-based methods are comparable to wrapper-based methods in that they also rely on classifiers. In embedded methods, feature selection is conducted during the classifier’s training [13]. Embedded-based methods require less computational time than wrapper-based methods because they examine each feature subset during the algorithm’s learning process and avoid repeated execution [14]. For example, the authors in [15] developed clustering-guided sparse structural learning as an embedded feature selection method, and it also used as a classifier. Similarly, the authors in [16] developed an embedded method by combining a particle swarm optimization (PSO) algorithm with a C4.5 classifier. In another work [17], a support vector machine Bayesian T-test recursive feature elimination algorithm was developed for gene selection. The selected genes were evaluated using an SVM classifier. In [18], the authors used a weighted feature selection algorithm embedded in the bacterial colony optimization algorithm to select the most relevant features. This method decreased the computational time and enhanced the search ability. This study used an SVM and a kNN with k = 5 as classifiers. However, the primary disadvantage of ensemble-based methods is that the selection process relies on the used classifier. As a result, the selected subset of genes may lead to poor results when it is used as input to other classifiers.

On the contrary, ensemble-based methods address a variety of problems. The main idea behind ensemble methods is combining the results of two or more feature selection methods to produce a more stable feature subset [3]. The main benefits of these methods are that they are stable, reduce overfitting, and are more scalable for high-dimensional surfaces. To cite an example, the authors in [19] developed an ensemble-based method that combined both embedded and filter methods. Consistency-based filter, ReliefF, correlation-based feature selection, and information gain were the used filters, whereas feature selection–perceptron and support vector machine recursive feature elimination were used as embedded methods. The classification method used in this work was an ensemble classifier that integrated the instance-based learning method, C4.5, and NB. Similarly, the authors in [20] proposed an ensemble method by integrating two wrapper methods: the cuckoo optimization algorithm and genetic algorithm (GA). SVM and MLP were used as classifiers to evaluate the proposed method. In the same context, in [21], maximizing global diversity, error-correcting output codes, and maximizing local diversity were integrated to form a hierarchical ensemble-based feature selection method. The resulting features were evaluated using SVM and kNN. The authors in [22] developed a gene selection method that combined G-Forest and GA. RF was used to assess the selected features. The main drawbacks of ensemble-based methods are the considerable computation and space consumed.

Conversely, hybrid-based feature selection methods have been proposed to exploit the benefits of two or more feature selection methods, commonly filter- and wrapper-based, to address the severe flaws of previously mentioned methods [3]. For instance, the authors in [23] developed a hybrid method that combined GA with mutual information maximization, and the selected features were evaluated using SVM. Similarly, ReliefF and PSO were used as a two-phase feature selection method. In [24], the authors used ReliefF with a recursive binary gravitational search algorithm for gene selection. The expert in [25] proposed a hybrid method that integrated the analysis of variance (ANOVA), Ejaya, and forest optimization algorithm. The picked subset of genes was evaluated using an SVM classifier. Contextually, in [26], Relief and stacked autoencoder were developed as a hybrid gene selection method, while SVM and convolutional neural networks were used for classification. Moreover, in [27], the authors developed a hybrid method that integrated information gain and barnacles mating optimizer, and the resulting subset of genes was evaluated using SVM. The experts in [28] developed a hybrid method that combined minimum redundancy, maximum relevance (mRMR) and moth flame optimization (MFO) algorithm, combined with quantum computation.

Another study on this topic, [29], proposed a hybrid method that integrated an ensemble of Chi-square, ReliefF, and information gain with PSO to select the most relevant genes. In [30], the authors used mRMR and Manta ray foraging optimization for gene selection, and SVM was used as a classifier. A two-phase gene selection method was developed in [31], using Pasi Luukka’s filter-based feature ranking algorithm in the first stage to remove irrelevant genes. In the second stage, an enhanced version of the whale optimization algorithm (WOA), the altruistic whale optimization algorithm (AltWOA), was applied. This version applied the idea of altruism to the whale population. The experts in [13] developed a two-phase feature selection method. Firstly, an ensemble of several filter-based methods, including the Chi-square test, information gain ratio, and ReliefF, was used. Secondly, a recursive flower pollination search algorithm was used to obtain the final subset of genes. In [1], the experts developed a new hybrid method that consisted of two stages. Firstly, one-class SVM was used for anomaly detection. Secondly, a guided GA was developed to find the final subset of genes. The final subset of genes was evaluated using an SVM classifier.

Lastly, swarm optimization algorithms have attracted more attention because they achieve the highest performance in addressing various issues, such as wind energy optimization [32], sustainable energy [33], appointment scheduling problems [34, 35], and breast cancer diagnosis [36]. In gene selection, these algorithms were used as a wrapper-based or a stage in hybrid-based gene selection method. The commonly used swarm algorithms in the literature are artificial bee colony (ABC) [37, 38], harmony search algorithm (HSA) [39, 40], flower pollination search algorithm [13, 41, 42], GA [43,44,45], WOA [31, 46], MFO [28, 47], and PSO [29, 48,49,50].

As a result of the interaction among genes in microarray gene expression data, expanded search space, and stochastic nature of the swarm algorithms, most of them are exposed to the local optima issue and may experience degraded performance. Thus, there is a chance to enhance the search strategies to effectively explore the high-dimensional space, avoid local optima, and exploit the global solution more reliable, which is needed to tackle the gene selection problem properly. Gene selection issue can be successfully tackled by combining two or more swarm algorithms and altering or improving existing ones. The main drawbacks of swarm algorithms stated in the literature are the algorithm’s slow convergence rate, their tendency to get stuck in local optima, the impact of the algorithm’s parameters on its performance, and how poorly the exploration and exploitation phases are balanced.

Consequently, this paper studies the most recent swarm algorithms and their characteristics to use one of them in developing a new hybrid-based gene selection method that addresses the gene selection issue. Mainly, spider wasp optimizer (SWO) [51] is a recently developed swarm optimization algorithm that simulates what spider wasps accomplish in nature when they hunt, build nests, and mate. The SWO algorithm has many novel updating mechanisms. Therefore, it can address a variety of optimization problems with various exploration and exploitation strategies. The SWO algorithm has several benefits. The stability of the SWO performance is evaluated using 23 test functions, CEC2014, CEC2017, and CSCE2020 benchmark functions, and two engineering design issues. The SWO was compared with several optimization algorithms, including recently published and commonly used algorithms. It outperforms the artificial gorilla troops optimizer, sine–cosine algorithm, the slime mold algorithm, equilibrium optimizer, gray wolf optimizer, fox optimizer, African vultures optimization algorithm, whale optimization algorithm, and marine predators algorithm (MPA).

Even though the SWO algorithm has yielded favorable results, it is not entirely impervious to the shortcomings that swarm algorithms may experience. Despite sufficient and robust optimization techniques, meta-heuristics can run into issues. However, any meta-heuristics algorithm has weaknesses that degrade functionality, lead to slow convergence, or get stuck in local optima.

Motivated by the necessity of feature selection for more rapid computation and accurate classification results, this paper introduces an updated SWO version, known as RSWO-MPA, that utilizes the MPA during initialization. Afterward, this updated version recursively uses SWO to decrease the chosen feature. During each invocation to SWO, the SWO algorithm operates on a reduced dataset from the previous invocation. In addition, a two-phase gene selection method is proposed. In the first phase, ReliefF is used to remove redundant genes. In the second phase, RSWO-MPA is proposed as a wrapper algorithm to address the limitations of the developed gene selection algorithms.

Key contributions of this study are as follows:

  • Develop a new version of SWO named a recursive spider wasp optimizer guided by marine predators algorithm (RSWO-MPA).

  • Propose a two-stage feature selection method for gene expression analysis.

  • Use ReliefF filter-based method with RSWO-MPA as a wrapper-based feature selection method.

  • Assess RSWO-MPA using eight benchmark microarray gene expression datasets to prove its efficacy.

  • Compare the proposed RSWO-MPA with seven commonly used and recently developed algorithms. These algorithms include Kepler optimization algorithm (KOA), Harris hawks optimization (HHO), social ski-driver optimization (SSD), whale optimization algorithm (WOA), ABC, original MPA, and original SWO.

  • Demonstrate that the proposed RSWO-MPA outperforms other state-of-the-art gene selection methods.

The rest of the paper is structured as follows: Sect. 2 presents an overview of the SWO algorithm used in the proposed method. Section 3 introduces the proposed method. The experimental findings obtained from the proposed method to tackle gene selection are discussed in Sect. 4. Section 5 presents the conclusion of this paper and discusses future work.

2 Materials and methods

This section discusses the materials and methods required to implementing the proposed method.

2.1 Spider wasp optimizer (SWO)

SWO is a newly proposed swarm optimization algorithm that mimics what spider wasps (females) do in nature when they hunt, build nests, and mate; it has been proposed in 2023 by Mohamed Abdel Basset and co-workers [51]. In the following lines, we briefly explain the behavior of wasps, which is imitated in the SWO algorithm. Firstly, searching behavior aims at the food/prey at the beginning of the optimization steps to obtain the suitable spider for larval evolution. Thirdly, nesting attitude mimics the method of pulling the prey to the nests with a size suitable for the egg and prey. Fourthly, mating behavior emulates the characteristics of the progeny constructed by hatching the egg utilizing the uniform crossover operator between the male and the female wasps with a distinct possibility which is referred to as crossover rate (CR). The following subsections provide a mathematical model of those behaviors.

2.1.1 Creation of the initial population

Each spider wasp considers as a solution in the existing generation. Equation 1 shows the encoding of this solution in the N-dimension vector.

$$\begin{aligned} \overrightarrow{SpW} = [p_1,p_2,...,p_N] \end{aligned}$$
(1)

A random population that consists of M vectors (i.e., solution) can be created between the predefined upper bound \(\textit{ub}\) and lower bound \(\textit{lb}\) using Eq. 2.

$$\begin{aligned} SpW_{popu} = \begin{bmatrix} spw_{1,1} &{}spw_{1,2} &{} spw_{1,3} &{} \cdots &{} spw_{1,N}\\ spw_{2,1} &{}spw_{2,2} &{} spw_{2,3} &{} \cdots &{} spw_{2,N} \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ spw_{M,1} &{}spw_{M,2} &{} spw_{M,3} &{} \cdots &{}spw_{M,N} \end{bmatrix}, \end{aligned}$$
(2)

where \(SpW_{popu}\) refers to the initial population of spider wasps. Equation 3 is applied to create any random solution in the search area.

$$\begin{aligned} \overrightarrow{SpW_p^g} = \overrightarrow{lb}+ \overrightarrow{rand} \times (\overrightarrow{ub}- \overrightarrow{lb}), \end{aligned}$$
(3)

where g indicates the index of current generation, and p refers to the index of population (p = 1, 2,..., M). \(\overrightarrow{rand}\) denotes a vector of N-dimension that has a random initial values between 0 and 1.

2.1.2 The attitudes of seeking prey and nesting

The wasps are female and belong to the spider family randomly search for food/spiders within the search regions to obtain the most appreciated spider, and this is pointed out as the searching or exploration phase. Thereafter, spider wasps surround the spider and hunt it by flying or running. This phase is known as the surrounding and chasing. In the final phase, the spider wasp will pull the palsied prey into the pre-prepared nest to put the egg over its stomach.

2.1.3 Exploration stage: searching stage

This phase mimics the attitude of the spider wasps in obtaining the best prey to supply their larvae. In this phase, the spider wasp randomly searches the search area with a fixed tread, as earlier explained, to obtain the prey that will be relevant for their offspring. Equation 4 models this attitude.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} =\overrightarrow{SpW_p^{g}} +const_1* (\overrightarrow{SpW_x^{g}} -SpW_y^{g}), \end{aligned}$$
(4)

where \(\overrightarrow{SpW_p^{g+1}}\) is the updated position of each female wasp with a static motion (\(const_1\)) through the current direction. g refers to the index of current generation, and p is the index of population (p=1,2,...,M). \(SpW_x^{g}\) and \(SpW_y^{g}\) are two random solutions that are used to identify the direction of exploration, followed by the female wasps, and x and y represent their indices. The next formula is used to compute \(const_1\).

$$\begin{aligned} const_1 = |rand_{norm}|*rand_1, \end{aligned}$$
(5)

where \(rand_{norm}\) is a random number that follows the normal distribution, while \(rand_1\) indicates a random number between 0 and 1.

Female wasp occasionally loses the path of the prey that has fallen from the orb. Therefore, they inspect the whole area surrounding the precise location where this prey fell. To model this attitude, a new formula with distinct exploration strategy was developed to allow the SWO algorithm to search the area surrounding the fallen prey with a smaller step size than that of Eq. 4.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} =\overrightarrow{SpW_{curr}^g} +const_2* (\overrightarrow{lb} + \overrightarrow{rand_2} *(\overrightarrow{ub} -\overrightarrow{lb})), \end{aligned}$$
(6)

where \(\overrightarrow{SpW_{curr}^g}\) is a randomly selected solution with curr index. \(const_2\) is a static motion that is utilized to specify the direction of search, and \(rand_1\) indicates a random number between 0 and 1. lb and ub indicate the lower boundary and upper boundary, respectively. \(const_2\) is computed using Eq. 7.

$$\begin{aligned} const_2 =\frac{1}{1+e^{lr}} * cos(2 \pi lr), \end{aligned}$$
(7)

where lr is a random number between 1 and -2.

Equations 4 and 6 are used together to explore the search regions and discover the most encouraging locations. Eventually, the selection between these equations to produce the updated position for the female wasp is executed randomly as described in Eq. 8.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = {\left\{ \begin{array}{ll} Equation \, 4 &{} rand_3 < rand_4 \\ Equation \, 6 &{} otherwise\end{array}\right. }, \end{aligned}$$
(8)

where \(rand_3\) and \(rand_4\) are random numbers in [0,1].

2.1.4 Exploration and exploitation stage: tracking and getting free stage

After locating the spider, the spider wasp seeks to kill it in the middle of the web. Regardless, the spider wasps fall on the ground to get away from them. Afterward, there are two scenarios. The first scenario is that spider wasps track the fallen prey to catch them and put them in the pre-prepared nests. This scenario is modeled in Eq. 9. The second scenario is that the wasp cannot catch the dropped prey. Equation 11 models this attitude, and the trade-off between these scenarios is attained randomly, as formulated in Eq. 13.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = \overrightarrow{SpW_p^{g}} +C *| \overrightarrow{rand_5} * \overrightarrow{SpW_x^{g}} -\overrightarrow{SpW_p^{g}}|, \end{aligned}$$
(9)

where g is the index of current generation, p refers to the index of population, \(SpW_x^{g}\) is a random solution, and x represents its index. \(\overrightarrow{rand_5}\) denotes a vector of a random values in the interval [0,1]. C is a distance controlling parameter that is used to identify how fast the wasp move and begins with a speed of 2 and linearly decreases to 0. Equation 10 is used to compute C.

$$\begin{aligned} C = (2-2*(p/p_{max}))*rand_6, \end{aligned}$$
(10)

where \(p_{max}\) indicates the maximum generation, and \(rand_6\) denotes a random value in the interval [0,1]. C is a distance controlling parameter that is used to identify how fast the wasp move and begins with a speed of 2 and linearly decreases to 0.

In the second scenario, the distance between the spider wasp and the spider gradually rises. This phase is initially exploitation, and it is turned into exploration with the increase in the distance.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = \overrightarrow{SpW_p^{g}}*\overrightarrow{vec}, \end{aligned}$$
(11)

where \(\overrightarrow{vec}\) indicates a vector, its values between v and -v. Equation 12 computes the value of v which follows the normal distribution. This value is gradually raises the distance between spider wasp and the prey.

$$\begin{aligned} v= 1-(p/p_{max}), \end{aligned}$$
(12)
$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = {\left\{ \begin{array}{ll} Equation \, 9 &{} rand_3 < rand_4 \\ Equation \, 11 &{} otherwise\end{array}\right. }, \end{aligned}$$
(13)

At the beginning of the optimization procedure, all the wasps will use the exploration strategy to globally explore the search area of the optimization issue to obtain the most appropriate area that might include the near-optimal solution. The algorithm will use the following (tracking) and escaping mechanisms to explore and exploit the region close to the current wasps during the iteration pass, hoping to avoid getting stuck in the local optima. Eventually, the balance between both stages is adapted based on Eq. 14.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = {\left\{ \begin{array}{ll} Equation \, 8 &{} rand_p < v \\ Equation \, 13 &{} otherwise\end{array}\right. }, \end{aligned}$$
(14)

where \(rand_p\) indicates a random number in the interval of 0 and 1.

2.1.5 Exploitation stage: nesting attitude

Female wasps drag the immobilized spider to an already prepared nest. Spider wasps exhibit a variety of nesting attitudes, such as forming cells in the earth, constructing mud nests on leaves or rocks, and utilizing existing cavities like spider or beetle burrows. The SWO algorithm emulates these actions using two distinct equations given that spider wasps demonstrate diverse nesting attitudes. The initial equation (Eq. 15) entails pulling the spider toward the area with the most ideal spider candidate and considering it the optimal location for nest-building, where the incapacitated spider will be deposited and an egg laid on its abdomen.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} =\overrightarrow{SpW^*} + cos(2 \pi lr)*(\overrightarrow{SpW^*}-\overrightarrow{SpW_p^{g}}), \end{aligned}$$
(15)

where the term \(\overrightarrow{SpW^*}\) denotes the optimal solution found-so-far.

The second equation aims to construct a spider’s nest at the location of one female spider selected randomly from the group while considering a separate step size to prevent two nests from being built in the same spot. This equation was created in the following manner:

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}}= & {} \overrightarrow{SpW_x^{g}} +rand_3 *|\wp |*(\overrightarrow{SpW_x^{g}}-\overrightarrow{SpW_p^{g}}) \nonumber \\{} & {} \quad + (1-rand_3)*\overrightarrow{V}*(\overrightarrow{SpW_y^{g}}-\overrightarrow{SpW_z^{g}}), \end{aligned}$$
(16)

where \(rand_3\) is a randomly generated number within the range of 0 and 1. The value of \(\wp\) is determined by a certain method called “levy flight.” x, y, and z are indices that represent three solutions randomly chosen from a population. \(\overrightarrow{V}\) is a binary vector that helps decide when to apply a step size to avoid creating two nests at the same spot. Lastly, \(\overrightarrow{V}\) is calculated using Eq. 17.

$$\begin{aligned} \overrightarrow{V} ={\left\{ \begin{array}{ll}1 &{} \overrightarrow{rand_4} > \overrightarrow{rand_5}\\ 0 &{} otherwise \end{array}\right. }, \end{aligned}$$
(17)

where \(\overrightarrow{rand_4}\) and \(\overrightarrow{rand_5}\) indicate two vectors representing random values in interval [0,1].

According to the formula given in Eq. 18, Eqs. 15 and 16 are exchanged in a random manner.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = {\left\{ \begin{array}{ll}Equation \, 15 &{} rand_3 <rand_4\\ Equation \, 16&{} otherwise\end{array}\right. }, \end{aligned}$$
(18)

Ultimately, the balance between the attitudes of seeking prey and nesting is attained through the use of Eq. 19. This involves all spider wasps searching for their respective spiders at the beginning of the optimization procedure, followed by pulling their appropriate wasps into the pre-arranged nests.

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = {\left\{ \begin{array}{ll}Equation \, 14 &{} p <M*v\\ Equation \,18&{} otherwise\end{array}\right. }, \end{aligned}$$
(19)

2.1.6 Mating behavior

The SWO algorithm takes into account the mating behavior of wasps. One of the key features of spider wasps is their ability to recognize gender, which is identified by the host size where an egg is deposited. Small spider wasps indicate male offspring, while larger wasps represent female offspring. In this approach, every spider wasp indicates a potential solution in the current iteration, and a spider wasp egg signifies a recently created possible solution for that iteration. Equation 20 is used to generate these spider wasp eggs (i.e., new solutions).

$$\begin{aligned} \overrightarrow{SpW_p^{g+1}} = Crossover(\overrightarrow{SpW_p^{g}},\overrightarrow{SpW_m^{g}},CrRa), \end{aligned}$$
(20)

where Crossover is an operator that is used to perform uniform crossover between the solutions \(\overrightarrow{SpW_p^{g}}\) and \(\overrightarrow{SpW_m^{g}}\) with a probability called the crossover rate (CrRa). The vectors \(\overrightarrow{SpW_p^{g}}\) and \(\overrightarrow{SpW_m^{g}}\) correspond to the female spider wasp and male, respectively. The SWO algorithm generates male spider wasps that are distinct from the female wasps using Eq. 21.

$$\begin{aligned} \overrightarrow{SpW_m^{g+1}} = \overrightarrow{SpW_p^{g}} + \overrightarrow{e^l}*|B|*\overrightarrow{vec_1}+(1-e^l)*|B_1|*\overrightarrow{vec_2}, \end{aligned}$$
(21)

where B and \(B_1\) are two randomly generated values, which follow a normal distribution. The exponential constant, e, is also included in the equation. Additionally, the formula includes the generation of vectors \(\overrightarrow{vec_1}\) and \(\overrightarrow{vec_2}\), which are calculated using the following equations:

$$\begin{aligned} \overrightarrow{vec_1}= & {} {\left\{ \begin{array}{ll} \overrightarrow{x_a}-\overrightarrow{x_i} &{} f(\overrightarrow{x_a}) < f(\overrightarrow{x_a}) \\ \overrightarrow{x_i}-\overrightarrow{x_b} &{} otherwise\end{array}\right. }, \end{aligned}$$
(22)
$$\begin{aligned} \overrightarrow{vec_1}= & {} {\left\{ \begin{array}{ll} \overrightarrow{x_b}-\overrightarrow{x_c} &{} f(\overrightarrow{x_b}) < f(\overrightarrow{x_c}) \\ \overrightarrow{x_c}-\overrightarrow{x_b} &{} otherwise\end{array}\right. }, \end{aligned}$$
(23)

These formulas involve randomly selecting three solutions from the population using indices a, b, and c, where a, b, and c are distinct from each other as well as from the current solution i. Crossover is then employed to combine genetic material from two-parent female spiders, resulting in an offspring (or egg) that inherits traits from both parents. The balance between tracking and mating attitudes is determined by a trade-off rate (TR).

2.1.7 Reduce the population size and reserving the memory

Once a female spider lays its egg on the host’s abdomen, it seals the nest and departs the area to avoid detection. This indicates that the female’s role in the optimization steps is largely fulfilled, and other wasps can take over their function evaluations for the remainder of the procedure, leading to potentially improved outcomes. During the iteration run, certain wasps in the swarm will be eliminated to generate more function assessment for the remaining wasps. This also serves to decrease population diversity and speed up convergence toward achieving a near-optimal solution. The length of the recent population in each evaluation is adjusted using the following equation:

$$\begin{aligned} M =M_{min} + (M-M_{min}) \times v, \end{aligned}$$
(24)

Equation 24 involves setting a minimum population size (\(M_{min}\)) to prevent getting trapped in local optima at various phases of the optimization process. In addition, the SWO incorporates memory saving technique to retain each wasp’s best spider position for use in subsequent generations. Essentially, each solution generated by a wasp is compared to its equivalent in the previous generation, and if it is a better fit, it replaces the current one. Algorithm 1 provides a pseudo-code of SWO algorithm.

Algorithm 1
figure a

Pseudo-code of SWO algorithm

3 Methodology

This section explains the proposed methodology in detail. Section 3.1 outlines the framework of the RSWO-MPA optimization algorithm. In Sect. 3.2, the proposed hybrid gene selection method based on the RSWO-MPA is explained. Finally, Sects. 3.3 and 3.4 introduce the important settings for RSWO-MPA.

3.1 The proposed RSWO-MPA

The SWO algorithm is a new optimization algorithm that can deal with challenging problems, such as gene selection, by selecting the most valuable genes with high results. However, it cannot guarantee that it will select fewer genes. To address this issue, RSWO leverages recursive algorithms from computer science to determine a smaller set of genes without sacrificing accuracy. Firstly, the MPA is used initially to reduce the search space and obtain the initial solution (i.e., genes). The MPA has an expanded foraging strategy, specifically L\(\acute{e}\)vy and Brownian movements in ocean predators, and the optimal confrontation rate policy in the biological interaction between carnivore and prey. This algorithm follows the laws that inherently control optimal foraging techniques and encounters a rate policy between carnivore and prey in marine ecosystems [52]. Secondly, RSWO uses this solution as input to search for an even smaller subset of genes in subsequent stages. This iterative process does not harm classification accuracy and reduces the search space at each step, resulting in the selection of a smaller number of highly accurate genes. The recursive procedure stops when further reduction of the gene subset leads to reduced prediction results.

The key stages of the proposed RSWO-MPA are summarized as follows:

  1. 1.

    Initialization stage: In the first run of RSWO, MPA is used to obtain the initial optimal genes (\(\overrightarrow{OG_{initial}}\)) to reduce the search space of SWO.

  2. 2.

    Recursive stage: In this stage, the SWO is run recursively. The first invocation of SWO algorithm uses \(\overrightarrow{OG_{initial}}\) to obtain a reduced microarray gene expression dataset (\(D_{reduced}\)). Moreover, the obtained fitness (BF) is stored in bestFitness. Afterward, in each invocation of SWO, the SWO algorithm is run using the \(D_{reduced}\) dataset. To select optimal genes (\(\overrightarrow{SpW^*}\)), which is used to reduce the dimension of \(D_{reduced}\). Moreover, the obtained result is compared with bestFitness, and the highest value is used to update the value of bestFitness. Then, the SWO is rerun using the new reduced dataset.

  3. 3.

    Termination stage: The recursive process is continued until obtaining an \(\overrightarrow{SpW^*}\), which reduces the prediction results (i.e., BF), or the number of selected features becomes one.

Figure 1 shows the flowchart of the proposed RSWO-MPA.

Fig. 1
figure 1

Flowchart of the proposed RSWO-MPA algorithm

3.2 The proposed gene selection method

The proposed gene selection model comprises three phases: data preprocessing, gene selection, and classification. The following subsections explain the phases of the developed gene selection method. Algorithm 2 summarizes these phases.

3.2.1 Data preprocessing phase

Data preprocessing plays a significant role in the performance of machine learning algorithms. Regrettably, gene expression datasets are not always clean. Thus, examining and enhancing data quality is a crucial step because low-quality input data have a significant influence on machine learning algorithms. The following lines explain the applied data preprocessing methods.

  1. 1.

    Splitting: A stratified train–test splitting is used to divide the datasets into two sets—one for training and the other for testing. This technique ensures that each set has an equal percentage of instances for every class. The dataset is split into 75% for training and 25% for testing. A technique known as k-folds cross-validation with k = 3 is applied to the training set to get the meta-parameters for SVM.

  2. 2.

    Imputation: Missing genes are imputed using the kNN algorithm. The missing values for each instance are obtained using the mean values of their k-nearest neighbors from the training set. Two observations are near if the values of the existing genes are close. If no class label is available for an instance, it is typically removed instead of imputing it.

  3. 3.

    Normalization: Normalizing the data aims to achieve a fair distribution of weights among genes and balance the model’s sensitivity with respect to their magnitude. This process has a significant impact on several classifiers, such as kNN and SVM. The gene values in this research are standardized to range from 0 to 1 using a min–max normalizer.

3.2.2 Gene selection phase

A hybrid gene selection method consisting of two stages is designed that combines filter-based with wrapper-based methods. In the first stage, ReliefF as an efficient, accurate filter-based method is used to select the most relevant genes. ReliefF helps reduce the dimensionality of the dataset. This can efficiently identify a subset of promising genes for further evaluation by the developed RSWO-MPA and lead to faster convergence in the second (wrapper) phase. In this stage, the top 100 genes are selected as suggested in the previous studies [6, 27, 29]. Moreover, the proposed RSWO-MPA is used in the second stage as a wrapper method to explore the reduced search space (containing 100 genes) and to select the most beneficial genes. The RSWO-MPA selects a subset of genes which are evaluated using SVM classifier that is used as a fitness function. The meta-parameters of SVM are adjusted through a grid search algorithm.

3.2.3 Classification phase

Following the execution of the two-stage gene selection method, an optimized SVM is developed to evaluate the effectiveness of the developed approach. It is reported that SVM achieved superior performance compared with other classifiers, such as logistic regression, DT, RF, NB, MLP, and kNN, in the domain of cancer classification based on the microarray gene expression datasets [3, 6, 7].

Algorithm 2
figure b

Pseudo-code of proposed gene selection method

3.3 Solution representation

Overall, to use an optimization algorithm, such as SWO and RSWO-MPA, as a feature selection method, we typically represent the search space as either set of possible feature indices or binary solutions. In this paper, each candidate solution (i.e., spider wasp) would then be represented by a decimal vector with d items, \(\overrightarrow{SpW_p} =(g_1,g_2,...,g_d)\), where d is the problem dimension (i.e., the number of genes that are used as an input to the optimizer), and g is a gene. In, \(\overrightarrow{SpW_p}\), each \(g_i\) has a decimal value that represents the index of a gene in dataset. Where \(\overrightarrow{SpW_p}\) denotes the candidate solution in continues space at p iteration, \(fun_1 ()\) is utilized to round each gene value, and \(fun_2 ()\) is applied to eliminate the duplicated genes indices. \(\overrightarrow{SpW_p^{new}}\) represents a candidate solution in decimal space. Figure 2 depicts a diagrammatic example of mapping continuous solution to a decimal one.

Fig. 2
figure 2

A graphical example of feature representation

3.4 Fitness function and its evaluation

The fitness function is considered a key factor in designing optimization methods. It is applied to assess how well each solution performs during optimization. Discovering the best gene subset in wrapper-based gene selection methods is challenging because it requires identifying a subset with minimal genes and maximum accuracy. A superior solution is one that has both high classification accuracy and fewer genes. To achieve this, an effective fitness function must reconcile these two competing goals and achieve a balance between them. This paper employs a fitness function, depicted in Fig. 3, that considers both accuracy and the number of selected genes (length(\(\overrightarrow{SpW_p^{new}}\))). The fitness function uses the average of the k-fold cross-validation algorithm, where k=3. The accuracy and gene count obtained from this function are compared to the best global solution (\(\overrightarrow{SpW^*}\)) and its corresponding accuracy (BF).

Fig. 3
figure 3

A flowchart of fitness computation

4 Experimental results and discussion

4.1 Experimental setup

The Python language was utilized to develop the proposed method. In order to assess how well the RSWO-MPA method works, eight available high-dimensional benchmark microarray datasets with various disease types were utilized. The characteristics of the gene expression datasets employed in this study are presented in Table 1.

Table 1 List of publicly available microarray datasets used in this paper and corresponding URLs

4.2 Parameter settings

The experiments were conducted on an Intel(R) processor with a Core(TM) i7-10750 H operating at 2.60 GHz and 16.0 GB of memory. This study utilized a population size parameter of 20, a maximum iteration parameter of 150, lower bounds of 0, and upper bounds of 99. The SWO-MPA was compared to other swarm optimization algorithms and the cutting-edge gene selection approaches based on the average classification accuracy and the average number of selected genes from 20 separate runs. Table 2 shows the common parameter settings for the proposed and other swarm algorithms used. This paper used the default algorithm’s parameters for all used swarm algorithms.

Table 2 Parameter configuration for RSWO-MPA and other swarm algorithms which are used for comparison

4.3 Assessment criteria

The proposed algorithm is assessed on each dataset using several metrics, such as accuracy, number of selected genes [26, 28, 29, 61], and a t-test to maintain conformity with prior studies.

4.4 RSWO-MPA results and discussion

We conducted a four-stage experimental analysis. In the first stage, we compared the effectiveness of the proposed RSWO-MPA with various filter-based algorithms in Sect. 4.4.1. In the second stage, we compared the performance of RSWO-MPA with other swarm algorithms in Sect. 4.4.2. We applied several statistical metrics in the third stage to validate the proposed algorithm in Sect. 4.4.3. Finally, we compared the effectiveness of our algorithm with the state-of-the-art gene selection algorithms in Sect. 4.4.4.

4.4.1 Comparison of proposed RSWO-MPA with existing filter-based methods

In Table 3, various filter-based algorithms were compared with the proposed RSWO-MPA method based on how the SVM performed with them. These algorithms are ReliefF, Fisher score, information gain, and minimum redundancy, maximum relevance (mRMR). The best results are highlighted in bold. It is observed that the proposed RSWO-MPA method exceeds other filter-based methods in six out of eight datasets, while in DS 6 and DS 8, all methods recorded %100 accuracy, but the number of selected genes using filter-based methods is more than our proposed method.

Figure 4 shows the average accuracy obtained from overall datasets for many filter-based feature selection methods. As shown in Fig. 4, ReliefF filter-based feature selection method performed better than others filter-based methods in an average accuracy of the eight datasets used, followed by mRMR. Therefore, the ReliefF is selected to be used in the first phase of the proposed gene selection method.

Table 3 The performance of row data, proposed RSWO-MPA method, and other commonly used filter-based methods, in terms of accuracy
Fig. 4
figure 4

The average prediction results of several filter-based feature selection methods overall datasets

4.4.2 Comparison of proposed RSWO-MPA with other swarm algorithms

The suggested RSWO-MPA was compared with seven swarm optimization algorithms, comprising recent and well-known algorithms. These algorithms are KOA, HHO, SSD, WOA, ABC, original MPA, and original SWO. This stage of the experiments aimed to compare the RSWO-MPA with other swarm algorithms based on two factors: accuracy and the count of selected features.

Table 4 displays the count of chosen features along with its corresponding fitness value. The most exceptional outcomes are highlighted in boldface. Figure 5 shows the average results over twenty independent runs for accuracy and the count of picked features across all datasets. Accordingly, the RSWO-MPA had the best accuracy among all algorithms while using the fewest features. As illustrated, the performance of the RSWO-MPA in solving the gene selection issue is satisfactory since its results are better than other algorithms for the given datasets.

Table 4 Average performance of several swarm algorithms over 20 independent runs
Fig. 5
figure 5

The average performance of various swarm algorithms overall datasets, average 20 independent runs

4.4.3 Statistical measurements

Table 5 compares the p-values obtained via a parametric, two-sample t-test between RSWO-MPA and alternative algorithms with respect to their accuracy and the number of selected genes. This analysis aims to determine if there are any notable variations between the RSWO-MPA and others in terms of these two factors.

A p-value less than 0.05 (5%) is considered statistically significant, meaning that there is less than a 5% chance that the observed difference between the two algorithms occurred by chance. On the other hand, a p-value greater than 0.05 indicates that there is not enough evidence. On the contrary, if the p-value is 1, it usually indicates no statistical significance between the two groups being compared [62]. Conversely, the nan p-values in this table indicate that the standard deviation of one of the groups being compared is 0. This can happen if all values for a group are the same. If the standard deviation is 0, the t-test statistic will also be 0, which results in a division by zero error when calculating the p-value.

We initially assess the p-value of accuracy for the proposed algorithm and the other algorithms. If there is no statistically significant difference between them, we subsequently evaluate the number of selected genes as a means of comparison. The bold highlighting indicates a significant difference at a P less than 0.05, suggesting that RSWO-MPA performs better than the other compared algorithms. However, other results in the table show similar or worse performance than the compared algorithms. Based on this table, the proposed RSWO-MPA outperforms other algorithms.

Table 5 A comparison of p-values obtained through the t-test between RSWO-MPA and other algorithms in terms of their accuracy and the number of genes they select

Table 6 displays the statistical performance measurements for the developed optimization algorithms, precisely their means and standard deviations over 20 independent runs. The best outcomes are highlighted in boldface. Generally, RSWO-MPA outperforms the other algorithms regarding average mean accuracy with the lowest standard deviation across all datasets.

Table 6 Statistical measures, including mean and standard deviation (Std.), were calculated from the results, using a stop criterion of 150 iterations. All swarm optimization algorithms are used an optimized SVM to obtain accuracy (fitness values) in 20 independent runs

4.4.4 Comparison with the state-of-the-art feature selection methods

Table 7 shows a comparative analysis of the classification accuracy for all the datasets used in the study with some of the cutting-edge methods. The selected publication belonged to various types of feature selection algorithms. The most exceptional results are emphasised in bold. The suggested method recorded an improved performance, achieving scores of 100%, 94.51%, 98.13%, 95.63%, 100%, 100%, 92.67%, and 100% on the DS 1, DS 2, DS 3, DS 4, DS 6, DS 7, and DS 8 datasets, respectively. The datasets with the highest accuracy were DS 1, DS 3, DS 5, DS 6, and DS 8, achieving an accuracy of 100%, 98.13%, 100%, 100%, and 100%, respectively, compared to the other datasets. Overall, the proposed gene selection method based on RSWO-MPA achieved much better performance compared with other methods. There was a significant reduction in the number of selected genes, particularly for DS 1, DS 2, DS 3, and DS 4.

Table 7 Comparison between the proposed RSWO-MPA method with some state-of-the-art algorithms, where the accuracy is given in percentage (%)

4.4.5 The drawbacks of the suggested RSWO-MPA method

Although the proposed algorithm demonstrates greater accuracy and selects fewer genes than other state-of-the-art algorithms, some limitations must be addressed in the future research. These limitations can be summarized as follows:

  • Computational Complexity: Hybrid gene selection and hybrid swarm optimization methods can be computationally expensive, particularly for large-scale datasets, such as microarray gene expression.

  • Longer Execution Time, Smaller Gene Subset: RSWO-MPA may require a longer time to execute than the original SWO, but this is generally acceptable because it selects a smaller subset of genes.

  • Parameter Tuning Challenges: Hybrid swarm optimization algorithms require careful tuning of various parameters to achieve optimal performance. This can be challenging, as the best combination of parameter settings may vary depending on the dataset and the specific problem being addressed.

  • Limited Generalizability: Because hybrid gene selection and hybrid swarm optimization methods often rely on specific assumptions and modeling strategies, their generalizability to other problems/domains may be limited.

Overall, while there are these limitations of hybrid gene selection and hybrid swarm optimization methods, they offer promising avenues for improving gene selection in bioinformatics applications. However, careful consideration must be given to their specific limitations.

4.4.6 Health-care implications

Implementing gene selection methods in health-care systems offers profound implications for health-care management and societal well-being. These methods enable health-care leaders and policymakers to make informed, strategic decisions aimed at improving disease prediction and personalizing medical treatments. By identifying genetic predispositions to diseases, doctors can treat patients early. Moreover, gene detection facilitates the development of new medical protocols and therapies, allowing for a more effective allocation of health-care resources.

5 Conclusion and future work

To conclude, this paper proposed a novel gene selection method to address the challenges of high-dimensional and overfitting biological datasets, such as microarray gene expression. The developed method consists of two phases. The ReliefF as filter method is employed in the first phase to reduce the number of genes. Then, the proposed RSWO-MPA is used in the second phase to identify the most informative genes. Moreover, the proposed methodology was thoroughly evaluated on eight microarray gene expression datasets and compared to seven existing meta-heuristic algorithms. According to the experimental results, the developed method outperformed all other compared algorithms and methods, including state-of-the-art methods, in terms of accuracy, number of selected features, and stability across all datasets used. These findings demonstrate the potential utility of the developed method for addressing gene selection challenges in biological research.

As a future direction, our goal is to assess the efficacy of the suggested methods on various dataset modalities. Additionally, we aim to construct a fusion model that can address multi-omics datasets such as RNA and DNA, not exclusively gene expression data. Future studies might concentrate on investigating the effectiveness of incorporating deep learning techniques. Deep learning has shown remarkable successes in various fields, including image and speech recognition, natural language processing, and bioinformatics. Therefore, integrating deep learning algorithms such as convolutional neural networks or recurrent neural networks with traditional feature selection techniques may lead to improved performance in gene selection analysis. Additionally, exploring other optimization algorithms or meta-heuristic approaches could be another direction for future research to enhance the efficiency and effectiveness of gene selection methods.