1 Introduction

Due to the exponential development in the amount of data being processed and stored by information systems, it is harder to retrieve relevant information. However, these stored data may include many attributes that may not be important or irrelevant. FS methods aim to pick an Optimal Feature Subset (OFS), which reduces overfitting and the computational time of machine learning models by eliminating redundant features while maintaining high classification performance [1,2,3]. Finding OFS in a broad search space is considered a multi-objective issue, i.e., minimizing the number of selected features and maximizing accuracy [4,5,6,7].

Nature-inspired methods are inspired by natural phenomena and mostly MH optimization for FS problems. These methods’ inspiration sources are broken down into three types [8]: swarm-based algorithms, evolutionary-based algorithms, and physics-based algorithms. These methods all share two principles, exploration and exploitation [9]. In the first principle, the algorithm tries to find new regions in the search area. In the later phase, the algorithm looks around the obtained solution from the first phase to discover the best candidate.

Some of the frequently used MH techniques for FS include Particle Swarm Optimization (PSO) [10], Multi-Verse Optimizer (MVO) [11], Whale Optimization Algorithm (WOA) [12], Salp Swarm Algorithm (SSA) [13], Genetic Algorithm (GA) [14], Gray Wolf Optimizer (GWO) [15], AEO [16], DMOA [17], Snake Optimizer [18], Fick’s Law Algorithm (FLA) [19], and Jellyfish Search [20]. Moreover, MH algorithms can be combined to achieve better results for FS problems. Such examples include Simulated Annealing (SA) added to Harris Hawks Optimization (HHO) [21], Ant Lion Optimization (ALO) added to Sine Cosine Algorithm (SCA) [22], Bird Swarms (BS) added to Gorilla Troops Optimizer (GTO) [23], Reptile Search Algorithm (RSA) added to Ant Colony Optimization (ACO) [24], RSA added to Snake Optimizer (SO) [25], Evolutionary-Mean shift algorithm for dynamic multimodal function optimization [26], Group-based synchronous-asynchronous grey wolf optimizer [27] and many other [28,29,30]. With the advent and success of MH algorithms in solving the problem of FS, new and improved approaches are still needed to deal with this problem. MH methods solve multi-objective optimization problems by increasing classification accuracy with the smallest possible number of OFS. Several works explored and investigated MH methods to effectively search a given space to obtain the best global solutions [31,32,33,34,35].

The AEO method has great potential to choose OFS in several applications, such as triple diode photovoltaic [36], image segmentation [37], economic dispatch [38], and Agriculture Feeders [39]. However, it shows strong ability in the exploration stage and poor exploitation because of its stochastic nature [40]. On the other hand, DMOA is a recent MH method that shows a high capability in exploitation [17]. Therefore, an efficient FS-based approach, namely, AEO-DMOA, is presented, which merges the best strength of the AEO in exploration and DMOA in the exploitation phases toward targeting optimum solutions. The AEO is applied on half of the defined number of iterations to discover a better solution in the search space, while DMOA identifies the best candidate around the obtained optimal space in the remaining number of iterations. The contributions of this work are summarized as follows:

  • AEO-DMOA, a hybrid approach, is developed to provide better performance with a better equilibrium between search space’s exploration and exploitation for the problem of FS.

  • AEO-DMOA is evaluated on seven UCI datasets, twenty-eight test functions, eighteen CEC2017, and ten CEC2019 test functions

  • Developed AEO-DMOA applicability is compared with other competitive MH approaches.

The rest of this article is structured as follows: Sect. 2 briefly reviews AEO and DMOA, followed by a description of the interdicted AEO-DMOA in Sect. 3. Section 4 presents the experimental analysis and statistical comparison of the AEO-DMOA with other well-known MH methods on the tested datasets and studied test functions. Finally, Sect. 5 concludes this article.

2 Methods

2.1 Artificial Ecosystem-Based Optimization (AEO)

AEO is an MH method motivated by the natural ecosystem’s energy flow [16]. AEO uses three operators to achieve optimal solutions, as described below.

2.1.1 Production

In this operator, the producer represents the worst individual in the population. Thus, it must be updated concerning the best individual by considering the upper and lower boundaries of the given search space so that it can guide other individuals to search other regions. The operator generates a new individual between the best individual \({x}_{best}\) (based on fitness) and the randomly produced position of individuals in the search space \({x}_{rand}\) by replacing the previous one. This operator can be given as,

$${x}_{i}\left(t+1\right)= \left(1-\alpha \right){x}_{best}\left(t\right)+{\alpha x}_{rand}\left(t\right)$$
(1)
$$\alpha =\left(1-\frac{t}{T}\right){r}_{1}$$
(2)
$${x}_{rand}={r}_{2}\left(UB-LB\right)+LB$$
(3)

where \({x}_{rand}\left(t\right)\) guides the other individuals to explore search space in the subsequent iterations broadly, \({x}_{i}\left(t+1\right)\) leads the remaining individuals to exploit around intensively \({x}_{best}\left(t\right)\), \(\alpha \) is a linear weight coefficient that leads the randomly positioned individual to the best individual position \({x}_{best}\left(t\right)\) through the pre-defined maximum number of iterations T. Two random numbers \({r}_{1}\) and \({r}_{2}\) are sampled from a uniform distribution over [0, 1]. For a given search space, \(UB\) and \(LB\) represent the upper and the lower extreme values.

2.1.2 Consumption

This operator starts after the production operator and obtains food energy by eating a producer, a random low-energy consumer, or both. A Levy flight-like random walk, called Consumption Factor (CF), is employed to enhance exploration capability, and it is defined as follows:

$$CF=\frac{1}{2}\frac{{v}_{1}}{\left|{v}_{2}\right|}, {v}_{1},{v}_{2}\in N\left(\mathrm{0,1}\right)$$
(4)

where, \(N\left(\mathrm{0,1}\right)\) is a zero mean and unity deviation normal distribution.

Different types of consumers adopt different consumption behaviors to update their positions. These strategies include:

  • 1. Herbivore behavior: A herbivore consumer would eat only the producer and can be formulated as:

    $${x}_{i}\left(t+1\right)={x}_{i}\left(t\right)+CF.\left({x}_{i}\left(t\right)-{x}_{1}\left(t\right)\right), i\in \left[2,\dots P\right]$$
    (5)
  • 2. Carnivore behavior: A carnivore consumer would only eat another consumer with energy higher than itself. Mathematically, it can be modeled as follows:

    $${x}_{i}\left(t+1\right)\,=\,{x}_{i}\left(t\right)+CF.\left({x}_{i}\left(t\right)-{x}_{rand\in \left(0, 2i-1\right)}\left(t\right)\right), i\in \left[3,\dots P\right]$$
    (6)
  • 3. Omnivore behavior: An omnivore consumer can eat a random producer or a random producer with more energy than itself. This behavior can be presented as:

    $${x}_{i}\left(t+1\right)\,=\,{x}_{i}\left(t\right)+CF\left({r}_{2}{(x}_{i}\left(t\right)-{x}_{1}\left(t\right))\right)+(1-{r}_{2}){(x}_{i}\left(t\right)-{x}_{rand\in \left(0, 2i-1\right)}\left(t\right)),i\in \left[3,\dots P\right]$$
    (7)

2.1.3 Decomposition

In this final phase, the ecosystem agent dissolves. The decomposer breaks down the remains of dead individuals to provide the required growth nutrients for producers. The decomposition operator can be expressed as:

$${x}_{i}\left(t+1\right)={x}_{P}\left(t\right)+De(e{ . x}_{P}\left(t\right)-h.{x}_{rand\in \left(0, 2i-1\right)}\left(t\right)), i\in \left[1,\dots P\right]$$
(8)

Where \(De=3u u\in N\left(0, 1\right)\), \(e={r}_{3} . randi\left(\left[1, 2\right]\right)- 1,\) and \(h=2{r}_{3}-1\)where \(e\), \(h\), and \(De\), are weight coefficients designed to model decomposition behavior.

2.2 Dwarf Mongoose Optimization Algorithm (DMOA)

DMOA is another MH method introduced to simulate the dwarf mongoose's prey size limitation, social organization, semi-nomadic life, and others [17]. The DMOA begins with initializing a set of random candidate populations of mongooses between the maxima (UB) and minima (LB) of the given problem. The optimization process is represented in the following phases:

2.2.1 Alpha Group

When initializing the population, each population fitness probability can be computed by:

$$\alpha =\frac{{fit}_{i}}{{\sum }_{i=1}^{n}{fit}_{i}}$$
(9)

For \(bs\) number of babysitters, the family is maintained inside a path \(peep\) marked by \(\alpha \, - \,bs\) number of alpha female’s vocalization. The sleeping mound is initialized to \(\alpha \) when every mongoose sleeps. In order to generate a candidate food position, DMOA employs the following:

$${X}_{i+1}={x}_{i}+phi+peep$$
(10)

where, \(phi\) is sampled randomly from a uniform distribution over [0, 1].

After every iteration, the sleeping mound \(sm\) can be given as:

$${sm}_{i}=\frac{{fit}_{i+1}-{fit}_{i}}{max\lfloor {fit}_{i+1},{fit}_{i} \rfloor }$$
(11)

While the Average (Avg) value of the \(sm\) can be represented as:

$$Avgsm=\frac{{\sum }_{i=1}^{n}{sm}_{i}}{n}$$
(12)

The DMOA is then moved to the next phase, named the scouting phase. In this phase, the sleeping mound or the next food source is assessed after satisfying the babysitter exchange condition.

2.2.2 Scout Group

This phase looks at the next sleeping mound, where exploration is guaranteed as the mongooses do not go back to the previous sleeping mound. The overall performance of the Mongooses decides this movement. The motivation is that sufficiently large foraging will discover a new \(sm\). The scout mongoose can be presented as:

$${X}_{i+1}=\left\{\begin{array}{l}{X}_{i}-CF*phi*rand[{X}_{i}-\overrightarrow{M]}\,\, if {Avgsm}_{i+1}>{Avgsm}_{i}\\ {X}_{i}+CF*phi*rand[{X}_{i}-\overrightarrow{M]} \,\,otherwise \end{array}\right. $$
(13)

where, \(CF={\left(1-\frac{iter}{{max}_{iter}}\right)}^{2\frac{iter}{{max}_{iter}}}\) and \(\overrightarrow{M}={\sum }_{i=1}^{n}\frac{{X}_{i}*{sm}_{i}}{{X}_{i}}\) where, \({X}_{i}\) is a vector determining the movement of the mongoose to the new \(sm\), \(CF\) is a parameter controlling the group collective-volitive movement. \(CF\) is decreased linearly with the number of iterations. The \(rand\) is sampled randomly from a uniform distribution over [0, 1] and \(\overrightarrow{M}\) is a vector specifying the mongoose’s movement to the new \(sm\).

2.2.3 The Babysitters

The young mongooses are cared for by supporting members of the group, the babysitters. A regular rotation of babysitters allows the daily foraging of the rest of the group by the alpha female (mother). At midday and in the evening, the alpha female returns to suckle the young. The population size decides the babysitter count. The percentage of babysitter representatives simulates this group by shrinking the population, affecting the DMOA. The previously held food source and scouting information of the replacing members of the family is reset by the exchange parameter of the babysitter. In the next iteration, the average weight of the alpha group is reduced by setting the fitness weight of the babysitters as zero. It hinders the group movement and emphasizes exploitation.

3 Proposed Method

This section presents the developed AEO-DMOA for FS. The primary objective of the developed AEO-DMOA is to split the number of iterations into two halves and apply AEO in the first half to explore the entire search area for optimizing the search area boundaries, while DMOA in the second half to exploit the AEO-optimized search area for obtaining the best solution. This sequential implementation helps both methods by alleviating the chances of being trapped in local optima and maintaining an appropriate balance between exploration and exploitation during optimization.

Firstly, the hyper-parameters of AEO and DMOA, such as the maximum number of iterations (\(T\)) and the number of candidate solutions (\(N\)), are initialized. The upper (\(UB\)) and lower (\(LB\)) boundaries of each feature dimension of the given search space are calculated. All \(N\) candidate solutions are initialized uniformly in the range [− 1, 1] as described earlier in Eq. (1). The Fitness Value (FV) is calculated for each candidate solution using K-Nearest Neighbor (KNN) classifier with the 5-nearest neighbor with a Euclidean distance measure. The candidate solution with the smallest FV is stored as the globally best solution. The FV can be calculated as follows:

$$FV=\lambda \times (1-AC)+(1-\lambda )\times \frac{{SF}_{i}}{M}$$
(14)

where, \(\lambda \) is a weight that controls the importance of classification performance and fraction of selected features, \(AC\) is the accuracy of the KNN, \({SF}_{i}\) is the number of features selected by the candidate solution, and \(M\) is the dimensionality of the original dataset. The value \(\lambda \) ranges from 0 (no importance to classification accuracy) to 1 (no importance to feature selection) and is set as 0.99 in this work, as suggested in the literature. The SF is calculated by thresholding the current position of the candidate solution as follows:

$${SF}_{i}=\left\{\begin{array}{l}1 \,\,if {x}_{i}>0.5\\ 0 \,\,otherwise\end{array}\right.$$
(15)

It must be noted that the threshold of 0.5 used to select features is empirically selected. The exact value of the threshold used during the training does not affect the feature selection as MH algorithm adapts the positions of important features above the threshold. The extreme values for the threshold make it difficult for the MH algorithm to separate important features from the redundant ones. Hence, as the literature suggests, a threshold value 0.5 is used [24].

The optimization starts with the AEO method. At the first iteration, the Production phase of the AEO updates the worst candidate solution (i.e., one with the largest fitness value) with the best individual (i.e., one with the smallest fitness value) by considering \(UB\) and \(LB\). After this phase, the three phases of the AEO algorithm are implemented based on a random number (\(rand\)) in the range 0–1, as follows:

$$ If\,rand\, \ge \,\frac{2}{3},\,then\,Herbivore\,phase, $$
(16)
$$ else\,if\,\frac{1}{3}\, \le \,rand\, \le \frac{2}{3}\,then,\,Carnivore\,phase, $$
(17)
$$ else\,if\,rand\, \le \,\frac{1}{3},\,then\,Omnivore\,phase, $$
(18)

The process repeats until all candidate solutions are processed. In the end, the Decomposition Phase Eq. (8) of the AEO algorithm is implemented, breaking down the dead candidate solutions as growth nutrients for producers. Finally, FV is calculated for all candidate solutions at the end of the iteration. The latter is updated if any candidate solution has an FV smaller than the global-best solution. In the next iteration, the entire process is repeated until the number of iterations has reached half the maximum (\(T/2\)).

The algorithm switches from AEO to DMOA after \(T/2\) iterations. The \(T/2\) th iteration starts by updating the AEO-optimized candidate solution using the Alpha group as in Eqs. 9, 10 and increments the counter \(C\) (which at first execution of DMOA is 0). If counter \(C\) is less than the babysitter exchange parameter \(L,\) then calculate the average value of the sleeping mound \(AvgSum\) as in Eqs. 11, 12. If the average value of the sleeping mound of the previous iteration \(AvgSu{m}_{t-1}\) is less than the sleeping mound of the current iteration \(AvgSu{m}_{t}\) Then, the exploration phase of DMOA is implemented to update the candidate solutions, or the exploitation phase of DMOA is implemented. Finally, when maximum iterations are reached for DMOA, the optimum solutions’ fitness FV is calculated, and the global-best solution is updated. The process continues until the counter \(C\) reaches babysitter exchange parameter \(L\), when the babysitter group is updated, as in Eqs. 1011, and counter \(C\) is initialized to 0. The process continues by updating the global-best solution. Figure 1 provides the process flow of the AEO-DMOA.and its pseudocode is reported in Algorithm 1.

Fig. 1
figure 1

Flowchart of the developed AEO-DMOA approach

The optimization stops when reaching the defined number of \(T\). The global-best solution available at the end of all iterations is used as the optimum solution. As discussed earlier in Eq. 15, features with positions larger than 0.5 are added to the OFS. A classifier is trained for the OFS as input and desired class as the output. During the testing phase, OFS selects the salient features while others are rejected.

4 Experimental Results

The developed AEO-DMOA is applied to solve the problem of FS on seven UCI datasets, twenty-eight benchmarked test functions, eighteen CEC2017, and ten CEC2019. The results are compared with other MH methods and provided in this section.

figure a

4.1 Experimental Setup

The efficiency of the AEO-DMOA is investigated with other MH approaches PSO [10], MVO [11], WOA [12], SSA [13], AEO [16], and DMOA [17]. The parameter settings for these methods are defined as how they are implemented in the original work, which is shown in Table 1. The standard parameters in this study are selected empirically and are set: Population size = 20, \(T\) = 100, and each method is indecently run 20 times. The experiments are performed on a 3.13 GHz Windows 10 machine with 32 GB RAM and implemented using Python Scikit-learn.

Table 1 Parameter settings for different MH algorithms

4.2 Datasets Descriptions

The efficiency of the AEO-DMOA is validated using seven datasets, and their characteristics are given in Table 2. Six of seven datasets are designed for binary classification problems, while the seventh is designed as a multiclass classification problem.

Table 2 The datasets description

4.3 Experimental Results and Discussion

In this section, the results of the developed AEO-DMOA are presented. Several evaluation metrics are employed, including accuracy, OFS, and the best, worst, Avg, and standard deviation (SD) of the fitness values to examine the effectiveness of the AEO-DMOA. To provide a fair comparison, the Friedman ranking test is utilized. The accuracy results of the AEO-DMOA are provided in Table 3. According to this table, the developed AEO-DMOA gained the best outcomes in almost all the datasets except for the Churn dataset, while DMOA got the first rank.

Table 3 Comparative analysis using the accuracy of AEO-DMOA and other MH methods

The number of the selected OFS by AEO-DMOA and other MH methods are compared in Table 4. According to this table, AEO-DMOA picked the least number of features in five out of seven datasets. For the Churn dataset, the AEO selected the least number of features, and for the KrvskpEW dataset, DMOA selected the least number of features, while in both datasets, the AEO-DMOA ranked second. This analysis shows the capability of the developed AEO-DMOA to select salient features in the datasets while reducing the search area.

Table 4 Comparative analysis using average OFS of AEO-DMOA and the MH methods

Table 5 summarizes the comparative performance of AEO-DMOA and other MH methods in terms of Best, Worst, Avg, and SD of fitness values. For each dataset, ranks are assigned for each MH method based on the performance level of the fitness values. The results are prioritized for ranking based on minimum average, SD, best, and worst fitness values. As per Table 5, the AEO-DMOA gained the first rank in five out of seven datasets. DMOA showed the best fitness value statistics on the IonosphereEW dataset, followed by the AEO. The developed AEO-DMOA is ranked five out of seven, indicating better performance due to switching between the algorithms. For the KrvskpEW dataset, DMOA showed the best fitness value performance, followed by AEO and AEO-DMOA, and the WOA attained the best Avg and SD. PSO method got the SD in both Breastcancer and SpectEW datasets. Overall, the results prove the proposed developed AEO-DMOA’s ability to balance the exploration and exploitation phases.

Table 5 Comparative performance analysis of the fitness values from different MH methods

An MH method that reaches a very low fitness value in the smallest number of iterations performs best. The characteristic average convergence curves for 100 iterations by the introduced AEO-DMOA and other MH methods are shown in Fig. 2. The number of iterations is plotted horizontally, while fitness values averaged over 20 independent runs are plotted vertically. From this figure, the AEO-DMOA converges faster speed compared to others on five out of seven datasets. For IonosphereEW and KrvskpEW datasets, the proposed AEO-DMOA performed slightly inferior to AEO and DMOA.

Fig. 2
figure 2

Convergence analysis of the AEO-DMOA and other MH methods for a Breastcancer, b Churn, c IonosphereEW, d KrvskpEW, e SpectEW, f Vote, and g Zoo datasets

4.4 Benchmark Functions

The developed AEO-DMOA is also applied to solve common global optimization problems, using twenty-eight benchmark test functions, eighteen CEC2017 functions, and ten CEC2019 test functions. The AEO-DMOA is compared with several other methods, and the results are given in this section.

4.4.1 CEC2017

To assess the effectiveness of the AEO-DMOA approach, three groups of test functions with various features are used. Many authors widely use these functions in the literature to test the effectiveness of different optimization methods [41]. The functions \({f}_{1}\) to \({f}_{7}\), are called unimodal functions, which have a single extreme point in the search domain, as given in Table 6. Functions \({f}_{8}\) to \({f}_{13}\), are called multimodal functions, which have more than an extreme solution and \({f}_{14}\) to \({f}_{18}\) are multimodal functions with fixed dimensions. These functions, along with their details, are given in Table 6. Figures 3, 4, and 5 show the search spaces for unimodal, multimodal, and multimodal with fixed dimensions.

Table 6 List of CEC2017 benchmark functions
Fig. 3
figure 3

Search landscape of the CEC2017 unimodal functions

Fig. 4
figure 4

Search space of the CEC2017 multimodal functions

Fig. 5
figure 5

Search space of the CEC2017 multimodal functions with fixed-dimension

The statistical results from the eighteen functions are provided in Table 7. Each MH algorithm is ranked based on the minimum average fitness value followed by the minimum SD of the fitness values. The results show that AEO-DMOA acquires the best results against its competitors in five out of seven unimodal test functions, four out of six multimodal test functions, and four out of five multimodal test functions with fixed dimensions. WOA is the second best method after AEO-DMOA, followed by AEO, PSO, DMOA, MVO, and SSA.

Table 7 The results of different feature selection methods on CEC2017 functions (dimension = 30)

The AEO-DMOA ranked first in twelve out of all the tested functions using the Friedman rank test. WOA got the first rank in F2, F6, F9, and F10, while PSO is first in F16 and F17. The superior performance of the AEO-DMOA over the other used MH methods validates its priority as FS over others. The results also indicate that AEO-DMOA processes high exploitation and has a high capability in exploration. Moreover, statistical rank tests prove that AEO-DMOA is statistically significant improvement compared to other methods.

Figure 6 depicts the convergence behavior of the AEO-DMOA method and other MH algorithms. From these figures, the developed method has a smaller fitness value than the other methods for ten functions from CEC2017 comprising five functions from unimodal functions (F1, F3, F4, F5, and F7), two multimodal functions (F8 and F11), and three multimodal functions with fixed dimension (F14, F16, and F17). Considering all the tested functions, the developed method shows the best performance, followed by SSA, WOA, and DMOA and PSO is the worst. For F2, F6, F12, F13, and F15 functions, the developed AEO-DMOA gained the second-best solution.

Fig. 6
figure 6figure 6

Convergence curves of the AEO-DMOA and other methods using the tested CEC2017 functions (dimension = 30)

Table 8. The results of different feature selection methods on CEC2017 functions (dimension = 50).

Table 8 The results of different feature selection methods on CEC2017 functions (dimension = 50)

A similar performance analysis is carried out for all MH algorithms by increasing the dimension of CEC2017 functions to 50. The comparative analysis supports the earlier claim that the developed AEO-DMOA is the best feature selection among all. AEO shows the second-best performance, followed by WOA, PSO, SSA, MVO, and DMOA. It must be noted that increasing the dimensionality of the CEC functions has certainly decreased the DMOA performance, but the hybridization with AEO does not allow the performance to decrease for the developed AEO-DMOA. This proved the cooperative relationship between the two algorithms. Figure 7 shows the convergence plots for CE2017 with higher dimensions for all MH algorithms. CEC2017 function F16 showed the same average performance for all MH algorithms, as shown in the convergence plot, and hence ranking is based on SD.

Fig. 7
figure 7figure 7

Convergence curves of the AEO-DMOA and other methods using the tested CEC2017 functions (dimension = 50)

4.4.2 CEC2019

The robustness of the developed AEO-DMOA is verified using the CEC2019 test function [42]. CEC2019 has ten functions, each with dimensions and search range, as shown in Table 9.

Table 9 CEC2019 benchmark functions

The comparison between the AEO-DMOA and the other methods of CEC2019 functions is provided in Table 10. From this table, the AEO-DMOA got more performance than other MH methods on five out of ten tested functions, demonstrating its worthy performance. Like earlier CEC function evaluations, WOA and DMOA are second best, followed by MVO, AEO, SSA, and PSO. The developed AEO-DMOA comes in the first rank in F2, F3, F4, F7, and F8, WOA in the second rank in F5 F10, and PSO only first in F1, respectively SSA F6 and DMOA in F9., indicating that AEO-DMOA is significantly better than all other competing methods.

Table 10 Results of the AEO-DMOA using CEC2019 functions

To further test the performance of the developed AEO-DMOA method to find high-quality solutions using CEC2019 functions, convergence behavior is plotted, as shown in Fig. 8. It can be seen that the AEO-DOMA has the smallest fitness value of the other methods on seven functions, comprising F1, F2, F3, F4, F8, F9, and F10 functions. The developed method shows the worst performance for functions F5, F6, and F7. However, the overall ranking shows that the developed AEO-DMOA has the best performance, followed by AEO, WOA, and DMOA, while SSA shows the last performance on the CEC2019.

Fig. 8
figure 8

Convergence behaviour for CEC2019 functions

5 Discussion

One of the main goals of an efficient FS method is to specify the optimal number of required features for the machine learning task and prevent the selection of too many or too few features during the feature selection process. For instance, when too many features are selected in a feature selection method, the probability of selecting redundant and irrelevant features will increase; therefore, the prediction accuracy will decrease. On the other hand, when too few features are selected, they cannot represent all the information of original features [40]. However, the developed AEO-DMOA considered that the number of OFS in their fitness function showed better performance, and the number of final selected features by these methods were fewer.

Exploration of the search space and exploitation of the best solutions found are two conflicting objectives that must be taken into account when using MH methods. From the results provided above, AEO-DMOA demonstrated a better performance in balancing the factors of exploration and exploitation and better convergence speed.

Based on the previous results and discussion, the developed AEO-DMOA has a high ability to explore the feasible region which contains the optimal solution. However, the time complexity of AEO-DMOA still needs more improvements, when applied to handle high-dimensional data.

6 Conclusion and Future Work

The paper introduces an FS-based approach, named AEO-DMOA, based on a hybridization of AEO and DMOA methods to improve the capabilities in the exploration and exploitation phases. The optimization starts by dividing the defined number of iterations into two parts; in the first half, AEO while DMOA is employed on the remaining number of iterations. The efficiency of the AEO-DMOA is investigated using seven datasets collected from the UCI repository, and an extensive study is performed on twenty-eight test functions, eighteen CEC2017, and ten CEC2019. The simulation and statistical results show that AEO-DMOA is competitive with other well-known MH methods in terms of accuracy, the number of selected OFS, and fitness values. In addition, the AEO-DMOA provides reliable performance on high-dimensional functions. The developed AEO-DMOA method can be used in other applications such as renewable energy, signal processing, and big data. Also, it can be adopted for solving other complex optimization problems such as vehicle routing, timetabling, and engineering design.