1 Introduction

The introduction of technological developments and their widespread application in fields such as engineering, science, biology, medicine, and industry has produced and generated a vast amount of data [16]. This exponential growth has led to the creation of high-dimensional datasets with thousands of properties encompassing various types of information. However, many of these features are often characterized by noise, redundancy, or unnecessary information [19]. Consequently, these irrelevant features can potentially mislead the learning system or result in overfitting of the data [7].

Feature Selection (FS) or Dimensionality Reduction plays a crucial role as a preprocessing phase in improving the performance of learning algorithms, specifically in terms of classification accuracy. It is an effective technique that prioritizes features by filtering out irrelevant ones, thereby reducing computing costs and enhancing generalization ability [74]. FS is commonly categorized into two types: wrapper and filter methods [6, 17]. Wrapper methods involve evaluating subsets of features using a learning algorithm, such as measuring classification or regression accuracy. On the other hand, filter methods rely solely on the correlations between features during the evaluation process. There also exists a third category called embedded methods, which combines elements from both wrapper and filter techniques [57]. In embedded methods, classification models are trained on the dataset, and their findings are utilized to evaluate the correlations of each attribute. Wrapper methods tend to outperform filter methods in terms of classification accuracy since they assess the quality of feature subsets using feedback from the learning process. However, wrapper methods can be computationally expensive compared to filters, and their effectiveness depends on the chosen learning technique.

Another common challenge encountered in the design process of a FS algorithm is the search for an optimal subset of features. Traditional methods often fall short in finding robust feature subsets due to their limitations. Methods like breadth and depth search are not efficient when dealing with large datasets. For instance, if a dataset has X features, the number of possible feature subsets to be calculated is \(2^X\) [50], leading to significant computational complexity. In wrapper-based methods, where learning algorithms are applied to individual subsets, this complexity is further magnified. Moreover, FS is an NP-hard optimization problem since its objective is to minimize the selected features without compromising classification accuracy and error rate. To tackle this challenge, FS algorithms can be formulated as multiobjective optimization tasks to strike a balance between the two conflicting objectives. Alternatively, both objectives can be integrated into a single objective optimization problem, as demonstrated in the existing literature on FS [80]. To address these complexities, metaheuristic algorithms have emerged as suitable and powerful solutions for FS scenarios [52, 65]. Metaheuristic algorithms exhibit dynamic behavior and excel at global search, making them well-suited for tackling challenging optimization problems. Recent surveys have highlighted the significance and potential of metaheuristic algorithms in solving FS problems [64]. However, there is still ample research potential in this area, and further work is needed to achieve advancements and improvements.

In recent years, there has been a growing implementation of various metaheuristic techniques to address FS problems. Genetic Algorithm (GA) [73], Genetic Programming (GP) [55], and Differential Evolution (DE) [25] are among the initial algorithms used for feature extraction. Additionally, swarm-based optimization techniques such as Ant Colony Optimization (ACO) [21] and Artificial Bee Colony (ABC) [12] have been employed. Recently, several population-based techniques have emerged, including Arithmetic Optimization Algorithm (AOA) [4], Reptile Search Algorithm (RSA) [3], Harris Hawks Optimization (HHO) [28], Aquila Optimizer (AO) [5], Chameleon Swarm Algorithm (CSA) [51], Salp Swarm Algorithm (SSA) [77], Whale Optimization Algorithm (WOA) [29], and Henry Gas Solubility Optimization (HGSO) [57]. A significant concern with metaheuristic algorithms is finding the right balance between exploitation and exploration stages to avoid being trapped in local optima or failing to converge correctly. The random nature of the solution-finding process contributes to these issues. Furthermore, when applied to large datasets, these techniques may exhibit performance inadequacies compared to their performance on smaller datasets. To enhance performance, hybrid techniques and modifications to existing algorithms can be implemented. Researchers have presented hybridizations of multiple methods and search techniques [22, 49]. For example, a combination of Scatter Search Optimization (SSO) and Opposition Based Learning (OBL) has been proposed [30], and a hybrid method integrating Thermal Exchange Optimization (TEO) and Seagull Optimization Algorithm (SOA) has been developed [31]. Additionally, the combination of Tabu Search (TS) and Chemical Reaction Optimization (CRO) has been employed to solve FS problems [81]. These examples highlight techniques that have demonstrated improved classification accuracy and effective FS, as reported in the existing literature.

One of the recent population-based metaheuristic algorithms introduced by Yang et al. is called Hunger Games Search (HGS) [82]. Inspired by the cooperative behavior of animals and their hunting strategies, HGS has shown superior performance compared to many well-known optimization techniques [82]. This improvement can be attributed to the integration of multiple stochastic operators in the HGS algorithm, enabling it to navigate multi-modal search scenarios and avoid local solutions effectively. Extensive research in the literature demonstrates that HGS consistently delivers excellent results across various challenging optimization tasks. For example, it has been successful in finding optimal parameters for hybrid microgrid systems (HMG) that utilize localized renewable energy resources over individual AC or DC microgrids [69], predicting ground vibration intensity (BIGV) induced by mine blasting through optimal parameter selection for machine learning algorithms [58], and achieving optimal values in the modeling of proton exchange membrane fuel cells (PEMFCs) [23]. However, like other metaheuristic algorithms, HGS is not without its limitations and challenges. Common issues encountered in optimization problems include time-consuming convergence, the trap of local optima, and imbalance between exploration and exploitation [27]. Additionally, the performance of the algorithm is dependent on appropriate parameter selection. To address these drawbacks, researchers have explored various methods that have shown promising results in enhancing the performance of metaheuristic algorithms. Examples include the application of Orthogonal Learning (OL) [79] and the utilization of the Local Escaping Operator (LEO) [8]. These approaches have attracted significant attention from researchers in the development of metaheuristic algorithms.

Parkinson’s Disease (PD) is the second most prevalent neurodegenerative disorder, affecting approximately 3% of individuals over the age of 65, second only to Alzheimer’s disease [60]. PD is a progressive condition that exhibits symptoms that worsen over time. While there is currently no cure for PD, early detection plays a crucial role in providing appropriate therapy and delaying disease progression to a critical stage [35]. Diagnosis of PD typically involves an assessment by a neurologist and movement disorder specialist, who analyze the patient’s detailed medical history and conduct repetitive scans to study body movements such as gait analysis [63]. However, this process is time-consuming and uncomfortable, particularly for elderly patients who make up a significant portion of PD cases. Recent research suggests that vocal disorders, which manifest in approximately 90% of early-stage PD patients, can serve as a quantitative indicator for early detection of PD [1, 83]. Machine learning (ML) and deep learning (DL) models have been developed to predict PD based on voice data [14, 53, 62, 67]. These techniques leverage statistical data and aim to provide effective learning-based solutions for data-intensive problems. Various methods, including Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), Artificial Neural Networks (ANN), and Random Forest (RF), have been employed in these studies to diagnose and predict Parkinson’s disease. Furthermore, some studies have explored the performance of heuristic and metaheuristic algorithms, while others have investigated the accuracy of ML algorithms in diagnosing PD [24, 71]. However, ML algorithms often exhibit reduced performance when faced with high-dimensional feature sets, as is the case with many PD datasets [86]. The data collected from PD patients often has a high dimensionality, posing challenges for ML approaches in accurately recognizing the disease. Therefore, addressing the issue of high dimensionality is crucial for the accurate recognition of PD.

1.1 Motivation

In accordance with the No Free Lunch (NFL) principle [78], no single algorithm can solve all possible objective functions effectively. This means that an optimizer may perform well in one area of optimization but struggle in another. Researchers in scientific domains embrace this principle as a motivation to propose new and innovative algorithms. As mentioned earlier, the HGS algorithm, being a recently developed approach, has garnered attention. However, HGS is not immune to inefficiencies in the unbalanced exploitation and exploration stages, which can result in being trapped in local optima. To address these issues, the a modified HGS, mHGS, is proposed that incorporates five different strategies: (1) modified production operator (mPO), (2) modified variation control (mVC), (3) modified Local Escaping Operator (mLEO), (4) modified Cooperative Communication for Foraging Behavior, and (5) modified Transition Factor (mTF). The first two techniques aim to enhance exploration abilities, while the third and fourth operators focus on improving exploitation. The final strategy is employed to strike a balance between the exploration and exploitation phases. By employing these strategies, mHGS aims to overcome the limitations posed by unbalanced exploration and exploitation, thus improving its overall performance in optimization tasks.

In this paper, we utilize the mHGS Algorithm to address the problem of Parkinson’s phonation. Speech and voice disorders are prevalent in 90% of Parkinson’s disease patients, often accompanied by swallowing difficulties [75]. These axial symptoms have a negative impact on patients’ social communication, quality of life, overall disability level, and treatment costs. Detecting speech changes at an early or even pre-symptomatic stage of Parkinson’s disease provides valuable biomarkers for neurodegeneration. Therefore, the recognition of speech disorders in Parkinson’s disease research is of great significance, extending beyond the assessment of a single point on a neurological scale, such as UPDRS [36]. Speech is a highly complex motor skill that relies on the coordinated functioning of approximately 100 different muscles. Consequently, speech is exceptionally sensitive to brain diseases and lesions, with speech changes often being the sole manifestation of underlying neurological pathologies. Moreover, specific patterns of speech disorders can provide essential clues for accurate assessment of the disease process and its pathophysiology [9]. This paper makes the following contributions:

  • Introducing an enhanced technique to address the dimensionality reduction problem using the Hunger Games Search (HGS) Algorithm and the local escaping operator.

  • Developing the mHGS approach as a wrapper FS algorithm for both large and small dimensionality reduction tasks, employing 18 different UCI datasets.

  • Comparing the performance of mHGS with well-known and popular swarm intelligence algorithms, including the Gradient-based optimizer (GBO) [8], Grasshopper Optimization Algorithm (GOA) [46], Gray Wolf Optimizer (GWO) [44], Whale Optimization Algorithm (WOA) [43], Salp Swarm Algorithm (SSA) [45], Harris Hawks Optimizer (HHO) [26], Ant Lion Optimizer (ALO) [41], and the conventional HGS algorithm.

  • Applying the mHGS method to address Parkinson’s disease phonation across both large and small datasets.

The paper is organized as follows: Sect. 2 provides an overview of previous works on metaheuristics for FS. Section 3 presents the foundational concepts of the Hunger Games Search (HGS) algorithm, including the Production Operator, Gradient Search Operator, and Local Search Operator. Sections 4.1 and 5 introduce the proposed HGS (mHGS) algorithm and its adapted variant for solving FS problems, respectively. Section 6 presents the experimental results obtained from the conducted experiments. Finally, Sect. 7 concludes the paper.

2 Literature review

In this section, we provide a detailed discussion of some of the recent algorithms that have been utilized for solving FS techniques. We delve into the specifics of these algorithms and explore their strengths and limitations in the context of FS.

2.1 Related work on feature selection

In 1995, Kennedy and Eberhart introduced Particle Swarm Optimization (PSO), an efficient and evolutionary algorithm inspired by swarm intelligence observed in birds and fish [33]. PSO has gained significant attention in recent years for solving feature reduction and selection problems. The algorithm offers advantages such as simplicity and fast convergence speed. However, it also has limitations related to population diversity and local optima. To address these limitations, researchers have explored the integration of PSO with other algorithms to enhance its performance in FS problems. For instance, Moradi and Gholampour [48] proposed a hybrid method called HPSO-LS, which combines PSO with a local search strategy. This framework incorporates a local search strategy within the PSO to select a feature set with low correlation and high importance. The objective of the local search strategy is to guide the PSO search process toward identifying salient features by considering their correlation information. The efficacy of the proposed technique was demonstrated through experiments conducted on 13 multi-dimensional classification datasets. The integration of PSO with a local search strategy in the HPSO-LS framework showcases the potential of hybrid approaches to enhance the performance of PSO in FS problems. By utilizing correlation information and leveraging the strengths of both PSO and the local search strategy, the proposed method offers improved FS capabilities.

In another PSO study, Mistry et al. [47] used micro-GA with PSO for robust FS to recognize face emotions. A micro-GA and Gaussian mutation in the velocity update equation solved the PSO algorithm’s premature convergence problem. PSO exploration and local optima avoidance were the goals. The researchers tested their suggested FS method for facial emotion recognition against GA and PSO-based methods. Their strategy outperformed established methods in picking characteristics for facial emotion identification tests. This work proposes combining micro-GA with PSO to improve facial emotion identification FS. These two algorithms solve premature convergence and increase exploration capacity, improving performance over GA and PSO-based FS approaches. Recently, a hybrid PSO with Spiral-Shaped Mechanism (HPSO-SSM) was introduced by Chen et al. [20] for FS. A robust feature subset for recognition tasks was selected using their wrapper-based method. The HPSO-SSM algorithm was improved in three ways by the researchers. The search process was enhanced by adding a logistic map sequence, which increased diversity and exploration. Optimization of more solutions was ensured by this addition. Two parameters were added to the position update formula, which facilitated faster convergence and the discovery of the optimal solution. Furthermore, the algorithm’s performance was improved by incorporating a spiral-shaped local search operator, which refined the search in the vicinity of the known best solution area. A comparison was made between HPSO-SSM, wrapper, and filter-based methods using a kNN classifier and 20 well-known datasets. The analyses demonstrated that the proposed strategy successfully selected a robust feature subset and yielded enhanced classification performance compared to previous FS methods.

Artificial Bee Colony (ABC) algorithm is based on the intelligent behavior exhibited by bee populations during food search, mimicking their collective foraging behavior [32]. It is commonly employed for solving optimization problems. In the literature, various hybrid methods based on ABC have been introduced to tackle FS problems. Zorarpacı and Ozel (2016) [87] proposed a hybrid algorithm that combines the ABC algorithm with differential evolution (DE) algorithm, creating a wrapper FS method. Before the fusion of these two methods, modifications were made to the observer bees in ABC and the mutation operator in the DE algorithm. These alterations aimed to address the slow convergence rate in ABC and facilitate binary transformation using a binary operator in DE. The proposed hybrid algorithm was employed with decision tree and J48 algorithms, and experiments conducted on 15 datasets from the UCI repository demonstrated reduced computational time and improved results after FS. Similarly, Mendiratta et al. [40] attempted to solve FS by combining the ABC algorithm with the PSO for automatic speech recognition signals. The main objective of their work was to enhance the exploration of the ABC algorithm by incorporating the PSO search strategy after the tasks performed by the employed and observer bees. The proposed hybrid model, employed in a wrapper-based approach with SVM, achieved an average accuracy of 92% on three speech datasets. Shunmugapriya and Kanmani [72] utilized ant and bee colony optimization algorithms to develop a hybrid technique for FS. The proposed technique improved the performance of employed bees in ABC and addressed stagnation issues in the ACO algorithm. In this hybrid approach, ABC extracted the exploration capability from the ACO algorithm, while ACO utilized the exploitation mechanism of ABC. Initially, ACO generated a set of solutions that were subsequently utilized by the working bees for directed search in ABC. This process allowed ACO to exploit only feasible solutions during the new iteration, eliminating redundant solutions. Thus, the hybrid technique effectively harnessed the strengths of both algorithms to achieve improved performance. For the classification phase, decision tree and J48 algorithms were implemented, and promising results were obtained on datasets ranging from 4 to 120 features in length.

In various studies, several hybrid techniques based on the Whale Optimization Algorithm (WOA) have been developed for feature reduction. For instance, Mafarja et al. [37] hybridized WOA with the Gray Wolf Optimizer (GWO) for FS in conjunction with the k-NN classifier. The researchers aimed to address the limitations of both methods, such as immature stagnation and convergence to local optima. They observed that the exploration scheme of GWO was inefficient and sought to improve it using WOA. The proposed hybrid method was evaluated on 18 well-known datasets obtained from the UCI repository, and the researchers demonstrated that their approach yielded better results compared to using GWO and WOA individually. In another study [38], Mafarja and Mirjalili focused on enhancing the exploitation aspect of Simulated Annealing (SA) without integrating it with WOA. They incorporated SA into WOA using two different strategies. The first strategy involved performing a local search by combining SA into WOA as a module within each iteration. The second strategy involved applying SA to search for the best solution after WOA completed its search. The proposed model was tested on 18 datasets for classification tasks, with a maximum feature size of 325. Results were collected using the k-NN classifier for feature assessment, and the researchers concluded that their approach outperformed other optimization techniques.

Butterfly Optimization Algorithm (BOA) replicates butterfly mating and food hunting behavior [11]. Arora and Anand proposed the S-shaped binary Butterfly Optimization Algorithm (S-bBOA) for FS in classification tasks in a study [10]. A two-objective fitness function improved recognition accuracy and reduced FS in the S-bBOA. Sigmoid functions maintained binary space solutions. S-bBOA’s adaptive technique increased convergence across iterations and balanced exploration and exploitation, resulting in more accurate optimal solution estimates. S-bBOA outperformed other optimization approaches, however it did not address feature relevancy, redundancy, and class label in FS. Sadeghian et al. [66] introduced the Information Gain binary BOA (IG-bBOA) to improve recognition performance and class label-feature understanding. IG-bBOA also reduced specified features. The three-phase Ensemble Information Theory-based binary Butterfly Optimization Algorithm (EIT-bBOA) was used. A recent BOA study [84] makes several improvements. Contributions included: (1) using Differential Evolution (DE), (2) adding a greedy mechanism for feature addition or elimination based on significance, (3) adding butterfly fragrance to the transfer function to increase flexibility, and (4) using Evolution Population Dynamics (EPD). These improvements reduced BOA initialization and local search activity unpredictability, balancing exploitation and exploration throughout the search process. Optimization and Extension of binary BOA (OEbBOA) increased BOA convergence speed and global search potential.

Dragonfly Algorithm (DA) [42] is a novel methodology developed to address optimization problems, inspired by the swarming behavior of dragonflies in nature. Mafarja [39] introduced a binary variant of the algorithm, called Binary DA (BDA), specifically designed for solving discrete problems in FS tasks. While BDA demonstrated good performance in various benchmark tests, it may encounter challenges related to local optima due to its strong exploitation. This can make it difficult for BDA to identify the global best solution. To overcome these limitations and enhance the performance of BDA, different techniques have been employed. In a study by Too and Mirjalili [76], they proposed the Hyperlearning Binary DA (HLBDA) for FS tasks. Their approach was tested on various FS datasets, including the health prediction dataset of COVID-19 patients. The results demonstrated the effectiveness of the proposed model in optimization tasks. Another FS approach based on the DA was introduced by Chantar et al. [18]. In their work, they combined the DA with Simulated Annealing (SA) to select robust features and improve classification. SA was applied to the optimal solution found by BDA. The performance of the proposed technique was evaluated on commonly used datasets obtained from the UCI repository. The combination of both algorithms led to improved recognition accuracy compared to other approaches.

In addition to the previously stated hybrid FS approaches, there are many other hybridizations of the different methods proposed for robust FS that benefit from different algorithms’ advantages and strengths; for example, the synthesis of opposition-based learning (OBL) and social spider optimization (SSO) [30], thermal exchange optimization (TEO) and seagull optimization algorithm (SOA) [31], tabu search (TS) and chemical reaction optimization (CRO) [81], harmony search (HS) with mayfly algorithm (MA) [15], Salp Swarm Algorithm (SSA) and local search (LS) algorithm citeptubishat2021dynamic, and (HHO) and SA & Harris hawks optimization [2].

2.2 Related work on Parkinson’s disease

There have been numerous studies focused on finding the optimal subset of features for accurate classification of Parkinson’s disease using various machine learning (ML) and deep learning (DL) methods. These research studies aim to monitor or diagnose the disease by identifying key factors contributing to its pathology. Sharma et al. [70] proposed the Antlion Optimization (ALO) algorithm for Parkinson’s disease recognition. They utilized the ALO method to reduce the feature set and achieved a maximum recognition rate of 95.91% by feeding the optimized subset to decision tree, random forest classifiers, and KNN. In another study [71], the authors developed a system for Parkinson’s disease recognition using Parkinson’s voice, Parkinson’s speech, and HandPD datasets obtained from the UCI repository. They modified the classical Gray Wolf Optimization (GWO) algorithm for feature reduction and employed random forest, KNN, and decision tree classifiers for classification. The maximum classification accuracy achieved on the speech dataset was 93.87%. Olivares et al. [61] proposed an optimized model for Parkinson’s disease diagnosis using an enhanced version of the BAT algorithm. They selected only 23 features from the UCI Parkinson’s disease classification dataset and fed these features into the input layer of a model with 23 neurons. The proposed technique achieved a maximum classification accuracy of 96.74%, with a 3.27% loss.

Naranjo et al. [56] proposed an expert method for Parkinson’s disease recognition using voice analysis. They recorded the voices of 80 individuals, including Parkinson’s patients, and extracted 44 features from five different classes using waveform matching algorithms. The resulting dataset consisted of 240 rows and 44 columns. In another study by Lahmiri and Shmuel [34], voice disorder patterns were utilized for diagnosing Parkinson’s disease. The authors applied eight different pattern ranking techniques to select features and employed a nonlinear Support Vector Machine (SVM) for classification. Using the Wilcoxon statistic feature rank technique, they achieved a recognition rate of 92.21%. Nissar et al. [59] developed a Parkinson’s disease recognition model based on voice signals. They performed FS using Recursive Feature Elimination (RFE) and minimum Redundancy Maximum Relevance (mRMR) approaches. The recognition task was conducted using eight classifiers, and the integration of XGBoost with RFE yielded the highest accuracy of 95.39%. In another study [54], researchers proposed a diagnostic method for Parkinson’s disease by analyzing the voice condition of patients. They introduced a novel FS technique that considered multiple feature evaluations. The performance of the proposed model was evaluated using five different classifiers, and a maximum accuracy of 99.49% was achieved using a random forest classifier.

3 Background

3.1 Hunger games search algorithm (HGS)

HGS, inspired by the hunger-driven behaviors and preferences of animals, is rooted in the fundamental concept of “Hunger” as a key motivator for animals’ behaviors, decisions, and activities [82]. In the HGS algorithm, the notion of hunger is incorporated into the search process by assigning adjustable weights to each search phase, simulating the impact of hunger. This approach aligns with the logical reasoning and adaptive strategies observed in various species, which are crucial for survival and successful acquisition of food resources. By incorporating the concept of hunger, HGS enhances the exploration and exploitation capabilities of the algorithm, enabling it to navigate solution spaces and improve overall performance efficiently.

The primary equation of the HGS algorithm for individual cooperative communication and foraging behavior is as follows:

$$\begin{aligned} {S_i^{t+1}}=\left\{ \begin{array}{c} \text{ Strategy}_{1}: {S_i^t} \cdot (1+{\text {randn}}(1)), r_{1}<l \\ \text{ Strategy}_{2}: {\overrightarrow{\omega _{1}}} \cdot {S_{b}}+\overrightarrow{R} \cdot {\overrightarrow{\omega _{2}}} \cdot \left| S_{b}-{S_i^t}\right| , r_{1}>l, r_{2}>E \\ {\text {Strategy}}_{3}: {\overrightarrow{\omega _{1}}} \cdot S_{b}-\overrightarrow{R} \cdot {\overrightarrow{\omega _{2}}} \cdot \left| S_{b}-{S_i^t}\right| , r_{1}>l, r_{2}<E \end{array}\right. \end{aligned}$$
(1)

where \(\vec {R}\) is a random parameter in the interval of \([-a, a]\); \(r_{1} \& r_{2}\), respectively, refer to numbers generated randomly taken in the range of \([0,1]; randn(1)\) indicates a number generated randomly satisfying normal distribution; t refer to the current iteration; \(\overrightarrow{S_{b}}\) refers to the information of location of the best agent in all the optimal agents; \(\overrightarrow{S_i^t}\) refers to location of each individual; l is the parameter and \(\overrightarrow{\omega _{1}}\) & \(\overrightarrow{\omega _{2}}\) refers to hunger weights and can be calculated as:

$$\begin{aligned}{} & {} \overrightarrow{\omega _{1}}=\left\{ \begin{array}{l} {H}_{{i}} \times \frac{{N}}{{SH}} \times {r}_{4}, r_{3}<{l} \\ 1, \quad r_{3}>{l} \end{array}\right. \end{aligned}$$
(2)
$$\begin{aligned}{} & {} {\overrightarrow{\omega _{2}}}=(1-exp (-\mid {H}_i- {SH} \mid )) \times r_{5} \times 2 \end{aligned}$$
(3)

where N refers to population size, \(r_{3}, r_{4}\, \& \,r_{5}\) are random numbers \(\in [0, 1]\), SH refers to the sum of all individual hungry feelings, and \(H_{i}\) refers to i-th individual which can be calculated from the following equation

$$\begin{aligned} H_{i}=\left\{ \begin{array}{l} 0, \quad {Fit}_{i}={Fit}_{{b}} \\ {H}_{{i}}+{H}_{\rm new}, {\text {otherwise}} \end{array}\right. \end{aligned}$$
(4)

where \(Fit_{i}\) indicates the the individual fitness number i. In every generation, the best agent hunger value is set to 0. \(H_{\rm new}\) formula can be calculated as follows:

$$\begin{aligned}{} & {} TH=\frac{Fit_i-Fit_b}{Fit_{\rm worst}-Fit_b} \times r_{6} \times 2 \times (UB - LB) \end{aligned}$$
(5)
$$\begin{aligned}{} & {} H_{\rm new}=\left\{ \begin{array}{l} L H \times (1+r), \quad T H<L H \\ T H, {\text {otherwise}} \end{array}\right. \end{aligned}$$
(6)

where \(Fit_i\) refers to i-th individual fitness, \(Fit_b\) refers to the best individual fitness, \(Fit_{\rm worst}\) refers to the worst individual fitness, \(r_{6}\) is a random number \(\in [0, 1]\), UB and LB refers upper and lower boundaries, respectively.

The E formula is calculated as follows:

$$\begin{aligned} E={Sech}(|Fit_i-Fit_b|) \end{aligned}$$
(7)

where \(i=1,2,\ldots ,n\), \(Fit_i\) refers to the each individual fitness value; and \(Fit_b\) is the best fitness obtained in the current iteration process, and Sech is a hyperbolic function \((Sech (x)=\frac{2}{e^{x}+e^{-x}})\). The procedure of HGS algorithm is described in Algorithm 1.

$$\begin{aligned}{} & {} \vec {R}=2 \times {Shrink} \times { rand }-{Shrink} \end{aligned}$$
(8)
$$\begin{aligned}{} & {} {Shrink}=2 \times \left( 1-\frac{t}{t_{\rm max}}\right) \end{aligned}$$
(9)

where rand is a number generated randomly in the interval of \([0,1]\); and \(t_{\rm max}\) refers to the largest iteration number.

figure a

3.2 The production operator (PO)

The worst solution \((S_{\rm worst}^{t})\) refers to the solution with the lowest fitness value at iteration t in the AEO algorithm [85]. It is known as the producer within the algorithm. The updating process of the solution is guided by the search space limits and the best solution \((S_b^{t})\), which is referred to as the decomposer. This update influences other agents in the population, such as herbivores and omnivores, to explore different locations [13]. The production operator in AEO generates a new individual by combining the best individual \((S_b^{t})\) with an arbitrary individual in the search space \((S_{\rm rand}^t)\). This operator aims to strike a balance between exploration and exploitation. The interaction between the best and random solutions leads to the formation of a new solution, as represented by Eq. (10).

$$\begin{aligned} S^{t+1}_{\rm worst}=(1-\alpha ) S_{b}^t+\alpha S^t_{ {\rm rand }} \end{aligned}$$
(10)

where \(S_{\rm rand }^t=LB+{\text {rand}}(0,1) \times (UB-LB)\),

\(\alpha =\left( 1-\frac{{ t }}{ { t_{\rm max}}}\right) \cdot r_{1}\) and \(r_{1} \in [0, 1]\).

3.3 Gradient search operator (GSO)

GSO is an operator initially introduced in the Gradient-based Optimizer (GBO) algorithm. It is designed to introduce stochastic behavior into the optimization process, aiding in the exploration of the search space and avoiding getting trapped in local optima. In addition to GSO, the Direction Movement (DM) is integrated into the GBO algorithm. DM is utilized to guide a suitable local search trend, thereby enhancing the convergence speed of the GBO algorithm.

figure b

To update the position of the current vector, the following equations are used as shown in algorithm 2.

$$\begin{aligned}{} & {} S 1_{i}^{t}=S_{i}^{t}-GSO+{rand} \times \lambda _{2} \times \left( S_{{b }}-S_{i}^{t}\right) \end{aligned}$$
(11)
$$\begin{aligned}{} & {} S 2_{i}^{t}=S_{ {b }}-GSO+{rand} \times \lambda _{2} \times \left( S_{r 1}^{t}-S_{r 2}^{t}\right) \end{aligned}$$
(12)
$$\begin{aligned}{} & {} S 3_{i}^{t}=S_{i}^{t}-\lambda _{1} \times \left( S 2_{i}^{t}-S 1_{i}^{t}\right) \end{aligned}$$
(13)

The new solution \(S_{i}^{t+1}\) at the following iteration \((t+1)\) can be defined by using the coordinates of \((S1_{i}^{t})\), \((S2_{i}^{t})\), \((S3_{i}^{t})\), and the current position \((S_{i}^{t})\) which can be calculated as follows:

$$\begin{aligned} S_{i}^{t+1}=r_{a} \times \left( r_{b} \times S 1_{i}^{t}+\left( 1-r_{b}\right) \times S2_{ i}^{t}\right) +\left( 1-r_{a}\right) \times S 3_{i}^{t} \end{aligned}$$
(14)

3.4 Local escaping operator (LEO)

LEO is the second operator introduced in the GBO by [8]. LEO serves as a local search mechanism that enhances the GBO’s capability to explore various locations in real-world problems. By updating the positions of solutions, the LEO operator improves the convergence behavior of the algorithm. It enables the avoidance of local optima trapping and accelerates the convergence process.

LEO creates high-performing potential solutions (\(S _{LEO}\)) by combining multiple techniques, such as the optimal position \(S_{b}\), the solutions \(S1^t_{i}\) and \(S2^t_{i}\), two solutions generated at random \(S^t_{r1}\) and \(S^t_{r2}\), as well as a newly produced solution \(S^t_{k}\). As a result, Eq. (15) can be used to obtain the solution \(S_{LEO}\). This can be expressed numerically as:

$$\begin{aligned} S_{ {i }}^{t+1}=\left\{ \begin{array}{cc} S_{i}^{t}+k_{1} \times \left( \nu _{1} \times S_{ {b }}-\nu _{2} \times S_{k}^{t}\right) +k_{2} \times \lambda _{1} \times \left( \nu _{3} \times \left( S_{i}^{t}-S 1_{i}^{t}\right) +\right. &{} \text{ if } \text{ rand } <0.5 \\ \left. \nu _{2} \times \left( S_{r 1}^{t}-S_{r 2}^{t}\right) \right) / 2 &{} \\ S_{b}^{t}+k_{1} \times \left( \nu _{1} \times S_{ {b }}-\nu _{2} \times S_{k}^{t}\right) +k_{2} \times \lambda _{1} \times \left( \nu _{3} \times \left( S2_{i}^{t}-S 1_{i}^{t}\right) +\right. &{} \text{ otherwise } \\ \left. \nu _{2} \times \left( S_{r 1}^{t}-S_{r 2}^{t}\right) \right) / 2 &{} \\ \end{array}\right. \end{aligned}$$
(15)

where: \(k_1\) is a random number between \(-1\) and 1, \(k_2\) is a random number distributed normally, and \(\nu _1\), \(\nu _2\) & \(\nu _3\) are three randomly generated numbers specified as follows:

$$\begin{aligned} \begin{aligned}&\nu _{1}=F_{1} \times 2 \times { rand }+\left( 1-F_{1}\right) \\&\nu _{2}=F_{1} \times { rand }+\left( 1-F_{1}\right) \\&\nu _{3}=F_{1} \times { rand }+\left( 1-F_{1}\right) \end{aligned} \end{aligned}$$
(16)

We note that \(F_1\) is a binary parameter (\(F_1\) \(=1\) if \(\zeta _1<0.5\), and 0 otherwise) and \(0\le \zeta _1\le 1\). The parameter \(\lambda _1\) ensures the balance between the exploitation and exploration processes, which is controlled by sine function \(\phi\) and can be calculated from the following equation:

$$\begin{aligned} \begin{aligned}&\lambda _{1}=2 \times {rand} \times \phi -\phi \\&\phi =\left| \theta \times \sin \left( \frac{3 \pi }{2}+\sin \left( \theta \times \frac{3 \pi }{2}\right) \right) \right| \\&\theta =\theta _{\min }+\left( \theta _{\max }-\theta _{\min }\right) \times \left( 1-\left( \frac{t}{t_{\max }}\right) ^{3}\right) ^{2} \end{aligned} \end{aligned}$$
(17)

where \([\theta _{\rm min};\theta _{\rm max}]\) equals [0.2; 1.2], t is the current iteration and \(t_{\rm max}\) is the maximum iteration number.\(S_k^t\) is computed using Eq. (18).

$$\begin{aligned} S_{k}^{{t}}= {\left\{ \begin{array}{ll}S_{{\rm rand }} &{} \text { if } \zeta _{2}<0.5 \\ S_{p}^{\text{ t } } &{} \text{ otherwise }\end{array}\right. } \end{aligned}$$
(18)

Where \(S_{{\rm rand }}=LB+{rand}(0,1) \times (UB-LB)\) and \(S_p^t\) is a random solution from the population \(p\in [1, \ldots ., n]\)

4 Modified HGS (Proposed method)

4.1 Architecture of mHGS

In this section, the modified HGS (mHGS) algorithm will be discussed in detail, with a focus on improving its exploitation capability and accelerating the exploration process. Furthermore, strategies will be explored to prevent the algorithm from being locked in locally optimal solutions or encountering premature convergence. These enhancements are crucial for ensuring that the global optimal solution can be effectively searched for and that suboptimal solutions are avoided by the mHGS algorithm. The modified version of HGS incorporates five key components, which will be elaborated upon:

  • Modified production operator (mPO).

  • Modified variation control (mVC).

  • Modified Local Escaping Operator (mLEO).

  • Modified Transition Factor (mTF).

  • Modified Cooperative communication and foraging behavior.

  1. 1.

    Modified production operator: To augment the exploration stage of the HGS algorithm, the AEO production operator was incorporated, as explained in Sect. 3.2. Furthermore, modifications were made to the original AEO operator. The original operator involved replacing the worst individual with a combination of the best solution and a random solution. In contrast, the modified production operator updates certain search agents based on a random rate. This operator significantly enhances the exploration phase and helps prevent stagnation in local optima. It achieves this by comparing the original solution with the generated solution and selecting the better of the two. The mathematical model for this phase is defined as follows:

    $$\begin{aligned} S_i^{t+1}=\left\{ \begin{array}{c} (1-a) S_i^{t}+a S_{ {\rm rand }} \quad { rand }<0.6 \\ S_i^{t} \quad { \text {otherwise} } \end{array}\right. \end{aligned}$$
    (19)

    where a is a random number between 0 and 1, and \(X_{{\rm rand }}=LB+{rand}(0,1) \times (UB-LB)\).

  2. 2.

    Modified variation control: E stands for variation control for all locations, as defined in Eq. (7). According to the hyperbolic function effect, most values are around zero, which increases the probability of ignoring Strategy 3 to explore the search space area. In order to overcome this problem, a new variation control is developed by Eq. 20.

    $$\begin{aligned} E= 1-\frac{Fit_i}{Fit_{b}} \end{aligned}$$
    (20)

    This new parameter represents a ratio between the best fitness \(Fit_b\) and the fitness of the current solution \(Fit_i\). It allows to increase the probability of the search using \(Strategy_1\) and \(Strategy_3\).

  3. 3.

    Modified Local Escaping Operator (mLEO): The exploitation process is significantly improved by the LEO, which is derived from the GBO algorithm. In the basic LEO, the agents update their positions using the parameter \(\delta\), which is defined as:

    $$\begin{aligned} \delta =2 \times {\text {rand}} \times \left( \frac{S_{r 1}^{t}+S_{r 2}^{t}+S_{r 3}^{t}+S_{r 4}^{t}}{4}-S_{i}^{t} \mid \right) \end{aligned}$$
    (21)

    By analyzing the previous formula (Eq. 21), it can be observed that \(\delta\) is calculated as the average of four randomly selected solutions along with the current solution. To further enhance the exploitation capability, it is more effective to utilize the nearest best solutions. Therefore, the parameter \(\delta\) is modified according to Eq. (22), where the four random agents are replaced with the four best agents.

    $$\begin{aligned} \delta =2 \times \text{ rand } \times \left( \left| \frac{S_{{b } 1}+S_{ {b } 2}+S_{ {b }3}+S_{ {b } 4}}{4}-S_{i}^{t}\right| \right) \end{aligned}$$
    (22)
  4. 4.

    Modified Transition Factor (mTF): In the original HGS, cooperative communication and foraging behavior played a crucial role in this meta-heuristic algorithm. The original HGS employed three strategies to search for the optimal solution, which were controlled by two variation controls (E) and a fixed random number (l) set to 0.03. Our proposed enhancement involves updating (E) using the Shrink parameter, as defined by Eq. (9), given as follows:

    $$\begin{aligned} {Shrink}=2 \times \left( 1-\frac{t}{t_{\rm max}}\right) \end{aligned}$$
    (23)

    In the original HGS, the variation control parameter (E) was solely based on fitness. However, in our proposed enhancement, we introduce the (mTF(Shrink)) parameter, which takes into account the running time of the algorithm. This addition allows us to achieve a better balance between the exploration and exploitation phases in the optimization process.

  5. 5.

    Modified Cooperative communication and foraging behavior: As stated in Eq. (1), the initial version of HGS deployed three strategies: Strategy1, Strategy2, and Strategy3. By examining this formula, we can observe that the solution changes its position in relation to the population’s best option for both Strategy 2 and Strategy 3. In Strategy 1, the agent moves randomly, resulting in a fall into local optima. In our proposed method (mHGS), a new enhancement is realized by removing the first strategy (Strategy 1) from the central equation. Also, a novel transition factor (mTF(Shrink)) is integrated to ensure a good balance between exploration/exploitation process. This enables seamless transitions between Strategies 2 and 3, where shrink value is proportional to iterations time rather than fitness.

    $$\begin{aligned} {S_i^{t+1}}=\left\{ \begin{array}{c} \text{ Strategy } _{2}: {\overrightarrow{\omega _{1}}} \cdot {S_{b}}+\overrightarrow{R} \cdot {\overrightarrow{\omega _{2}}} \cdot \left| S_{b}-{S_i^t)}\right| , Shrink>rand \\ {\text {Strategy}}_{3}: {\overrightarrow{\omega _{1}}} \cdot S_{b}-\overrightarrow{R} \cdot {\overrightarrow{^\omega _{2}}} \cdot \left| S_{b}-{S_i^t}\right| , \text {Otherwise} \end{array}\right. \end{aligned}$$
    (24)

4.2 Complexity time of mHGS

This subsection examines the computational complexity of the developed optimizer, mHGS. The overall computational complexity of mHGS can be attributed to three main factors: initializing the solutions, evaluating the fitness functions, and updating the solutions. When it comes to initializing the solutions, the computational complexity is denoted as O(N), where N represents the number of individuals in the population. For updating the solutions, the complexity time can be calculated as \(O(T \times N)+O(T \times N \times Dim)\). Here, T represents the total number of iterations, and Dim is the dimension size of the given problem. This complexity encompasses the exploration of the best locations and the update of locations for all individuals in each iteration. Taking all factors into account, the total computational complexity of the proposed mHGS can be expressed as \(O(N \times (T \times Dim+1))\).

5 mHGS for FS

In this section, the core framework of FS using the mHGS algorithm is presented. The FS process is divided into three phases, which are outlined as follows:

5.1 Initialization

The first phase in the mHGS process is to generate a random population of N agents, with each solution representing a subset of features to be evaluated. As in, the population \(X_i^0\) is produced at random by Eq. (25).

$$\begin{aligned} X_i^0= LB+{rand}(0,1) \times (UB-LB) \end{aligned}$$
(25)

The upper and lower bounds (UB and LB) for each individual i in this study are in the range [0,1]. Before the evaluating fitness procedure, an additional step called “conversion to binary” is required to choose a subset of features. As a result, using Eq. (26), each solution \(X^0_i\) must be transformed into binary \(X^{b}_i\):

$$\begin{aligned} X^{b}_i=\left\{ {\begin{array}{*{20}{c}} 1&{} \text {if} \ X^0_i>0.5\\ 0&{}\text {otherwise}.\\ \end{array}} \right. \end{aligned}$$
(26)

To better understand the conversion procedure, let’s consider the solution \(X_i\), which consists of five elements: [0.2, 0.6, 0.33, 0.7, 0.01]. The conversion procedure, as described in Eq. (26), is performed to construct a binary vector: \(X_i^b = [0, 1, 0, 1, 0]\). In this binary vector, ’1’ indicates that a feature is selected, while ’0’ indicates that a feature is neglected. It can be inferred from this conversion that the first, third, and last features in the original dataset are considered irrelevant and should be unselected, while the remaining features are regarded as relevant and should be selected.

The objective function is utilized to calculate the quality of each solution \(X_i^b\) and assess the selected features subset. Equation (27) is employed to determine the objective value for the \(i^{th}\) agent.

$$\begin{aligned} Fit_i=0.99\times (1-Acc_i)+0.01\times \frac{d^*_i}{D} \end{aligned}$$
(27)

The k-NN classifier is utilized as an evaluator in this work to assess the performance of the FS process. The error rate obtained by the k-NN method on the testing set is denoted by \(Err_i\). It is worth mentioning that a hold-out strategy with a ratio of 0.8:0.2 is employed for the evaluation.

5.2 Updating solutions

The mHGS algorithm focuses on enhancing the exploration phase and improving the local search step. The modified production operator, inspired by the AEO algorithm, is introduced to prevent stagnation in local optima. For the local search, the modified local escaping operator, inspired by the GBO algorithm, is designed to select and retain the best four solutions. Additionally, a new transition factor is developed to strike a balance between intensification and diversification by choosing between two strategies of cooperative communication and foraging behavior. Once the mHGS algorithm completes the updating process, the fitness function is evaluated using Eq. (27) to determine the best solution, which is then returned. The dataset is divided into a training set (80%) and a testing set (20%). The accuracy of the suggested algorithm is evaluated using the k-NN algorithm with a value of \(k = 5\) on the testing set. This process is repeated until the specified number of iterations is reached.

The proposed method, mHGS, follows the flowchart described in Fig. 1. To ensure robustness, the experiments are repeated 30 times, as commonly done in related literature.

5.3 Classification

After completing the updating process of mHGS, the algorithm returns the best solution obtained. In this paper, the dataset is divided into two parts: a training set and a testing set, with proportions of 80% and 20%, respectively. The accuracy of the suggested algorithm is evaluated using the k-NN algorithm with a value of \(k = 5\) on the testing set. To ensure reliable results, the experiments are repeated 30 times, following the common practice found in many papers in the literature.

Fig. 1
figure 1

Flowchart of the proposed method mHGS based FS

6 Experimental evaluation and discussion

6.1 Performance measures

To evaluate and test the performance of the proposed method (mHGS), various measures are utilized, including:

  • Average accuracy \((\mu _{Acc})\): This metric refers to the correct data classification rate. Here, the suggested algorithm and other ones are executed thirty times, and to calculate \(\mu _{Acc}\), the following metric is used.

    $$\begin{aligned} {\mu _{Acc}} = \frac{1}{{{M}}}\frac{1}{N_{s}}\sum \limits _{k = 1}^{{M}}\sum \limits _{r = 1}^{{N_{s}}} ({C_{r}}=={L_{r}}) \end{aligned}$$
    (28)

    where M refers to the runs number, \(N_{s}\) indicates the samples size in the test dataset, \({L_{r}}\) and \({C_{r}}\) refer to the reference label sample class r and the classifier output label, respectively.

  • Average fitness value \((\mu _{Fit})\): this metric is used to evaluate the algorithm performance by establishing the relationship between reducing the selection ratio & minimizing the classification error rate as in Eq. (29):

    $$\begin{aligned} {\mu _{Fit}} = \frac{1}{{{M}}}\sum \limits _{k = 1}^{{M}} {Fit_{*}^k} \end{aligned}$$
    (29)

    where M refers to the runs number and \(Fit_{*}^k\) refers to the best fitness value for \(k^{th}\) run.

  • Average attributes selection size \((\mu _{Size})\): This metric shows the mean of selected features size and is calculated as in Eq. (30):

    $$\begin{aligned} {\mu _{Size} = \frac{1}{{{M}}}\sum \limits _{k = 1}^{{M}} {{d_{*}^k}}} \end{aligned}$$
    (30)

    To be able to calculate the overall selection ratio that equals to the percent between the selected features size \(d_{*}\) and the features total size D in the original dataset, we can use Eq. (31):

    $$\begin{aligned} Overall_{SelectRatio} = \frac{1}{{{M}}}\sum \limits _{k = 1}^{{M}} \frac{ {d_{*}^k}}{D} \end{aligned}$$
    (31)

    where M refers to the runs number, \(d_{*}^k\) is the selected features size of the best individual for kth run and D refers to the attributes total size in original data.

  • Average time of CPU \((\mu _{CPU})\): It computes the CPU average time for each algorithm as shown in Eq. (32):

    $$\begin{aligned} \mu _{CPU} = \frac{1}{{{M}}}\sum \limits _{k = 1}^{{M}} {T_{*}^k} \end{aligned}$$
    (32)

    where M refers to runs number and \(T_{*}^k\) refers to the taken time to perform k-th run.

  • Standard deviation (STD): This metric is used to evaluate each algorithm’s quality and analyze the given results over various runs. Equation (33) is used to calculate STD.

    $$\begin{aligned} ST{D_{Y}} = \sqrt{\frac{1}{{{M}}}\sum \limits _{k = 1}^{{M}} {{{\left( {Y_{*}^k - {\mu _{Y}}} \right) }^2}}} \end{aligned}$$
    (33)

    Note that the \(STD_{Y}\) is calculated for all metrics: fitness, accuracy, time, and selected feature number.

6.2 Experiment 1: FS for large and small dimensionality benchmarks

To confirm the effectiveness of the proposed method mHGS, two sets of experiments were conducted on eighteen datasets obtained from the UCI repository. The experiments were performed using MATLAB 2017b on a computer system with an Intel CoreTM i7 (3.40 GHz) CPU, 32GB RAM, and Windows 10 operating system.

The first set of experiments focused on solving FS problems, while the second set focused on a real-world application, specifically Parkinson’s disease detection. Various measures were used to validate the results, including mean fitness, mean accuracy, mean number of selected features, overall selection ratio, and standard deviation. Furthermore, the significance of the improvement achieved by the mHGS method was evaluated using Wilcoxon’s test, comparing it against other established methods such as GBO, HGS, WOA, HHO, GOA, GWO, SSA, and ALO.

The obtained results were analyzed in terms of Accuracy, Fitness Function, number of selected features, Precision, Recall, and F-score, providing a comprehensive evaluation of the performance of the proposed mHGS method compared to the other methods.

6.2.1 Datasets and parameter setup

Table 1 presents the details of the datasets used in the experiments, including the data categories, number of samples, and the size of features. The datasets are categorized into three groups: small-dimension datasets with a small number of features in the first category, medium-dimension datasets in the second category with a moderate number of features, and high-dimension datasets in the last category with a large number of features. The high-dimension datasets pose a greater challenge due to their complexity and the larger number of features.

Table 2 provides the parameter values used for all the comparative algorithms employed in the experiments. These parameters were adjusted to ensure a fair comparison between the algorithms.

Table 1 Descriptions of the used datasets
Table 2 Setting of mHGS and other compared algorithms

6.2.2 Experiment 1: Results and analysis

In this section, the results of the comparative methods are presented and analyzed. Table 3 displays the average fitness function values obtained by each method. It is evident that the proposed mHGS method consistently outperforms the other comparative methods. In almost all tested cases, mHGS achieves the best results in terms of fitness function. The statistical analysis using the Wilcoxon ranking test confirms the superiority of mHGS, as it outperforms the other methods in 13 cases, achieves similar results in 3 cases, and performs worse in 5 cases.

Table 3 The obtained results using average fitness function for all the comparative methods

Table 4 presents the accuracy values obtained by each method, showcasing the performance of the proposed mHGS method compared to other comparative methods. The results demonstrate that mHGS consistently achieves promising and competitive accuracy values across different datasets. In most cases, mHGS outperforms the other methods, especially for both high and low dimensional datasets. The statistical analysis using the Wilcoxon ranking test further confirms the superiority of mHGS, as it outperforms the other methods in 13 cases, achieves similar results in 7 cases, and performs worse in 4 cases. Additionally, the reported results of the Std values indicate that the proposed method exhibits stable optimization processes in solving the dimension reduction problem. These findings highlight the effectiveness and robustness of the proposed mHGS method in achieving high accuracy in FS tasks.

Table 4 The obtained results using average accuracy for all the comparative methods

Table 5 displays the results in terms of the number of features selected by each method, providing insights into the performance of the proposed mHGS method and other comparative methods. The results indicate that mHGS consistently achieves the smallest number of features among the tested methods, showcasing its effectiveness in FS. In almost all cases, mHGS outperforms the other methods, regardless of the dataset’s dimensionality. The Wilcoxon ranking test further confirms the superiority of mHGS, as it outperforms the other methods in nine cases, does not achieve similar results in any case, and performs worse in nine cases. Moreover, the reported results of the Std values validate the stability of the optimization processes employed by mHGS in solving various dimension reduction problems. Overall, the proposed mHGS method demonstrates its capability to handle complex problems effectively and consistently delivers auspicious results in comparison to well-established methods in the field.

Table 5 The obtained results using the size of features for all the comparative methods

6.2.3 Convergence analysis

Within this section, Fig. 2 illustrates the convergence curves of the comparative FS methods. It is evident that the incorporation of the HGS algorithm with the LEO operator enhances the convergence speed in finding optimal solutions for the given problems. This improvement in convergence can be observed in datasets such as Exactly, Vote, HeartEW, Prostate Tumors, SonarEW, and others. Notably, these significant levels of convergence are predominantly achieved in high-dimensional problems.

This section focuses on the asymptotic analysis of the proposed mHGS algorithm for FS problems on various selected datasets. The analysis aims to investigate the convergence behavior of mHGS through the examination of convergence curves. The effectiveness of mHGS in FS is influenced by its classification performance, which is determined by the algorithm’s convergence and execution. The convergence curves illustrate the relationship between the number of iterations and the achieved classification accuracy. It is observed that the optimal fitness values obtained by mHGS correspond closely to the optimal accuracy rates. This phenomenon demonstrates the effective trade-off equilibrium achieved by the search strategies employed in mHGS.

Fig. 2
figure 2

Convergence curve of mHGS against other competitors

Figure 3 presents the Boxplots comparing the performance of the proposed mHGS method with other comparative methods. These subfigures illustrate the distribution of the obtained results and their stability across multiple runs for each problem. When the results have a narrow range of values, it indicates that the tested method consistently performs well on the given problem, reflecting its stability. The boxplots of the proposed mHGS method demonstrate that the results are consistently similar, with minimal variation for each problem. In contrast, the boxplots of the other methods show higher variability in the obtained results. This indicates that the proposed mHGS method outperforms the other methods in terms of stability and consistency.

Fig. 3
figure 3

Boxplots of mHGS against other competitors

6.2.4 Wilcoxon’s rank-sum test:

In this section, Wilcoxon’s rank-sum test is conducted to assess the significance of the improvements achieved by the proposed mHGS method compared to other comparative methods. The p-values are reported, along with the number of win cases (“W”), tie cases (“T”), and lost cases (“L”). The results demonstrate that the proposed mHGS method outperforms various well-established methods (HGS, WOA, GOA, HHO, GWO, SSA, ALO, and GBO). For instance, when comparing the proposed mHGS against HGS, it achieved the best results in 12 cases, had similar results in 2 cases, and had worse results in 4 cases. Similarly, when comparing the proposed mHGS against WOA, it achieved the best results in 17 cases, had similar results in 1 case, and did not have any worse cases. These results highlight the superior performance of the proposed mHGS method in solving the selected problems.

Table 6 Statistical study based on Wilcoxon’s runk sum test

6.3 Experiment 2: Parkinson’s disease detection

In this section, the proposed mHGS method is further tested to demonstrate its effectiveness in real-world applications, specifically in the context of Parkinson’s phonation disease.

6.3.1 Datasets description

The Parkinson’s Disease Speech Vocal Dataset, a large dataset used in this study, was collected at the Department of Neurology, Cerrahpasa Faculty of Medicine, Istanbul University, Turkey. The dataset, obtained from the UCI machine learning repository [68], comprises speech signals from 188 patients diagnosed with Parkinson’s disease (107 males and 81 females) and 64 healthy subjects (23 males and 41 females). The dataset exhibits class imbalance, with a majority-to-minority class ratio of 2.93. The age range for healthy subjects is 41–82 years, while for Parkinson’s patients, it is 33–87 years. The microphone used during data collection had a frequency response centered at 44.1 kHz. Participants were instructed to sustain the vowel "ah" three times. Given that speech abnormalities are early indicators of Parkinson’s disease, analyzing speech characteristics is crucial. The PD speech dataset comprises 753 features, including 752 speech characteristics and 1 gender feature.Footnote 1

The Parkinson’s small dataset used in the study conducted by Naranjo et al. [56] aimed to distinguish individuals with Parkinson’s disease (PD) from healthy individuals through the analysis of replicated voice recordings. The dataset consisted of 40 individuals with PD and 40 healthy individuals, with each participant completing a brief questionnaire and providing three recordings of sustained /a/ phonation, resulting in a total of 240 voice recordings. The study received approval from the Bioethical Committee of the University of Extremadura, and all participants provided informed consent. From each recording, 44 acoustic features were extracted, resulting in a 44-dimensional vector for each recording. These extracted features were categorized into nine groups, with four groups containing only one feature, based on their formulation and relationship to the study.Footnote 2

6.3.2 Experiment 2: Results and analysis

Table 7 presents the results of the proposed mHGS method compared to other competitors in terms of fitness for Parkinson’s phonation disease. The table clearly demonstrates that the proposed method achieved the optimal results (both in terms of average and best values) for both the large and small dimensional datasets. Additionally, the Std values indicate that the proposed method consistently produced similar results for multiple runs, highlighting its stability and reliability. The low distribution of the obtained results further reinforces the effectiveness of the proposed method in solving real-world problems by accurately selecting the most relevant features. Moving on to classification accuracy, as depicted in Table 8, the proposed method also demonstrated consistent and high accuracy rates, aligning with the fitness function values. This indicates that the proposed method possesses excellent discriminatory power, resulting in accurate classification of Parkinson’s disease based on phonation data. Furthermore, Table 9 reveals that the proposed method consistently selected the smallest number of features while achieving higher accuracy rates. This demonstrates the efficiency of the proposed method in effectively identifying the most informative features relevant to Parkinson’s disease diagnosis.

Table 7 mHGS performance against other competitors in terms of fitness for Parkinson’s phonation desease
Table 8 mHGS performance against other competitors in terms of accuracy for Parkinson’s phonation desease
Table 9 mHGS performance against other competitors in terms of size of selected features for Parkinson’s phonation desease

Table 10 presents the performance of the proposed mHGS method compared to other competitors in terms of the recall measure for Parkinson’s phonation disease. The table clearly demonstrates that the proposed method achieved better results (both in terms of average and best values) for both the large and small dimensional datasets. The Std values indicate that the proposed method consistently produced similar results for the same case across multiple runs, reflecting its stability and reliability. The narrow distribution of the obtained results further indicates the efficacy of the proposed method in solving real-world problems by accurately selecting the most important features. Moving on to precision, as shown in Table 11, the proposed mHGS method consistently achieved results that align with the recall values. This indicates that the proposed method has a remarkable ability to achieve higher precision values, highlighting its accuracy in classifying instances related to Parkinson’s phonation disease. Furthermore, according to the F-score results in Table 12, the proposed mHGS method consistently obtained excellent results by achieving higher F-score values. The F-score combines precision and recall, providing a comprehensive evaluation of the proposed method’s performance. Additionally, the statistical measures in Table 13 demonstrate that the proposed method outperformed all the comparative methods for both the small and large datasets. When compared to existing comparable methods, it is evident that the proposed mHGS method performs exceptionally well in addressing this class of real-world problems.

Table 10 mHGS Performance against other competitors in terms of recall for Parkinson’s phonation desease
Table 11 The Performance of mHGS against other competitors in terms of precision for Parkinson’s phonation desease
Table 12 The Performance of mHGS against other competitors in terms of F-score for Parkinson’s phonation desease
Table 13 Statistical test for large and small parkinson’s phonation disease

In Fig. 4, the convergence curve of the proposed mHGS method is compared to other competitors using Parkinson’s phonation disease. It is evident that the combination of the HGS and LEO in the proposed method improves the convergence speed toward finding optimal solutions for the given problems. The proposed method achieves high percentages of convergence in most cases, indicating its effectiveness in reaching optimal solutions. The purpose of this experiment is to analyze the convergence experience of the mHGS method using convergence curves. The classification efficiency of mHGS in FS can vary, depending on the algorithm’s approach and convergence behavior. The convergence curves demonstrate the relationship between the number of iterations and the level of classification precision achieved thus far. Furthermore, the convergence curves of the mHGS method show that the optimal fitness values align well with the optimal accuracy rates. This indicates that the proposed method is able to achieve high accuracy while converging toward the optimal solution consistently.

In Fig. 5, the proposed mHGS method is compared to various comparative approaches using boxplots. These subfigures illustrate the distribution of the obtained results and their consistency over multiple runs for each task. When the results have a narrow range of values, it indicates that the tested method is able to solve the given problem consistently, reflecting algorithm stability. The boxplots of the proposed method show that the results are similar, with nearly identical outcomes for each task. Additionally, the low distribution values of the presented data demonstrate the effectiveness of the proposed strategy. In contrast, the other comparative approaches exhibit larger distribution values, indicating more variability in their results.

Fig. 4
figure 4

Convergence curve for mHGS against other competitors—Parkinson’s phonation disease

Fig. 5
figure 5

Boxplots for mHGS against other competitors—Parkinson’s phonation disease

7 Conclusion

This paper introduces a novel FS method called modified Hunger Games Search (mHGS) Algorithm. The mHGS algorithm is designed to improve the performance of the original Hunger Games Search by enhancing the local escaping operator (LEO), which helps overcome issues such as local optima, slow convergence, and ensures a balanced search strategy.

To validate the effectiveness of the mHGS method, two sets of experiments are conducted. In the first set, a comprehensive collection of benchmark dimensionality reduction problems is used, consisting of 18 datasets with varying sizes and dimensions. In the second set, a real-world problem related to Parkinson’s disease phonation is considered to demonstrate the method’s applicability in solving real-world applications. The results obtained from the mHGS approach are evaluated based on fitness function, accuracy, selected features number, precision, F-score, and recall, and compared to other well-known algorithms such as classical HGS, GBO, GOA, GWO, WOA, SSA, HHO, and ALO. The results indicate that the mHGS method outperforms other algorithms in terms of performance.

In future work, the proposed mHGS method can be applied to solve various other problems, such as engineering design, text mining, task scheduling, image segmentation, as well as other complex benchmark problems. Additionally, further research can explore the combination of the mHGS method with other meta-heuristic components to enhance its capabilities.