Abstract
Clustering analysis is essential for obtaining valuable information from a predetermined dataset. However, traditional clustering methods suffer from falling into local optima and an overdependence on the quality of the initial solution. Given these defects, a novel clustering method called gradient-based elephant herding optimization for cluster analysis (GBEHO) is proposed. A well-defined set of heuristics is introduced to select the initial centroids instead of selecting random initial points. Specifically, the elephant optimization algorithm (EHO) is combined with the gradient-based algorithm GBO for assigning initial cluster centers across the search space. Second, to overcome the imbalance between the original EHO exploration and exploitation, the initialized population is improved by introducing Gaussian chaos mapping. In addition, two operators, i.e., random wandering and variation operators, are set to adjust the location update strategy of the agents. Nine datasets from synthetic and real-world datasets are adopted to evaluate the effectiveness of the proposed algorithm and the other metaheuristic algorithms. The results show that the proposed algorithm ranks first among the 10 algorithms. It is also extensively compared with state-of-the-art techniques, and four evaluation criteria of accuracy rate, specificity, detection rate, and F-measure are used. The obtained results clearly indicate the excellent performance of GBEHO, while the stability is also more prominent.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Machine learning can be divided into supervised and unsupervised learning, depending on whether the examples are trained with or without labels [1]. Supervised learning focuses on prediction, mainly by considering the model complexity as well as the variance and bias between samples. The main task is to obtain the corresponding response variables when observations are made on the predictor variables. In contrast, unsupervised learning focuses on observation. Unlike supervised learning, response variables are not available in unsupervised learning. Consequently, the chief task is to determine the underlying characteristics of the input variables. Specifically, a cluster analysis is a representative technique used in unsupervised learning.
1.1 Literature review
The concept of a cluster analysis was first introduced by Driver and Kroeber in 1932 [2]. Later, Zubin and Tryon brought it to the field of psychology. Clustering techniques are currently widely used in many fields, such as data mining [3], image segmentation [4], wireless communication [5], outlier detection [6], agricultural production [7], and e-commerce [8]. Different from classification techniques, the classes to be divided in clustering are unknown. The correlation, distribution, and variability among the data need to be analyzed from the sample data. In other words, the process divides the samples into different groups by weighing the similarity measures between them, where each group is called a cluster [9]. Homogeneity and separability are two important metrics used in cluster analyses. The former indicates the similarity between objects in the same cluster, and the latter implies the difference of objects between different clusters. The purpose of clustering is to maximize the homogeneity of the same cluster and the heterogeneity of different clusters [10]. Driven by these two concepts, various types of clustering methods have been introduced. Interestingly, different conclusions can be drawn depending on the method used.
Clustering algorithms can be broadly classified into two categories, namely, hierarchical and partitional methods [11]. Hierarchical clustering methods assume a hierarchical structure between clusters and recursively find nested clusters. Their advantage lies in that the entire clustering process can be completed at once without requiring a priori knowledge. However, this approach is computationally intensive. The main methods include DIANA, BIRCH, CURE, and CHAMELEON. The partitional method, on the other hand, simulates the lookup of all clusters as a partition of the data instead of imposing a hierarchical structure [12]. Specifically, the dataset is divided into a fixed number of clusters based on a specific criterion. These clusters are disjointed, i.e., each object belongs to only one cluster. This type of method is characterized by insensitivity to the input dataset and is easy to operate. In addition, it is computationally simple. However, the scalability is poor, and most methods fall into local optima when the dimensions of the data objects increase [13].
Due to the ease of implementation, simplicity, and efficiency, k-means clustering has become one of the most widely used clustering methods [14]. It separates all samples into the closest clusters by minimizing the sum of squared errors to find the approximate solution in a greedy manner. However, due to the nature of the gradient descent, k-means often converges to a local minimum of the objective function. For the same reason, the quality of k-means for solving clustering problems depends heavily on the initial solution [15]. If not chosen properly, the algorithm can converge slowly, and may produce empty clusters. Under this circumstance, the probability of falling into a local optimum will increase [16]. With further research, many k-means variants have emerged to overcome this problem. For example, Bortoloti et al. [17] proposed supervised kernel-density-estimation k-means, called SKDEKMeans. The kernel destiny was used to estimate a better representation of the distribution so that the balance between majority and minority clusters was achieved. A k-centroid initialization algorithm (PkCIA) was then proposed by Manochandar et al. [18]. The eigenvector of the new matrix was adopted as an index for computing initial cluster centroids. On this basis, the problem that the original algorithm is highly sensitive to the initial solution can be solved. An I-k-means-plus was proposed by Hassan [19]. According to his philosophy, the quality of the solution can be improved by removing or splitting the class clusters in each iteration. It was experimentally demonstrated that the clustering process was accelerated with a relatively higher accuracy. Huang et al. [20] developed a robust deep k-means model to learn the hidden attributes. The objective function is derived to a more trackable form to tackle the optimization problem more easily while obtaining the final robust results. An entropy-based initialization algorithm was proposed by Chowdhury et al. [21]. In their method, an entropy-based objective function was defined to finish the initialization process. Meanwhile, by using a number of cluster validity indexes, the proper number of clusters for different datasets can be calculated. Therefore, the performance of the proposed algorithm was enhanced. Zhao et al. [22] proposed another novel variant of k-means to perform top-down hierarchical clustering. It exhibited a faster speed while maintaining a lower clustering error.
Data clustering has been widely used in the real world for mining valuable information. It has long been applied in such areas such as object detection and segmentation patterns, medical risk assessment, energy exploration and development, IoT applications and anomaly detection [23].
In the real world, datasets are mostly vague, complex, and large. Meanwhile, their labels and attributes are often difficult to access smoothly. In particular, it is almost impossible to cluster the data with varied shapes, sizes, and densities [24]. In this case, an accurate and efficient estimation of the initial centroids without a priori information is urgently required [25].
Since the aims of clustering are to maximize the similarity within the same cluster and dissimilarity across clusters, it can be considered an optimization problem [26]. In optimization problems, it is often necessary to maximize or minimize some objective function (the function used to evaluate the quality of the solution). In the entire process, various difficulties, such as constraints, multiple objectives, uncertainties, and local optimum traps, need to be solved. Optimization algorithms are one of the most powerful tools to address these problems. These methods treat the problem as a black box and search for the best solution through predefined steps. Traditional optimization methods include the dynamic programming algorithm (DPA), stochastic search, steepest descent, and Newton’s method [27]. The drawback of these methods is that they are usually limited by the size of a particular problem and the given dataset.
Inspired by natural phenomena and biological evolutionary behaviors, many simple and easy-to-implement metaheuristic algorithms have emerged in recent years for solving global optimization problems, such as Monarch Butterfly Optimization (MBO) [28], Slime Mould Algorithm (SMA) [29], Moth Search Algorithm (MSA) [30], Hunger Games Search (HGS) [31], Harris Hawks Optimization (HHO) [32], and others. The recent emergence of metaheuristic algorithms has developed a simple yet powerful data abstraction and analysis tool for researchers [33]. Currently, the popular research trend is to combine clustering algorithms with metaheuristics, thus ensuring a greater probability of achieving optimal clustering [34]. Chen et al. [35] proposed a new algorithm called QALO-K. k-means was optimized with a quantum-inspired ant lion to enhance the clustering performance and reach the global optimum. In addition, three clustering algorithms, GA-PFKM, PSO-PFKM, and SCA-PFKM, were proposed by Kuo et al. [36] to address the problem where fuzzy k-mode algorithms are sensitive to the initial solution. Nayak et al. [37] combined fuzzy c-means (FCM) with chemical reaction optimization (CRO) to achieve the global best solution. Aggarwal and Singh [38] introduced a nature-inspired algorithm for optimizing the k-means++ algorithm, aimed at overcoming the tendency to fall into local optima. Lakshmi et al. [39] mixed the crow search algorithm (CSA) with k-means, and the quality of the solutions obtained on the benchmark dataset was significantly improved. Due to the defect that traditional clustering methods usually perform poorly when dealing with high-dimensional optimization problems, Yang and Sutrisno [40] proposed a clustering-based SOS (CSOS) algorithm. The combination of local and global searches was achieved through cross-cluster interactions between elite individuals, thus enhancing the clustering efficiency. Note that Fuzzy C-means (FCM) tends to fall into local minima when facing complex problems. Verma et al. [41] proposed hybrid FCM and particle swarm optimization (PSO) algorithms (Hybrid FCM- PSO), while the global optimization property of PSO is used to search for cluster centers. In [42], an Automatic Clustering Local Search HMS (ACLSHMS) algorithm was proposed for image segmentation, incorporating a local search operator in the algorithm aimed at optimizing the cluster configuration of the clusters. In addition, given the effectiveness of unsupervised learning for medical image diagnosis, Mittal et al. [43] proposed a novel k-means-based improved gravitational search algorithm clustering (KIGSA-C) method for diagnosing medical images of coronavirus (COVID-19).
Considering the relevance of clustering methods to most real-world problems, there is a need to modify the current algorithms to improve the clustering performance and expand the range of applications. Cluster analysis is an open field. Researchers [2, 25, 44] encourage the application of new metaheuristic algorithms in combination with traditional clustering methods to efficiently solve complex clustering problems.
Elephant herding optimization (EHO) [45] is a novel metaheuristic algorithm proposed by Wang et al. in 2016. The algorithm has a strong global optimization capability and few control parameters [46]. Consequently, it is simple and efficient for clustering. Unfortunately, EHO still has defects, such as a lack of exploitation ability, slow convergence, and an ease of falling into local optimality. Li et al. [47] proposed an improved EHO algorithm (IMEHO) that introduced a global speed strategy and a novel learning strategy to update the speed and position of search agents. Experiments showed that the algorithm can find a better solution. Ismaeel et al. [48] proposed three EHO variants, EEH015, EEH020, and EEH025, based on the γ-value. The purpose was to overcome the problem of an unreasonable convergence to the origin. Huseyin [49] proposed a binary version of EHO. Mostafa et al. [50] presented a study of parameters in EHO. Three versions of EHO with cultural-based, alpha-tuning, and biased initialization were proposed to ameliorate the exploration and exploitation capabilities. However, none of the above algorithms are involved in the field of clustering, and their performance in clustering analysis has not been verified.
According to the no free lunch (NFL) theorem, a metaheuristic algorithm that performs well on one specific problem cannot be adapted to all optimization problems [51]. This allows researchers to add new modules and mechanisms to enhance the performance of metaheuristic algorithms. It has been determined that these hybrid algorithms can obtain a global optimal solution more efficiently than a single metaheuristic algorithm [52]. In summary, the research in this paper has a strong relevance. Inspired by this, a gradient-based elephant herding optimization for cluster analysis (GBEHO) is proposed in this paper for cluster analyses. EHO is combined with a gradient-based optimizer (GBO) [53] to further improve the convergence efficiency and exploitation capability. In addition, random wandering and variational operators are introduced to improve the ability of the algorithm to jump out of the local optimum and increase the convergence accuracy.
1.2 Contribution and organization of the paper
Overall, although many researchers have made great contributions to enhance the performance of clustering algorithms, there are still limitations. The paper contributes with six main aspects:
-
1.
A novel hybrid metaheuristic algorithm, GBEHO, is proposed for the cluster analysis, which can automatically determine the best cluster centers.
-
2.
Certain modifications are made to easily address the problem of falling into the local optimum. First, Gaussian chaotic mapping is introduced for initialization to generate high-quality initial populations. Second, a random wandering operator is designed to optimize the update strategy of the patriarch position. Third, a mutation operator is adopted to change the update strategy of other agents in the EHO. This prevents premature convergence and enhances the ability of the algorithm to jump out of the local optimum.
-
3.
To prevent premature convergence and enhance the balance between exploration and exploitation, EHO is combined with GBO. A framework is developed to fuse the advantages of both algorithms, and the resulting clustering centers are evaluated using a greedy selection strategy.
-
4.
A set of ablation experiments are designed to verify the effect of the variational probability PSR on the performance of the algorithm. The experiments are conducted on 23 recognized benchmark functions and tested statistically. The results show that the newly added operators are emphatic for the improvement of EHO, and that the optimization is most effective when PSR = 0.2.
-
5.
The analysis for the different modules illustrates that the combined strategy is effective. Experiments are carried out on synthetic and real-world datasets. GBEHO is compared with nine other metaheuristics and clustering algorithms, including k-means, particle swarm optimization (PSO), differential evolution (DE), genetic algorithm (GA), cuckoo search algorithm (CS), gravitational search algorithm (GSA), bat algorithm (BA), a quantum-inspired ant lion optimized hybrid k-means algorithm (QALO-K), hybrid grey wolf optimizer and a tabu search (GWOTS). The experimental results show that GBEHO has a superior clustering accuracy and higher stability.
-
6.
Comparative experiments are conducted with four other state-of-the-art techniques on five datasets, including CSOS, Hybrid FCM-PSO, ACLSHMS, and KIGSA-C. A variety of measures, namely, accuracy rate, specificity, detection rate, and F-measure, are adopted to evaluate the clustering effect. The experiments proved that GBEHO is an effective algorithm for clustering analysis.
The structure of this paper is shown as follows: Section 2 briefly introduces the principles of cluster analysis, EHO, and GBO. Section 3 provides a specific description of the novel concepts and design process. Section 4 conducts the experiment and analyzes the results. Discussions are given in Section 5. Finally, conclusions are summarized, and future research directions are proposed in Section 6.
2 The basic theory
2.1 Principle of clustering
Clustering is the process of organizing datasets and objects into different clusters based on certain rules [54]. In short, all data points are clustered into different clusters by comparing their similarity. Suppose there exists a set of objects U = {x1,x2,……,xn} in an argument space U, where U ∈ Rn∗m. The hard assignment follows the principle of dividing objects into K clusters C = {C1,C2,……,CK}. No intersection is allowed between any two clusters. This can be expressed as follows:
During this process, the similarity between objects in a cluster plays the most significant role in the clustering result [55]. The main way to measure the similarity in clusters is to calculate the distance between data points, such as the Mahalanobis distance [56], cosine distance [57], Pearson correlation measure [58], Jaccard measure [59], or Dice coefficient measure [60]. The most common is the Euclidean distance [61]. For two data points xi = {xi1,xi2,……xim} and xj = {xj1,xj2,……xjm} in m dimensions, the Euclidean distance is shown as follows:
Generally, the smaller the intracluster distance or the larger the intercluster distance, the better the clustering performance [62]. In this paper, the sum of squared errors (SSE) is chosen as the objective function. SSE should be minimized in each iteration, which can be expressed as follows:
where d(x,gi)2 denotes the squared distance from the sample point x to the center of mass gi of cluster ci.
2.2 EHO
EHO is a population-based algorithm proposed to simulate the nomadic life characteristics of elephants. In EHO, three principles are followed. (i) The population of all agents is divided into a specific number of clans. (ii) Each clan is led by a female individual, called a matriarch, representing the best-positioned agent in each iteration. (iii) The worst agent in each iteration represents the male elephant, who, once reaching adulthood, leaves its clan to live alone. EHO sets up the clan operator and the separation operator to model the above behavior.
2.2.1 The clan operator
For the search agent j in clan ci, its position must be modified according to the relationship with the clan leader, which can be expressed by:
where xbest,ci is the position of the best agent in clan ci, xci,j and xnew,ci,j are the current and new positions of the search agent j in clan ci, respectively, and α and rand are both random numbers between [0,1]. Unlike other member position updates, the position of the clan leader is adjusted based on the current position of all agents in the clan. This can be modeled by (8).
where \({x_{center,{c_{i}}}}\) denotes the central position of all agents in clan ci, which is calculated by:
where β affects the extent to which xcenter,ci acts on xnew,ci,j, β ∈ [0,1], and nci is the number of all agents in clan ci.
2.2.2 Separating operator
The separating operator imitates the life characteristics of male elephants. When adults, male elephants leave their current clan, represented by the following equation:
where r is a random number between [0,1], and \({x_{{\max \limits } }}\) and \({x_{{\min \limits } }}\) are the upper and lower bounds of the individual position, respectively.
2.2.3 Elitism strategy
To protect the best elephant individuals from being ruined, EHO sets an elitism strategy. At the beginning of the algorithm, the best m elephant individuals are saved. After an iteration is completed, the fitness values of the worst m elephants are compared with the best elephant individuals that were saved before, and the better agents have the opportunity to be preserved. In this way, it is ensured that the quality of the latter agents is not worse than the quality of the former agents.
2.2.4 Pseudocode of EHO
Based on the above description, the process of EHO can be summarized, and the pseudocode is shown in Algorithm 1.
2.3 GBO
GBO is a population-based algorithm solved by the gradient method. In GBO, the search direction is controlled by Newton’s method. Additionally, two main operators and a set of vectors are adapted to explore the search space.
2.3.1 Gradient search rule (GSR)
The gradient search rule (GSR) is extracted from Newton’s method to control the direction of the vector search. To ensure a balance between exploration and exploitation during the iterations and accelerate the convergence, a series of vectors are introduced as follows:
where \({{\upbeta }_{{\max \limits } }}\) and \({{\upbeta }_{{\min \limits } }}\) are taken as 1.2 and 0.2, respectively, m and M represent the current and the maximum number of iterations, respectively, and rand denotes a random number between [0,1]. The value of α varies with the iterations and can be used to control the convergence rate. Early in the iteration, the value of α is large, thus allowing the algorithm to increase the diversity and converge quickly to the region where it hopes to find the optimal solution. Later in the iteration, the value decreases. Therefore, the algorithm can better exploit the explored regions. On this basis, the expression of GSR is as follows:
where xworst and xbest represent positions of the worst and the best agents, and ε is a small number in the range of [0,0.1]. The proposed GSR is capable of a random search, which enhances the exploration ability of GBO and the ability to jump out of the local optimum. Δx is calculated by the following expression:
where rand(1 : N) denotes N random numbers between [0,1] and step is the step size. xbest represents the global optimal agent, and \({x_{n}^{m}}\) denotes the mth dimension of the nth agent. r1,r2,r3,r4 are different integers randomly selected from [1, N].
Moreover, a motion parameter DM is set for a local search to improve the exploitation capabilities. The expression is shown as follows:
rand denotes a random number between [0,1], and ρ2 is the parameter that controls the step size and is represented as follows:
Ultimately, the current location of the search agent (\({x_{n}^{m}}\)) can be updated by GSR and DM in the following way:
According to 14 and 18, (20) can also be expressed as follows:
where \(y{p_{n}^{m}} {=} {y_{n}^{m}} {+} {\Delta } x\), \(y{q_{n}^{m}} {=} {y_{n}^{m}} {-} {\Delta } x\), and \({y_{n}^{m}}\) is a newly generated variable determined by the average of \({x_{n}^{m}}\) and \(z_{n + 1}^{m}\). According to Newton’s method, \(z_{n + 1}^{m}\) is formulated by:
where Δx is specified by (15), and xworst and xbest denote the current worst and best agents, respectively. After replacing the current vector \({x_{n}^{m}}\) in (21) with xbest, a new vector \(X{2_{n}^{m}}\) can be obtained with the following expression.
Based on 21 and 23, the new solution \(x_{n}^{m + 1}\) can be expressed as:
where ra and rb are random numbers between [0,1].
2.3.2 Local escaping operator (LEO)
The local escaping operator (LEO) is set to retune the resulting solution so that the algorithm can move away from local optima, improving the probability of finding the optimal solution. A solution with superior performance (\(X_{LEO}^{m}\)) is introduced in the LEO, which is represented as:
pr is a predetermined threshold, where pr = 0.5. f1 is a random number between [-1,1], and f2 is a random number that conforms to the standard normal distribution. u1,u2,u3 are respectively represented by:
where L1 is a binary parameter of 0 or 1, and μ1 is a random number between [0,1]. When μ1 < 0.5, L1 = 1; otherwise, L1 = 0. In summary, the resulting solution \({x_{k}^{m}}\) is expressed as follows:
where \({x_{p}^{m}}\) is a randomly selected solution from the population, p ∈ [1,2,……N]. L2 is a binary parameter of 0 or 1, and μ2 is a random number between [0,1]. When μ2 < 0.5, L2 = 1; otherwise, L2 = 0. xrand is the newly generated solution in the following manner.
3 The proposed algorithm
3.1 Motivation
Traditional clustering algorithms (e.g., k-means), whose degree of validity depends on the initial solution, may fall into local optima when dealing with complex problems. Therefore, in this paper a new method is developed for data clustering. The method applies the concept of metaheuristics to automatically estimate the initial clustering centers and enhance the ability of the algorithm to escape from local optima.
The ability to balance exploration and exploitation is the concern of all metaheuristic algorithms [63]. The analysis of EHO reveals that the worst positioned agents are only randomly modified by (10). This kind of approach lacks some variation mechanism, which makes the exploitation capacity insufficient and thus leads to a slow convergence. Furthermore, the best-positioned agents are adjusted by (8). This would be useless once the population has fallen into a local optimum while reducing the diversity of the population. In addition, the capability of exploitation of EHO is relatively weak, which increases the probability of falling into a local optimum [64]. By combining with GBO, the search direction during the iteration can be guided to avoid trapping in a local optimum, resulting in a better solution. The local escape operator (LEO) in GBO can improve the diversity of the population and avoid excessive stagnation. In this case, the proposed algorithm can make full use of the gradient information so that the search efficiency of the algorithm can be improved. [65].
Based on the above reasons, several modifications are performed. First, Gaussian chaos mapping is introduced to initialize the population, thus increasing the diversity and traversal of the initial population. Next, two operators, random wandering and variation operators, are adopted to optimize the position of the agents. The aim is to achieve a better balance between exploration and exploitation. Furthermore, EHO is combined with GBO to enhance the exploitation capability by introducing GSR and LEO operators. In summary, the authors believe that this kind of modification is quite interesting.
3.2 Methodology
Since the algorithm is based on a metaheuristic, the search agents need to be represented first. Depending on the specificity of the clustering problem, the representation of the individuals is supposed to be changed. If the input dataset U={x1,x2,……,xn} includes n agents, then each object with m features can be represented as xi={xi1,xi2,…… xim},i ∈ [1,n]. Since one or more initial clustering centers are generated, the dimensionality \(\dim \) of the algorithm will change based on the number of clusters k, i.e., \(\dim = m \times k\). Therefore, each candidate solution Cj denotes a set of cluster centers, which can be represented by:
The solution for the initial iteration is irrelevant to the clustering problem and is randomly generated based on the available dataset. To complete the initialization process, upper and lower bounds must be determined for each feature. Namely, the lower bound is represented as \({c_{{\min \limits } }} = \{ {c_{l1}},\) cl2,……clm}, where \({c_{lm}} = \min \limits \{ {c_{1m}},{c_{2m}}, {\ldots } {\ldots } {c_{nm}}\}\). Similarly, the upper bound is determined as \({c_{{\max \limits } }} = \{ {c_{u1}},\) cu2,……cum}, where \({c_{um}} = {\max \limits } \{ {c_{1m}},{c_{2m}}, {\ldots } {\ldots } {c_{nm}}\}\).
3.2.1 Initialization
It is noted that a strong connection exists between the quality of the initial population and the efficiency of the metaheuristic algorithm. Under this circumstance, it is necessary to improve the initialization by suitable methods to obtain a higher quality initial population. In the original EHO, the search process starts from a randomly generated initialized population. Based on that, a priori knowledge of the objective function or constraints is not required. However, it lacks ergodicity and diversity. It has been experimentally demonstrated that chaotic maps have similar properties to randomness but possess better statistical and dynamic properties [66]. Therefore, it is advantageous to use chaotic maps for population initialization in GBEHO.
In this paper, a pre-programmed Gaussian sequence [67] is selected to replace the conventional random number generator, which is represented as follows.
η(t) and η(t + 1) denote the numbers of chaotic maps generated in the current and next generations, respectively. The initialized population is generated by the Gaussian chaos mapping function, which can explore the space more extensively to obtain better exploration results.
3.2.2 Random wandering operator
It should be emphasized that in the original EHO, the position of the patriarch is determined by the position of all members in the same clan. Once the algorithm has fallen into a local optimum, the quality of the best solution is difficult to modify. As a result, the populations generated by the clan operator are prone to wandering in place. This makes the algorithm somewhat lack the ability to jump out of the local optimum. In our consideration, as the best-positioned agent in each clan, the update strategy of the patriarch should be pioneering and innovative.
One of the most significant rules of metaheuristic algorithms is to maintain a balance between exploration (diversification) and exploitation (intensification). In the pre-exploration stage, agents need to explore the search space sufficiently to identify promising regions for exploitation. During this phase, individuals should have a better stochastic search ability; otherwise, it will lead to premature convergence. In the exploitation phase, agents focus on discovering better solutions in the explored regions. Therefore, the accuracy of individuals in finding the best solution should be optimized so that the algorithm converges to a feasible local or global optimal solution in a limited time. Based on this consideration, the update strategy of the patriarch is adapted as follows:
where \(x_{best,ci}^{t}\) and \(x_{best,ci}^{t + 1}\) denote the current and latest positions of the patriarch in clan ci, respectively, xa,ci,xb,ci,xc,ci denote individuals randomly selected from clan ci, respectively, rand is a random number between [0,1], and C(σ) denotes the Cauchy distributed random number. It has been proven that a Cauchy distribution-based random walk could contribute to global exploration [68]. The Cauchy distribution function is defined as
where a is the location parameter and b is the scale parameter. In the standard Cauchy distribution, a = 0,b = 1. Meanwhile, the Cauchy density function is as follows
The Cauchy distributed random number C(σ) generated by (35) can be expressed by
It should be noted that the random wandering operator based on the Cauchy distribution replaces the original strategy of updating based on the mean value in GBEHO. Under this circumstance, it is beneficial for agents to expand the search area, bringing an increase in diversity. For the algorithm to run smoothly, bounds should be checked to prevent crossing them. Once out of range, the Cauchy mutation is repeated several times until the new solution lies within the specified range. As the iterations continue to run, it is actually a process of decreasing the step size. Later in the algorithm, GBEHO moves to exploitation. At that time, the clan leader is modified by the position of three random individuals in the population, which contributes to improving the accuracy of discovering the globally optimal solution.
3.2.3 Mutation operator
Another deficiency of EHO is the lack of a variation mechanism, which is reflected in the following two points. First, most of the agents in a population, excluding the worst individual, are updated based on the relationship with the clan leader, and the sense of independence is poor. This type of mechanism is not conducive to enhancing the diversity. For instance, once the algorithm is caught in a local optimum, it is difficult to have the opportunity to continue exploring. Second, during the search process, a few agents broke away from the group led by the female matriarch. These agents obviously have a more prominent sense of independence and are able to perform a random search in the search space. However, their sense of following the matriarch is still relatively weak in terms of the whole clan. Unless most of them explore the wrong search direction, it will slow down the convergence speed of the algorithm and affect the search efficiency. In the original algorithm, the position of the worst individual is adjusted by the random nature, making it difficult to ensure that the search agent is updated to a better position [69].
Similar to mutations in chromosomes, mutation strategies have been widely used through genetic algorithms [70], the aim of which is to increase the diversity of the population. To ensure that most agents have the opportunity to mutate, a variance probability (PSR) is set. This parameter should take a value between (0,1) to avoid exceeding the population size boundary. If PSR is less than 0.2, it means that fewer individuals undergo mutation, and it is difficult for the experiment to have a substantial effect. If PSR is greater than 0.8, then the algorithm will determine that most of the individuals will participate in the mutation, which is contrary to the original intention of the setting. Therefore, for the purpose of maintaining a balance between exploration and exploitation while meeting the diversity enhancement requirements, the magnitude of the variance probability PSR is proposed to be experimentally tested in order to determine the optimal clustering effect. It has been experimentally verified that this module has a positive impact on the performance of the algorithm. The ablation experiments will be presented in the next section. In GBEHO, the mutation operator is set as shown below:
where xworst,ci represents the position of the agent to be modified and δ is the variation factor. In this paper, \(\delta = 0.1 * ({X_{{\max \limits } }} - {X_{{\min \limits } }})\). r1,r2,r3, and r4 are random numbers uniformly distributed from 0 to 1. u1 is a random variable of [− 1,1], and t and Maxiter represent the current and maximum number of iterations, respectively. \(x_{pbest}^{t}\) is the optimal solution at the tth iteration, and xGbest stands for the global optimal solution.
3.2.4 Greedy selection strategy
When designing a hybrid framework, there are two critical issues [71]. One is to combine two or more methods into one framework, and the other is to evaluate the best solution from the iterations. In this paper, EHO is set as the basic algorithm because of its ease of implementation and certain exploration capability. The obtained solutions are then updated via GBO to enhance the diversity of the population. Compared to EHO, GBO is more advantageous in terms of its exploitation capability due to GSR and LEO. Finally, the solutions provided by the search agents are evaluated by a reedy selection strategy. If the fitness generated by the new agent is better than the current one, it is replaced and involved in a new round of iteration processes. The purpose is to ensure the convergence of GBEHO.
where GBestX represents the global optimal agent, and \({x_{k}^{i}}\) represents the kth agent generated in the ith iteration.
3.3 Pseudocode of GBEHO
According to the above adjustments, the pseudocode of GBEHO is shown in Algorithm 2. The initialization is performed in line 4 by means of the introduced chaotic mapping. The EHO phase is then completed in lines 7 to 16. In detail, the two proposed operators are applied in lines 11 and 15. In the second stage, the algorithm performs the gradient search rule (GSR) and local escaping operator (LEO) operators, which are shown in lines 17-28. Finally, the clustering process is completed based on the searched clustering centers in lines 33-36. In addition, the flow chart of GBEHO is given in Fig. 1.
3.4 Time complexity
The time complexity of the algorithm can reflect the magnitude of the running time variation with an increase in the input size [72]. The time complexity of the proposed GBEHO is bounded by the number of search agents N, the dimensions of the problem D, and the maximum number of iterations T.
In general, the time complexity of GBEHO can be divided into the following parts: chaos initialization, random wandering, mutation, and the GBO strategy. First, the time spent initializing the population using Gaussian chaos mapping is O(N). Next, the main loop phase with a maximum number of iterations of T is executed. Random wandering with a Cauchy distribution takes O(TN), and the execution of the mutation operator takes O(TN). In addition, the GBO strategy costs O(TND), so the computational complexity of GBEHO is O(TDN + TN).
4 Experiments and analysis
In this section, experiments are conducted to verify the validity of the GBEHO. All simulations are implemented on a Windows 10 operating system computer with an Intel(R) Core (TM) i5-9300H (2.40 GHz) processor, 16 GB of RAM and the MATLAB R2019b platform.
4.1 Influence of the parameters
In Section 3, the variation probability PSR is introduced into GBEHO. To verify the sensitivity of the controlled parameters, four versions of GBEHO were developed to test the performance under different parameters on a set of 23 recognized functions [73]. The values of PSR vary in the range of [0.2,0.8] with a step size of 0.1. For the sake of convenience, these sub-algorithms are named GB2, GB3, GB4, GB5, GB6, GB7, and GB8, corresponding to PSR values of 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8, respectively. The test functions include 7 unimodal benchmark functions, 6 expandable multimodal functions, and 10 multimodal functions with fixed dimensions. The basic information of the benchmark functions is listed in Table 1. In this subsection, PSR is the only parameter that changes across GBEHO versions. For the sake of validating the parameters, the final clustering part of the original algorithm was excluded, and only the final fitness values were calculated. Furthermore, the number of clans c in GBEHO is set to 5. The maximum number of iterations \({t_{{\max \limits } }}\) is set to 500, and the size of population N is set to 10.
To evaluate the performance of each variant, several measurement terms were invoked, including the mean and standard deviation values (std). Given the randomized nature of the heuristic algorithm, it was necessary to compare the experimental results via statistical tests in order to test the validity of the experimental data [74]. Therefore, the Wilcoxon rank-sum test and the Friedman test with a significance level of 5% were adopted, where the p value is an important indicator of the confidence of the results. When p < 0.05, it was determined that there was a statistically significant difference between the two groups of results. In addition, bold indicates the best candidate solution obtained in each function, and NaN indicates that the algorithm performs best on the current function. Moreover, the ranks of the results obtained by different algorithms on each function were also compared. The results obtained when each experiment was completed 30 times are shown in Table 2.
The box plots represent discrete information of a set of data, which can detect outliers and data skewness and can be used to differentiate the ability of algorithms in terms of data symmetry and dispersion [75]. The height in each boxplot reflects the stability, with narrower heights representing less noise and outliers and more stable results. The aggregation of the solution is an important factor in assessing the performance of the algorithm. If an algorithm falls into a local optimum, it will lead to premature convergence, and the quality of the solution will be degraded. 6 representative functions are selected from unimodal benchmark functions, multimodal functions, and fixed-dimension functions, with the box plots of the 7 variants and the original EHO plotted in Fig. 2. As can be observed from the results in the figure, the graphics of GB2 are relatively lower and narrower. Considering the mean, best value, worst value, and standard deviation, GBEHO performs better in different functions when PSR = 0.2.
The results of the 8 algorithms on 23 benchmark functions are represented in Table 2. As clearly shown in the table, the different GBEHO variants achieved higher ranks than EHO. The results show that the quality of the candidate solutions is significantly strengthened by two operators and the combination with GBO. Indeed, GBEHO obtains a lower mean and standard deviation on most functions, especially GB2. Meanwhile, the performance of the algorithm gradually decreases as the probability of variation increases. Specifically, GB2 surpasses the other algorithms on F1 to F4, F9 to F11, F13 to F15, and F17 to F19 and obtains the highest ranking. GB3 performs best on F10 and F16 and ranks second among all algorithms. Comparatively, the improvement of GBEHO with PSR greater than 0.6 is less pronounced. These results show that the newly added operators further promote the capability of local exploitation. Meanwhile, the balance between exploration and exploitation is well achieved through the combined framework. However, the performance of variants on parts of the functions is relatively insignificant, e.g., F5, F6, F8, F12, F22, and F23. It is obvious that the best solutions on those functions are achieved by the original EHO. This is probably due to the unique characteristics of the different functions that make the modifications of EHO inapplicable. In conclusion, the variation operator can promote the performance of the original algorithm, and the improvement is most pronounced when PSR = 0.2. Therefore, PSR is set to 0.2 in the subsequent experiments.
4.2 Analysis of the modifications
To investigate the impact of the modifications on the performance of the algorithm, a set of comparison experiments is conducted. In this subsection, in addition to GBEHO and EHO, three other methods, namely, Gaussian sequence + EHO (GEHO), mutation operator + EHO (MEHO), and random wandering operator + EHO (RWEHO) are designed. The above three strategies are also the core modules of the modifications. 6 representative functions are selected to verify the performance of the different variants. The size of population N is set to 30, the maximum number of iterations \({t_{{\max \limits } }}\) is 500, and the number of clans c is 5. Other than that, other parameters are kept consistent. To reduce the effects of errors and instabilities, all algorithms are subjected to 30 experiments. The final results are based on an average of 30 experiments.
Figure 3 shows the convergence curves of 5 algorithms on the 30-dimensional functions. The convergence efficiency of the other 4 variants is significantly better than the original EHO. This indicates that the modifications of Gaussian mapping, random wandering, and mutation operators can indeed all improve the convergence efficiency of the EHO. Specifically, the Gaussian sequence improves the initialization, which leads to an increased efficiency in the early stages of the algorithm. In addition, the global optimal solutions achieved by MEHO and RWEHO are superior to EHO and GEHO. It is therefore proven that random wandering and mutation operators enhance the diversity of the population. Consequently, exploration and exploitation are promoted, leading to a higher convergence accuracy. In comparison, GBEHO has the best convergence performance. The global optimum is attained around the 300th and 10th generations on the F2 and F9 functions, respectively. The convergence rate on the F11, F14, F15, and F20 functions is also the fastest among several algorithms. These results provide strong evidence that the combined effect of modifications has led to further improvements in the search accuracy and breadth of GBEHO.
In addition, the average, standard deviation, best, and worst values of different variants on the six benchmark functions are recorded and reported in Table 3. The results in the table are the average results obtained after 30 runs of each algorithm, and the best results on each function are shown in bold. It is obvious that all variants perform better than the original EHO algorithm, indicating that the strategy of Gaussian sequence, the random wandering, and the mutation operators are efficient, respectively. Besides, it is also worth noting that GBEHO achieves the most desirable performance overall, with the best average and standard deviation results on F2, F9, F11, F14, and F15. This indicates that the combination of different strategies is effective in a way that can significantly improve exploration and exploitation. In summary, it can be concluded that the modifications of EHO are convincing.
4.3 Comparison with other metaheuristic algorithms
To further verify the effectiveness of the algorithm, GBEHO was compared with nine other metaheuristics, namely, k-means, particle swarm optimization (PSO) [76], differential evolution (DE) [77], genetic algorithm (GA) [70], cuckoo search algorithm (CS) [78], gravitational search algorithm (GSA) [79], bat algorithm (BA) [80], a quantum-inspired ant lion optimized hybrid k-means algorithm (QALO-K) [35], hybrid grey wolf optimizer and a tabu search (GWOTS) [81].
4.3.1 Parameter settings
Under the consideration of fairness, parameters within the selected algorithm are preset, which are shown in Table 4. It should be noted that the parameters in the table are set according to the recommendations in the above work. Except for the parameters in the table, the other parameters are kept consistent. Furthermore, the maximum number of iterations \({t_{{\max \limits } }}\) is set to 200, and the size of population N is set to 10. The number of clans c in GBEHO is set to 5.
4.3.2 Datasets
Adán et al. [34] stated that the evaluation of a complete clustering algorithm should include both synthetic and standard real-world datasets. The datasets chosen for the experiments are from the University of California, Irvine (UCI) machine learning repository [82] and include Iris, Wine, Seeds, Breast, Heart, CMC, and Vowel. The synthetic dataset consists of two artificial datasets: two-moon and aggregation [83]. The basic information of the datasets is shown in Table 5.
4.3.3 Comparison of the experimental results
In this section, the various algorithms are compared based on the experimental values of SSE. Each algorithm is run 30 times separately, and the obtained results are shown in Table 6. Best, Worse, Mean, and Std. denote the best, worst, mean, and standard deviation of all the results, respectively. Obviously, it can be seen that the algorithms produce separate values due to the complexity of the dataset. GBEHO can provide the lowest solutions in most datasets. Compared to the basic k-means algorithm, GBEHO achieves better mean values in all cases.
For 9 datasets, GBEHO can provide the lowest mean SSE results for 7 datasets: Wine, Seeds, Breast, Heart, CMC, Vowel, and Aggregation. In particular, GBEHO achieves the lowest best and worst values on these datasets. However, due to the inability to accurately identify the manifold structure, GBEHO performed poorly in the Two-moon dataset. The standard deviations of GBEHO are smaller than those of the other algorithms, indicating that the algorithm is more stable in its operation. In general, GBEHO could obtain more satisfactory results than the other 9 algorithms. Consequently, these results provide strong proof for GBEHO to solve the clustering problem effectively.
Figure 4 shows the box plots obtained by 9 algorithms on the different datasets. It is observed that the box plots of GBEHO are the narrowest among all data sets. Obviously, GBEHO has a more stable clustering ability, and the population diversity is ameliorated by using the strategy of mixing EHO and GBO. In addition, GBEHO produced the fewest outlier points, which indicates that GBEHO has a strong robustness. These facts indicate that the proposed algorithm can effectively circumvent local minima.
4.3.4 Convergence analysis
Iteration is the act of repeating a set of procedures to achieve the best solution. When all procedures of an algorithm are repeated once, this is called one iteration. The results of each iteration provide the initial value for the next iteration [84]. The convergence curve can reflect the convergence rate and the global search ability during the iteration of the algorithm.
The comparison of convergence curves on different datasets is shown in Fig. 5. All curves are generated synthetically after 30 independent runs of the different algorithms. GBEHO reaches stability at the 20th generation on the Iris, Wine, Seeds, Breast, Heart, and Aggregation datasets. Despite the fact that GBEHO converges more slowly on the Vowel dataset, the quality of the solutions found is higher. The results verify that GBEHO has relatively faster convergence and a superior global search capability. Compared with GBEHO, the performances of metaheuristics for PSO, DE, GA, CS, GSA, and BA are slightly less.
4.3.5 Statistical analysis
In the proceeding experiments, there are inevitable chance factors that affect the experimental results. To test the variability between different algorithms, further statistical analysis of the obtained results is needed to obtain more reliable data. Nonparametric tests can be used in the field of mathematics to check the performance of the algorithms [85]. The Wilcoxon signed-rank test [86] and Friedman test [87] are two well-known techniques. Both can be used on data distributions, statistically examining whether a difference exists between two groups. The experiments in this paper are performed at the 5% significance level.
Table 7 reports the results for the comparison of GBEHO with PSO, GBEHO with DE, GBEHO with GA, GBEHO with CS, GBEHO with GSA, GBEHO with BA, GBEHO with QALO-K, GBEHO with GWOTS, and GBEHO with k-means on the nine groups. If the p-value is less than 0.05, then the result is significantly different. The bold values in the table indicate values greater than 0.05. As observed from the table, except for the values obtained for GBEHO vs. CS on Iris and GBEHO vs. PSO on the Heart dataset, which are greater than 0.05, all other values are less than 0.05, which provides valid evidence against the null hypothesis. The results suggest that the excellent performance of GBEHO is statistically significant, and not achieved by chance.
The results of the Friedman test are shown in Table 8. The obtained values are the average ranking of all algorithms when conducting the experiments. According to the results, the algorithm with the lower ranking is considered to be the most efficient algorithm. Obviously, a better average ranking of GBEHO proves that the proposed algorithm has a more competitive advantage. At the same time, it makes the series of experiments more convincing.
4.3.6 Analysis of the clustering process
In this subsection, three datasets, Iris, Seeds and Aggregation, are selected for visualization and presentation. The original distributions are shown in Fig. 6. Figures 7, 8 and 9 display the clustering visualization results. We know that GBEHO and PSO are the two best algorithms on the Iris dataset. It can be observed in Fig. 7 that both algorithms accurately divide the dataset into three distinct clusters, and both achieve relatively better solutions. In comparison, the centroids found by GBEHO are significantly closer to the real scenario than PSO. This suggests that GBEHO has a better performance. In terms of the iterations, the centroids found by GBEHO are relatively stable at the 20th generation. This indicates that GBEHO has a faster convergence rate and stability. Figure 8 compares the clustering results on the Seeds dataset, where GBEHO and GA are the two superior algorithms. Apparently, GBEHO achieves better positions of centroids in the 20th generation and in the final results. In the 20th generation, GBEHO is able to extract centroids of the bottom leftmost cluster, while GA is unable to. It is clear that GBEHO is able to distinguish blue and green clusters more accurately than GA. The performance on the Aggregation dataset is shown in Fig. 9. For the two clusters on the top left and top, GBEHO obtains more precise clustering centroids. Both GWOTS and GBEHO find the exact centroids on the upper and lower right clusters. However, GBEHO’s delineation in the bottommost cluster is more obvious. Although the black and magenta clusters in Fig. 6c are not accurately distinguished, this is due to the shortcomings of the traditional Euclidean distance. In terms of the overall convergence rate and clustering accuracy, GBEHO is relatively superior.
4.4 Comparison experiments with state-of-art techniques
In this subsection, extra experiments are conducted to further validate the performance of the proposed algorithm.5 UCI datasets, namely, Wine, Breast, CMC, Heart, and Vowel, are chosen to evaluate the significance of GBEHO with PSR = 0.2 versus the reported results of four other recently proposed algorithms, such as CSOS, Hybrid FCM-PSO, ACLSHMS, and KIGSA-C. Table 9 shows the values of the experimental parameters for the different algorithms. The maximum number of iterations Maxit is set to 500, and the size of population N is set to 30. To eliminate the influence of uncontrollable factors to the greatest extent possible, all algorithms were run 30 times, and the average value was adopted as the final result for comparison.
When completing the clustering of the dataset, attention needs to be given to the degree of adaptation of the clustersto the input data. Therefore, it is necessary to validate via certain evaluation criteria, which is a fundamental aspect of data clustering. The metrics for evaluating the clustering results are broadly classified into three categories, namely, external metrics, internal metrics, and relative validation [25]. Four evaluation metrics are invoked in the experiments to quantitatively compare the clustering performance, namely, accuracy rate (AR), specificity (SP), detection rate (DR), and F-measure (F1), which are defined in 42-45.
where TP is true positive, TN is true negative, FP is false positive, FN is false negative in classification, \(precision = \frac {{TP}}{{TP + FP}}\), \(recall = \frac {{TP}}{{TP + FN}}\) and b = 1.
The obtained results are shown in Table 10. Figure 10 shows the comparison of the evaluation metrics of the five algorithms on different datasets. On the Wine dataset, all five algorithms achieve satisfactory results. The reason lies in the simpler structure of the Wine dataset. Therefore, the different algorithms are able to achieve more accurate identification. In contrast, on the breast and Vowel datasets, several algorithms do not perform well due to the more complex structure of the clusters. Specifically, for AR, GBEHO achieves the best performance on Wine and CMC, and ranks 2nd, 3rd and 4th on breast, Vowel and heart, respectively. As for SP, GBEHO ranks 1st on Wine and CMC, and ranks 3rd, 3rd and 4th on breast, Vowel and heart datasets, respectively. For DR, GBEHO ranks first on Wine and CMC datasets, and 2nd, 2nd and 3rd on breast, Vowel and heart datasets, respectively. For F1, GBEHO ranks 2nd on Wine, breast and CMC datasets, and 4th on heart and Vowel. Overall, GBEHO performs the best on Wine, CMC. The performance on breast is located at 2nd, which is not as good as KIGSA-C. While for the heart dataset, CSOS performs the best, Hybrid FCM-PSO is second, and GBEHO is able to achieve a tie with KIGSA-C. On the Vowel dataset, GBEHO performs second only to KIGSA-C and ranks second. From the above analysis, it can be concluded that GBEHO’s performance is competitive and convincing in the comparison of the state-of-art techniques. In general, it is proved that GBEHO provides a better choice of clustering. Therefore, GBEHO can be regarded as a powerful and effective clustering algorithm.
5 Discussions
Overall, the experimental results are consistent with the hypothesis. The introduction of the two operators and GBO improves the performance of the original EHO. Experiments on benchmark functions and datasets with different types prove that the improvement is significant. The proposed GBEHO is proven to have a higher clustering accuracy by evaluating four metrics, namely, accuracy rate, specificity, detection rate, and F-measure. Therefore, it can be concluded that GBEHO is an effective clustering method that can be used for a cluster analysis of different datasets.
Compared with other algorithms based on metaheuristics that are used for clustering, GBEHO shows a more competitive and superior performance and provides more desirable clustering results. GBEHO inherits all the advantages of traditional EHO, such as a superior global exploration capability. Meanwhile, the clan operator and separating operator in the original EHO are improved by the random wandering operator and mutation operator so GBEHO is better equipped with a stronger local exploitation compared with PSO, DE, GA, etc. Compared with BA and CS, it has a better exploration, and thus better avoids falling into the local optimum trap. In that case, the convergence rate is optimized. Moreover, GBEHO provides more accurate clustering results than the state-of-art algorithms. However, we observe that GBEHO is subject to several problems as follows. First, the time complexity of GBEHO is too high compared to other classical algorithms, which is caused by the newly added mechanism. The enhancement in clustering accuracy leads to an increase in the complexity of GBEHO. Second, with the increase in dimensionality, some of the metaheuristic algorithms suffer from a weakened stability. A scalability test with expandable dimensions is not performed, so the adaptability of GBEHO to multiple dimensions needs to be further examined. However, based on the No Free Lunch (NFL) theorem [51], there is no perfect optimization method, so we do not intend to claim that GBEHO is the best method in the world. The famous k-means algorithm has gained widespread use and attention since its inception, but it does not mean that k-means is without flaws. On the contrary, k-means is still limited to dependence on the initial solution and the tendency to fall into a local optimum. For our proposed method, we are more concerned with the accuracy of clustering rather than the time. As the research work goes further and becomes more detailed, the authors believe that there will be more techniques for improving operational efficiency in the future, such as parallel computing, which will provide better technical support for GBEHO.
6 Conclusions and future work
Traditional clustering methods easily fall into local optima, and the initialization of the center of mass position is a prominent problem. In this paper, an improved version of EHO is proposed for clustering analysis. Chaotic mapping based on Gaussian sequences improves the ergodicity and diversity of the initialized populations. Two operators, random wandering and mutation, are presented to optimize the strategy of updating positions in EHO, thus promoting the population diversity and the ability to jump out of the local optimum. Among them, the former improves the diversity of the population as well as the global exploration ability, and the latter promotes local exploitation at a later stage. In addition, GBO operators contribute to further balancing exploration and exploitation for the sake of determining the best center of mass more accurately. More suitable variable parameters are determined through ablation experiments.
Experiments on artificial and real-world datasets indicated that GBEHO has a better clustering performance than the other metaheuristic algorithms and their variants. The obtained intracluster variance was compared with classical k-means, PSO, DE, and GA algorithms to show superiority. By analyzing box plots and convergence curves, it was shown that GBEHO has a greater stability and faster convergence. The numerical data were confirmed by statistical analysis. Nonparametric tests were performed to verify significant differences between GBEHO and other algorithms. The visualization graphs of the clustering process demonstrated that GBEHO can find more accurate centroids at a faster iteration rate. Compared with the other state-of-art algorithms, GBEHO achieves more realistic results on accuracy rate, specificity, detection rate, and F-measure on five UCI datasets. Taken together, these results confirmed that GBEHO is an effective tool for data clustering.
In future research, we plan to reduce the time complexity of GBEHO through further design and experimentation. GBEHO can also be extended to several application areas, such as intrusion detection, image segmentation, and route planning. In addition, the performance of the hybrid algorithm will continue to be optimized to address sophisticated problems faced in practical engineering. The authors believe that this is an algorithm with great potential, and its application effect is worthy of expectation.
References
Gambella C, Ghaddar B, Naoum-Sawaya J (2020) Optimization problems for machine learning: A survey. Eur J Oper Res 290(3):807–828
Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl-Based Syst 163:546–557
Zhang C, Hao L, Fan L (2019) Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data. Clust Comput 22(2):3001–3010
Mousavirad SJ, Ebrahimpour-Komleh H, Schaefer G (2019) Effective image clustering based on human mental search. Appl Soft Comput 78:209–220
Maheshwari P, Sharma AK, Verma K (2021) Energy efficient cluster based routing protocol for wsn using butterfly optimization algorithm and ant colony optimization. Ad Hoc Netw 110:102317
Zhang J, Yu X, Xun Y, Zhang S, Qin X (2017) Scalable mining of contextual outliers using relevant subspace. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50(3):988–1002
Maione C, Barbosa RM (2019) Recent applications of multivariate data analysis methods in the authentication of rice and the most analyzed parameters: A review. Critical reviews in food science and nutrition 59 (12):1868–1879
Li H-J, Bu Z, Wang Z, Cao J (2019) Dynamical clustering in electronic commerce systems via optimization and leadership expansion. IEEE Transactions on Industrial Informatics 16(8):5327–5334
Huang D, Wang C-D, Wu J-S, Lai J-H, Kwoh C-K (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226
Saeed MM, Al Aghbari Z, Alsharidah M (2020) Big data clustering techniques based on spark: a literature review. PeerJ Computer Science 6:e321
Naouali S, Ben Salem S, Chtourou Z (2020) Clustering categorical data: A survey. International Journal of Information Technology & Decision Making 19(01):49–96
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern recognition letters 31 (8):651–666
Liu Y, Liu J, Jin Y, Li F, Zheng T (2020) An affinity propagation clustering based particle swarm optimizer for dynamic optimization. Knowl-Based Syst 195:105711
Bu Z, Li H-J, Zhang C, Cao J, Li A, Shi Y (2019) Graph k-means based on leader identification, dynamic game, and opinion dynamics. IEEE Trans Knowl Data Eng 32(7):1348–1361
Capó M, Pérez A, Lozano JA (2020) An efficient k-means clustering algorithm for tall data. Data mining and knowledge discovery 34(3):776–811
Tian K, Li J, Zeng J, Evans A, Zhang L (2019) Segmentation of tomato leaf images based on adaptive clustering number of k-means algorithm. Comput Electron Agric 165:104962
Bortoloti FD, de Oliveira E, Ciarelli PM (2021) Supervised kernel density estimation k-means. Expert Syst Appl 168:114350
Manochandar S, Punniyamoorthy M, Jeyachitra RK (2020) Development of new seed with modified validity measures for k-means clustering. Computers & Industrial Engineering 141:106290
Ismkhan H (2018) Ik-means-+: An iterative clustering algorithm based on an enhanced version of the k-means. Pattern Recogn 79:402–413
Huang S, Kang Z, Xu Z, Liu Q (2021) Robust deep k-means: An effective and simple method for data clustering. Pattern Recogn 117:107996
Chowdhury K, Chaudhuri D, Pal AK (2021) An entropy-based initialization method of k-means clustering on the optimal number of clusters. Neural Comput & Applic 33(12):6965–6982
Zhao W-L, Deng C-H, Ngo C-W (2018) k-means: A revisit. Neurocomputing 291:195–206
Mahmoudi MR, Akbarzadeh H, Parvin H, Nejatian S, Rezaie V, Alinejad-Rokny H (2021) Consensus function based on cluster-wise two level clustering. Artif Intell Rev 54(1):639–665
Dutta D, Sil J, Dutta P (2019) Automatic clustering by multi-objective genetic algorithm with numeric and categorical features. Expert Syst Appl 137:357–379
Ezugwu AE, Shukla AK, Agbaje MB, Oyelade ON, Jose-Garcia A, Agushaka JO (2020) Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput & Applic, pp 1–60
Zhu S, Xu L, Goodman ED (2020) Evolutionary multi-objective automatic clustering enhanced with quality metrics and ensemble strategy. Knowl-Based Syst 188:105018
Gupta S, Deep K, Mirjalili S (2020) An efficient equilibrium optimizer with mutation strategy for numerical optimization. Appl Soft Comput 96:106542
Wang G-G, Deb S, Cui Z (2019) Monarch butterfly optimization. Neural computing and applications 31(7):1995–2014
Li S, Chen H, Wang M, Heidari AA, Mirjalili S (2020) Slime mould algorithm: A new method for stochastic optimization. Futur Gener Comput Syst 111:300–323
Wang G-G (2018) Moth search algorithm: a bio-inspired metaheuristic algorithm for global optimization problems. Memetic Computing 10(2):151–164
Yang Y, Chen H, Heidari AA, Gandomi AH (2021) Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst Appl 177:114864
Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: Algorithm and applications. Future generation computer systems 97:849–872
Hussain K, Salleh MNM, Cheng S, Shi Y (2019) Metaheuristic research: a comprehensive survey. Artif Intell Rev 52(4):2191–2233
José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: A survey. Appl Soft Comput 41:192–213
Chen J, Qi X, Chen L, Chen F, Cheng G (2020) Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection. Knowl-Based Syst 203:106167
Kuo RJ, Zheng YR, Nguyen TPQ (2021) Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci 557:1–15
Nayak J, Naik B, Behera HS, Abraham A (2017) Hybrid chemical reaction based metaheuristic with fuzzy c-means algorithm for optimal cluster analysis. Expert Syst Appl 79:282–295
Aggarwal S, Singh P (2019) Cuckoo, bat and krill herd based k-means++ clustering algorithms. Clust Comput 22(6):14169–14180
Lakshmi K, Visalakshi NK, Shanthi S (2018) Data clustering using k-means based on crow search algorithm. Sādhanā 43(11):1–12
Yang C-L, Sutrisno H (2020) A clustering-based symbiotic organisms search algorithm for high-dimensional optimization problems. Appl Soft Comput 97:106722
Verma H, Verma D, Tiwari PK (2021) A population based hybrid fcm-pso algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl 167:114121
Mousavirad SJ, Ebrahimpour-Komleh H, Schaefer G (2020) Automatic clustering using a local search-based human mental search algorithm for image segmentation. Appl Soft Comput 96:106604
Mittal H, Pandey AC, Pal R, Tripathi A (2021) A new clustering method for the diagnosis of covid19 using medical images. Appl Intell 51(5):2988–3011
Kuo R-J, Zulvia FE (2020) Multi-objective cluster analysis using a gradient evolution algorithm. Soft Comput 24(15):11545–11559
Wang G-G, Deb S, Gao X-Z, Coelho LDS (2016) A new metaheuristic optimisation algorithm motivated by elephant herding behaviour. International Journal of Bio-Inspired Computation 8(6):394–409
Muthusamy H, Ravindran S, Yaacob S, Polat K (2021) An improved elephant herding optimization using sine–cosine mechanism and opposition based learning for global optimization problems. Expert Syst Appl 172:114607
Li W, Wang G-G, Alavi AH (2020) Learning-based elephant herding optimization algorithm for solving numerical optimization problems. Knowl-Based Syst 195:105675
Ismaeel Alaa AK, Elshaarawy IA, Houssein EH, Ismail FH, Hassanien AE (2019) Enhanced elephant herding optimization for global optimization. IEEE Access 7:34738–34752
Hakli H (2020) Bineho: a new binary variant based on elephant herding optimization algorithm. Neural Comput & Applic 32(22):16971–16991
Elhosseini MA, El Sehiemy RA, Rashwan YI, Gao XZ (2019) On the performance improvement of elephant herding optimization algorithm. Knowl-Based Syst 166:58–70
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE transactions on evolutionary computation 1(1):67–82
Dokeroglu T, Sevinc E, Kucukyilmaz T, Cosar A (2019) A survey on new generation metaheuristic algorithms. Computers & Industrial Engineering 137:106040
Ahmadianfar I, Bozorg-Haddad O, Chu X (2020) Gradient-based optimizer: A new metaheuristic optimization algorithm. Inf Sci 540:131–159
Purushothaman R, Rajagopalan SP, Dhandapani G (2020) Hybridizing gray wolf optimization (gwo) with grasshopper optimization algorithm (goa) for text feature selection and clustering. Appl Soft Comput 96:106651
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C-T (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Sokal RR (1966) Numerical taxonomy. Sci Am 215(6):106–117
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Transactions on neural networks 16(3):645–678
Pearson K, Lee A (1900) Mathematical contributions to the theory of evolution. viii. on the inheritance of characters not capable of exact quantitative measurement. part i. introductory. part ii. on the inheritance of coat-colour in horses. part iii. on the inheritance of eye-colour in man. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 195:79–150
Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Workshop on artificial intelligence for web search (AAAI 2000), vol 58, p 64
Dice LR (1945) Measures of the amount of ecologic association between species. Ecol 26(3):297–302
Ramos-Guajardo AB, Ferraro MB (2020) A fuzzy clustering approach for fuzzy data based on a generalized distance. Fuzzy Sets Syst 389:29–50
Taib H, Bahreininejad A (2021) Data clustering using hybrid water cycle algorithm and a local pattern search method. Adv Eng Softw 153:102961
Bhadoria A, Marwaha S, Kamboj VK (2021) A solution to statistical and multidisciplinary design optimization problems using hgwo-sa algorithm. Neural Comput & Applic 33(8):3799–3824
Khalilpourazari S, Doulabi HH, Çiftçioğlu AO, Weber G-W (2021) Gradient-based grey wolf optimizer with gaussian walk: Application in modelling and prediction of the covid-19 pandemic. Expert Syst Appl, pp 114920
Hassan MH, Houssein EH, Mahdy MA, Kamel S (2021) An improved manta ray foraging optimizer for cost-effective emission dispatch problems. Eng Appl Artif Intell 100:104155
Singh NJ, Singh S, Chopra V, Aftab MA, Hussain SM, Ustun TS (2021) Chaotic evolutionary programming for an engineering optimization problem. Appl Sci 11(6):2717
Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. Journal of Computational Science 5 (2):224–232
James JQ, Lam AYS, Li VOK (2012) Real-coded chemical reaction optimization with different perturbation functions. In: 2012 IEEE Congress on Evolutionary Computation, IEEE, pp 1–8
Li W, Wang G-G (2021) Elephant herding optimization using dynamic topology and biogeography-based optimization based on learning for numerical optimization. Engineering with Computers, pp 1–29
Holland JH (1975) Adaptation in natural and artificial systems. ann arbor 18(3):529–530
Agbaje MB, Ezugwu AE, Els R (2019) Automatic data clustering using hybrid firefly particle swarm optimization algorithm. IEEE Access 7:184963–184984
Li W, Wang G-G (2021) Improved elephant herding optimization using opposition-based learning and k-means clustering to solve numerical optimization problems. Journal of Ambient Intelligence and Humanized Computing, pp 1–32
Yousri D, Mirjalili S, Machado JAT, Thanikanti SB, Fathy A, et al. (2021) Efficient fractional-order modified harris hawks optimizer for proton exchange membrane fuel cell modeling. Eng Appl Artif Intell 100:104193
Jia H, Sun K, Zhang W, Leng X (2021) An enhanced chimp optimization algorithm for continuous optimization domains. Complex & Intelligent Systems, pp 1–18
Fan Y, Shao J, Sun G, Shao X (2020) A modified salp swarm algorithm based on the perturbation weight for global optimization problems. Complexity, 2020
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, vol 4, IEEE, pp 1942–1948
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization 11(4):341–359
Yang X-S, Deb S (2009) Cuckoo search via lévy flights. In: 2009 World congress on nature & biologically inspired computing (NaBIC), Ieee, pp 210–214
Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Information sciences 179(13):2232–2248
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, pp 65–74
Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539
Asuncion A, Newman D (2007) Uci machine learning repository. Irvine, CA, USA
Duan Y, Liu C, Li S (2021) Battlefield target grouping by a hybridization of an improved whale optimization algorithm and affinity propagation. IEEE Access 9:46448–46461
Tu J, Chen H, Liu J, Heidari AA, Zhang X, Wang M, Ruby R, Pham Q-V (2021) Evolutionary biogeography-based whale optimization methods with communication structure: towards measuring the balance. Knowl-Based Syst 212:106642
Ouaar F, Boudjemaa R (2021) Modified salp swarm algorithm for global optimisation. Neural Comput & Applic, pp 1–26
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information sciences 180(10):2044–2064
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation 1(1):3–18
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Duan, Y., Liu, C., Li, S. et al. Gradient-based elephant herding optimization for cluster analysis. Appl Intell 52, 11606–11637 (2022). https://doi.org/10.1007/s10489-021-03020-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03020-y