Introduction

During the fourth industrial revolution, people have entered the artificial intelligence (AI) age. Faced with complex research branches and massive data, it is essential to design effective tools to explore valuable information. As one of the elements in the development of AI, algorithms regularly promote the optimization and innovation of technique. Brute-force search [1] finds the best solution possible but is very low even in two or three dimensions. Hill-climbing [2] usually finds the local optimum instead of the global warm. These algorithms are hard to solve exactly classical NP-complete problems such as Hamiltonian cycle problem [3], traveling salesman problem [4], and coloring problem [5], etc. This is also the reason that metaheuristic algorithms (MAs) enter the blowout period. For a specific issue, heuristic algorithms provide estimated solutions in feasible time and space. To adapt to a wide range of the issues, MAs are proposed, which are independent of the existence of the problem. If necessary, some fine-tuning of internal parameters will adapt to the current issue [6].

When designing a MA, two components should be considered: exploration and exploitation. The convergence speed and solution accuracy of the algorithm mainly depend on the balance level between the two components. In the initial phase, a well-organized optimizer should highly explore the search space to find diverse solutions. After a fine transition, the algorithm often uses local information to generate better solutions, which are usually in the neighborhood of the current solutions [7, 8]. More than hundreds of MAs have so far developed. Based on different design inspirations, MAs can be divided into four categories. The physics-inspired MAs use physical laws, which mainly include gravitational search algorithm (GSA) [9], multi-verse optimizer (MVO) [10], and thermal exchange optimization (TEO) [11]. The evolution-inspired MAs use Darwinian natural selection. The most well-known algorithms are genetic algorithm (GA) [12] and differential evolution (DE) [13]. The swarm-inspired MAs mainly simulates the collective behavior of creatures in nature. The most classical algorithm belongs to particle swarm optimization (PSO) [14]. swarm intelligence (SI) optimization algorithms proposed in recent years also include ant lion optimizer (ALO) [15], whale optimization algorithm (WOA) [16], and marine predators algorithm (MPA) [17], etc. The last category is the human behavior-inspired MAs represented by teaching–learning-based optimization (TLBO) [18].

MAs have their importance in various fields of computational sciences. However, most MAs suffer from premature convergence, stuck in local optimum, poor convergence speed, etc. No free lunch (NFL) theorem [19] turns out that the world is dominated by real-world problems without a known provably efficient algorithm. Therefore, many researchers have been motivated by the NFL theorem to propose novel algorithms. Some of them have demonstrated very well performance for optimization problems. However, some algorithms rely only on attractive metaphors and are criticized for lack of novelty [20, 21]. In 2020, Chimp optimization algorithm was proposed by Khishe and Mosavi. An overview of ChOA can be found in Sect. “Chimp optimization algorithm”. One of the main motivations in this paper is to propose an enhanced ChOA version (named EChOA), which is created to improve the performance of the algorithm. It is not a novel SI algorithm but an enhancement of the existing one inspired by experimental studies on its behavior. The other main motivation is that the AI-enabled application is further developed. To the best of our knowledge, it is the first time that the enhanced ChOA is used to train multilayer perceptron. The main contributions of this paper are stated more clearly as follows:

  1. 1.

    HDPM helps further to explore the regions and boundaries of the initial space.

  2. 2.

    Spearman's rank correlation coefficient refines candidate solutions that need to be improved.

  3. 3.

    Beetle antennae operator helps to avoid the local optimum and provide better exploitation.

  4. 4.

    The performance of EChOA is evaluated on 27 benchmark functions, three engineering design problems, and training multilayer perceptron.

  5. 5.

    EChOA outperforms contemporary optimization algorithms.

Section "Related works" outlines some related works. Section "Preliminaries" introduces preliminaries, including ChOA and three strategies. In Sect. "Proposed algorithm", the proposed algorithm is described in detail. In Sect. "Function optimization experiments", the experimental results of benchmark functions are given. In Sect. "Engineering optimization experiments" and Sect. "Training multilayer perceptron experiments", the research aspects of solving engineering design problems and training multilayer perceptron are discussed and analyzed, respectively. Finally, Sect. "Conclusions and future research" summarizes the work.

Related works

In the introduction, some well-known algorithms with different categories are listed. In this section, the recent works done about ChOA and other MAs are briefed hereafter. Some of these algorithms in the literature are then used as the comparative algorithms in this paper.

Khishe and Mosavi used ChOA to train a neural network to classify the underwater acoustical dataset. The proposed method obtained better classification accuracy with respect to the ion motion algorithm, gray wolf optimization, and PSO-GSA [22]. It is one of the few works about the new application of ChOA. The MA-ALS algorithm combined GA and adaptive local search was proposed by Arab et al., which resulted in improved accuracy and convergence speed. The performance was evaluated on function optimization and optimal controller design [23]. Li et al. also developed a novel hybrid algorithm based on PSO and artificial bee colony (ABC) and tested it on 13 high-dimensional benchmark functions. The efficiency and robustness of PS-ABC were verified in detail [24]. As another variant of PSO, Berkan Aydilek also attempted to hybridize firefly optimization algorithm (FOA) into PSO in 2018 [25]. Through analysis, we believe that the main motivation of [2325] is to combine one algorithm for the global exploration phase and the other for locally refining exploitation by applying reasonable laws.

In 2017, Mirjalili et al. embedded 10 chaotic maps and an adaptive transformation mechanism into GSA to improve the performance. Experimental results demonstrated that the sinusoidal map is the most appropriate for this specific problem [26]. Meanwhile, Khishe et al. also used four chaotic maps to improve the performance of the stochastic fractal search (SFS) algorithm. The proposed algorithm trained multi-layer perceptron and applied it to the sonar dataset. The capability to address high dimensional problems was one of its advantages [27]. To accomplish the same task, an improved biogeography-based optimization (BBO) algorithm based on mutation operators was proposed in [28] and implemented on hardware. Jia et al. proposed a multi-strategy emperor penguin optimizer (MSEPO) based on HDPM, Levy flight, and thermal exchange operator. The effectiveness of the suggested three strategies was analyzed and verified in the experiments. It was an effective multilevel thresholding segmentation algorithm for color satellite images [29].

Khishe and Mosavi redesigned seven spiral shapes in the WOA algorithm. Different spiral shapes can directly affect the convergence behavior of the algorithm. This metaheuristic trainer based on the proposed algorithm was applied to data classification [30]. MAs applied in this field also include salp swarm algorithm (SSA) [31]. Mousavi et al. recently improved WOA by splitting subpopulations and embedding fractional chaotic maps and applied this enhanced algorithm (EWOA) to identify parameters of wind–diesel power systems [32]. Shokri-Ghaleh et al. proposed an improved version of the cuckoo optimization algorithm (ULCOA) for non-linear field calibration problems in 2020 [33]. In 2020, Ma et al. improved the step size of the beetle antennae search algorithm to optimize the Huber Loss function [34]. The artificial bee colony (ABC) algorithm fused three kinds of knowledge to improve search capability. This idea can be regarded as an optimization framework for other MAs. [35]. In the improved grey wolf optimizer (I-GWO) algorithm, the dimension learning-based hunting (DLH) search strategy was used to enhance the information sharing among wolfs [36].

Based on the above in-depth literature review, the recent trend for improvement is to combine the two algorithms or apply some strategies to the entire solution. However, this paper attempts to apply the search gain to only certain components of the solution besides introducing the mutation operator. Generally, it can enhance the result more finely.

Preliminaries

Chimp optimization algorithm

ChOA is a novel MA based on swarm intelligence proposed by Khishe and Mosavi in 2020 [37]. Its intuitive background originates from the hunting behavior of chimps. Chimps perform different actions according to the division of labor to find the prey. The standard ChOA algorithm divides the chimp group into four types: attacker, barrier, chaser, and driver. Among them, the attacker is the leader of the population. The other three types of chimps assisted in hunting, and their status decreased in turn. The mathematical model is briefly described as following. Equations (1) and (2) is used to update the position of the chimp. Figure 1 visualizes this effect:

$$ \begin{gathered} X_{1} \left( {t + 1} \right) = X_{{\rm {Attacker}}} (t) - a_{1} \cdot d_{{\rm {Attacker}}} \hfill \\ X_{2} \left( {t + 1} \right) = X_{{\rm {Barrier}}} (t) - a_{2} \cdot d_{{\rm {Barrier}}} \hfill \\ X_{3} \left( {t + 1} \right) = X_{{\rm {Chaser}}} (t) - a_{3} \cdot d_{{\rm {Chaser}}} \hfill \\ X_{4} \left( {t + 1} \right) = X_{{\rm {Driver}}} (t) - a_{4} \cdot d_{{\rm {Driver}}} \hfill \\ \end{gathered} $$
(1)
$$ X_{{\rm {chimp}}} \left( {t + 1} \right) = \frac{{X_{1} + X_{2} + X_{3} + X_{4} }}{4} $$
(2)

where t represents the number of the current iteration, the position of the chimp is updated according to the four types of position stored (\(X_{{\rm {Attacker}}} , \, X_{{\rm {Barrier}}} , \, X_{{\rm {Chaser}}} , \, {\rm and} \, X_{{\rm {Driver}}}\)). The dynamic coefficient a and vector d are expressed in Eq. (3):

$$ \begin{gathered} a_{1} = 2 \cdot f_{1} \cdot r_{1} - f_{1} , \quad d_{{\rm {Attacker}}} = \left| {c \cdot X_{{\rm {Attacker}}} (t) - m \cdot X(t)} \right| \hfill \\ a_{2} = 2 \cdot f_{2} \cdot r_{1} - f_{2} , \quad d_{{\rm {Barrier}}} = \left| {c \cdot X_{{\rm {Barrier}}} (t) - m \cdot X(t)} \right| \hfill \\ a_{3} = 2 \cdot f_{3} \cdot r_{1} - f_{3} , \quad d_{{\rm {Chaser}}} = \left| {c \cdot X_{{\rm {Chaser}}} (t) - m \cdot X(t)} \right| \hfill \\ a_{4} = 2 \cdot f_{4} \cdot r_{1} - f_{4} , \quad d_{{\rm {Driver}}} = \left| {c \cdot X_{{\rm {Driver}}} (t) - m \cdot X(t)} \right| \hfill \\ \end{gathered} $$
(3)

where the coefficient f decreases nonlinearly from 2.5 to 0 with the lapse of iteration. \(c = 2r_{2}\). r1 and r2 are random numbers in [0, 1]. m is a chaotic map vector. Assuming the probability μ is a random number in [0, 1], the chaotic model is used for position updating when \(\mu \ge 0.5\), as shown in Eq. (4). Otherwise, Eq. (2) is still executed. Algorithm 1 shows the pseudo-code of the ChOA algorithm:

$$ X_{{\rm {chimp}}} \left( {t + 1} \right) = \rm {Chaotic}\_\rm {value} $$
(4)
Fig. 1
figure 1

Process of position update [37]

figure a

Highly disruptive polynomial mutation

Common mutation operators include random mutation, non-uniform mutation, and polynomial mutation. For the traditional polynomial mutation (PM), the mutation has no effect when the variable is on the boundary. HDPM improves this disadvantage [38, 39]. The form of the operator is in Eq. (5):

$$ X_{{\rm{new}}} = X + \delta_{k} \cdot ({ub} - {lb}) $$
(5)

where ub and lb represent the upper and lower boundaries of the search space, respectively. X is the parent, Xnew is the offspring. The coefficient δk is calculated by Eqs. (6)–(8):

$$ \delta_{1} = \frac{{X - {lb}}}{{{ub} - {lb}}} $$
(6)
$$ \delta_{\rm {2}} = \frac{{{ub} - X}}{{{ub} - {lb}}} $$
(7)
$$\begin{array}{ll} \delta_{k} = \left\{ \begin{gathered} {[2}r + {(1} - {2}r{)} \cdot {(1} - \delta_{\rm {1}} \rm {)}^{{\eta_{m} + \rm {1}}} \rm {]}^{{\frac{\rm {1}}{{\eta_{m} + \rm {1}}}}} - 1, \qquad\quad {\rm if}\, r \le \rm {0}\rm {.5} \hfill \\ \rm {1} - {[2(1} - r{) + 2(}r - {0}\rm {.5)} \cdot \rm {(1} - \delta_{\rm {2}} \rm {)}^{{\eta_{m} + \rm {1}}} \rm {]}^{{\frac{\rm {1}}{{\eta_{m} \rm { + 1}}}}} , \, \rm {otherwise} \hfill \\ \end{gathered} \right.\end{array} $$
(8)

where r is a random number in [0, 1], ηm is the mutation index. It can be observed from Eq. (8) that it can still make full use of the whole search space even if the variable is on one of the boundaries. This advantage maintains the diversity of candidate solutions.

Spearman’s rank correlation coefficient

Spearman's rank correlation coefficient is a non-parametric index used to measure the statistical correlation between the two series [40]. The two series ui and vi are sorted. The ranks \(u_{i}^{^{\prime}}\) and \(v_{i}^{^{\prime}}\) represent the positions of sorted ui and vi, respectively. The difference between them is \(d_{i} = u_{i}^{^{\prime}} - v_{i}^{^{\prime}} ,i = 1,2,...n.\) Spearman;s rank correlation coefficient ρ is calculated as follows. Its value is in the range of [− 1, 1]. 0 means that the two series are no correlation. (0, 1] means positive correlation, [− 1, 0) means negative correlation, and the higher value means a stronger correlation:

$$ \rho \;{ = }\;1 - \frac{{6\sum {d_{i}^{2} } }}{{n \cdot (n^{2} - 1)}} $$
(9)

where n represents the dimension of the series.

Beetle antennae operator

When a beetle (see Fig. 2) is preying, it receives the food smell of near-area using two antennas and finds the area with the strongest smell. If the antennae on one side receives a stronger concentration of smell, the beetle will turn to the same side; otherwise, it will turn to the other side. Considering the sense of smell, the beetle antennae search algorithm is formally proposed by Jiang and Li [41]. First, the random direction vector \(\mathop b\limits^{ \to }\) is normalized to the following form:

$$ \mathop b\limits^{ \to } = {\frac {{\rm rnd}(n,1)}{{\left\| {\rm rnd}(n,1) \right\|}}} $$
(10)

where rnd(.) represents a random function, and n represents the dimension of search space. The two search behaviors in Eq. (11) simulates the beetle exploring the left and right areas using two antennas:

$$ \begin{gathered} X_{r} (t) = X(t) + d(t) \cdot \mathop b\limits^{ \to } \hfill \\ X_{l} (t) = X(t) - d(t) \cdot \mathop b\limits^{ \to } \hfill \\ \end{gathered} $$
(11)

where \(X\) represents the original position of the beetle, \(X_{r}\) represents the position obtained by the beetle exploring the right areas, \(X_{l}\) represents the position obtained by the beetle searching the left areas, \(d\) represents the distance between two antennas. The update rule is as follows:

$$ d(t) = \frac{\delta (t)}{C} $$
(12)

where \(\delta\) represents the step size, \(C\) represents the attenuation rate, \(C = 2\) [42]. \(\delta\) is calculated in Eq. (13):

$$ \delta (t) = K \cdot \delta (t - 1)$$
(13)

where \(K\) is the attenuation rate of \(\delta\), \(K{ = }\rm {0}\rm {.95}\) [41]. Based on the above search behaviors, the new position update model of the beetle is shown in Eq. (14):

$$ X\left( {t + 1} \right) = X\left( t \right) + \delta (t) \cdot \mathop b\limits^{ \to } \cdot {\rm {sign}}\,(f(X_{r} \left( t \right)) - f(X_{l} \left( t \right))) $$
(14)

where \(\rm {sign(}\rm {.)}\) represents a sign function.

Fig. 2
figure 2

Beetle with long antennae [43]

Proposed algorithm

Although the native ChOA algorithm divides chimps into different social levels for cooperative hunting, it lacks population diversity in the initialization phase. Besides, it also falls in the local optimum easily in the last detailed search phase. In this section, three strategies, including HDPM, Spearman’s rank correlation coefficient, and beetle antennae operator are introduced to enhance the performance of the native ChOA algorithm. The enhanced version is named EChOA. The pseudo-code is given in Algorithm 2, and the flowchart is shown in Fig. 3. The details of the proposed algorithm are explained as follows.

Fig. 3
figure 3

Flowchart of EChOA

First, the HDPM strategy is introduced to enhance the population diversity in the initialization phase. A well-organized optimizer should achieve an appropriate balance between exploration and exploitation. Further, high exploration steps are carried out at the beginning of the search, and more exploitation steps are often required in the last phase. By introducing the HDPM strategy, the exploration capability of the algorithm is enhanced. Second, the Spearman's rank correlation coefficient of the driver chimps with respect to the attacker chimp is calculated. In the native ChOA algorithm, the attacker chimp as the leader plays an important role in guiding the population. However, it can be observed from Eq. (2) that the final position is updated not only by the attacker chimp but also by the driver chimps with less social level and fit. Therefore, by calculating the Spearman’s rank correlation coefficient, it can determine whether the driver chimp is near or far away from the attacker chimp. For the driver chimp far away from the attacker chimp, we change the position updating method to improve the fit. Finally, two series are negatively correlated or uncorrelated when the Spearman's rank correlation coefficient is less than or equal to 0. Meanwhile, the less fit chimp (the driver chimp) is improved. The beetle antennae operator is introduced. As described in Sect. “Beetle antennae operator”, the less fit chimp gains visual capability. The position updating method of chimp is improved using Eq. (14). It utilizes its left and right eyes to observe the environment on both sides and further determines the direction of the next step. The purpose of this improvement is to prevent the driver chimp from falling in the local optimum due to poor performance. It is considered to boost the exploitation trend.

figure b

Function optimization experiments

In this section, the performance of the proposed EChOA algorithm is evaluated based on the function optimization. The native ChOA algorithm and five state-of-the-art algorithms from 2017 to 2021, including EO [44], LFD [45], HFPSO [25], CGSA10 [26], and I-GWO [36], are used as competitors to verify the improvements of the proposed algorithm. Some of these comparison algorithms are the native ones recently proposed, and some are improved ones with good performance. Table 1 shows the parameter settings of different algorithms. The values used are the same as those in the corresponding references. Besides, the maximum of iterations is 500, and the population size is 30. All the experimental series were carried out on MATLAB R2016a, and the computer was configured as Intel(R) Core (TM) i5-1035G1 CPU @1.00 GHz 1.19 GHz, using Microsoft Windows 10 system. Each experiment is run 30 times independently.

Table 1 Parameter settings of different algorithms

Impact of introduced strategies

Due to the mutual influence among the strategies, in this subsection, 12 classical benchmark functions [46] are used to verify the effectiveness of the introduced strategies. The formula of functions, the class, and the range of variables are given in Table 2. The number of variables and the global optimum is all 30 and 0, respectively. The settings of the samples of different strategy combinations are shown in Table 3. Here, 1 and 0 are used to represent whether the strategy is introduced, respectively. The average fitness results obtained are shown in Table 4. Note that the best values has been highlighted in boldface in all following tables. Based on these results, the three enhanced versions all outperform the native algorithm. EChOA outperforms ChOA1 for 12 classical benchmark functions, and ChOA2 for 11 functions. On the one hand, ChOA1 and ChOA2, respectively, reach a higher level in terms of search breadth and search accuracy compared with ChOA. On the other hand, the performance of EChOA is comprehensively improved with the effective combination of the introduced strategies. After validation, EChOA is selected as the final optimized version of the native algorithm.

Table 2 Details of 12 classical benchmark functions
Table 3 Settings of different strategy combinations
Table 4 Average fitness results of different combinations on 12 classical benchmark functions

Qualitative analysis

Figure 4 shows the qualitative results of testing on F05, F07, and F13. Figure 4b shows the two-dimensional scatter plot of the search history of EChOA. Through observation, it can be seen that chimps are scattered in the entire search space in the early phase, and concentrated rapidly in the later phase. When addressing different functions, the density of distribution is significantly different. This demonstrates that EChOA can change the balance between exploration and exploitation. Figure 4c shows the trajectory of the first chimp. There are some sudden changes in the early optimization phase. It reveals that EChOA does not fall into the early local optimum and has better exploration capability. The amplitude of these fluctuations gradually decreases with the lapse of iteration. Finally, the chimp tends to be stable and converges to the optimal position. Figure 4d shows the convergence curve of ChOA and EChOA. For functions such as F05, F07, and F13, EChOA have a better fitness curve that lasts 500 iterations than ChOA. EChOA can find a smaller solution, which demonstrates that the excellent convergence capability is one of the features for EChOA.

Fig. 4
figure 4

a Parameter space. b Search history of EChOA. c Trajectory of EChOA in the first chimp. d Convergence curve of ChOA and EChOA

Analysis of scalability

Scalability is an important indicator of a new-designed algorithm. By testing the different dimensions of the benchmark function, we can effectively judge the influence of dimension expansion on the execution efficiency of the algorithm. According to experience, 12 classical benchmark functions are tested on three dimensions of 30, 50, and 100. The experimental results are made in Table 5. It can be seen from the data comparison that as the number of dimension increases, the optimization performance of two algorithms gradually decreases. The main reason is that the dimension of data reflects the number of factors to be optimized. The larger the dimension, the more complex the search space. On the other hand, Table 5 also show that the optimization results of EChOA are always better than that of ChOA, and the gap of optimization effect becomes more and more significant as the number of dimension increases. When addressing high-dimensional problems, EChOA can maintain well exploitation while exploring certain optimality.

Table 5 Average fitness results of ChOA and EChOA in different dimensions on 12 classical benchmark functions

Comparison with other algorithms on classical benchmark functions

Table 6 gives the average (Avg) and the standard deviation (Std) of fitness obtained by each algorithm on 12 classical benchmark functions. It can be seen from the table above that EChOA gives the best results in general. For example, in the case of F01, the average fitness values are 1.6170E − 23, 3.0993E − 07, 3.2812E − 05, 9.4112E + 04, 1.7304E − 28, 7.7223E − 06, and 1.2202E − 36 for EO, LFD, HFPSO, CGSA10, I-GWO, ChOA, and EChOA, respectively. To verify the stability of the proposed algorithm, the Std indicator is also used. A lower value of Std indicates better stability. From Table 6, it can be found that EChOA gives lower values as compared to other algorithms, which shows the better consistency and stability of the proposed algorithm. In essence, the results of unimodal benchmark functions reflect the high exploitation of the proposed algorithm. Considering the characteristics of multimodal benchmark functions, it can be said that EChOA has a robust capability to avoid the local optimum.

Table 6 Fitness results of different algorithms on 12 classical benchmark functions

Comparison with other algorithms on CEC2017 benchmark functions

In this subsection, 15 benchmark functions from CEC2017 [47] are used to further evaluate the performance of the proposed EChOA algorithm. The name, the class, and the global optimum of functions are given in Table 7. The number and the range of variables are all 10 and [− 100, 100], respectively. Compared with the classical benchmark functions, the selected benchmark functions with constraints and high computational cost are challenging landscapes.

Table 7 Details of 15 CEC2017 benchmark functions

Table 8 gives the average (Avg) and the standard deviation (Std) of fitness obtained by each algorithm on some CEC2017. Because these functions provide complex shapes and many local optimums, they approximate the search space of real-world problems. In terms of Avg, EChOA obtains the best results on 73.33% of functions. For “F09”, I-GWO algorithm ranks first, and the proposed algorithm obtains the second result. In terms of Std, EChOA gives lower values on 66.67% of functions. These results may prove that the proposed algorithm can transition from high exploration to more exploitation. In other words, a smooth relationship is achieved between them. The convergence curve shows the trend of fitness value. Figure 5 gives convergence curves of F05, F07, F09, F25, F26, and F29. Comparing each convergence curve, EChOA can effectively find a better minimum. Effective updates are guaranteed both in the early and late iterations.

Table 8 Fitness results of different algorithms on 15 CEC2017 benchmark functions
Fig. 5
figure 5

Convergence curves of different algorithms on F05, F07, F09, F25, F26, and F29

The computational complexity can qualitatively describe the runtime of an algorithm. The computational complexity of EChOA is given based on four factors such as the number of chimp N, the number of dimension D, the maximum number of iterations T, the cost of function F. In the initialization process, the computational complexity is O(N). The computational complexity of function evaluation is O(T × N × F). The computational complexity is expressed as O(T × N) for the attacker selection. The computational complexity of updating the less fit chimps’ position based on beetle antennae operator is O(T × (N − 3) × D × 3), where D × 3 represents the left, right and new position of them. The computational complexity of updating the attacker’s, the barrier’s, and the chaser’s positions is O(T × 3 × D). The overall computational complexity of EChOA is O(N × (1 + T × (F + 1 + 3D) − 6TD)). From the quantitative analysis results, Table 9 gives the average runtime obtained by each algorithm on some CEC2017. To conclude more intuitively, the ranking of the runtime of each algorithm in most cases is as follows: LFD > EChOA > ChOA > CGSA10 > I-GWO > EO > HFPSO. From the results, it can be argued that EChOA consumes more time than ChOA and ranks second to last. An explanation is that introduced strategies are added to the native algorithm. The high time consumption of ChOA itself is also the main reason. To improve the accuracy of solutions, we sacrifice some runtime. It can also be concluded that the limitation of EChOA is still the computational complexity that needs to be reduced.

Table 9 Runtime results of different algorithms on 15 CEC2017 benchmark functions

A full statistical analysis of results based on different performance measures is included and commented on below. Wilcoxon rank-sum test performs a rank sum test of the hypothesis that two independent samples and returns the p-value from the test. The null hypothesis is defined as no significant difference between the two samples. In this paper, the level of significance is stipulated as 0.05. p > 0.05 casts doubt on the validity of the null hypothesis. The other non-parametric statistical test, the Friedman test, is used to detect significant differences between the behaviors of two or more algorithms. It returns an additional structure, including ranking, statistic, and p-value [48]. Table 10 gives the exact p-values based on the Wilcoxon rank-sum test. From Table 10, it can be observed p-values are less than 0.05 for three types of functions. EChOA has significant improvements over other algorithms. Besides p-values, Table 11 also gives rankings and Chi-squares (Chi-sq) based on the Friedman test. EChOA gets the 1st rank compared to other algorithms. When the degree of freedom is 6, and the level of significance is 0.05, the critical value of the test statistic is 12.592. The calculated Chi-squared is greater than 12.592. It also proves that there are significant differences between the proposed algorithm and the other comparison algorithms.

Table 10 Experimental results of Wilcoxon rank-sum test
Table 11 Experimental results of Friedman test

Analysis of control parameters

In this subsection, the sensitivity of control parameters in EChOA is analyzed using experiments, as shown in Tables 12 and 13. The experimental design is consistent with the previous subsection. It can be found from the details of EChOA that the main parameters include the mutation index ηm and the attenuation rate of the step size K. The function of parameter ηm is to control the diversity of the population. The function of parameter K is to determine the step length of the search. In Table 13, the best ranking is achieved when ηm = 1 and K = 0.95. Besides, the results do not increase or regularly decrease with the changes of ηm and K. This is because the small value of ηm attempts to maintain a large diversity of solutions. And we need a smaller step size to obtain a more accurate optimum. But it is also a fact that too large diversity in the early phase is not beneficial to the later convergence, and too small a step size will lead to more computational consumption. So the ranking result is a trade-off between them.

Table 12 Settings of different control parameters
Table 13 Average ranking of Friedman test

Engineering optimization experiments

In this section, to highlight the applicability of the proposed EChOA algorithm, three engineering design problems are selected for further testing. They are gear train design, welded beam design, and speed reducer design. The descriptions and mathematical models of all engineering problems are provided in detail below. All issues are implemented in MATLAB through the barrier penalty function. Each algorithm runs independently for each project 30 times, with a selected chimp population of 30 and an iteration of 500. Finally, the corresponding evaluations for different projects are given.

Design of gear train

This is an unconstrained optimization problem that was proposed by Sandgren. Figure 6 shows the gear train with four variables (y1, y2, y3, and y4). The final aim is to minimize the gear ratio (\(\frac{{y_{2} y_{3} }}{{y_{1} y_{4} }}\)). The mathematical model is designed as follows:

$$ \min F(y_{1} ,y_{2} ,y_{3} ,y_{4} ) = \left( {\frac{1}{6.931} - \frac{{y_{2} y_{3} }}{{y_{1} y_{4} }}} \right)^{2} $$
(15)

where \(12 \le y_{1} ,y_{2} ,y_{3} ,y_{4} \le 60\).

Fig. 6
figure 6

Gear train [49]

The results of variables and fitness are reported in Table 14. EChOA obtains the best fitness corresponding to the optimal solution (43.7742, 19.0186, 16.9050, 49.1866). Compared with other algorithms, EChOA can be considered to be more suitable for solving this design problem.

Table 14 Experimental results of gear train design

Design of welded beam

As it is named, this problem deals with designing a welded beam to minimize the fabrication cost. The minimization process is subject to constraints such as shear stress, bending stress in the beam, buckling load on the bar, end deflection of the beam, and side constraints. This optimum design has four parameters: the thickness of weld (y1), the length of the clamped bar (y2), the height of the bar (y3), and the thickness of the bar (y4), as shown in Fig. 7. The mathematical formulation is also illustrated as follows:

$$ \min F(y_{1} ,y_{2} ,y_{3} ,y_{4} ) = 1.10471y_{1}^{2} y_{2} + 0.04811y_{3} y_{4} (14.0 + y_{2} ) $$
(16)
$$ \begin{gathered} s.t. \, \tau (y_{1} ,y_{2} ,y_{3} ,y_{4} ){ - }\tau_{\max } \le 0, \, \sigma (y_{1} ,y_{2} ,y_{3} ,y_{4} ){ - }\sigma_{\max } \le 0 \hfill \\ \, \delta (y_{1} ,y_{2} ,y_{3} ,y_{4} ){ - }\delta_{{{\text{max}}}} \le 0, \, y_{1} - y_{4} \le 0 \hfill \\ \, P - P_{{\text{c}}} (y_{1} ,y_{2} ,y_{3} ,y_{4} ) \le 0, \, 0.125 - y_{1} \le 0 \hfill \\ \, 1.10471y_{1}^{2} y_{2} + 0.04811y_{3} y_{4} (14.0 + y_{2} ) - 5.0 \le 0 \hfill \\ \end{gathered} $$
(17)

where \(0.1 \le y_{1} ,y_{4} \le 2,0.1 \le y_{2} ,y_{3} \le 10\).

$$ \begin{gathered} \tau (y_{1} ,y_{2} ,y_{3} ,y_{4} ) = {\sqrt {(\tau {^\prime})^{2} + 2\tau ^{\prime}\tau ^{\prime\prime}\frac{{y_{2} }}{2R} + (\tau ^{\prime\prime})^{2}} } \hfill \\ \tau {^\prime} = {\frac{P}{{\sqrt 2 y_{1} y_{2} }}},\tau ^{\prime\prime} = {\frac{MR}{J}},M = P\left( {L + {\frac{{y_{2} }}{2}}} \right) \hfill \\ R = {\sqrt {\frac{{y_{2}^{2} }}{4} + \left( {\frac{{y_{1} + y_{3} }}{2}} \right)^{2} }} \hfill \\ J = 2\left\{ {\sqrt 2 y_{1} y_{2} \left[ {\frac{{y_{2}^{2} }}{4} + \left( {\frac{{y_{1} + y_{3} }}{2}} \right)^{2} } \right]} \right\} \hfill \\ \sigma (y_{1} ,y_{2} ,y_{3} ,y_{4} ) = {\frac{6PL}{{y_{4} y_{3}^{2}} }},\delta (y_{1} ,y_{2} ,y_{3} ,y_{4} ) = {\frac{{6PL^{3} }}{{Ey_{3}^{2} y_{4} }}} \hfill \\ P_{c} (y_{1} ,y_{2} ,y_{3} ,y_{4} ) = \frac{{4.013E{\sqrt {\frac{{y_{3}^{2} y_{4}^{6} }}{36}}} }}{{L^{2} }}\left( {1 - {\frac{{y_{3} }}{2L}}{\sqrt {\frac{E}{4G}}} } \right) \hfill \\ P = 6000\;{\text{lb}}, \, L = 14\;{\text{in}}, \, E = 30 \times 1^{6} {\text{psi}}, \, G = 12 \times 10^{6} {\text{psi}} \hfill \\ \tau_{\max } = 13{,}600\;{\text{psi}}, \, \sigma_{\max } = 30{,}000\;{\text{psi}}, \, \delta_{\max } = 0.25\;{\text{in}} \hfill \\ \end{gathered} $$
(18)
Fig. 7
figure 7

Welded beam [49]

Experimental results of the welded beam design problem are shown in Table 15. It is shown that the proposed algorithm can find the lowest cost design. Thus, it is reasonable to think that the proposed algorithm is feasible in solving such a problem.

Table 15 Experimental results of welded beam design

Design of speed reducer

This structural optimization problem involves seven variables: the face width (y1), the module of teeth (y2), the number of teeth on pinion (y3), the length of shafts between bearings (y4, y5), and the diameter of shafts (y6, y7). Figure 8 shows the speed reducer. Considering nine constraints, the aim is to find the minimum weight of the speed reducer. The mathematical model is designed as follows:

$$ \min F(y_{1} ,y_{2} ,y_{3} ,y_{4} ,y_{5} ,y_{6} ,y_{7} ) = 0.7854y_{1} y_{2}^{2} (3.3333y_{3}^{2} + 14.9334y_{3} - 43.0934) - 1.508y_{1} (y_{6}^{2} + y_{7}^{2} ) + 7.4770(y_{6}^{3} + y_{7}^{3} ) + 0.7854(y_{4} y_{6}^{2} + y_{5} y_{7}^{2} ) $$
(19)
$$ \begin{gathered} s.t. \, \frac{27}{{y_{1} y_{2}^{2} y_{3} }} - 1 \le 0, \, \frac{397.5}{{y_{1} y_{2}^{2} y_{3}^{2} }} - 1 \le 0 \hfill \\ \, \frac{1.93}{{y_{2} y_{3} y_{4}^{3} y_{6}^{4} }} - 1 \le 0, \, \frac{1.93}{{y_{2} y_{3} y_{4}^{3} y_{7}^{4} }} - 1 \le 0 \hfill \\ \, \frac{{\sqrt {1.69 \times 10^{6} + \left( {\frac{{745y_{4} }}{{y_{2} y_{3} }}} \right)^{2} } }}{{110y_{6}^{3} }} - 1 \le 0 \hfill \\ \, \frac{{\sqrt {157.5 \times 10^{6} + \left( {\frac{{745y_{4} }}{{y_{2} y_{3} }}} \right)^{2} } }}{{87y_{7}^{3} }} - 1 \le 0 \hfill \\ \, \frac{{y_{2} y_{3} }}{40} - 1 \le 0, \, \frac{{y_{1} }}{40} - 1 \le 0, \, \frac{{5y_{2} }}{{y_{1} - 1}} - 1 \le 0 \hfill \\ \end{gathered} $$
(20)

where \(y_{1} ,y_{2} ,y_{3} ,y_{4} ,y_{5} ,y_{6} ,y_{7} \in R\).

Fig. 8
figure 8

Speed reducer [49]

The results of variables and fitness are reported in Table 16. As it represents, EChOA has the superior capability to minimize the weight of the speed reducer compared to other algorithms. This example also highlights the applicability of the proposed algorithm.

Table 16 Experimental results of speed reducer design

Training multilayer perceptron experiments

In this section, it is the first time to use the enhanced ChOA algorithm to train multilayer perceptron (MLP). MLP is a feedforward artificial neural network model, which can be used to solve classification and regression problems. In these perceptrons, there is at least one hidden layer besides one input layer and one output layer. Data information is transmitted in one direction only. The training process is as follows.

The weighted (W) sum of inputs (x) is calculated using Eq. (21):

$$ s_{j} = \sum\limits_{i = 1}^{n} {(W_{ij} x_{i} - \theta_{j} )} ,{j} = 1,2,...,{h} $$
(21)

where n is the number of the input nodes, h is the number of the hidden nodes, and θ represents the bias. The output of hidden nodes is calculated using Eq. (22):

$$ S_{j} = \frac{1}{{1 + e^{{ - s_{j} }} }},{j} = 1,2,...,{h} $$
(22)

The final outputs are calculated based on outputs of each hidden node using Eqs. (23) and (24):

$$ o_{p} = \sum\limits_{j = 1}^{h} {W_{j,p} } S_{j} - \theta_{p}^{^{\prime}} ,{p} = 1,\;2,\;\;...,\;{m} $$
(23)
$$ O_{p} = \frac{1}{{1 + e^{{ - o_{p} }} }},{p = 1,}\;2,\;...,\;{m} $$
(24)

where p is the number of the output nodes. From the mathematical model, it can be analyzed that the weights W and biases θ are the most important parameters for defining the final outputs of perceptrons. The average mean square error (\(\overline{{MSE}}\)) is selected to evaluate the training model. The formula is shown as the following:

$$ \overline{{MSE}} = \frac{{\sum\nolimits_{k = 1}^{s} {\sum\nolimits_{i = 1}^{m} {(o_{i}^{k} - d_{i}^{k} )^{2} } } }}{s} $$
(25)

where s is the number of the training samples, m is the number of outputs, d represents the desired output, o represents the actual output. After analysis, training MLP can be transformed into minimizing the objective function \(\overline{{MSE}}\) by optimizing two parameters W and θ. In this paper, the Balloon dataset and Breast cancer dataset from the UCI repository [50] are used to evaluate the performance of EChOA for training MLP. Figure 9 shows the flowchart of the training process. Descriptions of these datasets are shown in Table 17. Assuming that the range of variables is [− 10, 10]. Each algorithm runs 10 times independently, with a population size of 30 and the iteration number of 250. The number of hidden nodes is equal to 2N + 1 where N indicates the number of features of datasets [51].

Fig. 9
figure 9

EChOA for training MLP

Table 17 Descriptions of datasets

Results and discussions

Table 18 gives \(\overline{{MSE}}\), standard deviation (Std), and classification rate obtained by each algorithm on two datasets. For Balloon dataset, all algorithms achieve 100% classification accuracy due to its simplicity. EChOA provides best \(\overline{{MSE}}\) and Std. The second best results belong to I-GWO. For the Breast cancer dataset, EChOA is superior to other comparison algorithms in terms of “\(\overline{{MSE}}\) ± Std” and classification accuracy. Each dataset has a different search space, which needs the algorithm to a higher level of the local optimum avoidance. These results also prove the merit of the proposed algorithm. EChOA can find more appropriate weights and biases for multilayer perceptrons, making the performance of the neural network better.

Table 18 Experimental results of different datasets

Conclusions and future research

In this paper, an enhanced chimp optimization algorithm named EChOA is proposed based on three strategies. HDPM enhances the diversity of the population. The correlation coefficient is calculated to determine the individuals that need to be improved. Then the beetle antennae operator helps the less fit individual to jump out of the local optimum. To explore the best performance, the strategy combinations, qualitative indicators, scalability, and control parameters of EChOA are tested. Besides, EChOA is compared with different algorithms from 2017 to 2021 based on 27 unimodal, multimodal, hybrid, and composite benchmark functions. The exploitation, exploration, and local optimum avoidance capabilities of EChOA are analyzed comprehensively. Some statistical tests with 5% confidence evaluate the significance of the improvements. Finally, three engineering design problems and training multilayer perceptron are selected as the real-world development of EChOA. The overall experimental results show that EChOA has significant competitiveness for some state-of-art algorithms.

However, like other optimization algorithms, EChOA also has some limitations that need to be improved. By analyzing the computational complexity and runtime in the experiment section, high consumption is still the main limitation. We believe that this problem can be alleviated by introducing some parallel strategies such as the island model or co-evolutionary mechanism.

For future research, attention will be given to how to reduce the process cost without loss of accuracy of the solution. In this paper, EChOA resolved the parameters tuning problem with the multilayer perceptron. In the next case, the development of a binary version to evaluate the potential features from the pool of features of a given machine learning/deep learning problem is an exciting topic. Other applications, including image segmentation, Internet of Things, and multi-objective optimization are also worth investigating.