Introduction

The multimodal optimization problem (MMOP) is a type of problem that has multiple global optima (sometimes including acceptable local optima on request from the decision maker). Generally, multimodal optimization (MMO) has two goals: (1) to learn the problem landscape and the distribution of the optima and (2) to provide more alternatives for a decision maker to choose from when the current solution is unsatisfactory. Seeking multiple solutions can help the decision maker to change from the existing solution to another one immediately [26]. Under these circumstances, an efficient MMO algorithm is of great importance [7].

Evolutionary algorithms (EAs) and swarm intelligence (SI) are popular algorithms for solving MMOPs. However, both typical EAs and SI have a tendency to converge to a single optimum, which is problematic for MMOPs [51]. Therefore, many widespread niching methods, such as crowding [8, 39], speciation [22, 47, 70], and fitness sharing [14], have been proposed to maintain the diversity of solutions. EAs and SI embedded with niching methods are common efficient algorithms for solving MMOPs. For example, CrowdingDE [59] improves the performance of differential evolution (DE) using a crowding scheme. Species-based DE [23] adopts an efficient niching method for maintaining diversity, where a population is divided into multiple species and DE is performed within each species. However, the radius of the species is a sensitive parameter, which should be set carefully. NSDE [54] adopts a mutation strategy based on neighborhood. Therein, the neighborhood of an individual consists of m closest individuals using the Euclidean distance, where m is a manually fixed parameter. In [29], Lin et al. proposed a variant of DE (FBK-DE). FBK-DE adopts three strategies including species formulation, species balance, and keypoint-based mutation to meet the convergence and diversity requirements. In [71], DE based on the adaptive mutation strategy, archive, and Gaussian distribution elitist set selection was used to solve MMOPs. Wang et al. [63] proposed a niching method with adaptive estimation distribution where each individual adaptively determined its niche radius. Niches are co-evolved using the master-slave mechanism, and the local search based on probability is adopted to enhance the accuracy of solutions. In [24], Li proposed a ring topological particle swarm optimization (PSO). In the proposed algorithm, neighbors are determined according to the index distance. Fieldsend [11] proposed a multi-swarm PSO algorithm, where the number of swarms was dynamically changed in the process of searching. In addition to niching methods, multi-objective approaches provide another way to solve MMOPs, by transforming MMOPs into multi-objective optimization problems to satisfy the objectiveness of diversity and convergence of the population [3, 6, 9].

In real-world applications, many optimization problems require adequate knowledge of the fitness landscape to search for all global optima and local optima. For example, urban search and rescue (USAR) problems [31] is a research topic to search for victims in extreme conditions, such as the drowned city during a very short period. That is, USAR teams must search for every area and rescue victims as soon as possible. Lots of evolutionary algorithms and swarm intelligence are studied, searching for the path to reach the victims as fast as possible. In [12], Firthous and Kumar tested the performance of different optimization algorithms on USAR problems. In [13], Geng et al. designed a novel version of particle swarm optimization to solve the task allocation in USAR problems. In addition to minimizing the time, the number of rescued victims should be maximized [2, 16]. In the case of a major natural disaster such as an earthquake or a tsunami, a large number of areas are revolved where there may be lots of survivors to be rescued. Obviously, USAR teams cannot blindly traverse the entire affected area, which will greatly waste manpower and material resources. Therefore, how to design algorithms to efficiently and accurately find every area where there is a probability of survivors is the main challenge of USAR problems. The USAR problem searching for lots of victims could be modeled as a many-modal optimization problem.

Compared with multimodal optimization, many-modal optimization problems are a class of problems with a large number of global optima (or acceptable local optima) and pose a great challenge to niching methods. Although MMO has been studied for many years, there is no work on many-modal optimization except in our previous conference paper [36]. In [36], we have defined and made a study of the many-modal optimization problem, and designed an optimization algorithm called difficulty-based cooperative co-evolution (DBCC) to solve it. However, the previous work is preliminary and does not consider complex environments. Hence, as an extension of our previous work [36], more complex many-modal optimization problems are considered in this work. The contributions of this work are given as follows.

  1. (1)

    We propose a new benchmark for many-modal optimization problems. Among them, some are nonseparable problems, and others are additively separable problems. Each of them has many optima. In addition, test functions are more complicated than in the previous work in [36] owing to higher dimensions.

  2. (2)

    We propose an improved version of difficulty-based cooperative co-evolution. Difficulty-based cooperative co-evolution (DBCC), proposed in our previous conference work [36], has four steps: problem separation, resource allocation, optimization, and solution reconstruction. In DBCC, difficulty estimation is performed only once and used as a guideline to allocate the computational resources. In this paper, we propose an improved version of DBCC, which is named DBCC2. In DBCC2, the dynamic resource allocation strategy is adopted according to the estimated difficulty.

The remainder of this paper is organized as follows: the next section introduces the related work, including MMOPs, cooperative co-evolution (CC) and typical optimizers. The third section discusses a novel many-modal benchmark of problems. DBCC2 is proposed in the fourth section. Experimental results and analysis are demonstrated in the fifth section. Finally, the last section presents conclusions and the future work.

Related work

Multimodal optimization problems (MMOPs)

MMOPs have been paid much attention in recent years, and several benchmark test suites have been proposed to test the performance of MMO algorithms. In [25], Li et al. have provided 12 test functions, where the range of dimensions is [1, 20]. The function with the largest number of global optima is F7, i.e., Vincent (3D) [25], which has 216 global optima. In [52], Qu et al. have proposed a novel MMO benchmark. Their benchmark contains eight extended simple functions and seven composition functions.

Although some problems of the above-mentioned benchmarks have many global peaks, they are not specifically designed for evaluating algorithms of many-modal optimization. In this study, we design a new many-modal optimization benchmark, in which each problem has more than 100 global optima. Importantly, we consider variable interactions of the problems. Some problems designed are additively partially separable, and others are additively nonseparable.

Cooperative co-evolution

The framework of CC [48] provides an effective divide-and-conquer strategy for complex problems. CC has been widely used in the large-scale optimization [43, 62, 68]. When problems can be decomposed into several subproblems, solutions of complex problems can be obtained from subproblems using co-evolving methods. The optimizer is important to search solutions of subproblems, and many existing optimizers are used in the CC framework, such as DE [44, 66], PSO [4, 28], genetic algorithms [48], evolutionary programming [32], and clonal selection algorithms (CSAs) [36]. Besides, two key problems should be solved in the CC framework, i.e., the decomposition method and the computational resource allocation problem, which will be explained in the following subsections.

Decomposition methods

In [67], variables are randomly divided into multiple groups. The variables in one group dynamically change in different cycles to enhance the variable interaction. Although the grouping method is sufficiently simple, full detection of the intrinsic interaction of the variables is neglected. Differential grouping (DG) [43] adopts the differential method to detect the variable interactions. Interactive variables are grouped into one group. The method of detecting the interaction between two variables is shown in Formulas (1) and (2), where \(\Delta _{1}\) and \(\Delta _{2}\) are differential values, \(v_1\) and \(v_2\) are different values of variable \(x_j\), and \(\delta \) is a perturbation. Two variables \(x_i\) and \(x_j\) are interactive if Formula (2) holds, where \(\tau \) is a preset threshold:

$$\begin{aligned}&\Delta _{1} = f(\dots ,x_{i}+\delta ,\dots ,v_{1},\dots ) - f(\dots ,x_{i},\dots ,v_{1},\dots )\nonumber \\&\Delta _{2} = f(\dots ,x_{i}+\delta ,\dots ,v_{2},\dots ) - f(\dots ,x_{i},\dots ,v_{2},\dots ). \end{aligned}$$
(1)
$$\begin{aligned}&\left| \Delta _{1} - \Delta _{2} \right| > \tau . \end{aligned}$$
(2)

There is an issue that DG cannot detect indirect interactions. Global differential grouping (GDG) [40] uses an interaction-related matrix to identify overlapping groups. Another issue of DG is that the threshold \(\tau \) needs to be predefined and has an effect on the sensitivity of the detection. Thus, DG2 provides a method to estimate an appropriate \(\tau \), where the lower and upper bounds of roundoff errors are used [46].

An improved version, recursive differential grouping (RDG), is considered an efficient DG method [58], where the interactions between variables are recursively identified. RDG uses \(\mathcal {O}(nlog(n))\) fitness evaluations to decompose the problem, which is better than the aforementioned methods. In RDG, the interactive detection occurs in a test set T of variables and an unprocessed set U. If U is detected to interact with variables in T, U is divided into two equal parts, \(U_1\) and \(U_2\). The detection is performed recursively in \(U_1\) and \(U_2\), and the interactive variables are included in T. In another case, there is no variable interaction with T. Then, T is grouped into a separable set if only one variable exists in T. Otherwise, T is identified as a new independent group. When RDG starts, the first variable in U is moved into the empty T, and the next round of detection starts until there are no variables in U. Besides, other grouping strategies have been proposed based on the interaction identification between variables like [15, 57].

Computational resource allocation

Classical CC methods use the round-robin strategy to allocate computational resources to each component [33, 49], which is called RBCC. RBCC does not consider the imbalanced nature of components and allocates the same computational resources for each component. Contribution-based cooperative co-evolution (CBCC) [42, 45] prefers to allocate more resources to the components that have more contributions to the fitness value, and there are three versions of CBCC, i.e., CBCC1, CBCC2, and CBCC3. In CBCC1, after components are sorted by the contributions, the component with the maximum contribution is optimized in only a one-time slice. However, CBCC2 tends to optimize the subcomponent with the maximum contribution until no improvement. Hence, CBCC1 and CBCC2 have the problems of over-exploration and over-exploitation, respectively [42]. CBCC3 considers both exploration and exploitation, and achieves a more reasonable allocation. In addition to CBCC, Ren et al. [55] proposed a fine-grained contribution-aware CC framework, which allocate computational resources to components based on their contribution expectations. In [20], a CC framework called bandit-based CC was proposed, which allocates computational resources to components from the perspective of the dynamic multi-armed bandit. Moreover, Jia et al. [17] proposed DCCA with the aim of allocating more processors to the components with the larger contribution.

There have been also CC frameworks developed for other types of complex problems. For example, Xu et al. [65] proposed a constraint-objective CC framework for large-scale constrained optimization, which allocates computational resources to components with larger contributions or larger corrections based on the constraint violation of the optimal solution. Jia et al. [18] proposed a CC framework for the overlapping problems, which included a contribution-based decomposition method and a contribution-based resource allocation method. Liu et al. [30] proposed a variable importance-based resource allocation mechanism for large-scale multi-objective optimization problems, which prioritizes the allocation of more computational resources to the group of variables with higher importance.

Nevertheless, the above-mentioned resource allocation strategies do not consider that components have different search difficulties. In general, the more difficult components need to obtain more resources to find solutions than the easier ones.

Optimizer

In this part, three optimizers are introduced, which are used in our experiments.

Differential evolution

DE and its variants are widespread techniques for solving MMOPs. Here, a variant of DE for MMO in [10] is introduced, which is used in experiments. In [10], the information of the neighborhood is considered in the mutation strategy. As shown in Formulas (3)\(\sim \)(4), two methods are adopted to generate a mutant \(z_i\) of \(x_i\), which both use the nearest individual according to the Euclidean distance:

$$\begin{aligned}&z_i = x_{i}^{nn} + F_1(x_{r1}-x_{r2}), \end{aligned}$$
(3)
$$\begin{aligned}&z_i = x_{i}^{nn} + F_1(x_{r1}-x_{r2}) + F_2(x_{r3}-x_{r4}), \end{aligned}$$
(4)

where \(z_i\) denotes the mutant individual of \(x_i\). \(x_{i}^{nn}\) is the individual nearest to \(x_i\). \(F_1\) and \(F_2\) are the scaling factors to control the step of mutation. \(x_{r1},x_{r2},x_{r3}\), and \(x_{r4}\) are randomly selected different individuals, which are also different from the i-th individual \(x_i\). After the mutant is generated, the crossover operation is performed according to the following formula:

$$\begin{aligned} u_{i}^{j} = \left\{ \begin{aligned} z_{i}^{j} \qquad&\text {if}\quad rand_{i}^{j} < CR \quad \text {or} \quad j = j_{rand}\\ x_{i}^{j} \qquad&\text {otherwise}, \end{aligned} \right. \end{aligned}$$
(5)

where CR is used to control the selected probability from the mutant, \(rand_{i}^{j} \in [0,1]\) is a random value from the uniform distribution. In addition, \(j_{rand}\) is the randomly chosen dimensional index. Hence, it ensures that at least one dimension comes from \(z_{i}^{j}\) if all random numbers are greater than CR.

Algorithm 1 shows the working mechanism at one generation. In line 3, the function rand() generates a random number in the range of [0, 1] which follows the uniform distribution.

figure a

Particle swarm optimization

PSO is a representative algorithm of SI, which has received much attention in MMO [24, 53]. PSO usually has two categories, lbestPSO and gbestPSO according to the source of social information. Particles take the global best individual as their information source of social cognition in gbestPSO, whereas lbestPSO uses the information of the local best individual. In this study, we choose lbestPSO as our optimizer because of better diversity.

In lbestPSO with a constriction factor [56], the j-th dimension of the velocity \(v_i\) of the i-th particle is updated as follows:

$$\begin{aligned} v_i^j(t+1)= & {} \mathcal {K} * (v_i^j(t)+c_{1}*rand_i^j*(pbest_i^j(t)-x_i^j(t))\nonumber \\&+c_{2}*rand_i^j*(lbest_i^j(t)-x_i^j(t))), \end{aligned}$$
(6)

where \(\mathcal {K}\) is a constriction factor, \(c_1,c_2\) are learning factors, \(pbest_i\) denotes the historical best position of the i-th particle, and \(lbest_i\) is the historical best position of the particles in the neighborhood. The j-th dimension of the solution \(x_i\) in the \((t+1)\)-th iteration is updated according to the following formula:

$$\begin{aligned} x_i^j(t+1) = x_i^j(t)+v_i^j(t+1). \end{aligned}$$
(7)

Algorithm 2 shows the iteration of individuals in the same species, where f is the fitness of x. pbest and pbestCost are historical best locations and their fitness, respectively.

figure b

Clonal selection algorithm

CSA simulates the behaviors of cloning and selecting in the immune system [34, 61], which clones and mutates antibodies and conducts the selection operation among the current individual and its offspring.

In [35], two methods of mutation are adopted in CSA: DE-like mutation and conventional Gaussian mutation. The step size of Gaussian mutation is set according to the distance between the two farthest individuals in the species. DE-like mutation produces a mutant \(a_i\) of \(x_i\), as shown in the following formula:

$$\begin{aligned} a_i = 0.5(best_i - x_i) + x_i, \end{aligned}$$
(8)

where \(best_i\) denotes the best individual in the same species. Algorithm 3 presents the process of reproducing the next generation antibodies in the same species. In line 2, the set \(Pool_i\) stores copies of the i-th individual. In line 7, the first copy \(x_{c1}\) adopts DE-like mutation with a probability of 0.5. In lines 13 and 14, each antibody is evaluated, and the best individual is chosen from \(x_i\) and its variants including the Gaussian variants and the DE-like variant. Note that the fitness is called affinity in CSA.

figure c

Many-modal optimization problems

Many-modal optimization problems are a type of multimodal problems which have many global optima (sometimes including acceptable local optima). Herein, we design a benchmark that could be used to test the performance of many-modal optimization algorithms. Similar to the problem construction method in [27], this paper adopts different subcomponents to construct many-modal problems. In particular, the problems designed in this paper take the difficulty differences between subcomponents into consideration. All test problems are maximization problems and have more than 100 global optima.

Structure of the proposed functions

The many-modal optimization problems designed can be divided into the following two categories.

  1. (1)

    Partially additively separable problems: such many-modal optimization problems are constructed by the following formula:

    $$\begin{aligned} F(X)=\sum _{i=1}^{N}{g_i(X_i)}, \ \bigcup _{i=1}^N X_i = X, \ X_i \cap X_j = \emptyset \ (i \ne j), \nonumber \\ \end{aligned}$$
    (9)

    where N is the number of subproblems and \(g_i\) is a nonseparable function.

  2. (2)

    Additively nonseparable problems: decision variables are transformed according to the following formula:

    $$\begin{aligned} Y = XM, \end{aligned}$$
    (10)

    where M is the rotation matrix. Specifically, suppose the i-th dimension in X is \(x_i\), and the ith dimension \(y_i\) in Y is obtained by Formula (11), where D denotes the dimension of the problem. Then, F(Y) is constructed using Formula (9), and the variables \(y_i\) replace the \(x_i\) in the same dimensions. Here, \(X = \{x_1, x_2, ..., x_D\} = \bigcup _{i=1}^N X_i \).

    $$\begin{aligned} y_i =m_{i1}x_1 + m_{i2}x_2 + \dots +m_{iD}x_{D}. \end{aligned}$$
    (11)

The search range of all test functions is set to \(\left[ -\!100,\!100\right] ^D\), and the total fitness evaluations are set to \(10^5 \cdot D\). The test suite is shown in Table 1, where MF1 and MF2 are nonseparable functions, and others are partially additively separable functions. In Table 1, \(r_{mm}\) is used to eliminate the redundant solutions close to the same peak. The way to construct the subproblems in Formula (9) will be introduced in the following part.

Table 1 Details of the benchmark of many-modal optimization problems

Composition functions

Table 2 Composite functions (\(CF_1\)-\(CF_{10}\)) used in the benchmark

The subproblems in Formula (9) are all designed using the composition functions with at least two global optima. We used a method similar to [25] to formulate these composition functions. The following formula shows the details of the construction of a composite function:

$$\begin{aligned} CF(\mathbf {x}) = \sum _{i=1}^{n}{\omega _i \hat{f}_{i}\left( \frac{\mathbf {x}-\mathbf {o}}{\lambda _i}M_i + bias_i \right) } + f_{bias}, \end{aligned}$$
(12)

where \(\mathbf {o}\) denotes the optimal solution, n is the number of basic functions and \(n\ge 2\), \(M_i\) represents the rotation matrix of the i-th component, \(bias_i\) is the bias value of the i-th component which is set to 0, and \(f_{bias}\) controls the global peak height of the composite function. Without loss of generality, \(f_{bias}\) is set to 0. \(\lambda _i\) is implemented to compress or stretch the i-th component. Similar to [25], \(\hat{f}_i\) estimates the normalization value of \(f_i\), which is calculated by the C multiplying a ratio between the fitness at the current location fraction and the absolute fitness of the fixed position. Here, the fixed position \(\mathbf {x} = [100,100, \dots , 100]\) and \(C=1000\). Herein, \(f_i\) is the fitness obtained by the i-th basic function.

The parameter \(\omega _i\) determines the weight of the i-th basic function according to the following formulas:

$$\begin{aligned}&\omega _i = \exp \left( -\frac{\sum _{i = 1}^{D}{\left( x_j - o_{ij}\right) ^2}}{2D\sigma _i^2}\right) , \end{aligned}$$
(13)
$$\begin{aligned}&\omega _i = \left\{ \begin{array}{l@{\quad }l} \omega _i &{}\omega _i = \max (\omega _i)\\ \omega _i(1-max(\omega _i)^{10}) &{} otherwise. \end{array}\right. \end{aligned}$$
(14)

The \(o_{ij}\) is the j-th dimension of the optimum of the i-th component, and D is the dimension of the composition function to be constructed. It is worth noting that \(\sigma _i\) controls the coverage range of the i-th basic function. \(\omega _i\) is normalized after being obtained according to Formulas (13) and (14).

In addition, all composite functions are transformed into maximization problems. The basic functions for each composition function are given in Table 2, and all composition functions are scalable.

Improved difficulty-based cooperative co-evolution

It has been shown that CC works well for complex optimization problems by decomposing them into several subproblems [38]. CC provides the basic guideline for us to implement the divide-and-conquer methods for many-modal optimization problems. The subproblems could have different difficulties in searching, but this fact received little attention in the existing CC frameworks. Hence, we have proposed a new framework of DBCC in [36], and propose its improved version (DBCC2) in this section. The framework includes the following steps namely: problem separation, resource allocation, optimization, and solution reconstruction.

Framework process of DBCC2

Algorithm 4 shows the framework process of DBCC2, where different optimizers can be embedded into this framework. In line 2, the problem is separated into several subproblems according to the grouping strategy, and set \(X_i\) contains the variable indexes of subproblem \(SP_i\). The details of problem separation are referred to in Sect. Problem separation.

figure d

The population initialization steps of the subproblems are in lines 37. The problems are black-box optimization problems, where the structure of subproblems can not be obtained directly. Hence, the variables whose indices are not in \(X_i\) are set to fixed values. In line 5, the lower bound of search space is set for these variables, and the initial fitness of the population in \(SP_i\) can be obtained according to line 5. The other supplemental information for the optimizer is updated in line 6. For example, in PSO, the velocity, best historical position, and best historical fitness of particles are updated in this line.

DBCC2 includes two steps to optimize the subproblems. First, the subproblems are optimized for base generations in line 8, which makes sure that the basic optimization is performed for each subproblem. Next, the difficulty of the subproblem is calculated, and the computational resources are allocated according to the estimated difficulty. In lines 1325, the remaining resources are dynamically allocated. At the end of the cycle, the current difficulties of the subproblems are re-estimated, which guides the next allocation of computational resources. The number of cycles c is updated in line 23. The resource allocation can be referred to in section Resource allocation.

When the computational resources are exhausted, each subproblem gets a solution set \(subP_i\), and the selection strategy shown in section Solution reconstruction is implemented to choose a suitable subset \(B_i\). The final solutions are formed based on the Cartesian set, which is shown in line 27.

In the following contents, the components adopted in DBCC2 are illustrated in detail, including the problem separation, resource allocation, optimizers and solution reconstruction.

Problem separation

The first step of DBCC2 is to detect the structure of the problem and decompose the problem into subproblems. In the existing methods of problem separation, RDG in [58] has a good performance to detect interactive variables. In this study, we adopt RDG with minor modifications for problem separation. Each separable variable is included in an independent group. In addition, the choice of detecting points is the same as in DG2 [46]. The estimation of the upper bound of the roundoff error is also similar to DG2, which uses the absolute fitness values of the detecting points to estimate the roundoff error.

Resource allocation

Resource allocation is critical for CC, and various strategies have been proposed. Different from the existing work, we suggest that the difficulty of subproblems can be used to guide resource allocation, while the difficulty of subproblems could be estimated by the fitness-distance correlation.

The fitness-distance correlation [19] can obtain prior knowledge about the landscape using statistical correlation of the sample points. Specifically, the correlation is calculated according to the following formula:

$$\begin{aligned} r = \frac{\sum _{t\in P}{(f_t-\overline{f_P})(d_t-\overline{d_P})}}{\sqrt{\sum _{t\in P}{(f_t-\overline{f_P})^2}}\sqrt{\sum _{t\in P}{(d_t-\overline{d_P})^2}}}, \end{aligned}$$
(15)

where P denotes the collection of the sampling points, \(f_t\) is the fitness of point t, \(d_t\) is the distance between point t and the best point, \(\overline{d_P}\) and \(\overline{f_P}\) are the mean values for all points in P. Ideally, there is strong correlation between fitness and distance in a linear function [41].

However, in a landscape with multiple peaks, the correlation can be weakened dramatically owing to the distribution of points at the other peaks. To handle such cases, we proposed a novel method to estimate the difficulty of subproblems as follows.

Estimated difficulty of subproblems

In the previous conference work in [36], we proposed a scheme of resource allocation based on the estimated difficulty. However, the estimation of the difficulty of subproblems is performed only once at the beginning of the algorithm, and the change of difficulty for the optimizer during the evolutionary search is not considered. In fact, the difficulties of subproblems for the algorithm are different from the initial difficulties. Furthermore, the species with poor performance are less likely to locate the peaks with high quality, which should be ignored in estimating the status of the entire population. Especially, in the later search process, the species with better performance should receive more attention.

Hence, based on the considerations above, an improved version that dynamically estimates the difficulty is proposed here.

First, in each cycle of the estimation, for subproblem \(f^i\), each selected species \(S^i\) for difficulty estimation are required to satisfy the following two conditions.

  1. 1.

    \(S^i\) should have at least two individuals.

  2. 2.

    The fitness of the seed of \(S^i\), i.e., \(f(S^i.seed)\), should satisfy the following formula:

$$\begin{aligned} f(S^i.seed) \ge f^i_{min} + (f^i_{max} - f^i_{min})(1-e^{-c}), \end{aligned}$$
(16)

where \(f^i_{min}\) and \(f^i_{max}\) are the minimum and maximum fitness values, respectively, of all the individuals in the population for the i-th subproblem, and c denotes the current evolutionary cycle.

Second, without loss of generality, suppose the species \(S^i_1,\cdots , S^i_K\) are selected; \(r^i_1,\cdots , r^i_K\) are calculated according to Formula (15). Next, the weakest fitness-distance correlation in the selected species could be obtained by the following formula:

$$\begin{aligned} \mu _i = min\{\left| r^i_1\right| ,\left| r^i_2\right| ,\dots ,\left| r^i_K\right| \}. \end{aligned}$$
(17)

Third, the difficulty of the i-th subproblem, \(d_i\), is not linearly dependent on \(\mu _i\). For example, \(\mu _i = 1\) is permanent when all picked species have two individuals. For another example, the value of \(\mu _i\) close to 0 does not always indicate that the problem is very difficult, especially when individuals of the species gather to a peak. Therefore, we implement the following formula as a nonlinear mapping between \(\mu _i\) and \(d_i\), where \(\rho \) is a manual parameter:

$$\begin{aligned} d_i = \rho (1-\mu _i)e^{-\rho (1-\mu _i)}. \end{aligned}$$
(18)

Computational resource allocation

Computational resources are divided into two parts, i.e., basic resources and flexible resources. In the previous resource allocation scheme in [36], basic resources are averagely allocated to each subproblem, and the remaining flexible computational resources are allocated at once according to the difficulties of the subproblems.

Herein, basic resources are also averagely allocated to each subproblem. However, the remaining flexible resources are allocated dynamically for each evolutionary cycle. In each evolutionary cycle, the difficulty of each subproblem is estimated according to the current status of species. Suppose subproblems \(SP_1,SP_2,\dots ,SP_n\) have the estimated difficulties \(d_1,d_2,\dots ,d_n\) according to Formula (18). The relative difficulty of the i-th subproblem is normalized according to the following formula:

$$\begin{aligned} p_i = \frac{d_i}{\sum _{i =1}^{n}{d_i}}, \qquad i = 1,2,\dots ,n. \end{aligned}$$
(19)

In addition, p is fixed at 1 if the problem cannot be decomposed by the decomposition method.

Finally, the number of generations (i.e., computational resources) allocated to each subproblem in this cycle is calculated by the following formula:

$$\begin{aligned} gen_i = \lfloor p_i*(min\{\lfloor maxGen * \alpha \rfloor , remainGen\})\rfloor , \end{aligned}$$
(20)

where \(gen_i\) is the number of generations allocated to the i-th subproblem in the current cycle. maxGen is the maximum generations for dynamical allocation, i.e., flexible resources. remainGen denotes the remaining generations, which is updated at the end of each cycle. Parameter \(\alpha \) controls the times of rounds in the dynamic resource allocation step.

Optimizer using BI-NBC

The NBC [50, 51] is an efficient clustering method, which has attracted much attention [5, 29, 35]. The core idea is to construct a search tree, and each edge links the individual and its nearest better individual. Then, the edges of the tree are cut if the weights are larger than the distance threshold. Thus, multiple sub-trees are created, and individuals in a sub-tree form a population. NBC has a good performance as it uses the information about the fitness values and distances reasonably. However, the accuracy of clustering is easily affected by outliers [37].

In this study, NBC based on the better individuals is adopted, which is called BI-NBC [36]. In BI-NBC, a control strategy is adopted that only better individuals are divided into different species with NBC. Specifically, individuals with higher quality are chosen before clustering. After that, the outliers are included in their nearest cluster.

Here, we use a simple method to identify better individuals based on the fitness. The i-th individual is chosen if its fitness, \(f_i\), is greater than the cutoff value \(f_c\), and \(f_c\) is calculated according to the following formula:

$$\begin{aligned} f_{c} = \frac{\sum _{i=1}^{NP}{f_{i}}}{NP} - \epsilon , \end{aligned}$$
(21)

where \(\epsilon \) is a small value, which is set to \(10^{-10}\). Since floating point operations have rounding errors, \(\epsilon \) ensures that the set of better individuals is not empty. The pseudo-code of BI-NBC is shown in Algorithm 5. In line 3, the temporary set TP contains the individuals except for the outliers. In line 6, the Euclidean distance is used.

figure e

After the population is divided into species, the optimizer is performed in each species. The pseudo-code of the optimizer with BI-NBC is shown in Algorithm 6, which is similar to our previous conference work in [36]. In Algorithm 6, \(S_{j}^{i}\) is the j-th species of the i-th subproblem.

figure f

Solution reconstruction

The framework of CC generates a set of solutions \(subP_i\) for the i-th subproblem, and \(|subP_i|\) denotes the size of \(subP_i\). However, if all solutions participate in constructing the final solutions, the number of solutions will be \(|subP_1|*|subP_2|*...*|subP_n|\), which increases exponentially with n (the number of subproblems). On the other hand, the diversity will be lost if only the best solution of the subproblem is chosen to construct the final solution, which is not suitable for many-modal optimization problems. Therefore, it is necessary to reduce the number of solutions used in constructing the final solutions, maintain diversity, and find a suitable subset \(B_i\) from set \(subP_i\). Then the number of the final solutions can reduce to \(|B_1|*|B_2|*...*|B_n|\).

The way to select \(B_i\) is the same as [36]. First, all solutions in \(subP_i\) are arranged in descending order of fitness. The individual with the best quality \(b_i\) is removed from \(subP_i\), and is added into \(B_i\). Next, we determine if the remaining solutions of \(subP_i\) are included in the set \(B_i\) or not. The solution \(x_k\) will be added to the \(B_i\) if two conditions are satisfied.

  1. Condition 1:

    \(|f(b_i) - f(x_k)| < \xi \), where \(\xi \) is set to 0.1.

  2. Condition 2:

    the distance between \(x_k\) and all existing individuals in \(B_i\) is greater than the distance threshold \(\gamma \), which is set to 0.1.

The selection loop continues until the end of \(subP_i\).

Experiments

Metrics

We utilize the same method as in [25] to measure the number of the found peaks, and the error between the fitness values of the found peaks and the real global optima are not greater than the manual threshold. Here, the threshold is fixed at \(\text {1.0E-04}\). The distance threshold to identify the different found global peaks is greater than the radius, \(r_{mm}\), which is shown in Table 1 and set to 0.5. The peak ratio (PR) is used to measure the overall performance of the algorithm in solving the problem. The PR can be mathematically described as the following formula:

$$\begin{aligned} PR = \frac{\sum _{Run = 1}^{TR}{FP_{Run}}}{nopt * TR}, \end{aligned}$$
(22)

where TR is the total number of runs, \(FP_{Run}\) is the found peaks in a run, and nopt shown in Table 2 is the number of global optima of the test function.

In addition, we use the standard deviation (Std) of PR as another metric, which counts the standard deviation of the ratio of the found global optima.

Experimental setting

In the experiments, the four CC frameworks are used including DBCC2, DBCC, CBCC3, and RBCC. DBCC is our previous framework. RBCC allocates the same computational resources to each subproblem using the round-robin method, and CBCC3 allocates the resources based on the contributions.

In each CC framework, there are three choices of the optimizers given in section Optimizer, i.e., DE, PSO, and CSA. The size of the population (NP) in all algorithms is fixed to 500. Each algorithm is run independently 50 times.

Parameter settings are shown in Table 3. In DBCC2 and DBCC, the basic generations for each subproblem are set to 100. At the step of dynamic allocation in DBCC2, \(\alpha \) is set to 0.1 and \(\rho \) is set to 5.0. The default parameter values are used in CBCC3. The operation of exploration is performed with a probability \(P_t\) which is set to 0.05. The contribution of the subproblem is calculated after \(iter_{gap}\) generations, which is set to 100.

In DE, two mutant strategies are used with a probability of 0.5 each. The crossover rate (CR) of recombination is set to 0.9. The constriction factor \(\mathcal {K}\) and \(c_1,c_2\) in PSO are all set to the default values. Each antibody copies twice in the CSA. The scaling factor \(\varphi \) in BI-NBC is fixed at 2.0.

Table 3 Parameter setting
Table 4 Results of DE at an accuracy level of 1.0E-04
Table 5 Results of PSO at an accuracy level of 1.0E-04
Table 6 Results of CSA at an accuracy level of 1.0E-04

Experimental results

The results of DE, PSO, and CSA in the different frameworks are shown in Tables 4, 5, 6. In these tables, BPR denotes the number of the best results of PR obtained using the algorithms, which are used to measure the comprehensive performance of the algorithms on the test functions. It can be seen that DBCC2 has higher BPR values than other CC frameworks under comparison.

As for nonseparable problems (MF1 and MF2), all the algorithms have poor performances. Specifically, DE in the different CC frameworks can find a few solutions, whereas PSO and CSA can not find solutions.

In the separable problems (MF3-MF15), DBCC2 performs better than the other three frameworks. Table 4 shows that DBCC2-DE and DBCC-DE have significant advantages in solving the following separable problems: MF3, MF4, MF9, MF10, and MF14. Moreover, DBCC2-DE performs better than DBCC-DE in solving 8 separable problems among all 13 separable problems. In Table 5, DBCC2-PSO performs better than the other algorithms in solving 7 separable problems. Compared to the other CC frameworks using the CSA, DBCC2-CSA has 8 better results among all separable functions.

In addition, the choice of the optimizers for the CC framework has a great influence on the numbers of the found global optima. It can be seen that CSA performs worse than DE and PSO for most test functions.

Comparisons with other MMO algorithms

Several state-of-art MMO algorithms are chosen for the comparisons, including FBK-DE [29], EMO-MMO [6], and MOMMOP [69]. In FBK-DE, the size of the population is set to the suggested value. In EMO-MMO, the population size is fixed at 500 and the cutting ratio in the peak detection is set to 0.1. In MOMMOP, the size of the population is set to 500, and other settings are set to the default values.

Table 7 shows the detailed results of PR and Std for all test functions. In the separable problems, DBCC2-DE and DBCC2-PSO have obvious advantages over the other algorithms. In addition, Table 8 shows the average ranks based on PR by the above algorithms as well as DBCC-DE, CBCC3-DE, and RBCC-DE in the Friedman test, which is obtained by the software tool KEEL 3.0 [1, 60]. It can be seen that DBCC2-DE has the best performance and MOMMOP is inefficient in the test suite.

Furthermore, the boxplots of the found global optima in partial problems (MF3, MF7, MF10, and MF13) are shown in Fig. 1. DBCC2-DE, DBCC2-PSO, and DBCC2-CSA have higher median values in solving all four problems. Besides, among FBK-DE, EMO-MMO, and MOMMOP, EMO-MMO has the highest median value, and FBK-DE performs more robustly than EMO-MMO.

Table 7 Results of the algorithm comparison at an accuracy level of 1.0E-04
Table 8 Average rankings of the algorithms (Friedman) according to the values of PR

Convergence analysis

In this section, we compare the differences in the behavior of DBCC2-DE, DBCC2-PSO, and DBCC2-CSA by convergence analysis. In global optimization, the radius of the population is generally used to analyze the convergence trend of the population [21, 64]. However, considering that the population is divided into multiple species by the niching strategy in multimodal optimization, the mean of the radius of each species is used to analyze the convergence trend of the population in this paper.

Figure 2 shows the trend of the mean of the species radius of DBCC2-DE, DBCC2-PSO, and DBCC2-CSA on functions MF3, MF7, MF10, and MF15. It can be seen that the convergence of PSO is the strongest and the convergence of CSA is the weakest. Also, experimental results in Table 7 show that DBCC2-PSO is the best among all the compared algorithms, while DBCC2-CSA performs worse than DBCC2-PSO and DBCC2-DE. Therefore, experimental results could support that the convergence of the optimization operator and the algorithm performance are correlated. However, the fast convergence tends to deteriorate the algorithm performance by converging the species to the local optima. As shown in Fig. 2c, DBCC2-PSO converges rapidly after the 2nd cycle such that it is unable to explore new regions in the subsequent evolutionary process. Therefore, for MF10, the performance of DBCC2-PSO is weaker than that of DBCC2-DE.

Analysis of mechanism

In this part, the strategies adopted in DBCC2 and the parameter settings are discussed detailed. That is, problem separation, resource allocation strategy, and two parameters \(\alpha \), \(\rho \) are tested to illustrate the influence of the performance.

Fig. 1
figure 1

Results of the found global optima on functions MF3, MF7, MF10, and MF13

Fig. 2
figure 2

Convergence trend of DBCC2 combining DE, PSO, and CSA on functions MF3, MF7, MF10, and MF13

Table 9 Results of DBCC2 with and without problem separation

Influence of problem separation

Problem separation is firstly executed in DBCC2 to decompose the many-modal optimization problems into subproblems, which can reduce the difficulty of finding a huge amount of optimal solutions. Here, the problem separation mechanism is tested. The algorithm without problem separation is denoted as DBCC2-xx-NPS, where xx is the optimizer adopted in DBCC2. For example, DBCC2-DE-NPS represents the proposed DBCC2 with DE optimizer and without the problem separation mechanism.

Table 9 shows the mean and standard deviation of PR results of DBCC2 and its version without problem separation. The experiment results show that problem separation performs powerfully in the many-modal optimization problems, especially in the partially separable benchmarks. In detail, in the comparison with the DE optimizer, DBCC2-DE performs much better than DBCC2-DE-NPS. On MF4, DBCC2-DE almost finds all the optima (i.e., 0.905) but DBCC2-DE-NPS obtains less than 10% optimal solutions. This huge gap also exists in the comparisons of PSO and CSA optimizers. It is noted that without problem separation, DBCC2-CSA-NPS hardly finds any optimal peaks.

Overall, the problem separation mechanism is useful in solving many-modal optimization problems with the separable property. For DE optimizer, DBCC2-DE gets the best performance on all benchmarks and the version without problem separation performs the best only on MF1 and MF2. For the PSO and CSA optimizers, DBCC2 obtains more powerful performance on all separable benchmarks in the corresponding comparisons.

Table 10 Results of DBCC2 with and without resource allocation strategy

Influence of resource allocation

The dynamic resource allocation strategy is proposed in DBCC2, and it can dynamically allocate the computing resources to the subproblems with different difficulties. Here, the performance of the resource allocation strategy is tested. We denote the proposed algorithms without the resource allocation strategy as DBCC2-xx-NRA, where xx represents the selected optimizer from DE, PSO and CSA.

Table 10 shows the mean and standard deviation of PR results of DBCC2 with and without the resource allocation strategy, respectively. It can be seen that the resource allocation strategy performs better in the algorithms with DE and CSA optimizers, but gets a poor performance with PSO optimizer. For example, on MF10, DBCC2-DE obtains 0.675 PR results, while DBCC2-DE-NRA finds a 0.549 ratio of the optimal peaks. On MF14, DBCC2-CSA achieves 0.160 results, performing better than DBCC2-CSA-NRA, which gets 0.144 PR results.

Table 11 Results of DBCC2-DE taking different values of \(\alpha \)

In summary, the resource allocation strategy is useful in DBCC2 framework with DE and CSA optimizers. In the current experiment, DBCC2-DE obtains the best results on 12 benchmark problems, while the version without the resource allocation strategy obtains the best results on 5 problems. As for the CSA optimizer, DBCC2-CSA obtains the best results on 10 benchmarks than DBCC2-CSA-NRA, which obtains the best results only on 4 problems. However, the reason that DBCC2-PSO-NRA is better DBCC2-PSO is worthy of further studying.

Table 12 Results of DBCC2-DE taking different values of \(\rho \)

Influence of \(\alpha \)

In DBCC2, \(\alpha \) controls the cycles in the step of dynamic resource allocation. The values of \(\alpha \) are chosen from the set \(\{0.05, 0.1, 0.2, 0.3, 0.5\}\). The smaller the value of \(\alpha \), the more the cycles of DBCC2. Here, DE is used as the optimizer of DBCC2 to illustrate the influence. Table 11 shows the results for different values of \(\alpha \). Overall, DBCC2-DE has the best performance when \(\alpha \) is equal to 0.1.

The algorithm requires more rounds to allocate the resources when \(\alpha \) is set to 0.05. However, it can be seen that the performance is worse than that of DBCC2-DE with \(\alpha = 0.1\). The experimental result indicates that more rounds of difficulty estimation do not always produce better performance.

The performance of DBCC2-DE with \(\alpha = 0.5\) is worse than that with \(\alpha = 0.1\). The former has only one better result in solving the separable problems, which shows that a small number of the difficulty estimation rounds cannot reflect the population status in time.

Influence of \(\rho \)

The parameter \(\rho \) has an impact on the difficulty estimation. Here, the value of \(\rho \) is taken from the set \(\{1.0, 2.0, 3.0, 5.0, 10.0\}\), and the results are shown in Table 12. The parameter \(\rho \) plays a key role in the mapping from the fitness-distance correlation to the estimated difficulty. As discussed in Section Improved difficulty-based cooperative co-evolution, the fitness-distance correlation does not reflect the difficulty linearly. In Formula (18), \(\rho \) intrinsically determines which value of \(\mu \) will lead to the maximum value of the estimated difficulty. For example, \(\rho = 2.0\) indicates that \(\mu = 1/2\) in Formula (18) results in the maximum estimated difficulty.

The experimental results show that there are significant advantages when \(\rho \) is set to 3.0, where the maximum value of the estimated difficulty is achieved with \(\mu = 2/3\). There are 10 better results among the 13 separable problems.

Furthermore, in a majority of the problems, DBCC2-DE tends to achieve better performance when \(\rho \) is increased from 1.0 to 3.0 and performs worse performance when \(\rho \) is decreased from 3.0 to 10.0. For example, in MF5, the value of PR increases from 0.474 to 0.496 when the value of \(\rho \) is increased from 1.0 to 3.0. The value of PR decreases to 0.387 when \(\rho \) is set to 10.0.

Finally, for all separable problems, except for MF15, DBCC2-DE achieves better performance when \(\rho \) is set to 3.0 or 5.0. As for MF1 and MF2, the algorithms have the same results for different values of \(\rho \) as they are nonseparable.

Conclusion

In this paper, many-modal optimization problems are investigated. A new benchmark and an improved DBCC (DBCC2), which dynamically estimates the difficulty of subproblems in the search process, are proposed. DBCC2 shows better performance as compared to other typical MMO algorithms. However, much remains to be done in the future. Optimizers, resource allocation strategies, and CC frameworks of many-modal optimization problems still deserve further study.