Introduction

Metaheuristics inspired by nature, such as evolutionary algorithms (EAs), form a major class of modern optimization methods [1]. One of the reasons underlying their success is that they provide the flexibility needed to solve diverse engineering problems [2]. However, this flexibility comes at the cost of defining and setting multiple internal parameters, which represents a difficult task. This is because:

  1. 1.

    the parameters are problem dependent [3];

  2. 2.

    they often change as the process evolves [4];

  3. 3.

    the interactions among them can be complex [5].

The problem dependency is expressed by the no free lunch theorem, as individual parameter configurations represent different search path optimizers. The evolution dependency and interaction dependency issues are mostly acknowledged through empirical studies.

Fig. 1
figure 1

General framework describing the adaptive process. Steps in boldtype refer to fundamental components

Parameter setting approaches first appeared on the scene with the emergence of the EA field, with the aim of providing guidelines to practitioners [68]. In the early 1990s, as it became clear that general recommendations were hardly useful [9], the emphasis shifted from guidelines to control systems and finely tuned methodologies. With a widely recognized and satisfactory solution yet to be found even after more than 35 years, parameter setting continues to be an important and active field of research [10]. Not only do parameter setting approaches help alleviate practitioners’ efforts, but they also play a major role in the achievement of enhanced EA performance. In fact, EA parameters are responsible for providing a specific exploration/exploitation balance (EEB) over a given problem [11]. This balance dictates the search path to be followed, and as a result, parameter setting constitutes a stepping stone providing an optimization framework to practitioners that is both robust (high-quality solutions for a wide range of problems) and efficient (high-quality solutions delivered as quickly as possible).

The most comprehensive taxonomy of parameter setting approaches to date has been provided by Eiben, Hinterding, and Michalewicz [9]. They define four approaches: parameter tuning, deterministic control, adaptive control, and self-adaptive control. Parameter tuning methods are designed to fix parameters prior to the optimization process; alternatively, parameter control methods change parameters as the process evolves. Deterministic control methods are iteration-based formulations, which do not take feedback from the process into account, while adaptive control methods use feedback gathered from the search and somehow apply it to the selection of parameter states. Finally, self-adaptive control refers to techniques in which parameters are directly encoded in individuals with the aim of allowing evolution to identify the most promising parameter states. Obviously, each of these categories has its own advantages and drawbacks [9]. Based on the preceding description, it could be concluded that parameter tuning and deterministic control are not capable of characterizing the three parameter setting dependency issues listed above. In contrast, the parameter setting capability of self-adaptive control is based entirely on the individual’s fitness. Nevertheless, these raw values are often not sufficient to characterize the search behaviour. For instance, population diversity remains crucial in guiding the search process [12]. Therefore, we should recognize that adaptive control, which is the only remaining option, represents the most attractive class to guide the search towards an optimal path.

Over the years, numerous adaptive parameter control approaches have been proposed. However, these have been restricted to a limited number of EA parameters. In fact, a review of more than 100 papers covering parameter adaptation reveals that more than 88 % of the approaches examine the adaptation of at most two parameters, while only 3 % involve four parameters. To the authors’ knowledge, no study has proposed the adaptation of more than four parameters. Furthermore, only a few of these approaches have the potential of handling parameter interactions, while most of them consider fixed interactions. This limits their adaptation effectiveness in terms of achieving an optimal search path, as parameter interactions may change when faced with different problems or during process evolution. This indicates that EA parameter adaptation is a rather complex task, and one that is yet to be fully resolved. This study proposes an adaptive control framework that enables the management of all three parameter setting dependencies.

The paper is organized as follows. Sect. 2 reviews the adaptive frameworks available in the literature; Sect. 3 presents the proposed adaptive control system; Sect. 4 describes the methodology introduced to compare the examined parameter setting approaches; Sect. 5 presents the results of this comparative study, and, finally, Sect. 6 provides a concluding discussion.

Review of adaptive parameter control strategies

Many aspects of an adaptive control strategy (listed below) need to be defined, while Fig. 1 illustrates the adaptive control process. Since each of these elements plays a specific role, the following subsections review them separately. Note that the present research focuses mainly on genetic algorithms (GA) due to their extensive use and the requirement of many parameters in their formulations [13].

  • The type and states of the parameters involved.

  • The feedback indicator used to evaluate the impact of the current state (j) of parameter i.

  • The window interval (W) over which the adaptation is conducted;

  • The credit assignment scheme required to convert feedback information into a suitable reward.

  • The parameter selection rule used to update parameter states.

Parameters involved

In general, six parameters must be set to conduct a GA search: (1) the population size (N), (2) the selection mode, genetic operators through (3) crossover and (4) mutation types, as well as their respective activation probabilities, (5) p\(_{c}\) and (6) p\(_{m}\). This ensemble becomes even larger if we consider multi-parent selection (\(\upmu )\) and/or multiple offspring creation per couple (\(\lambda )\). In addition, supporting the steady-state evolution model (SSGA) adds replacement and update plans to parameter selection. Throughout this paper, parameter setting is looked at in the broadest sense of the term, which includes both parameters and operators.

One of the strongest motivations for conducting parameter adaptation is to control the EEB throughout the search. Each parameter has its own impact on this fundamental search characteristic. For instance, increasing N favours the exploration of the landscape, whereas increasing selection pressure directs the search towards the exploitation of the most promising solutions. Depending on their formulation, any of the genetic operators may be seen as having an exploration or an exploitation function [11]. Adapting all the parameters therefore becomes important.

Feedback indicators

A feedback indicator may be defined as the evidence on which adaptation is based [14]. In other words, by assessing the productivity of the parameter states, it acts as the driving force for adaptation. Sending the wrong advice to the update mechanism may adversely affect the adaptation process [15], which makes feedback indicator selection truly fundamental for any adaptive framework.

Since most adaptive studies cover genetic operators and their activation probabilities, we must point out that many feedback formulations are based on the improvement achieved by the offspring (\(f_\mathrm{offspring})\) relative to a reference fitness value. This reference could be the best individual in the population (\(f_\mathrm{{best}})\) [16], the parents (\(f_\mathrm{parent})\) [15], or any population-based statistics [17, 18]. In almost all applications, no penalty is transferred to the associated parameter state when no improvement is registered from \(f_\mathrm{offspring}\).

Another widely applied approach consists in considering diversity as a feedback indicator. Indeed, diversity is a measure commonly used to express the EEB of a search process, and here, it can be formulated based on the location of the individuals in the search space (i.e. genotypic formulation) or based on their fitness distribution (i.e. phenotypic formulation). Various proposals have been developed regarding genotypic diversity measures (GDMs) [12] and phenotypic diversity measures (PDMs) [19]. It is important to note, however, that PDMs are inherently limited when they are used alone [20]. Consequently, feedback indicators using both GDMs and PDMs have also been proposed, with the aim of simultaneously describing both the quantity and the quality of the population diversity [19]. Reference [21] considered up to three aspects: a GDM, a quality measure defined as the average fitness (\(f_\mathrm{avg})\) variation, and the mean execution time of the selected parameter state.

Credit assignment scheme

Credit assignment is used to convert a feedback indicator into a form supported by the adaptation mechanism (i.e. selection rule) and/or for aggregating multiple feedback indicators. The normalization of a feedback indicator can be regarded as a simple credit assignment scheme. In fact, normalization is helpful in reducing dependency related to feedback indicators built from raw values [22]. The concept of ranking has also been proposed as a way of eliminating the concern regarding raw values [22]. Reference [23] focuses on rare but substantial improvements based on the probability of producing exceptionally good solutions.

Regardless of the credit assignment scheme promoted, the frequency rate of the adaptation schedule must be defined. This is reflected by the window size (W), a hyper-parameter investigated in [15]. Although it was found that W is relatively robust overall, the published results suggest that W may be sensitive and problem dependent. Three streams are employed for conferring a reward to a selection rule: (1) instantaneously conferring the reward \((W=1)\), (2) averaging the rewards over W, or (3) conferring the extreme reward encountered along W.

Parameter selection rule

The parameter selection rule can be considered as the core of any adaptive system. It is responsible for automatically incorporating past performance into current parameter selection, and numerous proposals for this rule have been presented over the years. For the sake of clarity, they are grouped below into six families.

Heuristic rule

This family encompasses various kinds of selection rules, all specifically designed for a given parameter. Because of this limitation, only the most famous of them is presented here: Rechenberg’s “1/5 success rule”, which was proposed within the evolution strategy (ES) paradigm. This rule is defined as follows [24]: “The ratio of successful mutations to all mutations should be 1/5. If it is greater than 1/5, increase the standard deviation, if it is smaller, decrease the standard deviation.”

Fuzzy logic controller

Fuzzy logic controller (FLC) allows the definition of fuzzy IF–THEN rules. This approach is useful when knowledge is vague, incomplete, or ill-structured [19]. Well-defined fuzzy sets are, however, quite difficult to form, and membership functions can be challenging to value. So, in general, FLCs rely on the knowledge and experience of experts [19].

Probability matching

The probability matching (PM) method was proposed by Goldberg [25] as a reasonable adaptive strategy for an environment characterized by uncertainty. PM is based on translating reward values into a corresponding selection probability, and is formalized in the following mechanism:

$$\begin{aligned}&\hat{q}_j^{t+1} =( {1-\alpha })\hat{q}_j^t +\alpha \cdot r_j^t,\end{aligned}$$
(1)
$$\begin{aligned}&p_{i,j}^{t+1} =p_{i,\min } +( {1-J\cdot p_{i,\min } })\frac{\hat{q}_j^{t+1} }{\sum \nolimits _{j=1}^J {\hat{q}_j^{t+1} } }. \end{aligned}$$
(2)

At a generation t+1, state j of parameter i is selected following the probability stored in \(p_{i,j}^{t+1} \). This selection probability is built from the past empirical quality estimate of state j (\(\hat{q}_j^t )\) and the current reward (\(r_j^t )\) provided by the credit assignment scheme. These two components are connected through an additive relaxation mechanism described by (1) and controlled by an adaptation rate parameter (\(0<\alpha \le 1)\). This parameter allows the system memory to be reduced, as past knowledge is increasingly forgotten with the rise of \(\alpha \). To ensure that no parameter state is lost in the process, a minimum probability (\(p_{i,min})\) is granted to all states (J) of parameter i. This last feature is essential to cope with the evolution dependencies. The PM selection rule has been used extensively [1518, 23].

Adaptive pursuit

Adaptive pursuit (AP) was introduced as an adaptive selection rule by Thierens [26]. It is based on a decision theory approach in which a winner-takes-all strategy is adopted, as the emphasis is placed on the parameter state with the highest reward. AP uses the same additive relaxation mechanism as PM (1). However, the updated probability mechanism is given by:

$$\begin{aligned} p_{i,j}^{t+1} =\left\{ {\begin{array}{l} p_{i,j}^t +\beta ( {p_{i,\max } -p_{i,j}^t })\quad \mathrm{if} j=j^*\\ p_{i,j}^t +\beta ( {p_{i,\min } -p_{i,j}^t })\quad \forall j\ne j^*\\ \end{array}} \right. , \end{aligned}$$
(3)

where \(j^*=\arg \max _{j=1,\ldots ,J} \left\{ {\hat{q}_j^{t+1} } \right\} \)and \(p_{i,\max } =1-( {J-1})p_{i,\min } \). The learning rate (\(\beta \in \) [0, 1]) is used to control the dominance of the best parameter state with respect to the concurrent states. It has been shown that AP outperforms PM [22, 26], at least over artificial scenarios.

Multi-armed bandit

The multi-armed bandit (MAB) approach was introduced as an EA adaptive selection rule by DaCosta et al. [27]. It allows the management of the exploration/exploitation balance among parameter states. The approach does not allocate any probability to an individual parameter state; rather, the allocation decision is made by selecting the state that provides the highest expectation. The general form is given by:

$$\begin{aligned} p_i^{t+1} =\mathrm{arg max}_{j=1,\ldots ,J} \left( {\hat{q}_j^{t+1} +C\cdot \sqrt{\frac{2\cdot \log \left( {\sum \nolimits _{j=1}^J {n_j } }\right) }{n_j }} }\right) , \end{aligned}$$
(4)

where \(n_{j}\) represents the number of times state j was selected from the latest periods. Within this formulation, the first term describes the exploitation aspect, while the second term is intended to promote exploration. The balance between these two aspects is controlled through the scaling factor C.

The first MAB implementation was dynamic MAB (DMAB). However, because of certain issues related to DMAB hyper-parameter setting, MAB variants, such as the sliding MAB (SLMAB) and the rank-based MAB (RMAB) [22], were developed. The main difference between RMAB and SLMAB is the \(\hat{q}_j^{t+1} \) definition. Instead of using raw values from the credit assignment scheme, RMAB ranks them through the area under the curves (AUC) concept or through the sum of the ranks (SR). Variants of these two ranking processes (FAUC and FSR) have also been introduced. Compared to AUC and SR, these latter approaches are shown to be invariant to monotonous fitness transformation. Furthermore, a decay factor (D) is integrated into RMAB to allow greater influence to top-ranked parameter states.

In all the MAB forms, (4) is applied only when at least one trial is allocated to each parameter state. Otherwise, the unselected states are chosen randomly. FAUC–RMAB has been shown to be the best MAB variant for adapting genetic operators [22], and additionally, has also shown better performance than PM or AP. Finally, it is worth noting that none of the MAB variants has succeeded in solving most of the multimodal functions considered. To explain this, Fialho [22] concluded: “[...] the maintenance of some level of diversity in the population should also be accounted somehow for the rewarding of operator applications [...].”

Covariance matrix adaptation

The covariance matrix adaptation (CMA) approach was developed within the ES framework (CMA-ES) by Hansen and Ostermeier [28]. The purpose of this strategy is to maximize the search progress rate by reproducing the successful evolution path through the adaptation of mutation (i.e. step size and direction). The underlying assumption of CMA-ES is that the mutation step size (\(\sigma )\) must be adapted as quickly as possible, whereas the covariance matrix (C) of the mutation distribution should be modified more slowly. This is reflected in the use of an additive relaxation mechanism, as defined in (1), for C. \(\sigma \) is defined globally within the population, but applied locally, as it is weighted with respect to the principal axis identified by C. This characteristic gives CMA-ES an invariant property with respect to the rotational and linear transformation of the search space. CMA-ES was introduced to improve the local search performance of ES, but it was shown in [29] that increasing N improves CMA-ES performance in multimodal problems. Based on this observation, G-CMA-ES was proposed in [30] with a restart feature that doubles the population size each time it is triggered. This could be seen as a second adaptive mechanism, since the restart decision is made based on five criteria characterizing search performance. To date, G-CMA-ES has been considered as the state-of-the-art EA [31].

Discussion

Regarding the three parameter setting dependencies, it is clear that all the adaptive control proposals take into account the problem and evolution dependencies by default. However, not every method can manage a parameter interaction dependency. In fact, among the previously defined families, only the heuristic rule and FLC are able to consider this aspect, as the other families are parameter-independent approaches. Nevertheless, methods relying on user-defined adaptive formulations are prone to encountering generalization issues with parameter interaction owing to the problem dependency issue.

Furthermore, almost all the adaptive methods rely on some hyper-parameters. This would be justifiable if it could be demonstrated that these hyper-parameters are less sensitive than the controlled parameters of EAs. However, if the ratio of hyper-parameters to controlled parameters (here referred to as the H/C ratio) is greater than 1, it is easy to lose track of the intended adaptive goal. In fact, the H/C ratio is 3 for SLMAB, 4 for PM and RMAB, and 5 for AP and DMAB, when hyper-parameters related to the adaptation schedule (window size and type of reward conferred) are included.

Parameter adaptation through Bayesian network

We propose the use of a Bayesian network (BN) as a GA parameter adaptation scheme (referred to as BNGA) to tackle the parameter setting dependencies more effectively. The BN is a coherent probabilistic framework drawn from the machine learning field; it is in fact a causal network based on Bayes’ rule:

$$\begin{aligned} P(A\left| B \right. )=\frac{P(B\left| A \right. )P(A)}{P(B)}. \end{aligned}$$
(5)

Within this formulation, P(A) represents the prior probability of event A, whereas \(P(A\vert B)\) describes the posterior probability of event A, knowing event B. \(P(B\vert A)\) represents a conditional probability or the likelihood of A, given B. The marginal probability of \(B\, (P(B))\) is taken into account for normalization purposes. It guarantees that final probabilities will respect the law of probability (\(\sum \nolimits _j {P(a_j \left| B \right. )=1.0} )\). The consistency of Bayes’ rule is easily demonstrated for independent events A and B, as the joint distribution representing the numerator in (5) becomes: \(P(A,B)=P(B\vert A)P(A)=P(A)P(B)\). Thus, a BN allows for the belief update on a parameter state based on new evidence from the search process. However, as the number of parameters increases, it becomes difficult to perform inferences directly from Bayes’ rule. To address this problem, BN uses a graphical model, which can efficiently represent the relationships among the parameters. Inference from it is conducted with algorithms developed to take advantage of the graphical representation.

Bayesian network has been applied to various domains, particularly in EA, where BN is used in the estimation of distribution algorithm (EDA) search paradigms to carry out probability distribution updates of landscape variables [32]. BN has also been used to tune GA parameters based on the number of evaluations required to achieve suitable solutions [33]. It has further been extended, with case-based reasoning (CBR), to cope with the inherent limitations of tuning [34]. To the authors’ knowledge, BN has never been used as an adaptive parameter control system.

Fig. 2
figure 2

Graphical model of BNGA

The construction of a BN for a particular application usually follows two steps. First, learning is conducted to define the best structure for describing the relationship among variables or parameters in the explored context. Then, network data or conditional probability tables (CPT) are learned from that BN structure, with the aim of quantifying the strength of those relationships. Thus, the BN framework tackles parameter interaction directly through the definition of the graphical model. The problem and evolution dependencies are settled with the help of the CPTs.

Graphical model of BNGA

The BN structure for a specific application can be learned using scoring metrics or search algorithms. However, to reduce computational costs, a graphical model is defined a priori in this study. Consequently, BNGA is based on the simplest BN category, where the structure is known and the data are all observable. Figure 2 illustrates this graphical model. The structure is developed for a real-coded steady-state GA (SSGA). The structure is decoupled into two-parameter sets to allow the parameters to act on two different canvases. In fact, as can be observed in Sect. 2.2, two kinds of evidence are used in the literature to assess parameter state productivities: (1) solution improvement and (2) population diversity. As they have a direct impact on search performance, both of these indicators should be used to assist in adaptation. Therefore, in BNGA, parameters related to the creation of new solutions (\(p_\mathrm{i}, i\) \(\in \) 1 to 4) are judged by their ability to improve solutions. In contrast, parameters involved in population governance (\(p_\mathrm{i}, i\) \(\in \) 5 to 9) are judged by their ability to manage the EEB through population diversity. Obviously, parameters from the former set have an impact on the EEB. However, the benefit of the steady-state model is that replacement and update plans constitute a gateway for deciding who will take part in the population.

Regarding the parameters involved, no activation probabilities (\(p_\mathrm{c}\) or \(p_\mathrm{m})\) are considered here, as offspring are automatically created through a genetic operator. Furthermore, \(p_{8}\) and \(p_{9}\) are added to control the dynamics of the population size.

In addition to BNGA being able to encompass all the SSGA parameters, the simplicity of its structures provides three advantages [35]. First, the V converging structures, corresponding to singly connected polytrees, allow closed-form inferences to be performed through a local updating scheme (the message passing algorithm [35]). The complexity of this model is linear with respect to the number of variable nodes [36, 37], whereas in general, BN inference is NP-hard [38]. This is an essential characteristic of BNGA, as it allows the minimization of the run-time impact of the adaptive system on the GA process. Second, it is possible to gather data incrementally from this two-level structure and refine our knowledge on parameter states. This characteristic constitutes the core of the present proposal for building CPTs. Section 3.3 below completes the description. Finally, the decoupling of the parameters into two sets enables the use of belief propagation without any approximation technique, such as noisy OR and AND gates. This is because of the small number of parent nodes in each structure. Thus, BNGA is aligned with the statement by Druzdzel and van der Gaag [39]: “Building a probabilistic network requires a careful trade-off between the desire for a large and rich model on the one hand and the costs of construction, maintenance, and inference on the other hand.”

The process of BNGA may be summarized in the three phases illustrated in Fig. 3. The first phase is related to the initialization through the definition of the CPTs and the prior probabilities of each parameter. Since no initial knowledge on the problem at hand is available, both CPTs and prior probabilities are assumed to be uniform distributions (\(p_\mathrm{i,j}=1/J\)). In the second phase, the parameter state for the next generation is selected. This selection is based on the established prior probabilities. Once the new generation is settled by the GA process, reward computations and CPTs updates for the parameters sets associated with new solution creation (\(r_\mathrm{offspring})\) and to population governance (\(r_\mathrm{population})\) are carried out in the third phase of BNGA. Thereafter, the probabilities of the parameters (BEL(\(p_\mathrm{i}^\mathrm{t}))\) are computed. The last step of this third phase is to update the prior probabilities for the upcoming generation (\(p_\mathrm{i,j}^{t+1})\); these probabilities are set equal to the last probabilities computed at the current generation. Phases 2 and 3 are repeated until the GA stop criteria are met.

To sum up, BNGA allows control of all the SSGA parameters, while taking into account the three dependencies:

  1. 1.

    Problem dependency—Initial CPTs and prior probabilities defined as uniform distributions.

  2. 2.

    Evolution dependency—BEL(\(p_\mathrm{i}^\mathrm{t})\) used as prior probabilities for the next generation.

  3. 3.

    Parameters interaction dependency—parameters \(p_\mathrm{m}\) to \(p_\mathrm{n}\) are conditionally dependent on r.

Furthermore, no additive relaxation or fading mechanism is needed by BNGA, as the memory of past knowledge is automatically managed through the modification of the prior probabilities at each generation. Finally, BNGA does not require any adaptation schedule, since it takes advantage of new evidence as soon as it appears.

Fig. 3
figure 3

Flowchart of the BNGA process

Credit assignment schemes

Rewards for \(p_{1}\) to \(p_{4}\) (\(r_\mathrm{offspring})\) are based on the fitness improvement of the offspring, while rewards for \(p_{5}\) to \(p_{9}\) (\(r_\mathrm{population})\) reflect the search EEB by means of population diversity measures.

Regarding \(r_\mathrm{offspring}\), preliminary experiments not included in this article showed that comparing average fitness of the offspring (\(\bar{f}_\mathrm{offspring} )\) to the average fitness of the parents (\(\bar{f}_\mathrm{parent} )\) provides the most suitable results. Therefore, for minimization problems, the reward over the former parameter set is given by:

$$\begin{aligned} r_\mathrm{offspring} =0.5+0.5*\frac{( {\bar{f}_\mathrm{parent} -\bar{f}_\mathrm{offspring} })}{\left| {\bar{f}_\mathrm{parent} } \right| } \end{aligned}$$
(6)
$$\begin{aligned}&P(r_\mathrm{offspring} \left| {p_1 ,p_2 ,p_3 ,p_4 } \right. )= \nonumber \\&\left\{ {\begin{array}{ll} r_\mathrm{offspring} &{}\quad \mathrm{if}\quad p_{\min } \le r_\mathrm{offspring} \le 1-p_{\min } ; \\ p_{\min } &{}\quad \mathrm{elseif }\quad r_\mathrm{offspring} <p_{\min } ; \\ 1-p_{\min } &{}\quad \mathrm{otherwise}. \\ \end{array}} \right. \end{aligned}$$
(7)

The 0.5 multiplication in (6) allows a better representation of the reward response between \(-\)0.5 and 0.5. Therefore, parameter states that are not able to improve parent solutions are penalized. In other words, a bad parameter choice also provides relevant knowledge that must not be ranked at the same level as an unused parameter state. This contradicts the mainstream view presented in Sect. 2.2. Finally, conditions are proposed to ensure that \(r_\mathrm{offspring}\) remains in the [\(p_\mathrm{min}\), \(1-p_\mathrm{min}\)] range.

The \(r_\mathrm{population}\) factor relies both on a genotypic and on a phenotypic measure to describe the quantity and quality of the population diversity simultaneously. It has been shown that the EEB orthogonal framework is a generalization of the exploration/exploitation opposing force concept [12]. As such, the GDM is responsible for characterizing the exploration axis, while the exploitation axis is represented by a phenotypic convergence measure (PCM). Note that the PCM is the inverse of the normalized PDM. A PCM is used instead of a PDM to comply with the useful diversity concept proposed in [40], which favours the maintenance of appropriate diversity, which potentially brings in good individuals. From this standpoint, both GDM and PCM must be maximized. Equations (8) and (9) formulate the measures employed in this paper for GDM and PCM evaluation. Even though they do not guarantee a perfect evaluation in all cases, the descriptors were compared to other available formulations and showed significant performance increases globally.

$$\begin{aligned}&{\text {GDM}}=\frac{\sum \nolimits _{i=1}^{N-1} {\ln \left( {1+\min \nolimits _{j\in \left[ {i+1,N} \right] } \frac{1}{n}\sqrt{\sum \nolimits _{k=1}^n {\left( {x_{i,k} -x_{j,k} }\right) ^2} } }\right) } }{\text {NMDF}},\end{aligned}$$
(8)
$$\begin{aligned}&{\text {PCM}}=1-\frac{\sum \nolimits _{i=1}^{N-1} {\ln ( {1+\left| {f_i -f_{i+1} } \right| })} }{\text {VMD}}. \end{aligned}$$
(9)

The subscripts used in these formulations are related to the individuals in the population. Therefore, \(x_{i,k}\) and \(x_{j,k }\) refer to gene k of individuals i and j. The GDM formulation is developed for a real-coded representation, where n corresponds to the dimensionality of the landscape. In the PCM formulation, \(f_{i}\) and \(f_{i+1 }\) correspond to the neighbour fitness derived from the sorted fitness distribution. Finally, NMDF and VMD are normalization techniques. The former performs normalization with the maximum diversity achieved so far and simply yields the maximum value attained by the numerator since the beginning of the search process. In contrast, the latter performs normalization with virtual maximum diversity. VMD is computed through the diversity part, or numerator, of the PCM formula using a virtual distribution of N samples uniformly distributed between \(f_\mathrm{worst}\) and \(f_\mathrm{{best}}\). Therefore, VMD needs to be updated when the population size and/or absolute fitness range change(s).

From there, we need to determine how these two measures can be combined to produce \(r_\mathrm{population}\). A similar framework was applied in [21], where it was proposed that genetic operators be rewarded based on their ability to bring the EEB closer to 45\(^{\circ }\) (equal exploration and exploitation levels). A commonly adopted EEB management scheme involves promoting full exploration at the beginning of the evolution process (EEB \(\rightarrow \) 90\(^{\circ }\)) and moving this balance towards full exploitation as the process evolves (EEB \(\rightarrow \) 0\(^{\circ }\)) [9]. However, these EEB management schemes are biased and have not been proven to provide an optimal search path over any problem. This issue was summarized in [41] as follows: “The type and amount of diversity required at different evolutionary times remains rather unclear.” Consequently, to make EEB management as flexible as possible, the proposed strategy encourages parameter states that maximize both exploration and exploitation (i.e. GDM\(\rightarrow \)1 and PCM\(\rightarrow \)1). Equation 10 formulates the operation:

$$\begin{aligned} r_\mathrm{population} =\frac{\sqrt{\mathrm{GDM}^2+\mathrm{PCM}^2} }{\sqrt{2} } \end{aligned}$$
(10)
$$\begin{aligned}&P(r_\mathrm{population} \left| {p_5 ,p_6 ,p_7 ,p_8 ,p_9 } \right. )=\nonumber \\&\left\{ {\begin{array}{ll} r_\mathrm{population} &{}\quad \mathrm{if}\,( {p_{\min } \le r_{population} \le 1-p_{\min } })\wedge ( {\mathrm{GDM}>p_{\min } \wedge \mathrm{PCM}<1-p_{\min } }); \\ p_{\min } &{}\quad \mathrm{elseif}\,( {r_\mathrm{population} <p_{\min } })\vee ( {\mathrm{GDM}<p_{\min } \wedge \mathrm{PCM}>1-p_{\min } }); \\ 1-p_{\min } &{}\quad \mathrm{otherwise}. \\ \end{array}} \right. \end{aligned}$$
(11)

Equation 11 adds a condition by penalizing parameter states that produce convergence (i.e. GDM \(< p_\mathrm{min}\) and PCM \(> 1-p_\mathrm{min})\). This condition allows moving away from a premature convergence state without relying on a restart feature.

Finally, as observed in (7) and (11), a minimum probability (\(p_\mathrm{min})\) constrains the rewards. The meaning of \(p_\mathrm{min}\) in this case is similar to that involved in the PM and AP adaptive systems; \(p_\mathrm{min}\) ensures that parameter state (\(p_\mathrm{i,j})\) is not lost during the search process. However, the role of \(p_\mathrm{min}\) is slightly different in BNGA, in that certainty states (i.e. \(p_\mathrm{i,j}=0.0\) or \(p_\mathrm{i,j}=1.0\)) are prohibited by avoiding the certainty of conditional probabilities. However, \(p_\mathrm{i,j}\) can approach these values without restriction. In addition, this hyper-parameter serves miscellaneous functions, depending on each parameter set: for \(r_\mathrm{offspring}\), \(p_\mathrm{min}\) allows the reward value to be bounded in a probabilistic range, while in contrast, for \(r_\mathrm{population}\), \(p_\mathrm{min}\) is involved in the definition of a hazardous area and penalizes parameter states that navigate into it.

Conditional probability table (CPT)

In BNGA, CPTs are built from a Bayesian type estimation. At each generation, a uniform probability is assigned to all the parameter configurations in the CPTs \((P(r\vert p_\mathrm{i}) = 0.5)\), and the reward \((r_\mathrm{offspring}\) or \(r_\mathrm{population})\) is subsequently used to update the parameter configuration selected \((j = s)\) in the CPT \((P(r\vert p_\mathrm{i,s}) = r)\). This CPT building scheme ensures that knowledge gathered through the reward is considered only once, as its impact is transferred to the parameter state probabilities by setting the prior probabilities of the next generation \((t+1)\) equal to the posterior probabilities of the previous generation (t). This approach is valid because the CPTs are built from scratch at each generation.

Finally, the size of the CPTs increases exponentially with the number of parameters involved. Consequently, a mechanism is added to increase the impact of the single configuration reward \((P(r\vert p_\mathrm{i,s}))\) in the CPTs. This is achieved by allocating a probability equal to \(1-P(r\vert p_\mathrm{i,s})\) to all configurations that do not involve parameter states included in the selected parameter configuration (i.e. \(\forall P(r\vert p_\mathrm{i,j}) = 1-P(r\vert p_\mathrm{i,s})\), where \(j \ne s)\). The assumption underlying this reinforcement mechanism is that if a parameter state performs well over a given generation, other choices become undesirable, and vice-versa. In fact, this reinforcement mechanism increases the rate of adaptation.

Posterior probabilities (BEL)

To ensure the tractability of the adaptive system, the posterior probabilities are computed at each generation according to a closed-form inference strategy, which is the message passing algorithm [35] given by:

$$\begin{aligned} \lambda _r ( {p_i })=\alpha \sum \limits _{p_k :k\ne i} {P(r\left| {p_m ,\ldots ,p_n )} \right. \prod \limits _{k\ne i} {\pi _r ( {p_k })} }, \end{aligned}$$
(12)
$$\begin{aligned} \mathrm{BEL}( {p_i })=\alpha \cdot \lambda _r ( {p_i })\cdot \pi _r ( {p_i }), \end{aligned}$$
(13)

where \(\lambda _\mathrm{r}(p_\mathrm{i})\) is the message passed by the child node \((r_\mathrm{offspring}\) or \(r_\mathrm{population})\) to the parent nodes, and \(\pi _\mathrm{r}(p_\mathrm{i})\) is the message passed by the parent nodes \(p_\mathrm{i}\) to the child node. Since nodes \(p_{i}\) do not have any parent, \(\pi _\mathrm{r}(p_\mathrm{i})\) is exactly equal to their prior probabilities. \(\alpha \) is a normalization factor based on the summation of all \(p_\mathrm{i}\) states. In (12), \(m=1\) and \(n=4\) for the structure converging towards \(r_\mathrm{offspring}\), and \(m=5\) and \(n=9\) for the structure involving \(r_\mathrm{population}\). Figure 4 presents the pseudocode summarizing the inference algorithm.

Fig. 4
figure 4

Pseudocode of the inference algorithm used by BNGA

Comparative study

We conducted a two-step comparative study to assess the performance of BNGA. First, we evaluated BNGA by means of an SSGA, looking at static parameter setting, a naïve adaptive method, PM, AP, and FAUC–RMAB. For static parameter setting, the parameters were defined from the most promising states identified by BNGA for each problem considered. For the naïve approach, the parameter states were defined with uniform probabilities throughout the course of the evolution, while for the other approaches, a hyper-parameter study was performed prior to the comparison to determine the impact of these parameters and selecting the best configuration for them.

We then compared the best parameter setting approach found in the first step to CMA-ES and G-CMA-ES in their default settings.Footnote 1 We propose this two-step approach because CMA-ES and G-CMA-ES follow different EA paradigms with different genetic operators and may thus overshadow the objectives of the analysis over the SSGA framework.

Methodology

The comparison included 25 functions (F1–F25) defined in the CEC’05 benchmark [42]. However, since this comparison represented a first validation phase aimed mainly at demonstrating the inherent ability of BNGA, while rigorously adopting the methodology provided by the CEC’05 benchmark, the following procedure remains limited to ten-dimensional experiments (\(n=10\)). Even though this restriction already offers high complexity levels, a complete description of the BNGA potential would later require a second experimentation phase with n greater than 10. Twenty-five runs were conducted for each problem. The search process stopped if the optimizer located the global optimum within the prescribed tolerance of 10\(^{-8}\) or if a 100,000-function evaluation limit was reached.

Parameter states involved

The following summarizes the SSGA process: \(\lambda \) offspring are generated at each generation using a specific crossover operator over the set of \(\upmu \) selected parents; following a chosen replacement plan, r individuals are removed from the population to make room for the new individuals; subsequently, an update plan is used to reinsert individuals from the temporary pool into the next generation.

Table 1 Parameters involved and their potential states

Table 1 summarizes the parameters considered in this study. In terms of the selection plan, the analysis includes five alternative states: (1) the best individual is selected as the parent, and the remaining individuals are chosen randomly; (2) one individual is selected randomly from among the best individuals in ten groups in the fitness range, and the remaining parents (\(\upmu -1)\) are picked at random from the population [43]; (3) parents are randomly selected from the population; (4) parents are selected from a binary tournament; and (5) parents are selected using the fitness uniform selection scheme (FUSS) [44]. For the genetic operator, only crossovers are considered here, as they all have the ability to introduce new genetic material. Again, five potential states are taken into account: (1) a parent-centric crossover (PCX) with \(\sigma _{\eta }=\sigma _{\zeta }= 0.1\) [45]; (2) a unimodal normal distribution crossover (UNDX) with \(\sigma _{\xi }=1/\sqrt{\mu -2} \) and \(\sigma _{\eta }= 0.35/\sqrt{n-\mu -2} \) [46]; (3) a uniform crossover (UX) with the addition of normally distributed noise N(0, 0.01); (4) a blended crossover (BLX-0.5); and (5) a parent-centric normal crossover (PNX) with \(\eta =1.0\) [47]. The analysis also provides four choices for the replacement plan: (1) parents are added to the temporary pool, and (2)–(4) \(x \in \) {1, 2, 5} individuals are randomly selected from the population. For the update plan, three strategies are proposed: (1) the best individuals are reinserted into the population; (2) the best individual is selected and the remaining individuals are picked from binary tournaments (BT1); and (3) the same as BT1, except that a diversity control mechanism is considered, since the selected individuals replace the nearest genotypic individual already in the temporary pool (BT2). Finally, addition and removal schemes are required to manage the population size dynamics. Two addition methods are considered when N increases: (1) randomly created individuals are added to the population; and (2) all the non-inserted offspring of the current generation are added to the population, and the remaining required individuals are randomly created. Four removal methods are considered when N decreases: (1) the worst individuals are removed from the population; (2)–(4) a tournament is made up of \(x \in \) {2, 5, 10} competitors, and the worst is removed from the population.

All the SSGA adaptive control systems (naïve, PM, AP, FAUC–RMAB, and BNGA) integrate these parameter states. At the beginning of each run, the probability of application of each parameter state (p\(_\mathrm{i,j})\) is uniformly set, as presented in the last row of Table 1. The adaptive controls of those systems, except for the naïve approach, are based on the credit assignment schemes described in Sect. 3.2. For PM, AP, and FAUC–RMAB, the parameters are controlled independently, since these systems ignore the parameter interactions. In contrast, as depicted in Fig. 2, BNGA completely integrates the parameter interaction.

Table 2 Hyper-parameter values considered for the sensitivity analysis

Hyper-parameter sensitivity analysis

To fairly compare the adaptive systems, a sensitivity analysis is conducted over the hyper-parameter setting as a prerequisite. Table 2 summarizes the hyper-parameter values considered for each adaptive system. For decoupled parameter control systems (PM, AP, FAUC–RMAB), the same hyper-parameters are used for each parameter, even though different optimal settings are likely. This choice maintains a suitable H/C ratio. Regarding \(p_\mathrm{min}\), the 1/2J option comes from the Thierens’ proposal, which seeks at most, the best parameter state applied in half the time [26].

The sensitivity analysis includes five repetitions, from F1 to F10, by considering all the hyper-parameter combinations. Since the goal is to find robust hyper-parameters over various problems, as well as configurations that provide high-quality solutions, we suggest a multi-objective approach. Both objectives are evaluated by the following equation:

$$\begin{aligned} \mathrm{obj} =\sum \limits _{j=1}^{10} {1-\frac{\min \limits _{k\in [1, TC]} f_{k,j} }{f_{i,j} }}. \end{aligned}$$
(14)

For the first objective, f represents the mean best fitness error (MBFE), whereas for the second, f corresponds to the best fitness error (BFE). Both are defined from all the repetitions over the evaluated problem j. Subscript i stands for the studied hyper-parameter configuration, which ranges from 1 to the total number of combinations (TC). Using this normalized formulation, responses can be aggregated, regardless of the complexity of the test functions. Consequently, each hyper-parameter configuration is assigned a result ranging from 0 to 10.

Fig. 5
figure 5

Results of the hyper-parameter sensitivity study: a PM. b AP

Figures 5 and 6 present the results for the four adaptive systems. For PM, the configurations with instantaneous reward application and \(p_\mathrm{min }= 0.01\) stand out compared to all hyper-parameter choices. Among them, three are non-dominated configurations (\(\alpha = {\{}0.5, 0.7, 0.9{\}}\)). To perform the comparative study, PM with \(p_\mathrm{min }= 0.01\), \(\alpha = 0.9\), and RWD = instantaneous (MBFE \(=\) 4.4, BFE \(=\) 4.6) is selected. For AP, no hyper-parameter family emerges. However, seven configurations present non-dominated behaviours. Among them, AP with \(p_\mathrm{min }= 0.01\), \(\alpha = 0.3\), \(\beta = 0.7\), and RWD=instantaneous (MBFE \(=\) 5.8, BFE \(= \) 6.7) is favoured. For FAUC–RMAB, favoured configurations suggest large window size, but only four configurations are found to be non-dominated. Among them, FAUC–RMAB with \(C = 0.5, D = 0.5\), and \(W = 250\) (MBFE \(= \)5.3, BFE \(=\) 5.9), is favoured. Finally, the performance of BNGA shows an improvement with a decreasing \(p_\mathrm{min}\) value. Based on this result, and because it is one of the non-dominated configurations, and it better reintroduces configurations that had almost been forgotten, BNGA with \(p_\mathrm{min }= 0.01\) (MBFE \(=\) 1.9, BFE \(=\) 3.5) is adopted.

Fig. 6
figure 6

Results of the hyper-parameter sensitivity study: a FAUC–RMAB. b BNGA

Results

To analyze the results of the parameter setting approaches, the statistical procedure proposed by García et al. [48] is followed, according to which Friedman’s non-parametric test is used to reveal any significant performance differences from at least one of the approaches. Then, a post hoc test, following Hochberg’s procedure, is applied to identify any difference between each parameter setting approach and a control algorithm. This control algorithm is represented by the best ranked approach over the sample considered. For each approach, the sample is built from the MBFE characteristics over a predefined group of functions. As proposed in [48], three groups are examined: F1–F14, F15–F25, and F1–F25. This statistical procedure is also independently applied to each test function, where BFEs obtained from all the repetitions serve as samples.

Table 3 presents the format used to display the results of each parameter setting approach over each individual test function. The MBFE with respect to the global optimum is recorded with its standard deviation (STD) and the best fitness error (BFE) achieved over the 25 repetitions. Furthermore, the success rate (SR), as well as the success performance (SP), is also registered. The former are defined as the percentage of runs performing better than the prescribed tolerance, whereas the latter describes the mean function evaluation number required by the successful runs divided by SR. SP is intended to compare, on the same basis, algorithms with low probabilities of success but high convergence rates, and vice-versa. The last entry corresponds to the statistical test p values. This information is used for both individual test functions and groups of functions. Shaded p values indicate Friedman’s test results and are located in the control algorithm column. The other p values represent Hochberg’s post hoc test. An underlined p value indicates that the performance of a given approach is worse than that of the control algorithm, following a 0.05 significance level. Finally, bold entries designate the best values achieved for a given test function.

Results of the SSGA parameter setting approaches

Table 4 compares the results of the six parameter setting approaches within the SSGA framework. Overall, the statistical results show that the static parameter setting approach is outperformed by other approaches over 44 % of the test functions (number of underlined p value/number of functions). This value increases to 48, 64, and 68 % for the naïve approach, PM, and AP, respectively. In terms of the FAUC–RMAB, poorer performances are observed over 40 % of the test functions. In this comparison, BNGA comes out as the most robust approach, since compared to the other parameter setting approaches, only 16 % of the test functions demonstrated weaker performances. This result confirms the importance of considering parameter interaction dependency.

Regarding the unimodal or low multimodal test functions (F1–F6), the results indicate that the static parameter setting approach performs the best. In reality, BNGA shows comparable results over those problems. However, it can be observed from the SP measures that BNGA converges more slowly to the optimum. This behaviour was expected, since, like any other adaptive parameter control method, BNGA needs to identify the most promising parameter states, whereas the static approach takes advantage of “off-line tuning”. It is interesting to note, however, that the high conditioning characteristic of F3 poses great difficulties to all adaptive parameter control systems.

Table 3 Results format

Comparing the general behaviour trends among the approaches is more difficult with the multimodal problems (F7–F25). In fact, FAUC–RMAB and BNGA turned out to be the best approaches over most of these test functions. Nevertheless, except for F12, none of the approaches were able to locate the global optimum within the prescribed tolerance. For the most complex multimodal subsets, which are the hybrid composition functions (F15–F25), it is interesting to note that the naïve approach appears to rank among the best. This suggests that over highly multimodal problems, SSGA adaptive systems are no better than a pure exploration strategy in selecting parameter states. However, an investigation of the population EEB throughout the simulations indicates that none of the approaches were able to allocate resources following a useful diversity pattern (GDM \(\rightarrow \) 1.0 and PCM \(\rightarrow \) 1.0). This lack of adequate EEB states suggests a missing complementarity and synergy among the parameter states involved. This introduces the exploration/exploitation dilemma, but this time, at the parameter level. On the one hand, too many parameter states could reduce their participation in the search and dilute their performances. On the other hand, with too few parameter states, the probability of finding robust and efficient parameter configurations diminishes. In other words, balance will yet again have to be sought. However, further investigating this aspect would be beyond the scope of the present paper.

Table 4 Results of the SSGA parameter setting approaches over the CEC’05 benchmark
Table 5 Statistical comparison by function group

Table 5 presents the performance of the SSGA parameter setting approaches from the perspective of the function groups. The outcomes of these statistical tests are similar to those obtained from the independent function analyses sampled. The results confirm that the static parameter setting approach is as good as the best adaptive control systems over the most simple test functions (F1–F14), whereas the naïve approach is competitive over complex test functions (F15–F25). However, this group perspective does not reveal the differences between the FAUC–RMAB and BNGA performances. What it clearly demonstrates is that over all the considered problems (F1–F25), these two methods stand out as the best SSGA parameter setting approaches.

Table 6 Results of the EA parameter setting approaches over the CEC’05 benchmark

Results of the EA parameter setting approaches

Since FAUC–RMAB and BNGA both demonstrated the highest global performance over F1 to F25 within the SSGA comparison framework, their results are now compared to the state-of-the-art EAs (CMA-ES and G-CMA-ES). Table 6 presents the comparison results.

Overall, Table 6 shows that FAUC–RMAB is statistically outperformed by other approaches over 60 % of the test functions, compared to 36 % for BNGA, 40 % for CMA-ES, and 20 % for G-CMA-ES. This confirms the robustness of G-CMA-ES and ranks BNGA second.

The recorded performance of G-CMA-ES is different from that indicated for functions F9 to F11 by Auger and Hansen [30]. This difference may be explained by different initialization seed numbers and the stochastic nature of EAs.

As for the SSGA parameter setting approaches, CMA-ES and G-CMA-ES are not able to allocate resources in accordance with the useful diversity concept. This may partially explain their difficulty in reaching the global optimum over highly multimodal test functions.

Table 7 presents the statistical comparison from the group of function point of view. FAUC–RMAB is the only adaptive control system dominated by the other approaches on F1–F14. It is interesting to note that, this time, G-CMA-ES is not statistically better than CMA-ES or BNGA, regardless of the group of functions considered. This is somewhat surprising, as G-CMA-ES was introduced specifically to improve the CMA-ES performance over multimodal problems. However, the lack of performance enhancement may be related to the strong asymmetry of the hybrid composition functions [29] and the maximum number of prescribed function evaluations preventing the identification of an effective \(\lambda \) value [49]. Finally, the fact that BNGA is statistically equivalent to G-CMA-ES clearly suggests that with an appropriate parameter setting approach, SSGA represents a competitive EA strategy.

Influence of diversity on search method responses

The previous results indicated that none of the evaluated approaches successfully found the global optimum of the complex multimodal problems. We can conjecture that this may be related to a lack of useful diversity in the search process. To support this hypothesis, Figs. 7, 8, and 9 compare the EEB spectra achieved by all the parameter setting approaches over the test functions F6, F8, and F21, respectively. These test functions provide a general, but concise picture of the EEB management offered by each parameter setting strategy. In these charts, each color corresponds to a specific run. Accordingly, the points represent the EEB footprints of the parameter setting strategies, rather than their path history. The EEB exploration and exploitation axes were computed from the GDM and PCM formulations given by Eqs. 8 and 9, respectively.

Fig. 7
figure 7

EEB history over F6 (shifted Rosenbrock’s function) with: a static parameter setting, b naïve approach, c PM, d AP, e FAUC–RMAB, f BNGA, g CMA-ES, and h G-CMA-ES

Fig. 8
figure 8

EEB history over F8 (shifted rotated Ackley’s function with global optimum on bound) with: a static parameter setting, b naïve approach, c PM, d AP, e FAUC–RMAB, f BNGA, g CMA-ES, and h G-CMA-ES

Fig. 9
figure 9

EEB history over F21 (rotated hybrid composition function) with: a static parameter setting, b naïve approach, c PM, d AP, e FAUC–RMAB, f BNGA, g CMA-ES, and h G-CMA-ES

Overall, the EEB of the static parameter setting approach (Figs. 7a, 8a, 9a) aptly describes the conventional EA evolution path; the search process is automatically directed towards the best individual found so far. Interestingly, this makes the EEB footprint compatible with its path history. Only F8 (Fig. 8) presents a different response. The results obtained for this test function will be discussed separately below. Surprisingly, the naïve adaptive approach (Figs. 7b, 8b, 9b) often settles in three specific EEB regions that are well balanced with respect to the exploration axis. On the other hand, PM, AP, FAUC–RMAB, and BNGA (Figs. 7c–f, 8c–f, 9c–f) demonstrate fairly similar EEB footprints.

Table 7 Statistical comparison by function group

CMA-ES and G-CMA-ES show very similar EEB footprints: few generations are allocated to extensive exploration, and most of their search processes are rapidly dedicated to the exploitation of a narrow landscape region. However, G-CMA-ES demonstrates more exploration capabilities over high multimodal problems (F21) than CMA-ES. This condition results from the restart feature embedded within G-CMA-ES. Nevertheless, as shown in Table 6, the higher exploration capability of G-CMA-ES remains insufficient to prevent the premature convergence issue observed over the higher multimodal problems (i.e. GDM \(\rightarrow \) 0 and PCM \(\rightarrow \) 1).

Test function F8 (Fig. 8) offers interesting information, as each parameter setting strategy presents a very different behaviour from the other test functions considered. In fact, F8 describes an “egg box” landscape, with the global optimum basin located on its bound. Except for rare FAUC–RMAB generations, where individuals seem to be spread throughout one or more local basins of attraction (Fig. 8e), most adaptive approaches continuously spread their resources across the landscape (GDM \(\rightarrow \) 1). In contrast, the CMA-ES and G-CMA-ES EEB footprints suggest that they concentrate the vast majority of their resources in a very few local basins of attraction. Interestingly, a differentiating feature of these two strategies is highlighted in this test function; they cover up to approximately 80 % of the EEB space. Nevertheless, practically all of them remain stuck in a local optimum basin (Table 6).

The preceding observations on the population EEB tend to verify the opening hypothesis on useful diversity, and, in other words, suggest an apparent lack of complementarity and synergy among parameter states, especially over multimodal landscapes, where premature convergence is generally observed (i.e. GDM \(\rightarrow \) 0 and PCM \(\rightarrow \) 1). Determining the best combination of parameter states involved may result in marked performance gain.

Concluding discussion

Parameter setting is a longstanding issue in the EA field: specific parameters are responsible for the search path followed and, therefore, drive the performance of the optimizer. On the other hand, these parameters are very difficult to determine, since optimal parameter states are problem and evolution dependent.

The present paper proposes an adaptive system capable of coping with all the parameter setting dependencies. This adaptive approach is based on a Bayesian network and is applied to genetic algorithms (BNGA). The proposed strategy addresses all three dependencies: (1) problem dependency, by ensuring uniform parameter state probabilities at the start of the search; (2) evolution dependency, by updating parameter state probabilities as new evidence appears; and (3) parameter interaction, by supporting relationships through the graphical model. Compared to other available adaptive parameter setting approaches, BNGA appears to be the only one capable of integrating parameter interactions, while autonomously managing interactions strength as the search evolves. Furthermore, BNGA requires only one hyper-parameter, compared to as many as five for comparable systems. For the authors, these characteristics seem clearly promising in terms of enhancing search robustness and efficiency.

The presented study initiated a comparative evaluation following the CEC’05 benchmark. In the first step, BNGA is compared to a static parameter setting, a naïve approach, and three recognized adaptive methods (PM, AP, and FAUC–RMAB). This comparison was performed within an SSGA framework involving the control of nine parameters. All the adaptive control approaches were based on the same credit assignment scheme. This means that four parameters related to offspring creation were judged on their ability to improve the fitness, and five parameters related to population governance were evaluated on their ability to manage the EEB.

In summary, the statistical results obtained led to three main conclusions. First, when considering the management of parameter interactions, BNGA shows the highest level of robustness. Second, BNGA and FAUC–RMAB stand out as the best SSGA adaptive approaches. Their performances were similar to those of static parameter setting over unimodal test functions, and statistically equivalent to that of the naïve approach over complex multimodal problems. For all the test functions, they were superior to all the parameter setting approaches included in the comparison. These observations clearly attest to the importance of the role of an adaptive control system. Adaptive systems must indeed be proficient in matching exploitative or explorative strategies. Finally, the hyper-parameter sensitivity study confirmed the implementation advantage of BNGA over other adaptive systems, such as FAUC–RMAB. In fact, the minimal number of hyper-parameters involved gives BNGA the greatest generalization power of all the adaptive systems.

The study further compared BNGA and FAUC–RMAB to state-of-the-art EAs: CMA-ES and G-CMA-ES. The statistical results demonstrated that BNGA performs as well as CMA-ES and G-CMA-ES. From a practitioner standpoint, this general conclusion is attractive, since it indicates that with the appropriate parameter setting, SSGA is highly competitive, while offering a more flexible framework than CMA-ES for customizing search components. This last observation certainly describes the central contribution of the present investigation, but more importantly suggests that SSGA remains a rich research avenue that is yet to be exploited. It is also important to mention that while associated with GA in the present investigation, the proposed adaptive approach is not limited to GA applications, but rather describes a generic control system that can be effective for any population-based search algorithm.

This analysis also revealed that none of the studied approaches found the global optimum over complex multimodal problems. Additional information on the EEB evolution gathered during the search progression and presented for all studied parameter setting approaches working on specific test functions indicated that a lack of complementarity and synergy among the parameter states may plausibly be responsible for premature convergence.

The intrinsic BNGA attributes, combined with the presented results, suggest that BNGA should also be efficient over large-scale global optimization (LSGO) problems. Actually, over the last few years, several algorithms have been developed to deal with LSGO problems [50]. These algorithms may be divided into two general categories: on the one hand, there are algorithms using landscape decomposition strategies, commonly known as cooperative coevolution (CC) methods, aimed at reducing the search complexity of LSGO problems. The main challenge associated with these algorithms resides in the optimal decomposition scheme, markedly over non-separable problems. On the other hand, the second group encompasses algorithms focusing on the improvement of conventional metaheuristics. This algorithm group faces the challenge of preventing premature convergence, which is exacerbated over LSGO problems, by providing enough exploration, while keeping runtimes at practical levels. Experiments over controlled benchmarks demonstrated that the algorithms’ performances are problem dependent [50, 51]. Clearly, these experiments indicated that LSGO problems are best solved using adaptive systems capable of leveraging population diversity to steer the search, which is a key BNGA characteristic. Therefore, since BNGA also assures efficient runtimes and requires only one hyper-parameter, future works will evaluate the scalability of the BNGA and EEB schemes to LSGO problems. The procedure developed in this study will be tested over the latest LSGO benchmark (CEC’2013) [52].