1 Introduction

Evolutionary algorithms are a widely used type of stochastic optimization that mimics biological evolution in nature. Like any other metaheuristic optimization algorithm (Brown et al. 2005; Conti et al. 2018), they need to maintain a balance on the exploration/exploitation trade-off in their search process: High exploration bears the risk to miss out on optimizing the intermediate solutions to the fullest; high exploitation bears the risk to miss the global optimum and get stuck in a sub-optimal part of the search space. Analogous to biological evolution, diversity within the population of solution candidates has been identified as a central feature to adjust the exploration/exploitation trade-off. Many means to maintain the diversity of the population throughout the process of evolution have been developed in literature; comprehensive overviews are provided by Squillero and Tonda (2016) and Gabor et al. (2018), for example.

For problems with complex fitness landscapes, it is well known that increased exploration (via increased diversity) yields better overall results in the optimization, even when disregarding any diversity goal in the final evaluation (Ursem 2002; Toffolo and Benini 2003). However, this gives rise to a curious phenomenon: By augmenting the fitness function and thus making it match the original objective function less, we actually get results that optimize the original objective function more. This implies that any evolutionary algorithm does not immediately optimize for the fitness function it uses (but instead optimizes for a slightly different implicit goal). Furthermore, to really optimize for a given objective function, one should ideally use a (slightly) different fitness function for evolution. In this paper, we introduce final productive fitness as a theoretical approach to derive the ideal fitness function from a given objective function.

We see that final productive fitness cannot feasibly be computed in advance. However, we show how to approximate it a posteriori, i.e., when the optimization process is already finished. We show that the notion of final productive fitness is sound by applying it to the special case of diversity-aware evolutionary algorithms, which (for our purposes) are algorithms that directly encode a strife for increased diversity by altering the fitness of the individuals. By running these on various benchmark problems, we empirically show that diversity-aware evolutionary processes might just approximate final productive fitness more accurately than an evolutionary process using just the original objective. We show that the fitness alteration performed by these algorithms, when it improves overall performance, does so while (perhaps because) it better approximates final productive fitness. We thus argue that the notion of final productive fitness for the first time provides a model of how diversity is beneficial to evolutionary optimization, which has been called for by various works in literature:

  • “One of the urgent steps for future research work is to better understand the influence of diversity for achieving good balance between exploration and exploitation.” (Črepinšek et al. 2013),

  • “This tendency to discover both quality and diversity at the same time differs from many of the conventional algorithms of machine learning, and also thereby suggests a different foundation for inferring the approach of greatest potential for evolutionary algorithms.” (Pugh et al. 2016),

  • “However, the fragmentation of the field and the difference in terminology led to a general dispersion of this important corpus of knowledge in many small, hard-to-track research lines” and, “[w]hile diversity preservation is essential, the main challenge for scholars is devising general methodologies that could be applied seamlessly [...]” (Squillero and Tonda 2016).

It should be noted that the approach presented in this paper merely provides a new perspective on exploration/exploitation in evolutionary algorithms and a new method of analyzing the effects of diversity. It is up to future works to derive new means to actively promote diversity from this analysis.

In this paper, we provide a short mathematical description of evolutionary processes in Sect. 2 and build our notion of (final) productive fitness on top of that in Sect. 3. Section 4 describes the empirical results and Sect. 5 discusses related work before Sect. 6 concludes.

2 Foundations

For this paper, we assume an evolutionary process (EP) to be defined as follows: Given a fitness function \(f : \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) for an arbitrary set \(\mathcal {X}\) called the search space, we want to find an individual \(x \in \mathcal {X}\) with the best fitness f(x). For a maximization problem, the best fitness is that of an individual x so that \(f(x) \ge f(x') \;\forall x' \in \mathcal {X}\). For a minimization problem, the best fitness is that of an individual x so that \(f(x) \le f(x') \;\forall x' \in \mathcal {X}\). Note that we normalize our fitness space on \([0; 1] \subset \mathbb {R}\) for all problems for ease of comparison. Whenever the maximum and minimum fitness are bounded, this can be done without loss of generality.

Usually, the search space \(\mathcal {X}\) is too large or too complicated to guarantee that we can find the exact best individual(s) using standard computing models (and physically realistic time). Thus, we take discrete subsets of the search space \(\mathcal {X}\) via sampling and iteratively improve their fitness. An evolutionary process \(\mathcal {E}\) over g generations, \(g \in \mathbb {N}\), is defined as \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\). \(\mathcal {X}\) is the search space. \(e : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) is the evolutionary step function so that \(X_{i+1} = e(X_i) \;\forall i \ge 0\). As defined above, \(f: \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) is the fitness function. \((X_i)_{i < g}\) is a series of populations so that \(X_i \subseteq \mathcal {X} \;\forall i\) and \(X_0\) is the initial population. Note that as the evolutionary step function e is usually non-deterministic, we define \(E(X) = \{X' | X' = e(X)\}\) to be the set of all possible next populations.

We use the following evolutionary operators:

  • The recombination operator \(\textit{rec} : \mathcal {X} \times \mathcal {X} \rightarrow \mathcal {X}\) generates a new individual from two individuals.

  • The mutation operator \(\textit{mut} : \mathcal {X} \rightarrow \mathcal {X}\) alters a given individual slightly to return a new one.

  • The migration operator \(\textit{mig} : \mathcal {X}\) generates a random individual \(\textit{mig}() \in \mathcal {X}\).

  • The (survivors) selection operator \(\textit{sel} : 2^\mathcal {X} \times \mathbb {N} \rightarrow 2^\mathcal {X}\) returns a new population \(X' = \textit{sel}(X, n)\) given a population \(X \subseteq \mathcal {X}\), so that \(|X'| \le n\).

The operators \(\textit{rec}, \textit{mut}, \textit{mig}\) can be applied to a population X by choosing individuals from X to fill their parameters (if any) according to some selection scheme \(\sigma : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) and adding their return to the population. For example, we allow to write \(\textit{mut}_\sigma (X) = X \cup \{ \; \textit{mut}(x') \; | \; x'\in \sigma (X) \; \}\). Note that all children are added to the population and do not replace their parents in this formulation.

For any evolutionary process \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) and selection schemes \(\sigma _1, \sigma _2, \sigma _3\) we assume that

$$\begin{aligned} X_{i+1} = e(X_i) = \textit{sel}\,(\textit{mig}_{\sigma _3}(\textit{mut}_{\sigma _2}(\textit{rec}_{\sigma _1}(X_i))), |X_i|). \end{aligned}$$
(1)

Usually, we assume that an evolutionary process fulfills its purpose if the best fitness of the population tends to become better over time, i.e., given a sufficiently large amount of generations \(k \in \mathbb {N}\), it holds for maximization problems that \(\max _{x \in X_i} f(x) < \max _{x \in X_{i+k}} f(x)\). We define the overall result of an evolutionary process \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) with respect to a fitness function \(\phi\) (which may or may not be different from the fitness f used during evolution) to be best value found and kept in evolution, i.e., for a maximizing objective \(\phi\) we define

$$\begin{aligned} |\mathcal {E}|_\phi = \max _{x \in X_g} \phi (x).\end{aligned}$$
(2)

Note that there are evolutionary processes which include a hall-of-fame mechanism, i.e., are able to return the result fitness

$$\begin{aligned} ||\mathcal {E}||_\phi = \max _{i = 1, ..., g} \;\; \max _{x \in X_i} \;\;\phi (x).\end{aligned}$$
(3)

However, we can derive the equality \(|\mathcal {E}|_\phi = ||\mathcal {E}||_\phi\) when we assume elitism with respect to \(\phi\), i.e., \({{\,\mathrm{arg\,max}\,}}_{x \in X_i} \phi (x) \in X_{i+1}\) for all \(i=1,...,g\). Since it makes reasoning easier and hardly comes with any drawback for sufficiently large populations, we use elitist evolutionary processes (with respect to f) from here on.

3 Approach

The central observation we build our analysis on is that in many cases the results of optimizing for a given objective function (called \({\textsc {of}}\)) can be improved by not using \({\textsc {of}}\) as a the fitness function f of the evolutionary process directly. Consequently, changing the fitness function f away from the true objective \({\textsc {of}}\) in some cases leads to better results with respect to the original objective function \({\textsc {of}}\). Note that this phenomenon extends beyond just heuristic optimization and is known as reward shaping in reinforcement learning, for example (Ng et al. 1999).

In evolutionary algorithms oftentimes a property called diversity is considered in addition to the objective function \({\textsc {of}}\) to improve the progress of the evolutionary process (Gabor et al. 2018; Squillero and Tonda 2016; Ursem 2002). In some way or the other, diversity-enhancing evolutionary algorithms award individuals of the population for being different from other individuals in the population. While there are many ways to implement this behavior, like topology-based methods (Tomassini 2006), fitness sharing (Sareni and Krahenbuhl 1998), ensembling (Hart and Sim 2018), etc., we consider an instance of diversity-enhancing evolutionary algorithms that is simpler to analyze: By quantifying the distance of a single individual to the population, we can define a secondary fitness \({\textsc {sf}}\) that rewards high diversity in the individual. This approach was shown by Wineberg and Oppacher (2003) to be an adequate general representation of most well-known means of measuring diversity in a population.

In order to avoid the difficulties of multi-objective evolution, we can then define the augmented fitness function \({\textsc {af}}\) that incorporates both the objective fitness \({\textsc {of}}\) and the secondary fitness \({\textsc {sf}}\) into one fitness function to be used for the evolutionary process.

Definition 1

(Augmented Fitness) Given the objective fitness \({\textsc {of}}\), a diversity-aware secondary fitness \({\textsc {sf}}\), and a diversity weight \(\lambda \in [0; 1] \subset \mathbb {R}\), we define the augmented fitness \({\textsc {af}}\) as

$$\begin{aligned} {\textsc {af}}(x) = (1 - \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x). \end{aligned}$$
(4)

As is shown in Gabor et al. (2018) and Wineberg and Oppacher (2003) such a definition of the augmented fitness suffices to show benefits of employing diversity.

We can then define two evolutionary processes \(\mathcal {E}_{{\textsc {of}}} = \langle \mathcal {X}, e, {\textsc {of}}, (X_i)_{i < g} \rangle\) and \(\mathcal {E}_{{\textsc {af}}} = \langle \mathcal {X}, e, {\textsc {af}}, (X'_i)_{i < g} \rangle\). We observe the curious phenomenon that in many cases the augmented fitness \({\textsc {af}}\) better optimizes for \({\textsc {of}}\) than using \({\textsc {of}}\) itself, formally

$$\begin{aligned} |\mathcal {E}_{{\textsc {of}}}|_{{\textsc {of}}} < |\mathcal {E}_{{\textsc {af}}}|_{{\textsc {of}}},\end{aligned}$$
(5)

which raises the following question: If \({\textsc {of}}\) is not the ideal fitness function to optimize for the objective \({\textsc {of}}\), what is?

Given a sequence of populations \((X_i)_{i < g}\) spanning over multiple generations \(i=1, ..., g\) we can write down what we actually want our population to be like inductively starting from the last generation g: The net benefit of \(X_g\) to our (maximizing) optimization process is exactly

$$\begin{aligned} |\mathcal {E}|_{{\textsc {of}}} = \max _{x \in X_g} {\textsc {of}}(x)\end{aligned}$$
(6)

as this population will not evolve any further and thus the best individual within \(X_g\) is what we are going to be stuck with as the result of the optimization process.

Note that the individuals of \(X_{g-1}\) already contribute differently to the result of the optimization process: From the perspective of generation \(g-1\) the overall optimization result is

$$\begin{aligned} \max _{x \in X_{g-1}} \;\; \max _{x' \in X_g(x)} {\textsc {of}}(x')\end{aligned}$$
(7)

where the follow-up generation \(X_g(x)\) is anyFootnote 1 population from \(\{ X_g \; | \; X_g \in E(X_{g-1}) \wedge x \in X_g \}\), i.e., the possible next populations where x survived.

Intuitively, the contribution of the the second-to-last generation \(X_{g-1}\) to the result of the optimization process stems from the objective fitness \({\textsc {of}}\) that this generation’s individuals can still achieve in the final generation \(X_g\). Generally, this does not fully coincide with the application of the objective function \({\textsc {of}}\) in said generation:

$$\begin{aligned} \max _{x \in X_{g-1}} \; \; \max _{x' \in X_g(x)} {\textsc {of}}(x') \;\;\; \ne \;\;\; \max _{x \in X_{g-1}} {\textsc {of}}(x) \end{aligned}$$
(8)

That means: While rating individuals according to their objective fitness \({\textsc {of}}\) in the last generation of the evolutionary process is adequate, the actual benefit of the individual x to the optimization result and the value of \({\textsc {of}}(x)\) may diverge more the earlier we are in the evolutionary process. Accordingly, at the beginning of an evolutionary process, the objective fitness \({\textsc {of}}\) might not be a good estimate of how much the individuals will contribute to the process’s return with respect to \({\textsc {of}}\) at the end of the optimization process. Still, standard optimization techniques often use the objective fitness \({\textsc {of}}\) as a (sole) guideline for the optimization process.

Instead, we ideally want to make every decision (mutation, recombination, survival, ...) at every generation \(X_i\) with the ideal result for the following generations \(X_{i+1}, X_{i+2}, ...\) and ultimately the final generation \(X_g\) in mind. We call this the optimal evolutionary process. Obviously, to make the optimal decision early on, we would need to simulate all the way to the end of the evolution, including all the follow-up decisions. This renders optimal evolution infeasible as an algorithm. However, we can use it for a posteriori analysis of what has happened within a different evolutionary process. In order to do so, we need to give a fitness function for the optimal process (as it obviously should not be \({\textsc {of}}\)).

Instead, we formalize the benefit to the optimization process discussed above and thus introduce the notion of productive fitness. But first, we need a simple definition on the inter-generational relationships between individuals.

Definition 2

(Descendants) Given an individual x in the population of generation i, \(x \in X_i,\) of an evolutionary process. All individuals \(x' \in X_{i+1}\) so that \(x'\) resulted from x via a mutation operator, i.e., \(x' = \text {mut}(x)\), or a recombination operator with any other parent, i.e., there exists \(y \in X_i\) so that \(x'= \text {rec}(x, y)\), are called direct descendants of x. Further given a series of populations \((X_i)_{0< i < g}\) we define the set of all descendants \(D_x\) as the transitive hull on all direct descendants of x.

We can now use this relationship to assign the benefit that a single individual has had to the evolution a posteriori. For this, we simply average the fitness of all its surviving descendants.

Definition 3

(Productive Fitness) Given an individual x in the population of generation i, \(x \in X_i\), of an evolutionary process. Let \(D_x \subseteq \mathcal {X}\) be the set of all descendants from x. The productive fitness after n generations or n-productive fitness is the average objective fitness of x’s descendants, written

$${\text{PF}}_{n} (x) = \left\{ {\begin{array}{*{20}c} {{\text{avg}}_{{x^{\prime} \in D_{x} \cap X_{{i + n}} }} {\text{OF}}(x^{\prime})} & {if\;D_{x} \cap X_{{i + n}} \ne \emptyset } \\ w & {otherwise.} \\ \end{array} } \right.$$
(9)

Note that in case the individual x has no descendants in n generations, we set its productive fitness \({\textsc {pf}}_n(x)\) to a worst case value w, which in our case of bounded fitness values is 0 for maximizing optimization processes and 1 for minimizing optimization processes.

We argue that the productive fitness \({\textsc {pf}}\) is better able to describe the actual benefit the individual brings to the optimization process, as represented by what parts of the individual still remain inside the population in a few generations. Note that our notion of productive fitness is rather harsh in two points:

  • We only take the average of all descendants’ fitness. One could argue that we may want a more optimistic approach where we might reward the individual for the best offspring it has given rise to. However, we argue that every bad individual binds additional resources for eliminating it down the road and thus a low target accuracy should actively be discouraged.

  • When the line of an individual dies out completely, we assign the worst possible fitness. Arguments could be made that even dead lines contribute to the search process by ruling out unpromising areas while, e.g., increasing the diversity scores of individuals in more promising areas of the search space. Still, we do count any however distant descendants, so even small contributions to the final population avoid the penalty w.

We leave the analysis of the effects of the discussed parameters to future work. Note that for now, our notion of productive fitness only covers a fixed horizon into the future. We can trivially extend this definition to respect the final generation no matter what generation the current individual is from:

Definition 4

(Final Productive Fitness) Given an individual x in the population of generation i, \(x \in X_i\), of an evolutionary process of g generations in total. The final productive fitness of x is the fitness of its descendants in the final generation, i.e., \({\textsc {fpf}}(x) = {\textsc {pf}}_{g-i}(x)\).

We argue that final productive fitness is able to describe what the fitness function of an optimal evolutionary process looks like: Every evaluation is done in regard to the contribution to the final generation, i.e., the ultimate solution returned by the search process.

Thesis 1

When rolling the ideal choices in all randomized evolutionary operators, final productive fitness \({\textsc {fpf}}\) is the optimal fitness function for evolutionary processes, i.e., an evolutionary process yields the best results when it optimizes for \({\textsc {fpf}}\) at every generation.

We sketch a short argument in favor of Thesis 1. For a more in-depth discussion, see Gabor and Linnhoff-Popien (2020). Let \(\mathcal {E}_{\textsc {fpf}}= \langle \mathcal {X}, e, {\textsc {fpf}}, (X_i^{{\textsc {fpf}}})_{i < g} \rangle\) be an evolutionary process using final productive fitness \({\textsc {fpf}}\). Let \(\mathcal {E}_{\textsc {idf}}= \langle \mathcal {X}, e, {\textsc {idf}}, (X_i^{\textsc {idf}})_{i < g} \rangle\) be an evolutionary process using a different (possibly more ideal) fitness \({\textsc {idf}}\). Let \(X_0^{{\textsc {fpf}}} = X_0^{\textsc {idf}}\). We assume that

$$\begin{aligned} \max _{x \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(x) < \max _{x \in X_g^{{\textsc {idf}}}} {\textsc {of}}(x).\end{aligned}$$
(10)

Since Eq. 10 implies that at least \(X_g^{{\textsc {fpf}}} \ne X_g^{{\textsc {idf}}}\), there is an individual \(x \in X_g^{{\textsc {idf}}}\) so that \(x \notin X_g^{{\textsc {fpf}}}\) and \({\textsc {of}}(x) > \max _{y \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(y)\). Since both \(\mathcal {E}_{\textsc {fpf}}\) and \(\mathcal {E}_{\textsc {idf}}\) use the same evolutionary step function e except for the used fitness, their difference regarding x needs to stem from the fact that there exists an individual \(x'\) that is an ancestor of x, i.e., \(x \in D_{x'}\), so that \(x'\) was selected for survival in \(\mathcal {E}_{\textsc {idf}}\) and not in \(\mathcal {E}_{\textsc {fpf}}\), which implies that \({\textsc {fpf}}(x') < {\textsc {idf}}(x')\). However, since x is a possible descendant for \(x'\), the computation of \({\textsc {fpf}}(x')\) should have taken \({\textsc {of}}(x)\) into account,Footnote 2 meaning that \(x'\) should have survived in \(\mathcal {E}_{\textsc {fpf}}\) after all, which contradicts the previous assumption. \(\square\)

Of course, Thesis 1 is a purely theoretical argument as we cannot guarantee optimal choices in usually randomized evolutionary operators and productive fitness in general thus comes with the reasonable disadvantage that it cannot be fully computed in advance. But for a given, completed run of an evolutionary process, we can compute the factual \({\textsc {fpf}}\) single individuals had a posteriori. There, we still do not make optimal random choices but just assume the ones made as given.

Still, we take Thesis 1 as hint that final productive fitness might be the right target to strive for. We argue that augmenting the objective fitness \({\textsc {of}}\) (even with easily computable secondary fitness functions) may result in a fitness function which better approximates final productive fitness \({\textsc {fpf}}\). In the following Sect. 4, we show empirically that (in the instances where it helpsFootnote 3) diversity-based secondary fitness \({\textsc {sf}}\) resembles the final productive fitness \({\textsc {fpf}}\) of individuals much better than the raw objective function \({\textsc {of}}\) does.

Thesis 2

When a diversity-aware augmented fitness function \({\textsc {af}}\) is aiding the evolutionary optimization process with respect to an objective fitness \({\textsc {of}}\), it is doing so by approximating the final productive fitness \({\textsc {fpf}}\) of a converged evolutionary process in a more stable way (i.e., more closely when disregarding the respective scaling of the fitness functions) throughout the generations of the evolutionary process.

This connection not only explains why diversity-aware fitness functions fare better than the pure objective fitness but also poses a first step towards a description how to deliberately construct diversity-aware fitness functions, knowing that their ideal purpose is to approximate the not fully computable final productive fitness. Again, we refer to Gabor and Linnhoff-Popien (2020) for more elaborate theoretical arguments.

Since we cannot estimate all possible futures for an evolutionary process, we provide empirical evidence in favor of Thesis 2 using a a posteriori approximation: Given an already finished evolutionary process, we compute the \({\textsc {fpf}}\) values given only those individuals that actually came into being during that single evolutionary process (instead of using all possible descendants). We argue that this approximation is valid because if the evolutionary process was somewhat successful, then all individuals’ descendants should be somewhat close to their ideal descendants most likely.Footnote 4 Note that the reverse property is not true (i.e., even in a bad run, individuals still aim to generate better descendants, not worse), which is why our approximation does not permit any statements about augmented fitness that does not aid the evolutionary process.

4 Experiments

For all experiments, we run an evolutionary process as defined in Sect. 2 with a mutation operator \(\textit{mut}\) that adds a (possibly negative) random value to one dimension of individual, applied with rate 0.1 to all individuals at random. For \(\textit{rec}\) we apply random crossover with rate 0.3 for a single individual and a randomly chosen mate. We apply \(\textit{mig}\) with a rate of 0.1 (Gabor et al. 2018). Following Wineberg and Oppacher (2003) and the results in Gabor et al. (2018), we focus on a Manhattan distance function for the secondary fitness; we also plot evolutionary processes using fitness sharing with parameter \(\alpha = 2.0\) and dissimilarity threshold \(\sigma = n\), where n is the dimensionality of the problem (Sareni and Krahenbuhl 1998), or inherited fitness with inheritance weight \(\kappa = 0.5\) (Chen et al. 2002; Gabor et al. 2018) for comparison, since both approaches also use an adapted fitness function to promote diversity.Footnote 5 The selection operator \(\textit{sel}\) is a simple rank-based cut-off in the shown evolutionary processes. Cut-off with protection for new individuals as well as roulette wheel selection was also tested without yielding noticeably different results.

All code that produced the results of this paper is available at github.com/thomasgabor/naco-evolib.

4.1 Pathfinding

We start with the pathfinding problem, which was shown to greatly benefit from employing diversity in the optimization process (Gabor et al. 2018): Given a room of dimensions \(1 \times 1\), we imagine a robot standing at position (0.5, 0.1). It needs to reach a target area at the opposite side of the room. See Fig. 1 for an illustration. The room also features a huge obstacle in the middle and thus the robot needs to decide on a way around it. The agent can move by performing an action \(a \in \{(\delta {x}, \delta {y}) | -0.33< \delta {x}< 0.33, -0.33< \delta {y} < 0.33\}\). A single solution candidate consists of \(n=5\) actions \(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le n}\). It achieves a reward of \(\frac{1}{n} = 0.2\) every time it stays within the target area between steps, i.e., its fitness is given via \({\textsc {of}}(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}) = r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, (0.5, 0.1))\) where

$$\begin{aligned}&r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, (x, y)) = \nonumber \\&r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{2 \le i \le m}, (x + \delta {x}_1, y + \delta {y}_1)) + t(x,y) \end{aligned}$$
(11)
$$\begin{aligned}&\text { with } r(\langle \rangle , (x, y)) = t(x,y) \end{aligned}$$
(12)
$$\begin{aligned}&\quad \text { and } t(x,y) = {\left\{ \begin{array}{ll} \frac{1}{n} &{} \textit{ if } 0.4 \le x \le 0.6 \wedge 0.8 \le y \le 1.0\\ 0 &{} \textit{ otherwise.}\end{array}\right. } \end{aligned}$$
(13)

The pathfinding problem lends itself to the application of diversity, as the optimization process in most cases first strikes a local optimum where it reaches the target area sometime by accident (and most probably towards the end of its steps). It then needs to switch to the global optimum where the first three steps are as goal-directed as possible and the last two steps are very small in order to stay within the target area.

Fig. 1
figure 1

The Pathfinding problem. A robot (red) needs to find a fixed path to reach the target area (green)

Fig. 2
figure 2

Evolution for the pathfinding problem. Standard evolutionary process using \({\textsc {of}}\) shown in black, diversity-aware evolutionary process using \({\textsc {af}}\) with Manhattan distance shown in blue. Inherited fitness (purple) and fitness sharing (orange) shown for comparison. All results averaged over 25 independent runs, the standard deviation is shown in transparent lines

We now compare a standard evolutionary algorithm given only the objective function \({\textsc {of}}(x) = f(x)\) to a diversity-aware evolutionary algorithm using the Manhattan distance on the solution candidate structure as a secondary fitness function, i.e., \({\textsc {af}}(x) = (1 - \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x)\) as given in Definition 1

$$\begin{aligned}&\text {where } {\textsc {sf}}(x) = \frac{1}{2n} \cdot {{\,\mathrm{avg}\,}}_{x' \in \sigma _4(X)} \textit{manhattan}(x, x') \end{aligned}$$
(14)
$$\begin{aligned}&\text { and } \textit{manhattan}(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, \nonumber \\&\quad \langle (\delta {x}_i', \delta {y}_i')\rangle _{1 \le i \le m}) =\nonumber \\&\quad \sum _{i = 1}^m |\delta {x}_i - \delta {x}_i'| + |\delta {y}_i - \delta {y}_i'|. \end{aligned}$$
(15)

Note that \(\sigma _4\) is a selection function that randomly selects 10 individuals from the population X. We use it to reduce the computational cost of computing the pairwise distance for all individuals in the population. Its admissibility for approximating the full pairwise distance was shown in Gabor and Belzner (2017). Just as we normalized the fitness function f to \([0; 1] \subset \mathbb {R}\) we also normalize the secondary fitness \({\textsc {sf}}\) to the same range via division by the maximum Manhattan distance between two individuals, i.e., 2n, to make the combination easier to understand. For now, we set \(\lambda =0.4\), which we discuss later.

Each evolution was run 30 times for 1500 generations each, using a population size of 50. Figure 2a shows the best fitness achieved per generation for all tested approaches. We see that (especially distance-based) diversity-aware evolution produces much better objective results. Figure 2b shows the separate diversity score \({\textsc {sf}}\) maintained by the best individual, which can only be computed in a meaningful way for Manhattan diversity. In Fig. 2c the standard approach shows the same plot as before since its fitness is not augmented. For all other approaches we plot the augmented fitness \({\textsc {af}}\) that is actually used for selection. We see that due to the combination of distance and objective fitness, Manhattan-diverse evolution starts higher but climbs slower than the respective objective fitness. Fitness sharing results in very small absolute values for fitness but climbs up nonetheless.

From the already run evolutionary processes, we can compute the final productive fitness as given in Definitions 3 and 4 a posteriori. Figure 2d shows the maximum \({\textsc {fpf}}\) per generation. We see that Manhattan-diverse and inherited fitness maintain a rather continuous lineage from the initial population to the best solution in the final generation as the final fitness propagates to the final productive fitness of very early generations. This behavior is rather unsurprising but illustrates the notion of the final productive fitness that measures the individuals’ impact in the final result.

For Fig. 2e we compute the perhaps most interesting measurement: This plot shows for each population X in a given generation the result of \({{\,\mathrm{avg}\,}}_{x \in X} |{\textsc {af}}(x) - {\textsc {fpf}}(x)|\), i.e., the average difference between the augmented fitness and the final productive fitness per individual. Thus, we get to assess how well the augmented fitness approximates the final productive fitness. There are a few observations to be made:

  1. 1.

    Towards the last few generations, we notice a rapid spike in the \({\textsc {fpf}}\) as the amount of descendants in the final generation to be considered for the \({\textsc {fpf}}\) decreases fast.

  2. 2.

    The actual value of the distance (i.e., the height of the line) is irrelevant to the analysis of Thesis 2 and largely determined by the setting of \(\lambda\).

  3. 3.

    Throughout most of the plot, the Manhattan-diverse evolution maintains a relatively stable level, i.e., the augmented fitness \({\textsc {af}}\) approximates the final productive fitness \({\textsc {fpf}}\) throughout the evolution. The less stable evolutions also show a worse overall result.

To further elaborate on that last point, we consider Fig. 2: It shows the average absolute value of change over 150-generations-wide windows of the \(|{\textsc {af}}(x) - {\textsc {fpf}}(x)|\) metric used in Fig. 2e. The plot was smoothed using a convolution kernel \(\langle 1,\ldots , 1 \rangle\) of size 25. Roughly speaking, we can see the slope of the plots in Fig. 2e here. In this plot, good evolutionary processes should maintain rather low values according to Thesis 2. We can observe that Manhattan-diverse evolution maintains the lowest values almost throughout the entire evolution. While fitness sharing shows increases and decreases in matching the \({\textsc {fpf}}\) at a higher level than Manhattan diversity, inherited fitness shows a huge spike in the beginning (as does the standard approach), thus making a much less stable match for the \({\textsc {fpf}}\). As proposed by Thesis 2, the match between \({\textsc {af}}\) and \({\textsc {fpf}}\) roughly corresponds to the quality of the overall result of the evolutionary process.

As mentioned earlier, we also further analyzed the importance of the setting for \(\lambda\) for the evolution. Figure 3 shows the impact of \(\lambda\) on the best results generated by the evolution. \(\lambda = 0\) equals the standard evolution in all previous plots. Unsurprisingly, we see that some amount of diversity-awareness improves the results of evolution but setting \(\lambda\) too high favors diversity over the actual objective fitness and thus yields very bad results. We want to add that more intricate version of Manhattan-based augmented fitness \({\textsc {af}}\) might aim to adjust the \(\lambda\) parameter during evolution just as inherited fitness and fitness sharing might want to adjust their parameters. For these experiments, we chose a static parameter setting for simplicity.

Fig. 3
figure 3

Parameter analysis for the diversity weight \(\lambda\). We show best fitness among all generations for 20 different settings of \(\lambda = 0.0, 0.05,\ldots , 1.0\). All results averaged over 25 independent runs each, the standard deviation shown in transparent lines

4.2 The route planning problem

The route planning problem is a discrete optimization problem with a similar motivation as the pathfinding problem. Again, we adapt the problem and its description from Gabor et al. (2018).

Fig. 4
figure 4

Illustration of the route planning problem for \(n=3\) tasks and \(o=5\) workstations per task

A robot needs to perform \(n=12\) different tasks in a fixed order by visiting relevant workstations. Each workstation can perform exactly one of the tasks and for each task, there are \(o=5\) identical workstations to choose from. Accordingly, a solution candidate is a vector \(\langle w_1,\ldots , w_n \rangle\) with \(w_i \in \{1, ..., o\}\) for all \(1 \le i \le n\). See Fig. 4 for an illustration using a smaller setting. A single workstation W can be identified by a tuple of its task type and its number, i.e., \(W = (i, k)\) for some \(1 \le i \le n\) and \(1 \le k \le o\). To mimic various means of transport, the distance \(D(W, W')\) between every two workstations \(W = (i, k)\) and \(W'= (j, l)\) is randomized individually within a range of \([0, \frac{1}{n}] \subset \mathbb {R}\). Note that this (most likely) gives rise to a non-euclidean space the robot is navigating. The objective fitness for this minimization problem is given via

$$\begin{aligned}&{\textsc {of}}(\langle w_1, ..., w_n\rangle ) \nonumber \\&\quad = \sum _{i=1}^{n-1} D((i, w_i), (i+1,w_{i+1})). \end{aligned}$$
(16)

Again, we from here also construct an augmented fitness \({\textsc {af}}(x) = (1 - \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x)\) (cf. Definition 1) but now use the Hamming distance as a secondary fitness so that

$$\begin{aligned}&{\textsc {sf}}(x) = \frac{1}{2n} \cdot {{\,\mathrm{avg}\,}}_{x' \in \sigma _4(X)} \textit{hamming}(x, x') \end{aligned}$$
(17)
$$\begin{aligned}&\text { where } \textit{hamming}(\langle w_1,\ldots , w_n \rangle , \langle w_1',\ldots , w_n' \rangle ) = \sum _{i = 1}^{n} h(w_i, w_i') \end{aligned}$$
(18)
$$\begin{aligned}&\text { and } h(w, w') = {\left\{ \begin{array}{ll} 0 &{} \text { if } w = w'\\ 1 &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(19)
Fig. 5
figure 5

Parameter analysis for the diversity weight \(\lambda\) for the route planning problem. We show best fitness among all generations for 20 different settings of \(\lambda = 0.0, 0.05,\ldots , 1.0\). All results averaged over 20 independent runs each, the standard deviation shown in transparent lines

Besides, we apply the same evolutionary processes as in Sect. 4.1 but the parameter search shown in Fig. 5 now recommended \(\lambda = 0.25\) for weighting now Hamming-based diversity.Footnote 6 We evolve 20 independent populations of size 50 for 400 generations and plot the same data we have seen before: Fig. 6a shows the best fitness achieved in evolution. Inherited fitness takes a lot more time but eventually almost reaches the level of Manhattan-diversity. However, both methods yield similarly solid results as fitness sharing or the naïve algorithm. This is mirrored by all methods showing quite stable behavior in Figs. 6e and 6f with the standard approach showing the highest fall within the first few generations as it matches \({\textsc {fpf}}\) the least. However, the results for all evolutions are very close together for the route planning problem.

Fig. 6
figure 6

Evolution for the route planning problem. Standard evolutionary process using \({\textsc {of}}\) shown in black, diversity-aware evolutionary process using \({\textsc {af}}\) with Hamming distance shown in blue. Inherited fitness (purple) and fitness sharing (orange) shown for comparison. Results averaged over 20 independent runs, the standard deviation is shown in transparent lines

4.3 Schwefel

Finally, we consider oneFootnote 7 of the canonical benchmark problems for evolutionary algorithms. The implementation of the Schwefel problem is taken from Rainville et al. (2012) while our study on the impact of diversity again follows experiments performed in Gabor et al. (2018).

The original fitness function is given as

$$\begin{aligned} \textit{schwefel}(\langle x_1, ..., x_n\rangle ) =418.9828872724339 \cdot n - \sum _{i=1}^n x_i \cdot \sin (\sqrt{|x_i|}) \end{aligned}$$
(20)

with \(x_1, ..., x_n \in [-500; 500] \subset \mathbb {R}\) where \(n=8\) is the dimensionality we use in our experiments (Rainville et al. 2012).Footnote 8 The resulting function is illustrated in Fig. 7. The Schwefel problem is a minimization problem looking for an x so that \(\textit{schwefel}(x) = 0\). Again, we normalize the result values defining \({\textsc {of}}(x) = \frac{1}{4000} \cdot \textit{schwefel}(x)\). Manhattan distance uses diversity weight \(\lambda = 0.3\) as suggested by Fig. 8.

Fig. 7
figure 7

Illustration of the Schwefel function for \(n=2\) dimensions. Image taken from (Benchmarks 2020)

Fig. 8
figure 8

Parameter analysis for the diversity weight \(\lambda\) for the Schwefel problem. We show best fitness among all generations for 10 different settings of \(\lambda = 0.0, 0.1,\ldots , 1.0\). All results averaged over 20 independent runs each, the standard deviation shown in transparent lines

We run the same kind of analysis as for the previous problems and plot the same data in Fig. 9. These experiments were performed on populations of size 50 evolving for 400 generations. Figure 9a shows a mixed picture: The Manhattan-diverse evolution again outperforms the standard approach, but inherited fitness hardly yields any benefit while fitness sharing performs worse than the standard approach. In Fig. 9d we see a different picture than before: All final productive fitness values are not as stable anymore but vary throughout the evolution, suggesting that solutions to the Schwefel problem are more influenced by migration (and thus show no continuous lineage) than for the other problems considered. Fig. 9e shows the \(|{\textsc {af}}- {\textsc {fpf}}|\) metric, which measures the individual difference between augmented fitness and final productive fitness. We observe very clearly that, after a short starting phase, this distance remains much more stable for the Manhattan-diverse evolution than any other approach, indicating that the augmented fitness is approximating the final productive fitness. Note again that in the last few generations, computing the final productive fitness has limited meaning. In Fig. 9f this behavior shows clearly as Manhattan-diverse evolution forms a line at the very bottom of the plot with almost no change in how \({\textsc {af}}\) matches \({\textsc {fpf}}\). All other approaches, which perform noticeably worse, also show a much more erratic pattern in how well their augmented fitness matches the final productive fitness.

Fig. 9
figure 9

Evolution for the Schwefel problem. Standard evolutionary process using \({\textsc {of}}\) shown in black, diversity-aware evolutionary process using \({\textsc {af}}\) with Manhattan distance shown in blue. Inherited fitness (purple) and fitness sharing (yellow) shown for comparison. Results averaged over 20 independent runs, the standard deviation is shown in transparent lines

5 Related work

Diversity has been a central topic of research in evolutionary algorithms (den Heijer and Eiben 2012; Morrison and De Jong 2001; Toffolo and Benini 2003; Ursem 2002). Its positive effect on the evolutionary process has often been observed there, but rarely been interpreted beyond a biological metaphor, i.e., “diversity is a key element of the biological theory of natural selection and maintaining high diversity is supposed to be generally beneficial” (Corno et al. 2005).

Without much concept of what to look for in a mechanism for diversity-awareness, lots of variants have spawned in research. Instead of repeating them, we would like to point out a few resources for a comprehensive overview: Burke et al. (2004), among others like Brameier and Banzhaf (2002 or McPhee and Hopper (1999) discuss various means to measure and promote diversity in genetic programming, which for the most part should apply to all evolutionary algorithms. They also provide an extensive analysis of the connection between diversity and achieved fitness, but do not define productive fitness or a similar notion. A more recent comprehensive overview of means to describe and enable diversity has been put together by Squillero and Tonda (2016), also providing a taxonomy on various classes of approaches to diversity. Gabor et al. (2018) provide a quantitative analysis of various means of maintaining inheritance-based diversity on standard domains like the ones we used in this paper. Regarding the multitude of diversity mechanisms present in research, however, it is most important to also point out the results of Wineberg and Oppacher (2003), who most drastically show that “all [notions of diversity] are restatements or slight variants of the basic sum of the distances between all possible pairs of the elements in a system” and suggest that “experiments need not be done to distinguish between the various measures”, a point which we already built upon in our evaluation.

Note that a variety of “meta-measurements” for the analysis of evolutionary processes exist: Effective fitness measures the minimum fitness required for an individual to increase in dominance at a given generation (when in competition with the other individuals) (Stephens 1999). It is related to reproductive fitness, which is the probability of an individual to successfully produce offspring (Hu and Banzhaf 2010). Both occur at the foundation of productive fitness, but do not include the (computationally overly expensive) diachronical analysis of the overall effect for the end result. Our approach is also comparable to entropy-based diversity preservation (Squillero and Tonda 2008), where the positive effect of certain individuals on the population’s entropy is measured and preserved in order to deliberately maintain higher entropy levels. By contrast, our approach is based on the fitness values only (without the need to look into the individuals beyond their genealogical relationships) and thus also cannot be used directly as a secondary goal in evolution but purely as a tool of a posteriori analysis on the effectiveness of other secondary goal definitions.

When we construct the “optimal evolutionary process”, we construct a dynamic optimization problem from a traditionally static one. It is interesting that specifically dynamic or on-line (Bredeche et al. 2009) evolutionary algorithms have been shown to benefit from increased diversity especially when facing changes in their fitness functions (Gabor et al. 2018; Grefenstette 1992). While this is obviously intuitive as more options in the population allow for higher coverage of possible changes, the reverse connection (pointed to by this work) is not stated there, i.e., that diversity in static domains may work because even for static domains the optimization process is inherently dynamic to some degree.

6 Conclusion

We have introduced the novel notion of final productive fitness \({\textsc {fpf}}\) (and all the definitions it is built upon). We make a theoretical argument that \({\textsc {fpf}}\) is the goal an optimal evolutionary process should strive for to achieve the best overall results. However, \({\textsc {fpf}}\) cannot be computed efficiently in advance, producing the need for an approximation. We argue that the well-known technique of augmenting the objective fitness function with an additional diversity goal (when it helps) happens to effectively approximate the theoretically derived \({\textsc {fpf}}\) (at least better than just the objective fitness on its own). We have shown this connection empirically on benchmark domains. We argue that this provides first insight into why and how diversity terms are beneficial to evolutionary processes.

Immediate future work would consist of answering the when and which: We have tested several domains for evolutionary algorithms and many are too simple to further benefit from explicit diversity-awareness. Maybe \({\textsc {fpf}}\) can be used to derive a criterion to estimate the usefulness of diversity in advance. Similarly, many mechanism to cater explicitly to diversity exist. While many can be subsumed by the pairwise distance used here (Wineberg and Oppacher 2003), others may still show different behavior. Their relation to \({\textsc {fpf}}\) requires further work.

There are means of maintaining diversity without altering the fitness function, most prominently structural techniques like island models (Tomassini 2006; Whitley et al. 1999) or hypermutation phases (Morrison and De Jong 2000; Simões and Costa 2002). As no match for \(|{\textsc {af}}- {\textsc {fpf}}|\) can be computed for them, we omitted them in this first analysis. Final productive fitness may be a useful tool to translate these structural means into an effect of the fitness function.

Eventually, there may be even more direct or outright better (compared to using diversity or similarly augmented fitness) approximations of \({\textsc {fpf}}\) to be found now that we know what we are looking for. The ultimate goal of the research into \({\textsc {fpf}}\) might be to utilize it directly or indirectly in actually constructing new types of evolutionary algorithms (instead of it “only” helping to explain how well-known types work). We can imagine:

  • We could compute \({\textsc {fpf}}\) for a simplified version of the problem (and the algorithm) for various parametrizations (Eiben and Smith 2003; Mitchell 1998). The \({\textsc {fpf}}\)’s values could then help evaluate the respective parametrization’s success. However, for most complex problems it is not entirely clear how to derive simpler instances that still preserve the interesting or challenging aspects of their larger counterparts. Furthermore, it is unclear how \({\textsc {fpf}}\) might provide more information than just using the respective runs’ \({\textsc {of}}\).

  • In this paper, we checked how well various established models for fitness functions approximate \({\textsc {fpf}}\). Instead, we might now construct new models with the goal to approximate \({\textsc {fpf}}\) better. Surrogate models have been used in evolutionary algorithms to approximate computationally expensive fitness functions (Gabor and Altmann 2019; Jin 2005; Jin and Sendhoff 2004). In our case, a surrogate model would have to approximate our a-posteriori-approximation of \({\textsc {fpf}}\) and could then maybe save time for future evaluations. It should be noted that our results make it seem plausible that no general model for \({\textsc {fpf}}\) on all domains should exist and we cannot train the surrogate as the algorithm goes along since our target metric is only computed a posteriori. However, we might still be able to learn an n-\({\textsc {pf}}\) surrogate for small n or a similar target metric. In Gabor and Linnhoff-Popien (2020) we already suspect (based on the a posteriori approximation as we do in this paper) that \({\textsc {fpf}}\) constructs a simpler fitness landscape compared to, quite like surrogates do, suggesting that surrogates may be trained to achieve a similar fitness landscape. That dynamic should be explored in future work.

  • In Gabor and Belzner (2017) and Gabor et al. (2018) we introduce genealogical distance as a diversity metric for evolutionary algorithms. We show that it provides similar although at times inferior results to distance-based diversity metrics (i.e., the Manhattan and Hamming diversity we use in this paper). However, genealogical diversity may provide means to approximate genealogical relations between individuals making our a-posteriori-approximation much easier to compute (probably at the expense of accuracy). Future work should evaluate if that approach brings any relevant benefits.

Some of these may be applicable to other methods of optimization as well: We suspect that the notion of final productive fitness translates directly to all optimization methods (ranging from simulated annealing or particle swarm optimization to Monte-Carlo tree search and backpropagation) that may or may not already implement means to approximate final productive fitness rather than just objective fitness.