Productive fitness in diversity-aware evolutionary algorithms

In evolutionary algorithms, the notion of diversity has been adopted from biology and is used to describe the distribution of a population of solution candidates. While it has been known that maintaining a reasonable amount of diversity often benefits the overall result of the evolutionary optimization process by adjusting the exploration/exploitation trade-off, little has been known about what diversity is optimal. We introduce the notion of productive fitness based on the effect that a specific solution candidate has some generations down the evolutionary path. We derive the notion of final productive fitness, which is the ideal target fitness for any evolutionary process. Although it is inefficient to compute, we show empirically that it allows for an a posteriori analysis of how well a given evolutionary optimization process hit the ideal exploration/exploitation trade-off, providing insight into why diversity-aware evolutionary optimization often performs better.


Introduction
Evolutionary algorithms are a widely used type of stochastic optimization that mimics biological evolution in nature. Like any other metaheuristic optimization algorithm (Brown et al. 2005;Conti et al. 2018), they need to maintain a balance on the exploration/exploitation trade-off in their search process: High exploration bears the risk to miss out on optimizing the intermediate solutions to the fullest; high exploitation bears the risk to miss the global optimum and get stuck in a sub-optimal part of the search space. Analogous to biological evolution, diversity within the population of solution candidates has been identified as a central feature to adjust the exploration/exploitation trade-off. Many means to maintain the diversity of the population throughout the process of evolution have been developed in literature; comprehensive overviews are provided by Squillero and Tonda (2016) and , for example.
For problems with complex fitness landscapes, it is well known that increased exploration (via increased diversity) yields better overall results in the optimization, even when disregarding any diversity goal in the final evaluation (Ursem 2002;Toffolo and Benini 2003). However, this gives rise to a curious phenomenon: By augmenting the fitness function and thus making it match the original objective function less, we actually get results that optimize the original objective function more. This implies that any evolutionary algorithm does not immediately optimize for the fitness function it uses (but instead optimizes for a slightly different implicit goal). Furthermore, to really optimize for a given objective function, one should ideally use a (slightly) different fitness function for evolution. In this paper, we introduce final productive fitness as a theoretical approach to derive the ideal fitness function from a given objective function.
We see that final productive fitness cannot feasibly be computed in advance. However, we show how to approximate it a posteriori, i.e., when the optimization process is already finished. We show that the notion of final productive fitness is sound by applying it to the special case of diversity-aware evolutionary algorithms, which (for our purposes) are algorithms that directly encode a strife for increased diversity by altering the fitness of the individuals. By running these on various benchmark problems, we empirically show that diversity-aware evolutionary processes might just approximate final productive fitness more accurately than an evolutionary process using just the original objective. We show that the fitness alteration performed by these algorithms, when it improves overall performance, does so while (perhaps because) it better approximates final productive fitness. We thus argue that the notion of final productive fitness for the first time provides a model of how diversity is beneficial to evolutionary optimization, which has been called for by various works in literature: • ''One of the urgent steps for future research work is to better understand the influence of diversity for achieving good balance between exploration and exploitation.'' (Č repinšek et al. 2013), • ''This tendency to discover both quality and diversity at the same time differs from many of the conventional algorithms of machine learning, and also thereby suggests a different foundation for inferring the approach of greatest potential for evolutionary algorithms.'' (Pugh et al. 2016), • ''However, the fragmentation of the field and the difference in terminology led to a general dispersion of this important corpus of knowledge in many small, hard-to-track research lines' ' and, ''[w]hile diversity preservation is essential, the main challenge for scholars is devising general methodologies that could be applied seamlessly [...]'' (Squillero and Tonda 2016).
It should be noted that the approach presented in this paper merely provides a new perspective on exploration/exploitation in evolutionary algorithms and a new method of analyzing the effects of diversity. It is up to future works to derive new means to actively promote diversity from this analysis.
In this paper, we provide a short mathematical description of evolutionary processes in Sect. 2 and build our notion of (final) productive fitness on top of that in Sect. 3. Section 4 describes the empirical results and Sect. 5 discusses related work before Sect. 6 concludes.

Foundations
For this paper, we assume an evolutionary process (EP) to be defined as follows: Given a fitness function f : X ! ½0; 1 & R for an arbitrary set X called the search space, we want to find an individual x 2 X with the best fitness f(x). For a maximization problem, the best fitness is that of an individual x so that f ðxÞ ! f ðx 0 Þ 8x 0 2 X. For a minimization problem, the best fitness is that of an individual x so that f ðxÞ f ðx 0 Þ 8x 0 2 X. Note that we normalize our fitness space on ½0; 1 & R for all problems for ease of comparison. Whenever the maximum and minimum fitness are bounded, this can be done without loss of generality.
Usually, the search space X is too large or too complicated to guarantee that we can find the exact best individual(s) using standard computing models (and physically realistic time). Thus, we take discrete subsets of the search space X via sampling and iteratively improve their fitness. An evolutionary process E over g generations, g 2 N, is defined as E ¼ hX; e; f ; ðX i Þ i\g i. X is the search space.
e : 2 X ! 2 X is the evolutionary step function so that X iþ1 ¼ eðX i Þ 8i ! 0. As defined above, f : X ! ½0; 1 & R is the fitness function. ðX i Þ i\g is a series of populations so that X i X 8i and X 0 is the initial population. Note that as the evolutionary step function e is usually non-deterministic, we define EðXÞ ¼ fX 0 jX 0 ¼ eðXÞg to be the set of all possible next populations.
We use the following evolutionary operators: • The recombination operator rec : X Â X ! X generates a new individual from two individuals. • The mutation operator mut : X ! X alters a given individual slightly to return a new one. • The migration operator mig : X generates a random individual migðÞ 2 X . • The (survivors) selection operator sel : 2 X Â N ! 2 X returns a new population X 0 ¼ selðX; nÞ given a population X X, so that jX 0 j n.
The operators rec; mut; mig can be applied to a population X by choosing individuals from X to fill their parameters (if any) according to some selection scheme r : 2 X ! 2 X and adding their return to the population. For example, we allow to write mut r ðXÞ ¼ X [ f mutðx 0 Þ j x 0 2 rðXÞ g. Note that all children are added to the population and do not replace their parents in this formulation. For any evolutionary process E ¼ hX; e; f ; ðX i Þ i\g i and selection schemes r 1 ; r 2 ; r 3 we assume that Usually, we assume that an evolutionary process fulfills its purpose if the best fitness of the population tends to become better over time, i.e., given a sufficiently large amount of generations k 2 N, it holds for maximization problems that max x2X i f ðxÞ\ max x2X iþk f ðxÞ. We define the overall result of an evolutionary process E ¼ hX ; e; f ; ðX i Þ i\g i with respect to a fitness function / (which may or may not be different from the fitness f used during evolution) to be best value found and kept in evolution, i.e., for a maximizing objective / we define Note that there are evolutionary processes which include a hall-of-fame mechanism, i.e., are able to return the result fitness However, we can derive the equality jEj / ¼ jjEjj / when we assume elitism with respect to /, i.e., arg max x2X i /ðxÞ 2 X iþ1 for all i ¼ 1; :::; g. Since it makes reasoning easier and hardly comes with any drawback for sufficiently large populations, we use elitist evolutionary processes (with respect to f) from here on.

Approach
The central observation we build our analysis on is that in many cases the results of optimizing for a given objective function (called of) can be improved by not using of as a the fitness function f of the evolutionary process directly. Consequently, changing the fitness function f away from the true objective of in some cases leads to better results with respect to the original objective function of. Note that this phenomenon extends beyond just heuristic optimization and is known as reward shaping in reinforcement learning, for example (Ng et al. 1999).
In evolutionary algorithms oftentimes a property called diversity is considered in addition to the objective function of to improve the progress of the evolutionary process Squillero and Tonda 2016;Ursem 2002). In some way or the other, diversity-enhancing evolutionary algorithms award individuals of the population for being different from other individuals in the population. While there are many ways to implement this behavior, like topology-based methods (Tomassini 2006), fitness sharing (Sareni and Krahenbuhl 1998), ensembling (Hart and Sim 2018), etc., we consider an instance of diversity-enhancing evolutionary algorithms that is simpler to analyze: By quantifying the distance of a single individual to the population, we can define a secondary fitness sf that rewards high diversity in the individual. This approach was shown by Wineberg and Oppacher (2003) to be an adequate general representation of most well-known means of measuring diversity in a population.
In order to avoid the difficulties of multi-objective evolution, we can then define the augmented fitness function af that incorporates both the objective fitness of and the secondary fitness sf into one fitness function to be used for the evolutionary process.
Definition 1 (Augmented Fitness) Given the objective fitness of, a diversity-aware secondary fitness sf, and a diversity weight k 2 ½0; 1 & R, we define the augmented fitness af as As is shown in  and Wineberg and Oppacher (2003) such a definition of the augmented fitness suffices to show benefits of employing diversity.
We can then define two evolutionary processes E of ¼ hX ; e; of; ðX i Þ i\g i and E af ¼ hX; e; af; ðX 0 i Þ i\g i. We observe the curious phenomenon that in many cases the augmented fitness af better optimizes for of than using of itself, formally which raises the following question: If of is not the ideal fitness function to optimize for the objective of, what is? Given a sequence of populations ðX i Þ i\g spanning over multiple generations i ¼ 1; :::; g we can write down what we actually want our population to be like inductively starting from the last generation g: The net benefit of X g to our (maximizing) optimization process is exactly as this population will not evolve any further and thus the best individual within X g is what we are going to be stuck with as the result of the optimization process. Note that the individuals of X gÀ1 already contribute differently to the result of the optimization process: From the perspective of generation g À 1 the overall optimization result is where the follow-up generation X g ðxÞ is any 1 population from fX g j X g 2 EðX gÀ1 Þ^x 2 X g g, i.e., the possible next populations where x survived. Intuitively, the contribution of the the second-to-last generation X gÀ1 to the result of the optimization process stems from the objective fitness of that this generation's individuals can still achieve in the final generation X g . Generally, this does not fully coincide with the application of the objective function of in said generation: That means: While rating individuals according to their objective fitness of in the last generation of the evolutionary process is adequate, the actual benefit of the individual x to the optimization result and the value of ofðxÞ may diverge more the earlier we are in the evolutionary process. Accordingly, at the beginning of an evolutionary process, the objective fitness of might not be a good estimate of how much the individuals will contribute to the process's return with respect to of at the end of the optimization process. Still, standard optimization techniques often use the objective fitness of as a (sole) guideline for the optimization process. Instead, we ideally want to make every decision (mutation, recombination, survival, ...) at every generation X i with the ideal result for the following generations X iþ1 ; X iþ2 ; ::: and ultimately the final generation X g in mind. We call this the optimal evolutionary process. Obviously, to make the optimal decision early on, we would need to simulate all the way to the end of the evolution, including all the follow-up decisions. This renders optimal evolution infeasible as an algorithm. However, we can use it for a posteriori analysis of what has happened within a different evolutionary process. In order to do so, we need to give a fitness function for the optimal process (as it obviously should not be of).
Instead, we formalize the benefit to the optimization process discussed above and thus introduce the notion of productive fitness. But first, we need a simple definition on the inter-generational relationships between individuals.
Definition 2 (Descendants) Given an individual x in the population of generation i, x 2 X i ; of an evolutionary process. All individuals x 0 2 X iþ1 so that x 0 resulted from x via a mutation operator, i.e., x 0 ¼ mutðxÞ, or a recombination operator with any other parent, i.e., there exists y 2 X i so that x 0 ¼ recðx; yÞ, are called direct descendants of x. Further given a series of populations ðX i Þ 0\i\g we define the set of all descendants D x as the transitive hull on all direct descendants of x.
We can now use this relationship to assign the benefit that a single individual has had to the evolution a posteriori. For this, we simply average the fitness of all its surviving descendants.
Definition 3 (Productive Fitness) Given an individual x in the population of generation i, x 2 X i , of an evolutionary process. Let D x X be the set of all descendants from x. The productive fitness after n generations or n-productive fitness is the average objective fitness of x's descendants, written Note that in case the individual x has no descendants in n generations, we set its productive fitness pf n ðxÞ to a worst case value w, which in our case of bounded fitness values is 0 for maximizing optimization processes and 1 for minimizing optimization processes.
We argue that the productive fitness pf is better able to describe the actual benefit the individual brings to the optimization process, as represented by what parts of the individual still remain inside the population in a few generations. Note that our notion of productive fitness is rather harsh in two points: • We only take the average of all descendants' fitness.
One could argue that we may want a more optimistic approach where we might reward the individual for the best offspring it has given rise to. However, we argue that every bad individual binds additional resources for eliminating it down the road and thus a low target accuracy should actively be discouraged. • When the line of an individual dies out completely, we assign the worst possible fitness. Arguments could be made that even dead lines contribute to the search process by ruling out unpromising areas while, e.g., increasing the diversity scores of individuals in more promising areas of the search space. Still, we do count any however distant descendants, so even small contributions to the final population avoid the penalty w.
We leave the analysis of the effects of the discussed parameters to future work. Note that for now, our notion of productive fitness only covers a fixed horizon into the future. We can trivially extend this definition to respect the final generation no matter what generation the current individual is from: Definition 4 (Final Productive Fitness) Given an individual x in the population of generation i, x 2 X i , of an evolutionary process of g generations in total. The final productive fitness of x is the fitness of its descendants in the final generation, i.e., fpfðxÞ ¼ pf gÀi ðxÞ.
We argue that final productive fitness is able to describe what the fitness function of an optimal evolutionary process looks like: Every evaluation is done in regard to the contribution to the final generation, i.e., the ultimate solution returned by the search process.
Thesis 1 When rolling the ideal choices in all randomized evolutionary operators, final productive fitness fpf is the optimal fitness function for evolutionary processes, i.e., an evolutionary process yields the best results when it optimizes for fpf at every generation.
We sketch a short argument in favor of Thesis 1. For a more in-depth discussion, see Gabor and Linnhoff-Popien (2020). Let E fpf ¼ hX; e; fpf; ðX fpf i Þ i\g i be an evolutionary process using final productive fitness fpf. Let E idf ¼ hX ; e; idf; ðX idf i Þ i\g i be an evolutionary process using a different (possibly more ideal) fitness idf. Let X fpf Since Eq. 10 implies that at least X fpf g 6 ¼ X idf g , there is an individual x 2 X idf g so that x 6 2 X fpf g and ofðxÞ [ max y2X fpf g ofðyÞ. Since both E fpf and E idf use the same evolutionary step function e except for the used fitness, their difference regarding x needs to stem from the fact that there exists an individual x 0 that is an ancestor of x, i.e., x 2 D x 0 , so that x 0 was selected for survival in E idf and not in E fpf , which implies that fpfðx 0 Þ\idfðx 0 Þ. However, since x is a possible descendant for x 0 , the computation of fpfðx 0 Þ should have taken ofðxÞ into account, 2 meaning that x 0 should have survived in E fpf after all, which contradicts the previous assumption.
h Of course, Thesis 1 is a purely theoretical argument as we cannot guarantee optimal choices in usually randomized evolutionary operators and productive fitness in general thus comes with the reasonable disadvantage that it cannot be fully computed in advance. But for a given, completed run of an evolutionary process, we can compute the factual fpf single individuals had a posteriori. There, we still do not make optimal random choices but just assume the ones made as given.
Still, we take Thesis 1 as hint that final productive fitness might be the right target to strive for. We argue that augmenting the objective fitness of (even with easily computable secondary fitness functions) may result in a fitness function which better approximates final productive fitness fpf. In the following Sect. 4, we show empirically that (in the instances where it helps 3 ) diversity-based secondary fitness sf resembles the final productive fitness fpf of individuals much better than the raw objective function of does.
Thesis 2 When a diversity-aware augmented fitness function af is aiding the evolutionary optimization process with respect to an objective fitness of, it is doing so by approximating the final productive fitness fpf of a converged evolutionary process in a more stable way (i.e., more closely when disregarding the respective scaling of the fitness functions) throughout the generations of the evolutionary process.
This connection not only explains why diversity-aware fitness functions fare better than the pure objective fitness but also poses a first step towards a description how to deliberately construct diversity-aware fitness functions, knowing that their ideal purpose is to approximate the not fully computable final productive fitness. Again, we refer to Gabor and Linnhoff-Popien (2020) for more elaborate theoretical arguments.
Since we cannot estimate all possible futures for an evolutionary process, we provide empirical evidence in favor of Thesis 2 using a a posteriori approximation: Given an already finished evolutionary process, we compute the fpf values given only those individuals that actually came into being during that single evolutionary process (instead of using all possible descendants). We argue that this approximation is valid because if the evolutionary process was somewhat successful, then all individuals' descendants should be somewhat close to their ideal descendants most likely. 4 Note that the reverse property is not true (i.e., even in a bad run, individuals still aim to generate better descendants, not worse), which is why our approximation does not permit any statements about augmented fitness that does not aid the evolutionary process.

Experiments
For all experiments, we run an evolutionary process as defined in Sect. 2 with a mutation operator mut that adds a (possibly negative) random value to one dimension of individual, applied with rate 0.1 to all individuals at random. For rec we apply random crossover with rate 0.3 for a single individual and a randomly chosen mate. We apply mig with a rate of 0.1 . Following Wineberg and Oppacher (2003) and the results in , we focus on a Manhattan distance function for the secondary fitness; we also plot evolutionary processes using fitness sharing with parameter a ¼ 2:0 and dissimilarity threshold r ¼ n, where n is the dimensionality of the problem (Sareni and Krahenbuhl 1998), or inherited fitness with inheritance weight j ¼ 0:5 (Chen et al. 2002; for comparison, 2 Note that ofðxÞ cannot be compensated by other descendants of x 0 with possibly bad objective fitness even as we average the results because all offspring is created by a random choice, which we assume to be ideal. This also shows how strong that assumption is. 3 Note the gravity of that restriction: We do not consider failed runs of evolutionary algorithms since we have no assumptions on how fpf should behave. i.e., relate to af, in that case. Future work may fill that void. 4 Note that we could construct a terrible evolutionary process that just happens to find the global optimum in the last generation out of the blue via random migration. That process would have a poor stability between af and fpf but a very successful result. However, since evolution at every step tries not to be terrible, we consider that scenario to be quite unlikely so that it should not play a role when we analyze the augmented fitness on multiple runs, parametrizations, and domains. since both approaches also use an adapted fitness function to promote diversity. 5 The selection operator sel is a simple rank-based cut-off in the shown evolutionary processes. Cut-off with protection for new individuals as well as roulette wheel selection was also tested without yielding noticeably different results.
All code that produced the results of this paper is available at github.com/thomasgabor/naco-evolib.

Pathfinding
We start with the pathfinding problem, which was shown to greatly benefit from employing diversity in the optimization process : Given a room of dimensions 1 Â 1, we imagine a robot standing at position (0.5, 0.1). It needs to reach a target area at the opposite side of the room. See Fig. 1 for an illustration. The room also features a huge obstacle in the middle and thus the robot needs to decide on a way around it. The agent can move by performing an action a 2 fðdx; dyÞj À 0:33\dx\ 0:33; À0:33\dy\0:33g. A single solution candidate consists of n ¼ 5 actions hðdx i ; dy i Þi 1 i n . It achieves a reward of 1 n ¼ 0:2 every time it stays within the target area between steps, i.e., its fitness is given via The pathfinding problem lends itself to the application of diversity, as the optimization process in most cases first strikes a local optimum where it reaches the target area sometime by accident (and most probably towards the end of its steps). It then needs to switch to the global optimum where the first three steps are as goal-directed as possible and the last two steps are very small in order to stay within the target area. We now compare a standard evolutionary algorithm given only the objective function ofðxÞ ¼ f ðxÞ to a diversity-aware evolutionary algorithm using the Manhattan distance on the solution candidate structure as a secondary fitness function, i.e., afðxÞ ¼ ð1 À kÞ Á ofðxÞ þ k Á sfðxÞ as given in Definition 1 where sfðxÞ ¼ 1 2n Á avg x 0 2r 4 ðXÞ manhattanðx; x 0 Þ ð14Þ and manhattanðhðdx i ; dy i Þi 1 i m ; Note that r 4 is a selection function that randomly selects 10 individuals from the population X. We use it to reduce the computational cost of computing the pairwise distance for all individuals in the population. Its admissibility for approximating the full pairwise distance was shown in Gabor and Belzner (2017). Just as we normalized the fitness function f to ½0; 1 & R we also normalize the secondary fitness sf to the same range via division by the maximum Manhattan distance between two individuals, i.e., 2n, to make the combination easier to understand. For now, we set k ¼ 0:4, which we discuss later. Each evolution was run 30 times for 1500 generations each, using a population size of 50. Figure 2a shows the best fitness achieved per generation for all tested approaches. We see that (especially distance-based) diversityaware evolution produces much better objective results. Figure 2b shows the separate diversity score sf maintained by the best individual, which can only be computed in a meaningful way for Manhattan diversity. In Fig. 2c the standard approach shows the same plot as before since its fitness is not augmented. For all other approaches we plot the augmented fitness af that is actually used for selection. We see that due to the combination of distance and objective fitness, Manhattan-diverse evolution starts higher but climbs slower than the respective objective fitness. Fitness sharing results in very small absolute values for fitness but climbs up nonetheless. From the already run evolutionary processes, we can compute the final productive fitness as given in Definitions 3 and 4 a posteriori. Figure 2d shows the maximum fpf per generation. We see that Manhattan-diverse and inherited fitness maintain a rather continuous lineage from the initial population to the best solution in the final generation as the final fitness propagates to the final productive fitness of very early generations. This behavior is rather unsurprising but illustrates the notion of the final productive fitness that measures the individuals' impact in the final result.
For Fig. 2e we compute the perhaps most interesting measurement: This plot shows for each population X in a given generation the result of avg x2X jafðxÞ À fpfðxÞj, i.e., the average difference between the augmented fitness and the final productive fitness per individual. Thus, we get to assess how well the augmented fitness approximates the final productive fitness. There are a few observations to be made: 1. Towards the last few generations, we notice a rapid spike in the fpf as the amount of descendants in the final generation to be considered for the fpf decreases fast. 2. The actual value of the distance (i.e., the height of the line) is irrelevant to the analysis of Thesis 2 and largely determined by the setting of k. 3. Throughout most of the plot, the Manhattan-diverse evolution maintains a relatively stable level, i.e., the augmented fitness af approximates the final productive fitness fpf throughout the evolution. The less stable evolutions also show a worse overall result.
To further elaborate on that last point, we consider Fig. 2: It shows the average absolute value of change over 150generations-wide windows of the jafðxÞ À fpfðxÞj metric used in Fig. 2e. The plot was smoothed using a convolution kernel h1; . . .; 1i of size 25. Roughly speaking, we can see the slope of the plots in Fig. 2e here. In this plot, good evolutionary processes should maintain rather low values according to Thesis 2. We can observe that Manhattandiverse evolution maintains the lowest values almost throughout the entire evolution. While fitness sharing shows increases and decreases in matching the fpf at a higher level than Manhattan diversity, inherited fitness shows a huge spike in the beginning (as does the standard approach), thus making a much less stable match for the fpf. As proposed by Thesis 2, the match between af and fpf roughly corresponds to the quality of the overall result of the evolutionary process.
As mentioned earlier, we also further analyzed the importance of the setting for k for the evolution. Figure 3 shows the impact of k on the best results generated by the evolution. k ¼ 0 equals the standard evolution in all previous plots. Unsurprisingly, we see that some amount of diversity-awareness improves the results of evolution but setting k too high favors diversity over the actual objective fitness and thus yields very bad results. We want to add that more intricate version of Manhattan-based augmented fitness af might aim to adjust the k parameter during evolution just as inherited fitness and fitness sharing might want to adjust their parameters. For these experiments, we chose a static parameter setting for simplicity.

The route planning problem
The route planning problem is a discrete optimization problem with a similar motivation as the pathfinding problem. Again, we adapt the problem and its description from .
A robot needs to perform n ¼ 12 different tasks in a fixed order by visiting relevant workstations. Each workstation can perform exactly one of the tasks and for each task, there are o ¼ 5 identical workstations to choose from. Accordingly, a solution candidate is a vector hw 1 ; . . .; w n i with w i 2 f1; :::; og for all 1 i n. See Fig. 4 for an illustration using a smaller setting. A single workstation W can be identified by a tuple of its task type and its number, i.e., W ¼ ði; kÞ for some 1 i n and 1 k o. To mimic various means of transport, the distance DðW; W 0 Þ between every two workstations W ¼ ði; kÞ and W 0 ¼ ðj; lÞ is randomized individually within a range of ½0; 1 n & R. Note that this (most likely) gives rise to a non-euclidean space the robot is navigating. The objective fitness for this minimization problem is given via ofðhw 1 ; :::; w n iÞ Again, we from here also construct an augmented fitness afðxÞ ¼ ð1 À kÞ Á ofðxÞ þ k Á sfðxÞ (cf. Definition 1) but now use the Hamming distance as a secondary fitness so that Productive fitness in diversity-aware evolutionary algorithms 369 where hammingðhw 1 ; . . .; w n i; hw 0 1 ; . . .; w 0 n iÞ ¼ and hðw; w 0 Þ ¼ Besides, we apply the same evolutionary processes as in Sect. 4.1 but the parameter search shown in Fig. 5 now recommended k ¼ 0:25 for weighting now Hammingbased diversity. 6 We evolve 20 independent populations of size 50 for 400 generations and plot the same data we have seen before: Fig. 6a shows the best fitness achieved in evolution. Inherited fitness takes a lot more time but eventually almost reaches the level of Manhattan-diversity. However, both methods yield similarly solid results as fitness sharing or the naïve algorithm. This is mirrored by all methods showing quite stable behavior in Figs. 6e and 6f with the standard approach showing the highest fall within the first few generations as it matches fpf the least.
(e) (f) Fig. 2 Evolution for the pathfinding problem. Standard evolutionary process using of shown in black, diversity-aware evolutionary process using af with Manhattan distance shown in blue. Inherited fitness (purple) and fitness sharing (orange) shown for comparison. All results averaged over 25 independent runs, the standard deviation is shown in transparent lines However, the results for all evolutions are very close together for the route planning problem.

Schwefel
Finally, we consider one 7 of the canonical benchmark problems for evolutionary algorithms. The implementation of the Schwefel problem is taken from Rainville et al. (2012) while our study on the impact of diversity again follows experiments performed in . The original fitness function is given as schwefelðhx 1 ; :::; x n iÞ ¼ 418 with x 1 ; :::; x n 2 ½À500; 500 & R where n ¼ 8 is the dimensionality we use in our experiments (Rainville et al. 2012). 8 The resulting function is illustrated in Fig. 7. The Schwefel problem is a minimization problem looking for an x so that schwefelðxÞ ¼ 0. Again, we normalize the result values defining ofðxÞ ¼ 1 4000 Á schwefelðxÞ. Manhattan distance uses diversity weight k ¼ 0:3 as suggested by Fig. 8.
We run the same kind of analysis as for the previous problems and plot the same data in Fig. 9. These experiments were performed on populations of size 50 evolving for 400 generations. Figure 9a shows a mixed picture: The Manhattan-diverse evolution again outperforms the standard approach, but inherited fitness hardly yields any benefit while fitness sharing performs worse than the standard approach. In Fig. 9d we see a different picture than before: All final productive fitness values are not as stable anymore but vary throughout the evolution, suggesting that solutions to the Schwefel problem are more influenced by migration (and thus show no continuous lineage) than for the other problems considered. Fig. 9e shows the jaf À fpfj metric, which measures the individual difference between augmented fitness and final productive fitness. We observe very clearly that, after a short starting   Fig. 4 Illustration of the route planning problem for n ¼ 3 tasks and o ¼ 5 workstations per task phase, this distance remains much more stable for the Manhattan-diverse evolution than any other approach, indicating that the augmented fitness is approximating the final productive fitness. Note again that in the last few generations, computing the final productive fitness has limited meaning. In Fig. 9f this behavior shows clearly as Manhattan-diverse evolution forms a line at the very bottom of the plot with almost no change in how af matches fpf. All other approaches, which perform noticeably worse, also show a much more erratic pattern in how well their augmented fitness matches the final productive fitness.

Related work
Diversity has been a central topic of research in evolutionary algorithms (den Heijer and Eiben 2012; Morrison and De Jong 2001;Toffolo and Benini 2003;Ursem 2002). Its positive effect on the evolutionary process has often been observed there, but rarely been interpreted beyond a biological metaphor, i.e., ''diversity is a key element of the biological theory of natural selection and maintaining high diversity is supposed to be generally beneficial'' (Corno et al. 2005).
(e) (f) Fig. 6 Evolution for the route planning problem. Standard evolutionary process using of shown in black, diversity-aware evolutionary process using af with Hamming distance shown in blue. Inherited fitness (purple) and fitness sharing (orange) shown for comparison. Results averaged over 20 independent runs, the standard deviation is shown in transparent lines Without much concept of what to look for in a mechanism for diversity-awareness, lots of variants have spawned in research. Instead of repeating them, we would like to point out a few resources for a comprehensive overview: Burke et al. (2004), among others like Brameier andBanzhaf (2002 or McPhee andHopper (1999) discuss various means to measure and promote diversity in genetic programming, which for the most part should apply to all evolutionary algorithms. They also provide an extensive analysis of the connection between diversity and achieved fitness, but do not define productive fitness or a similar notion. A more recent comprehensive overview of means to describe and enable diversity has been put together by Squillero and Tonda (2016), also providing a taxonomy on various classes of approaches to diversity.  provide a quantitative analysis of various means of maintaining inheritance-based diversity on standard domains like the ones we used in this paper. Regarding the multitude of diversity mechanisms present in research, however, it is most important to also point out the results of Wineberg and Oppacher (2003), who most drastically show that ''all [notions of diversity] are restatements or slight variants of the basic sum of the distances between all possible pairs of the elements in a system'' and suggest that ''experiments need not be done to distinguish between the various measures'', a point which we already built upon in our evaluation.
Note that a variety of ''meta-measurements'' for the analysis of evolutionary processes exist: Effective fitness measures the minimum fitness required for an individual to increase in dominance at a given generation (when in competition with the other individuals) (Stephens 1999). It is related to reproductive fitness, which is the probability of an individual to successfully produce offspring (Hu and Banzhaf 2010). Both occur at the foundation of productive fitness, but do not include the (computationally overly expensive) diachronical analysis of the overall effect for the end result. Our approach is also comparable to entropybased diversity preservation (Squillero and Tonda 2008), where the positive effect of certain individuals on the population's entropy is measured and preserved in order to deliberately maintain higher entropy levels. By contrast, our approach is based on the fitness values only (without the need to look into the individuals beyond their genealogical relationships) and thus also cannot be used directly as a secondary goal in evolution but purely as a tool of a posteriori analysis on the effectiveness of other secondary goal definitions.
When we construct the ''optimal evolutionary process'', we construct a dynamic optimization problem from a traditionally static one. It is interesting that specifically dynamic or on-line (Bredeche et al. 2009) evolutionary algorithms have been shown to benefit from increased diversity especially when facing changes in their fitness functions Grefenstette 1992). While this is obviously intuitive as more options in the population allow for higher coverage of possible changes, the reverse connection (pointed to by this work) is not stated there, i.e., that diversity in static domains may work because even for static domains the optimization process is inherently dynamic to some degree. Fig. 7 Illustration of the Schwefel function for n ¼ 2 dimensions. Image taken from (Benchmarks 2020) Fig. 8 Parameter analysis for the diversity weight k for the Schwefel problem. We show best fitness among all generations for 10 different settings of k ¼ 0:0; 0:1; . . .; 1:0. All results averaged over 20 independent runs each, the standard deviation shown in transparent lines Productive fitness in diversity-aware evolutionary algorithms 373

Conclusion
We have introduced the novel notion of final productive fitness fpf (and all the definitions it is built upon). We make a theoretical argument that fpf is the goal an optimal evolutionary process should strive for to achieve the best overall results. However, fpf cannot be computed efficiently in advance, producing the need for an approximation. We argue that the well-known technique of augmenting the objective fitness function with an additional diversity goal (when it helps) happens to effectively approximate the theoretically derived fpf (at least better than just the objective fitness on its own). We have shown this connection empirically on benchmark domains. We argue that this provides first insight into why and how diversity terms are beneficial to evolutionary processes. Immediate future work would consist of answering the when and which: We have tested several domains for evolutionary algorithms and many are too simple to further benefit from explicit diversity-awareness. Maybe fpf can be used to derive a criterion to estimate the usefulness of diversity in advance. Similarly, many mechanism to cater explicitly to diversity exist. While many can be subsumed by the pairwise distance used here (Wineberg and  (e) (f) Fig. 9 Evolution for the Schwefel problem. Standard evolutionary process using of shown in black, diversity-aware evolutionary process using af with Manhattan distance shown in blue. Inherited fitness (purple) and fitness sharing (yellow) shown for comparison. Results averaged over 20 independent runs, the standard deviation is shown in transparent lines Oppacher 2003), others may still show different behavior. Their relation to fpf requires further work. There are means of maintaining diversity without altering the fitness function, most prominently structural techniques like island models (Tomassini 2006;Whitley et al. 1999) or hypermutation phases (Morrison and De Jong 2000;Simões and Costa 2002). As no match for jaf À fpfj can be computed for them, we omitted them in this first analysis. Final productive fitness may be a useful tool to translate these structural means into an effect of the fitness function.
Eventually, there may be even more direct or outright better (compared to using diversity or similarly augmented fitness) approximations of fpf to be found now that we know what we are looking for. The ultimate goal of the research into fpf might be to utilize it directly or indirectly in actually constructing new types of evolutionary algorithms (instead of it ''only'' helping to explain how wellknown types work). We can imagine: • We could compute fpf for a simplified version of the problem (and the algorithm) for various parametrizations (Eiben and Smith 2003;Mitchell 1998). The fpf's values could then help evaluate the respective parametrization's success. However, for most complex problems it is not entirely clear how to derive simpler instances that still preserve the interesting or challenging aspects of their larger counterparts. Furthermore, it is unclear how fpf might provide more information than just using the respective runs' of. • In this paper, we checked how well various established models for fitness functions approximate fpf. Instead, we might now construct new models with the goal to approximate fpf better. Surrogate models have been used in evolutionary algorithms to approximate computationally expensive fitness functions (Gabor and Altmann 2019;Jin 2005;Jin and Sendhoff 2004). In our case, a surrogate model would have to approximate our a-posteriori-approximation of fpf and could then maybe save time for future evaluations. It should be noted that our results make it seem plausible that no general model for fpf on all domains should exist and we cannot train the surrogate as the algorithm goes along since our target metric is only computed a posteriori. However, we might still be able to learn an n-pf surrogate for small n or a similar target metric. In Gabor and Linnhoff-Popien (2020) we already suspect (based on the a posteriori approximation as we do in this paper) that fpf constructs a simpler fitness landscape compared to, quite like surrogates do, suggesting that surrogates may be trained to achieve a similar fitness landscape. That dynamic should be explored in future work.
• In Gabor and Belzner (2017) and  we introduce genealogical distance as a diversity metric for evolutionary algorithms. We show that it provides similar although at times inferior results to distance-based diversity metrics (i.e., the Manhattan and Hamming diversity we use in this paper). However, genealogical diversity may provide means to approximate genealogical relations between individuals making our a-posteriori-approximation much easier to compute (probably at the expense of accuracy). Future work should evaluate if that approach brings any relevant benefits.
Some of these may be applicable to other methods of optimization as well: We suspect that the notion of final productive fitness translates directly to all optimization methods (ranging from simulated annealing or particle swarm optimization to Monte-Carlo tree search and backpropagation) that may or may not already implement means to approximate final productive fitness rather than just objective fitness.