Abstract
The heavy-tailed mutation operator proposed in Doerr et al. (GECCO 2017), called fast mutation to agree with the previously used language, so far was proven to be advantageous only in mutation-based algorithms. There, it can relieve the algorithm designer from finding the optimal mutation rate and nevertheless obtain a performance close to the one that the optimal mutation rate gives. In this first runtime analysis of a crossover-based algorithm using a heavy-tailed choice of the mutation rate, we show an even stronger impact. For the \((1+(\lambda ,\lambda ))\) genetic algorithm optimizing the OneMax benchmark function, we show that with a heavy-tailed mutation rate a linear runtime can be achieved. This is asymptotically faster than what can be obtained with any static mutation rate, and is asymptotically equivalent to the runtime of the self-adjusting version of the parameters choice of the \((1+(\lambda ,\lambda ))\) genetic algorithm. This result is complemented by an empirical study which shows the effectiveness of the fast mutation also on random satisfiable MAX-3SAT instances.
Similar content being viewed by others
Notes
As a reviewer of [1] pointed out, in [28] an upper bound was shown for the runtime of the \((1 + 1)\) EA with general mutation rate on the hurdle problem with hurdle width 2 and 3. This upper bound is minimized by the mutation rates \(\frac{2}{n}\) and \(\frac{3}{n}\). This could have been seen earlier as a hint that larger mutation rates can be useful. Since the central research question discussed in [28] was whether crossover is beneficial or not, apparently this detail was overlooked by the broader scientific audience.
We note that the work [5] conducted in parallel to ours suggests that a different choice is necessary when large fitness valleys need to be crossed.
This mutation can be interpreted as a standard bit mutation with rate \(\frac{\alpha }{n}\), but conditional on having the same number of flipped bits for all individuals.
References
Antipov, D., Buzdalov, M., Doerr, B.: Fast mutation in crossover-based algorithms. In Genetic and Evolutionary Computation Conference, GECCO 2020, pp. 1268–1276. ACM (2020)
Auger, A., Doerr, B. (eds.): Theory of Randomized Search Heuristics. World Scientific Publishing (2011)
Antipov, D., Doerr, B.: Runtime analysis of a heavy-tailed \((1+(\lambda , \lambda ))\) genetic algorithm on jump functions. In: Parallel Problem Solving From Nature, PPSN 2020, Part II, pp. 545–559. Springer (2020)
Antipov, D., Doerr, B., Karavaev, V.: A tight runtime analysis for the \({(1 + (\lambda ,\lambda ))}\) GA on LeadingOnes. In: Foundations of Genetic Algorithms, FOGA 2019, pp. 169–182. ACM (2019)
Antipov, D., Doerr, B., Karavaev, V.: The \((1 + (\lambda ,\lambda ))\) GA is even faster on multimodal problems. In: Genetic and Evolutionary Computation Conference, GECCO 2020, pp. 1259–1267. ACM (2020)
Bäck, T.: Optimal mutation rates in genetic search. In: International Conference on Genetic Algorithms, ICGA 1993, pp. 2–8. Morgan Kaufmann (1993)
Buzdalov, M., Doerr, B.: Runtime analysis of the \({(1+(\lambda ,\lambda ))}\) genetic algorithm on random satisfiable 3-CNF formulas. In: Genetic and Evolutionary Computation Conference, GECCO 2017, pp. 1343–1350. ACM (2017)
Doerr, B., Doerr, C.: Optimal static and self-adjusting parameter choices for the \({(1+(\lambda ,\lambda ))}\) genetic algorithm. Algorithmica 80, 1658–1709 (2018)
Doerr, B., Doerr, C., Ebel, F.: From black-box complexity to designing new genetic algorithms. Theoret. Comput. Sci. 567, 87–104 (2015)
Doerr, B., Jansen, T., Sudholt, D., Winzen, C., Zarges, C.: Mutation rate matters even when optimizing monotone functions. Evol. Comput. 21, 1–21 (2013)
Doerr, B., Künnemann, M.: Optimizing linear functions with the \((1+\lambda )\) evolutionary algorithm–different asymptotic runtimes for different instances. Theoret. Comput. Sci. 561, 3–23 (2015)
Doerr, B., Le, H.P., Makhmara, R., Nguyen, T.D.: Fast genetic algorithms. In: Genetic and Evolutionary Computation Conference, GECCO 2017, pp. 777–784. ACM (2017)
Doerr, B., Neumann, F. (eds.): Theory of Evolutionary Computation—Recent Developments in Discrete Optimization. Springer. https://cs.adelaide.edu.au/~frank/papers/TheoryBook2019-selfarchived.pdf (2020)
Doerr, B.: Does comma selection help to cope with local optima? In: Genetic and Evolutionary Computation Conference, GECCO 2020, pp. 1304–1313. ACM (2020)
Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. In: Doerr, B., Neumann, F. (eds) Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pp. 1–87. Springer, https://arxiv.org/abs/1801.06733 (2020)
Garnier, J., Kallel, L., Schoenauer, M.: Rigorous hitting times for binary mutations. Evol. Comput. 7, 173–203 (1999)
Goldman, B.W., Punch, W.F.: Parameter-less population pyramid. In: Genetic and Evolutionary Computation Conference, GECCO 2014, pp. 785–792. ACM (2014)
Gießen, C., Witt, C.: The interplay of population size and mutation probability in the \({(1 + \lambda )}\) EA on OneMax. Algorithmica 78, 587–609 (2017)
He, J., Yao, X.: Drift analysis and average time complexity of evolutionary algorithms. Artif. Intell. 127, 51–81 (2001)
Jansen, T.: Analyzing Evolutionary Algorithms: The Computer Science Perspective. Springer, Berlin (2013)
Jansen, T., De Jong, K.A., Wegener, I.: On the choice of the offspring population size in evolutionary algorithms. Evol. Comput. 13, 413–440 (2005)
Lehre, P.K.: Negative drift in populations. In: Parallel Problem Solving from Nature, PPSN 2010, pp. 244–253. Springer (2010)
Lehre, P.K.: Fitness-levels for non-elitist populations. In: Genetic and Evolutionary Computation Conference, GECCO 2011, pp. 2075–2082. ACM (2011)
Lengler, J.: A general dichotomy of evolutionary algorithms on monotone functions. In: Parallel Problem Solving from Nature, PPSN 2018, Part II, pp. 3–15. Springer (2018)
Mühlenbein, H.: How genetic algorithms really work: mutation and hillclimbing. In: Parallel Problem Solving from Nature, PPSN 1992, pp. 15–26. Elsevier (1992)
Neumann, F., Witt, C.: Bioinspired Computation in Combinatorial Optimization: Algorithms and Their Computational Complexity. Springer, Berlin (2010)
Pinto, E.C., Doerr, C.: Towards a more practice-aware runtime analysis of evolutionary algorithms. CoRR, arXiv:1812.00493 [abs] (2018)
Prügel-Bennett, A.: When a genetic algorithm outperforms hill-climbing. Theoret. Comput. Sci. 320, 135–153 (2004)
Rowe, J.E., Sudholt, D.: The choice of the offspring population size in the (1, \(\lambda \)) evolutionary algorithm. Theoret. Comput. Sci. 545, 20–38 (2014)
Szu, H.H., Hartley, R.L.: Fast simulated annealing. Phys. Lett. A 122, 157–162 (1987)
Teytaud, O., Gelly, S.: General lower bounds for evolutionary algorithms. In: Parallel Problem Solving from Nature, PPSN 2006, pp. 21–31. Springer (2006)
Wald, A.: Some generalizations of the theory of cumulative sums of random variables. Ann. Math. Stat. 16, 287–293 (1945)
Witt, C.: Runtime analysis of the (\(\mu \) + 1) EA on simple pseudo-Boolean functions. Evol. Comput. 14, 65–86 (2006)
Witt, C.: Tight bounds on the optimization time of a randomized search heuristic on linear functions. Comb. Probab. Comput. 22, 294–318 (2013)
Yao, X., Liu, Y.: Fast evolution strategies. In: Evolutionary Programming, volume 1213 of Lecture Notes in Computer Science, pp. 151–162. Springer (1997)
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. IEEE Trans. Evol. Comput. 3, 82–102 (1999)
Acknowledgements
This work was supported by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH and by RFBR and CNRS, Project number 20-51-15009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended version of the paper [1] in the proceedings of GECCO. This version contains all proofs and other details that had to be omitted in the conference version for reasons of space. Also, we have greatly expanded the experimental section.
Appendices
Appendix: Computation of Table 1
In this appendix we compute all estimates of the true progress probability \(p_{d(x)}\) shown in Table 1. We use the same expression for estimating \(p_{d(x)}\) as in Lemma 8, that by Lemma 7 is,
where C is some constant. Recall that by Lemma 4 we have
-
If \(\beta < 0\), then \(C_{\beta , u} \ge u^{\beta - 1} \frac{1 - \beta }{2 - \beta }\),
-
If \(\beta \in [0, 1)\), then \(C_{\beta , u} \ge u^{\beta - 1} (1 - \beta )\),
-
If \(\beta = 1\), then \(C_{\beta , u} \ge \frac{1}{\ln (u) + 1}\), and
-
If \(\beta > 1\), then \(C_{\beta , u} \ge \frac{\beta - 1}{\beta }\).
Now we consider 11 cases depending on \(\beta \) and u. We start with the cases when \(u \le \sqrt{\frac{n}{d(x)}}\) and therefore estimate \(p_{d(x)}\) as
Case 1 \(\beta < 0\), \(u \le \sqrt{\frac{n}{d(x)}}\).
By Lemma 3 we have
Case 2 \(\beta \in [0, 1)\), \(u \le \sqrt{\frac{n}{d(x)}}\).
By Lemma 3 we have
which is the same as in Case 1.
Case 3 \(\beta = 1\), \(u \le \sqrt{\frac{n}{d(x)}}\).
In this case we have
Case 4 \(\beta \in (1, 3)\), \(u \le \sqrt{\frac{n}{d(x)}}\).
By Lemma 3 we have
Case 5 \(\beta = 3\), \(u \le \sqrt{\frac{n}{d(x)}}\).
By Lemma 3 we have
Case 6 \(\beta > 3\), \(u \le \sqrt{\frac{n}{d(x)}}\).
We have
In the following cases we consider \(u > \sqrt{\frac{n}{d(x)}}\), hence we estimate \(p_{d(x)}\) as
In all cases we first estimate the sums in the brackets and then put it into the inequality. Case 7 \(\beta < 1\), \(u > \sqrt{\frac{n}{d(x)}}\).
We consider three sub-cases.
-
1.
When \(u \le 2\sqrt{\frac{n}{d(x)}} + 2\) and \(\sqrt{\frac{n}{d(x)}} \le 4\). In this case we also have \(u \le 2 \cdot 4 + 2 = 10\). Hence,
$$\begin{aligned} \frac{d(x)}{n} \sum _{\lambda = 1}^{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor } \lambda ^{2-\beta } \ge \frac{d(x)}{n} \ge \frac{1}{16} \ge \frac{u^{1 - \beta }}{16 \cdot 10^{1 - \beta }}. \end{aligned}$$ -
2.
When \(u \le 2\sqrt{\frac{n}{d(x)}} + 2\) and \(\sqrt{\frac{n}{d(x)}} > 4\). In this case we have \(\sqrt{\frac{n}{d(x)}} \ge \frac{u}{2} - 1\). We also have that \(\sqrt{\frac{n}{d(x)}}^{3 - \beta } \ge 4^{3 - \beta } > 2^{4 - \beta }\) (therefore, \((\sqrt{\frac{n}{d(x)}} / 2)^{3 - \beta } > 2\)). Hence, by Lemma 3 we have
$$\begin{aligned} \frac{d(x)}{n} \sum _{\lambda = 1}^{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor } \lambda ^{2-\beta }&\ge \frac{d(x)}{n} \sum _{\lambda = 1}^{\lceil \sqrt{\frac{n}{d(x)}} - 1 \rceil } \lambda ^{2-\beta } \ge \frac{d(x)}{n} \cdot \frac{\left( \sqrt{\frac{n}{d(x)}} - 1\right) ^{3 - \beta } - 1}{3 - \beta } \\&\ge \frac{d(x)}{n} \cdot \frac{\left( \sqrt{\frac{n}{d(x)}} / 2\right) ^{3 - \beta } - 1}{3 - \beta } \\&\ge \frac{d(x)}{n} \cdot \frac{\left( \sqrt{\frac{n}{d(x)}} / 2\right) ^{3 - \beta }}{2(3 - \beta )} \\&\ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \frac{1}{2^{4 - \beta }(3 - \beta )} \\&\ge \left( \frac{u}{2} - 1\right) ^{1 - \beta } \frac{1}{2^{4 - \beta }(3 - \beta )} \\&\ge \frac{u^{1 - \beta }}{2^{(6 - 3 \beta )}(3 - \beta )}. \end{aligned}$$ -
3.
When \(u > 2\sqrt{\frac{n}{d(x)}} + 2\). In the same way as in Lemma 3 we estimate a sum via a corresponding integral.
$$\begin{aligned} \sum _{\lambda = \lfloor \sqrt{\frac{n}{d(x)}} \rfloor + 1}^u \lambda ^{-\beta }&\ge \int _{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor + 1}^u x^{-\beta } dx \ge \int _{u/2}^u x^{-\beta } dx = u^{1 - \beta } \cdot \frac{1 - 2^{\beta - 1}}{1 - \beta }. \end{aligned}$$
Summing up all three cases we have that for each \(\beta < 1\) there exists a constant \(\gamma _1(\beta ) = \min \{\frac{1}{16 \cdot 10^{1 - \beta }}, \frac{1}{2^{(6 - 3 \beta )}(3 - \beta )}, \frac{1 - 2^{\beta - 1}}{1 - \beta }\}\) such that
If \(\beta < 0\), we have
If \(\beta \in [0, 1)\), we have
Case 8 \(\beta = 1\), \(u > \sqrt{\frac{n}{d(x)}}\). We aim at showing that
Note that in this case we do not use asymptotic notation for estimating \(p_{d(x)}\) due to having terms of different signs in the bound above (and thus, the leading constants of these terms are important). However note that as long as u is by a constant times greater than \(\sqrt{\frac{n}{d(x)}}\), then the first term is dominant, therefore, this bound is \(\Omega (\frac{1}{\log (u)})\). If u is at least \(\phi \cdot \sqrt{\frac{n}{d(x)}}\) for some super-constant \(\phi \), then this bound is \(\Omega (\frac{\log (\phi )}{\log (u)})\).
In this case we have \(u > \sqrt{\frac{n}{d(x)}} \ge 1\), hence \(u \ge 2\). Therefore, by Lemma 4 we have
By the formula for a sum of arithmetic progression and estimating the second sum via a corresponding integral in the same way as in Lemma 3, we have
Since for all \(x \ge 1\) we have \(\frac{\lfloor x \rfloor }{x} \ge \frac{1}{2}\) and \(\frac{\lfloor x \rfloor + 1}{x} \ge 1\), we also have
Now we consider two sub-cases. First, let \(u \le e^2 \sqrt{\frac{n}{d(x)}}\). Then we have
Otherwise, if \(u > e^2 \sqrt{\frac{n}{d(x)}}\), then we estimate the integral by
Hence, we conclude
We unite the two sub-cases with the following lower bound, which holds both for \(u \le e^2 \sqrt{\frac{n}{d(x)}}\) and for \(u > e^2 \sqrt{\frac{n}{d(x)}}\).
Case 9 \(\beta \in (1, 3)\), \(u > \sqrt{\frac{n}{d(x)}}\).
We consider three sub-cases
-
1.
When \(\beta \le 2\) and \(\sqrt{\frac{n}{d(x)}} \le 2\).
$$\begin{aligned} \frac{d(x)}{n} \sum _{\lambda = 1}^{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor } \lambda ^{2-\beta }&\ge \frac{d(x)}{n} = \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \sqrt{\frac{n}{d(x)}}^{\beta - 3} \ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \left( \frac{1}{2}\right) ^{\beta - 3} \\&\ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \left( \frac{1}{2}\right) ^2 = \frac{1}{4}\sqrt{\frac{n}{d(x)}}^{1 - \beta }. \end{aligned}$$ -
2.
When \(\beta > 2\) and \(\lfloor \sqrt{\frac{n}{d(x)}} \rfloor \le 2^{\frac{1}{3 - \beta }}\). In this case we also have \(\sqrt{\frac{n}{d(x)}} \le 2^{\frac{1}{3 - \beta }} + 1\). Hence, we have
$$\begin{aligned} \frac{d(x)}{n} \sum _{\lambda = 1}^{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor } \lambda ^{2-\beta }&\ge \frac{d(x)}{n} = \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \sqrt{\frac{n}{d(x)}}^{\beta - 3} \\&\ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \left( 2^{\frac{1}{3 - \beta }} + 1\right) ^{\beta - 3} \\&\ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \cdot \left( 2^{\left( \frac{1}{3 - \beta } + 1\right) }\right) ^{\beta - 3} \\&= 2^{\beta - 4}\sqrt{\frac{n}{d(x)}}^{1 - \beta } \ge \frac{1}{4}\sqrt{\frac{n}{d(x)}}^{1 - \beta }. \end{aligned}$$ -
3.
When \(\beta > 2\) and \(\lfloor \sqrt{\frac{n}{d(x)}} \rfloor \ge 2^{\frac{1}{3 - \beta }}\) or when \(\beta \le 2\) and \(\sqrt{\frac{n}{d(x)}} > 2\). In this case we have both \(\lfloor \sqrt{\frac{n}{d(x)}} \rfloor ^{3 - \beta } \ge 2\) and \(\sqrt{\frac{n}{d(x)}} \ge 2\). Hence, by Lemma 3 we have
$$\begin{aligned} \frac{d(x)}{n} \sum _{\lambda = 1}^{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor } \lambda ^{2-\beta }&\ge \frac{d(x)}{n} \cdot \frac{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor ^{3 - \beta } - 1}{3 - \beta } \ge \frac{d(x)}{n} \cdot \frac{\lfloor \sqrt{\frac{n}{d(x)}} \rfloor ^{3 - \beta }}{2(3 - \beta )}\\&\ge \frac{d(x)}{n} \cdot \frac{\left( \sqrt{\frac{n}{d(x)}} - 1 \right) ^{3 - \beta }}{2(3 - \beta )} \ge \frac{d(x)}{n} \cdot \frac{\left( \frac{1}{2}\sqrt{\frac{n}{d(x)}}\right) ^{3 - \beta }}{2(3 - \beta )} \\&\ge \sqrt{\frac{n}{d(x)}}^{1 - \beta } \frac{1}{2^{4 - \beta }(3 - \beta )}. \end{aligned}$$
Summing up all three cases we have that for each \(\beta \in (1, 3)\) there exists a constant \(\gamma _2(\beta ) = \min \{\frac{1}{4}, \frac{1}{2^{4 - \beta }(3 - \beta )}\}\) such that
Taking into account that \(C_{\beta , u} \ge \frac{\beta - 1}{\beta }\), we obtain
Case 10 \(\beta = 3\), \(u > \sqrt{\frac{n}{d(x)}}\).
If \(\sqrt{\frac{n}{d(x)}} \ge 2\), we compute
Otherwise,
Therefore,
Case 11 \(\beta > 3\), \(u > \sqrt{\frac{n}{d(x)}}\).
In this case we have
Appendix: Computation of Table 2
In this appendix we compute the values of the expected runtime shown in Table 2. We start with computing the expected runtimes in terms of iterations for each value of the algorithm’s meta-parameter \(\beta \). Recall that \(p_d\) is the probability to create a better offspring in one iteration, which is shown in Table 1. Hence, using the fitness levels argument we can estimate the expected number of iterations before we find the optimum as follows.
Note that in the first sum we have \(u \le \sqrt{\frac{n}{d}}\) (thus, we should use values for \(p_d\) from the left column of Table 1) and in the second sum we have \(u > \sqrt{\frac{n}{d}}\) (thus, we should use the estimates from the right column). Note that \(p_d = \Omega (f(n, d, u))\) in Table 1 means that for each \(\beta \) there exists a constant \(\gamma (\beta )\) (independent of n, d and u) such that \(p_d \ge \gamma (\beta ) \cdot f(n, d, u)\). We will use this constant in our further computations.
To estimate the expected runtime we consider five cases.
Case 1 \(\beta < 1\).
In this case we have
where we used the estimates for the sums from Lemma 4. Note that when \(u \ge \sqrt{\ln (n)}\), we have
Otherwise, we have
Therefore, we conclude
Case 2 \(\beta = 1\). In this case we have
Note that \(\frac{n\ln (u)\ln (\frac{n}{u^2})}{u^2}\) is a decreasing function of u for all \(u \ge 1\), which can be shown by considering its derivative (we omit this tedious computation). Hence, if \(u < \sqrt{\ln (n)\ln \ln (n)}\), then we have
For such u we also have
If \(u \ge \sqrt{\ln (n)\ln \ln (n)}\), we have
and we have
Hence, we conclude
Case 3 \(\beta \in (1, 3)\).
In this case we have
where we used Lemma 4 to estimate the sums. When \(u < (\ln (n))^{1/(3 - \beta )}\), we have
Otherwise, we have
Therefore, we have
Case 4 \(\beta = 3\). We compute
where we used the fact that \(f(x) = \frac{1}{x(\ln (n) - \ln (x) + 1)}\) is a decreasing function in interval [1, n] to estimate the sum via a corresponding integral. We estimate the integral as follows.
Therefore,
Note that the first term is decreasing in u, while the second one is increasing. We show that they are asymptotically the same when \(u = n^{1/\ln \ln (n)}\).
Therefore, when \(u \le n^{1/\ln \ln (n)}\), the first term is dominant, otherwise the second term is dominant. Hence, we conclude
Case 5 \(\beta > 3\).
In this case we have
We complete the computation of the right column of Table 2 by using Wald’s equation (Lemma 1) and estimates of the expected cost of each iteration shown in Lemma 9.
Case 1 \(\beta < 1\).
If \(u \ge \sqrt{\ln (n)}\), then
If \(u < \sqrt{\ln (n)}\), then
Case 2 \(\beta = 1\).
If \(u \ge \sqrt{\ln (n)\ln \ln (n)}\), then
If \(u < \sqrt{\ln (n)\ln \ln (n)}\), then
Case 3 \(\beta \in (1, 2)\).
If \(u \ge (\ln (n))^{1/(3 - \beta )}\), then
If \(u < (\ln (n))^{1/(3 - \beta )}\), then
Case 4 \(\beta = 2\).
If \(u \ge \ln (n)\), then
If \(u < \ln (n)\), then
Case 5 \(\beta \in (2, 3)\).
If \(u \ge (\ln (n))^{1/(3 - \beta )}\), then
If \(u < (\ln (n))^{1/(3 - \beta )}\), then
Case 6 \(\beta = 3\).
If \(u \ge n^{1/\ln \ln (n)}\), then
If \(u < n^{1/\ln \ln (n)}\), then
Case 7 \(\beta > 3\).
For all u we have
Rights and permissions
About this article
Cite this article
Antipov, D., Buzdalov, M. & Doerr, B. Fast Mutation in Crossover-Based Algorithms. Algorithmica 84, 1724–1761 (2022). https://doi.org/10.1007/s00453-022-00957-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-022-00957-5