1 Introduction

The runtime analysis of randomized search heuristics like evolutionary algorithms (EAs), simulated annealing, ant colony optimization and estimation-of-distribution algorithms is a young and active subfield in algorithm research that has produced remarkable results in the last 20 years [2, 11, 15, 23]. Its main goal is to understand the working principles of the algorithms in different scenarios by deriving runtime bounds depending on problem characteristics, choice of algorithms and parameters. This line of research started with simple evolutionary algorithms using mutation only. Still today, the role of crossover, also called recombination, is less well understood than the one of mutation. In fact, explaining when recombination and mutation based genetic algorithms (GAs)Footnote 1 perform better than more traditional general purpose search heuristics that use mutation alone is regarded as one of the fundamental problems in evolution-inspired computation.

Traditionally proofs showing that crossover is a useful operator relied either on excessively low crossover rates [16, 18] or on some diversity-enforcing mechanism to make recombination effective by increasing the probability that members of the population are different [8, 10, 20, 22, 27]. However, it was never shown whether this enforced diversity was necessary or whether it was an additional requirement for the proofs to hold. Recently some results have appeared proving the superiority of standard steady state GAsFootnote 2 over mutation-only algorithms, without the need of any additional diversity enforcing mechanisms. Dang et al. [7] proved that for sufficiently large population sizes, the (\(\mu \)+1) GA is at least a linear factor faster than the best algorithm using only standard bit mutation for the Jump benchmark function. Hence, they showed that crossover may help algorithms to escape more quickly from local optima. Sutton [28] even proved that for the NP-hard Closest String problem from computational biology, the (\(\mu \)+1) GA with sufficiently large population size and restarts is a fixed parameter tractable (FPT) algorithm while if only standard bit mutation is used (i.e., (\(\mu \)+1) EA) it is not.

Strikingly, recombination has also been proven to be useful on unimodal functions. Lengler [21] has shown that there exist monotone functions for which the (\(\mu \)+1) EA with not too low standard bit mutation rate c/n (i.e., \(c>2.13\)) requires exponential runtime with high probability while the (\(\mu \)+1) GA with sufficiently large population sizes can solve them in \(O(n \log n)\) expected runtime for arbitrary mutation rates i.e., \(\varTheta (1)/n\). Analyses have revealed that the (\(\mu \)+1) GA is faster than the (\(\mu \)+1) EA using any standard bit mutation rate and population size, even on unimodal functions where the latter is particularly efficient i.e., OneMax [5, 6]. Furthermore, if the fitness of offspring that are identical to their parents is not unnecessarily re-evaluated, then the algorithm is faster than any unary unbiased black box algorithm for the problem [19], albeit slower than if the diversity is enforced [3, 27]. To prove these results, precise analyses up to the leading constants are required since for OneMax the algorithms have the same asymptotic expected runtime \(O(n \log n)\) for moderate population sizes.

An important insight from these analyses is that if diversity is enforced as in Sudholt’s work [27], then inevitably there are no advantages of using population sizes greater than \(\mu =2\) for OneMax. On the other hand, the analysis of Corus and Oliveto [6] provides upper bounds that decrease with the population size (up to some sub-logarithmic limit). For large enough population sizes the best derived upper bound is roughly \(1.64 n \ln n\) while, for \(\mu =2\), Corus and Oliveto only provide a larger upper bound of \(\frac{4e^{c} n \ln n}{c(c+4)} + O(n)\) [5]. Due to a mistake in one probability calculation this turns out to actually be \(\frac{9e^{c} n \ln n}{c(2c+9)} + O(n)\).

Indeed, all the positive results summarised above regarding the plain (\(\mu \)+1) GA required sufficiently large population sizes. While the comparative statements with the mutation-based algorithms were possible because of the availability of lower bounds on their expected runtime, rigorously showing whether the suggested population sizes are actually necessary requires lower bounds on the expected runtime of the (\(\mu \)+1) GA. Proving lower bounds for GAs with crossover is a notoriously hard task. The only available analysis concerning a standard GA is the proof that the simple genetic algorithm (SGA [12]) cannot solve OneMax in polynomial time with overwhelming probability due to the ineffectiveness of the fitness proportional selection operator [25, 26]. There have been recent attempts to generalize proof methods like the family tree technique to crossover-based algorithms; however, these only apply in a specific setting without mutation.

Providing lower bounds on the expected runtime of the (\(\mu \)+1) GA for OneMax has turned out to be surprisingly difficult. Sudholt simplified the analysis by considering a “greedy” (2+1) GA that always selects amongst the fittest individuals in the population and is sped-up by automatically achieving the best possible crossover operation between different parents [27]. A less greedy (2+1) GA was considered by Corus and Oliveto where individuals are only immediately crossed over optimally if the Hamming distance between the parents is larger than 2 [5]. These simplified algorithms allow the analysis to ignore the improvements which may occur in standard GAs when one parent is crossed over with another one of different fitness. However, it was never proven that the algorithms are indeed faster than the standard (2+1) GA, hence that the bounds are also valid for the latter algorithm. In this paper we provide a lower bound for the (2+1) GA with no simplifications that matches its upper bound up to the leading constant, hence providing a rigorous proof that larger populations are beneficial to the GA for OneMax. The preciseness of the results also allows us to derive that the value of \(c \in (0, 1.422]\) that yields the optimal mutation rate c/n is approximately \(c=1.21221445\).

A major difficulty in proving rigorous lower bounds for populations with crossover is to find a way to aggregate the state of the algorithm such that it accurately captures the current distance from the optimum, but also the potential improvements of the crossover operator. These advancements could be very big if the parents have a large Hamming distance, and our aim is to show that this rarely happens. We solve the aggregation problem for the (2+1) GA by defining a potential function that captures the current fitness and opportunities for easy improvements through crossover. By showing bounds on the expected increase in the potential, we are able to quantify how the distance to the optimum decreases in one generation. The challenge lies in proving this for every possible population, from those with identical individuals to those with a good amount of diversity. Once the potential is appropriately bounded, we can use standard drift analysis arguments to bound the expected time from below.

1.1 Main Contributions

The expected optimisation time of the (2+1) GA is bounded from above as follows.

Theorem 1

The expected optimisation time of the (2+1) GA with mutation rate \(c/n\), \(c>0\) a constant, on OneMax is at most

$$\begin{aligned} \frac{9e^{c}}{c(2c+9)} \cdot n \ln (n) + O(n). \end{aligned}$$

For \(c=1\) this is \( \frac{9}{11} \cdot e n \ln (n) + O(n) \approx 2.224 n \ln (n) + O(n) \). The upper bound follows from applying the analytical framework in [5] with a corrected transition probability for \(p_r\), using the value 1/(4e) instead of 5/(24e) (we shall give more details in Sect. 3). It can also be proven with mild adaptations of the proof of [27, Theorem 4]. We provide a self-contained proof in Sect. 3.

Our main contribution is the following lower bound that matches the upper bound proven in Theorem 1 up to small-order terms.

Theorem 2

The expected optimisation time of the (2+1) GA with mutation rate c/n, and \(0<c \le 1.422\) a constant, on OneMax is at least

$$\begin{aligned} \frac{9e^c}{c(2c+9)} \cdot n \ln (n) - O(n \log \log n). \end{aligned}$$

Since the bounds from Theorems 1 and 2 have the same leading constant \(\frac{9e^c}{c(2c+9)}\), which is minimised for

$$\begin{aligned} c = \frac{\sqrt{97}-5}{4} \approx 1.21221445, \end{aligned}$$

we identify this as the optimal mutation rate for the (2+1) GA (up to small-order terms) within the range of rates covered by Theorem 2.

Theorem 3

Amongst all mutation rates c/n with \(c \in (0, 1.422]\), the choice \(c = \frac{\sqrt{97}-5}{4}\) is the optimal mutation rate of the (2+1) GA on OneMax, up to small-order terms. Then the expected optimisation time is \(\approx 2.18417 n \ln (n) + O(n)\).

The best identified mutation rate for the (2+1) GA is lower than the one minimising the upper bound for larger population sizes \(\mu \ge 5\) (it is at least 1.425/n and increases with \(\mu \)) always providing upper bounds below \(1.7 n \ln n\) and decreasing with \(\mu \) [6]. This implies that the (2+1) GA with mutation rate \(\frac{\sqrt{97}-5}{4n}\) is at least \(28\%\) slower than any (\(\mu \)+1) GA with \(\mu \ge 5\) and appropriate mutation rates.

Structure of the paper. Section 2 formally defines the (2+1) GA and lists important tools for the analysis, including the drift theorem used for our main result. Section 3 presents the above-mentioned, corrected upper bound from Theorem 1 and its self-contained proof. In Sect. 4, we introduce the potential function that captures the state of the (2+1) GA and is crucial for the drift analysis proving the lower bound in Theorem 2. As determining the drift requires a careful case analysis, we give a roadmap of this analysis in Sect. 5, followed by the technical Sects. 68 that analyze the drift in different scenarios. In Sect. 9 we then put all pieces of our analysis together to complete the proof of Theorem 2. In addition, Sect. 10 gives empirical results and results of regression analyses. The latter confirm that our theoretical results are remarkably precise and provide further insight into small-order terms. We finish with some conclusions. To streamline the presentation of the most important cases in the drift analysis, some less insightful proofs have been moved to an appendix.

This paper extends a previous conference paper [24] where most proofs had to be omitted because of space constraints. The experimental results have not been published before either.

2 Preliminaries

The (2+1) GA is defined in Algorithm 1. The algorithm initialises the population with two randomly chosen individuals. At each generation it selects two random parents with replacement to be mated via uniform crossover. The operator assigns each bit to the offspring by selecting the corresponding bit from one parent with probability 1/2 and from the other with the same probability. Standard bit mutation is then applied to the offspring by flipping each of its bits independently with probability c/n. Finally, the worst individual amongst the parents and the offspring is removed to select the new population. Ties are broken uniformly at random.

figure a

We will analyse the expected runtime of the algorithm to optimise the function \(f(x)=\mathrm {OneMax}(x)=\sum _{i=1}^n x_i\), which counts the number of 1-bits in a bitstring. Formally, the runtime is defined as the number of fitness function evaluations, which equals the smallest t such that the current population contains an optimum, up to an additive term of at most 2 stemming from evaluating the initial population.

Throughout this article, we will use state-of-the-art analysis techniques for randomized search heuristics, including drift analysis and concentration inequalities. In the following, we list three important tools, starting with the drift theorem that is crucial for the lower bound on the runtime.

Theorem 4

(Multiplicative drift, lower bound, e. g., [29])

Let \(X_t\), \(t\ge 0\), be a stochastic process, adapted to a filtration \({\mathcal {F}}_t\), over a state space \(S\subseteq \mathbb {R}^{\ge x_{\min }}\), where \(x_{\min }>0\). Assume that \(X_t\) is non-increasing, i. e., \(X_{t+1}\le X_t\) for all \(t\ge 0\). Let T be the smallest \(t\ge 0\) such that \(X_t\le x_{\min }\). If there exist positive real numbers \(\beta ,\delta >0\) such that for all \(t<T\) it holds that

  1. 1.

    \(\mathord {{\mathrm {E}}}\mathord {\left( X_{t}-X_{t+1}\mid {\mathcal {F}}_t\right) } \le \delta X_t\)

  2. 2.

    \(\mathord {{\mathrm {P}}}\mathord {\left( X_t-X_{t+1}\ge \beta X_t\right) } \le \beta \delta /\ln (X_t)\)

then

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( T\mid {\mathcal {F}}_0\right) } \ge \frac{1-\beta }{1+\beta } \frac{\ln (X_0/x_{\min })}{\delta } \end{aligned}$$

The following Chernoff bound goes back to [4] and is formulated in the style of [9, Theorem 1.10.1].

Theorem 5

Let \(X_1,\dots ,X_n\) be independent random variables taking values in [0, 1]. Let \(X:=\sum _{i=1}^n X_i\). Let \(\delta \ge 0\). Then

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( X\ge (1+\delta )\mathord {{\mathrm {E}}}\mathord {\left( X\right) }\right) }\le e^{-\min \{\delta ^2,\delta \}\mathord {{\mathrm {E}}}\mathord {\left( X\right) }/3}. \end{aligned}$$

The following lemma bounds a sum by an integral.

Lemma 6

For any integrable function \(f :\mathbb {R}\rightarrow \mathbb {R}\), the following statements hold.

  1. 1.

    If f is non-increasing in [ab], \( \sum _{i=a}^b f(i) \le f(a) + \int _{a}^{b} f(i) \ \mathrm {d}i\).

  2. 2.

    If f is non-decreasing in [ab], \( \sum _{i=a}^b f(i) \le f(b) + \int _{a}^{b} f(i) \ \mathrm {d}i\).

  3. 3.

    If there is an \(\alpha \in [a, b]\) such that f is non-increasing in \([a, \alpha ]\) and non-decreasing in \([\alpha , b]\) then \( \sum _{i=a}^b f(i) \le f(a) + f(b) + \int _{a}^b f(i) \ \mathrm {d}i\).

Proof

The first statement follows from splitting off the term f(a) and using \(\int _{j-1}^j f(i) \ \mathrm {d}i \ge f(j)\) for all \(j \in [a+1, b]\), which follows from f being non-increasing. This implies \(\sum _{i=a+1}^b f(i) \le \int _{a}^b f(i) \ \mathrm {d}i\). The second statement is proved similarly. For the last statement, we note \(\sum _{i=a}^b f(i) \le \sum _{i=a}^{\lfloor \alpha \rfloor } f(i) + \sum _{i=\lceil \alpha \rceil }^b f(i)\). Then we apply the first statement to \(\sum _{i=a}^{\lfloor \alpha \rfloor } f(i)\) and the second statement to \(\sum _{i=\lceil \alpha \rceil }^b f(i)\), yielding a bound of \(f(a) + f(b) + \int _{a}^{\lfloor \alpha \rfloor } f(i) \ \mathrm {d}i + \int _{\lceil \alpha \rceil }^b f(i) \ \mathrm {d}i \le \sum _{i=a}^b f(i) \le f(a) + f(b) + \int _{a}^b f(i) \ \mathrm {d}i\). \(\square \)

Finally, we frequently need the following estimate of the largest binomial coefficient, which can be found in [9, Inequality 1.4.18].

Lemma 7

For all \(n\in \mathbb {N}\) and \(k\in \{1,\dots ,n\}\) it holds that

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) \le \left( {\begin{array}{c}n\\ \lceil n/2\rceil \end{array}}\right) \le 2^n \sqrt{\frac{2}{\pi n}}. \end{aligned}$$

3 Upper Bounds for the (2+1) GA

In this section we aim to convince the reader in two different ways that the upper bound for the (2+1) GA is as stated in Theorem 1, with a leading constant of \(\frac{9e^c}{c(2c+9)}\) instead of the leading constant \(\frac{4e^c}{c(c+4)}\) claimed in [5].

Sudholt [27] provided an upper bound for (\(\mu \)+\(\lambda \)) GAs with a diversity mechanism in the tie-breaking rule for the replacement selection. The analysis is based on a fitness-level argument and a simple Markov chain analysis made on each fitness level. Corus and Oliveto [5] analysed the standard (\(\mu \)+1) GA, for which the choice \(\mu =2\) yields the (2+1) GA as defined in Algorithm 1. They observed that the (\(\mu \)+1) GA can lose diversity after creating it, which required a more complex Markov chain analysis of each fitness level. We first explain their framework and argue why the leading constant is \(\frac{9e^c}{c(2c+9)}\). In addition, we give a self-contained bound based on the analysis in [27] with an ad-hoc Markov chain framework similar to the one used in [5], simplified for the fixed value of \(\mu =2\) (and \(\lambda =1\)). This shows that both previous works [5, 27], with appropriate modifications, yield the same upper bound for the (2+1) GA.

3.1 The Framework by Corus and Oliveto

Corus and Oliveto coupled the standard artificial fitness levels method [15] with a Markov chain framework to bound the expected runtime of the (\(\mu \)+1) GA for OneMax and any population size \(\mu \) [5]. More precisely, they divided the search space into the canonical \(n+1\) fitness levels, each containing all search points with i 1-bits, and assumed that the algorithm is in level \(L_i\) if all the individuals of the population have exactly i 1-bits. Since crossover may only speed up the optimisation process if diversity is present in the population, a Markov chain was used on each fitness level to distinguish between populations with and without diversity. The Markov chain at fitness level i consists of two transient states \(S_{1,i}\) and \(S_{2,i}\) (i.e., resp. with/without diversity) and an absorbing state \(S_{3,i}\), that is reached when a solution with better fitness is identified. For each level the analysis was performed by pessimistically assuming that initially on each level the population has no diversity (i.e., all individuals are identical). For this assumption to hold, and a valid upper bound on the expected runtime achieved, once the absorbing state is reached for level \(L_i\), the expected time for the improved individual to take over the population has to be taken into account before the analysis of the absorption time of the Markov chain for level \(L_{i+1}\) may be carried out (i.e., the population is initially in state \(S_{1, i+1}\) for each level \(L_i\)). When the algorithm is on the current level \(L_i\) with no diversity (\(S_{1,i}\)) only two changes of states may occur, both due to mutation: either the absorption state \(S_{3,i}\) is reached by increasing the number of 1-bits in an individual, or a state with some diversity is reached by switching the order of 1-bits and 0-bits in an offspring, hence reaching state \(S_{2,i}\). From this state, \(S_{3,i}\) may be reached more quickly than from state \(S_{1,i}\) if diverse parents are selected for reproduction and at least an extra 1-bit obtained in the offspring via crossover with constant probability if the diversity is not lost before (eg., by selecting two identical parents, creating a copy by not flipping any bits, and removing the diverse individual in the selection for replacement step). Overall, the proof strategy requires summing up the bounds on the absorption times of the \(n+1\) Markov chains and subsequent takeover times over the \(n+1\) levels according to the artificial fitness levels methodology.

The described technique allows to calculate the transition probabilities of the Markov chain, hence to provide upper bounds on the expected runtime of the (\(\mu \)+1) GA, as a function of any population size \(\mu \). In particular, it provides bounds on the expected runtime of the (\(\mu \)+1) GA up to the leading constant for population sizes of \(\mu =o(\log n/ \log \log n)\) that are smaller than those of any steady-state EA using only standard bit mutation (i.e., no crossover) (i.e., for larger population sizes the take over time becomes asymptotically larger than the \(O(n \log n)\) expected runtime of the (1+1) EA). An important insight from the Markov chain analysis is that the probability of losing diversity in state \(S_{2,i}\) is higher for population size \({\mu =2}\) than for larger populations: in the former case the diversity may be completely lost in every generation by either of the two individuals taking over, which is not the case for larger populations. However, in the analysis of this transition probability for the special case \({\mu =2}\), the bound provided by the authors of [5] is by a factor of 2 smaller than the correct one due to a mistake in the calculation of the probability that different parents are selected for reproduction and the variation operators produce an offspring identical to either parent (and the differing parent is removed by selection of replacement). This miscalculation led to an upper bound on the expected runtime of the (2+1) GA of \(\frac{4e^c n \ln n}{c(c+4)} + O(n)\) instead of the correct bound of \(\frac{9e^c n \ln n}{c (2c+9)} + O(n)\) provided in Theorem 1. In the following subsection we provide a self-contained proof of the result and point out exactly where the miscalculation occurred in [5].

3.2 A Self-Contained Upper Bound Proof for the (2+1) GA

Here we give a self-contained proof of the upper bound for the (2+1) GA, Theorem 1.

Proof of Theorem 1

We use straightforward adaptations of the proof of [27, Theorem 4], using the same notation as in [27]. As in said proof, we distinguish between the following cases that are labelled according to the current best fitness in the population, called i. There are cases i.1, i.2, and i.3 explained in the following. We estimate the expected time spent in all these cases, summed up over all possible values of i, to obtain an upper bound on the total expected optimisation time.

Case i.1: the population contains a search point with fitness i and one search point of lower fitness.

This case is left for good if another search point of fitness at least i is created as then both search points will have fitness at least i. A sufficient condition for this is that the parent with fitness i is selected twice as parent (probability 1/4) and no 1-bit is flipped during mutation (probability at least \((1-c/n)^{n-1} = \varOmega (1)\)). Hence the expected waiting time in case i.1 is O(1).

The other two cases are:

Case i.2: the population contains two copies of the same individual, with fitness i.

Case i.3: the population contains two different individuals, both having fitness i.

The algorithm can switch between the two cases as diversity can be gained or lost.

We first bound the expected time spent in Case i.3. This case can be left for good if the two different search point are chosen as parents (probability 1/2) if crossover generates a surplus of 1-bits (estimated from below by 1/4 in [27, page 427]) and if mutation does not flip any bits (probability at least \((1-c/n)^n\)). Together, we obtain a probability of at least \((1-c/n)^n/8\). Hence, the expected time spent in Case i.3 is at most \(8/(1-c/n)^n = O(1)\).

Let \(T_{i, 2}\) be the random time spent in Case i.2. We leave this case for good if mutation only flips a single 0-bit and no 1-bit. This event has probability at least \(p^+ := (n-i) \cdot c/n \cdot (1-c/n)^{n-1}\). We further make a transition to Case i.3 if mutation flips exactly one 0-bit and one 1-bit, leading to a different search point with fitness i, and then choosing one of the identical parents for removal (probability 2/3). The probability of said events is at least \(p^{2 \rightarrow 3} := i(n-i) \cdot (c/n)^2 \cdot (1-c/n)^{n-2} \cdot 2/3\).

From Case i.3 the algorithm can move back to Case i.2, so that we may have to consider multiple visits to Case i.2. A necessary condition for going back to Case i.2 is that the created offspring is identical to one of the search points in the population, and the other search point is being selected for removal (probability 1/3). The probability for creating an identical offspring is maximised when the two search points have Hamming distance 2. Either the same parent is selected twice (probability 1/2) and mutation does not flip any bit (probability \((1-c/n)^n\)), or the same parent is selected twice and mutation creates the other parent (probability \(O(1/n^2)\)), or different parents are selected (probability 1/2), crossover and mutation set the two differing bits identical to one of the parents (probability 1/2 i.e., this is the probability that was wrongly estimated to be 1/4 in [5] leading to an upper bound of 5/(24e) for going back to Case i.2 rather than the correct 1/(4e) bound which we derive now) and mutation does not flip any of the common bits (probability \((1-c/n)^{n-2}\)). Together, the probability of going back to Case i.2 is at most

$$\begin{aligned} \frac{1}{3} \cdot \left( \frac{(1-c/n)^n}{2} + O\left( n^{-2}\right) + \frac{(1-c/n)^{n-2}}{4}\right) = \frac{(1-c/n)^n}{4} + O(n^{-1}). \end{aligned}$$

Hence the conditional probability that, when Case i.3 is left towards Case i.2 or higher fitness, it is left towards Case i.2 is at most

$$\begin{aligned} p^{3 \rightarrow 2} := \frac{(1-c/n)^n/4 + O(n^{-1})}{(1-c/n)^n/4 + O(n^{-1}) + (1-c/n)^n/8} = \frac{2}{3} + O(n^{-1}). \end{aligned}$$

With these pessimistic estimations for transition probabilities, we obtain the following recurrence:

$$\begin{aligned} T_{i, 2} \le 1 + (1-p^+-p^{2 \rightarrow 3}) T_{i,2} + p^{2 \rightarrow 3} p^{3 \rightarrow 2} T_{i, 2} \end{aligned}$$

which is equivalent to

$$\begin{aligned} T_{i,2} \le \frac{1}{p^+ +p^{2 \rightarrow 3} (1-p^{3 \rightarrow 2})} \end{aligned}$$

and plugging in the above values yields the following upper bound for \(T_{i, 2}\):

$$\begin{aligned}&\frac{1}{(n-i) \cdot c/n \cdot (1-c/n)^{n-1} +i(n-i) \cdot (c/n)^2 \cdot (1-c/n)^{n-2} \cdot 2/3 \cdot (1-\frac{2}{3} - O(n^{-1}))}\\&\quad \le \frac{1}{(n-i) \cdot c/n +i(n-i) \cdot (c/n)^2 \cdot 2/9} \cdot (e^c+O(n^{-1})). \end{aligned}$$

Summing up these upper bounds for all i yields a total time bound of

$$\begin{aligned}&\le \; (e^c+O(n^{-1})) \sum _{i=0}^{n-1} \frac{1}{(n-i) \cdot c/n +i(n-i) \cdot (c/n)^2 \cdot 2/9}\\&\quad =\; (e^c+O(n^{-1})) \frac{n}{c}\sum _{i=0}^{n-1} \frac{1}{(n-i)(1+i \cdot c/n \cdot 2/9)}. \end{aligned}$$

A closer inspection of the function \(f(i) :=\frac{1}{(n-i)(1+i \cdot c/n \cdot 2/9)}\) reveals that it is non-decreasing in \([0, n-1]\) if \(c \le 9/2\). For \(c > 9/2\) it is non-increasing in \([0, n(2c - 9)/(4c)]\) and non-decreasing in \([n(2c - 9)/(4c), n-1]\). In either case, we obtain an upper bound from Lemma 6:

$$\begin{aligned}&\sum _{i=0}^{n-1} \frac{1}{(n-i)+i(n-i) \cdot c/n \cdot 2/9}\\&\quad \le \; \frac{1}{n} + \frac{1}{1+(n-1) \cdot c/n \cdot 2/9} + \int _{i=0}^{n-1} \frac{1}{(n-i)(1+i \cdot c/n \cdot 2/9)} \ \mathrm {d}i. \end{aligned}$$

The first two summands are in O(1). The integral simplifies as follows, using e. g. equation 3.3.20 in [1]:

$$\begin{aligned} \int _{i=0}^{n-1} \frac{1}{(n-i)(1+i \cdot c/n \cdot 2/9)} \ \mathrm {d}i&=\; \left[ \frac{9}{2c+9} \cdot \ln \left( \frac{1+i \cdot c/n \cdot 2/9}{n-i}\right) \right] _0^{n-1}\\&=\; \frac{9}{2c+9} \left( \ln \left( \frac{1+(n-1) \cdot c/n \cdot 2/9}{1}\right) - \ln \left( \frac{1}{n}\right) \right) \\&=\; \frac{9}{2c+9} \cdot \ln \left( n+(n-1) \cdot c\cdot 2/9\right) . \end{aligned}$$

Plugging everything together, the total time bound is

$$\begin{aligned}&\; (e^c+O(n^{-1})) \frac{n}{c} \left( O(1) + \frac{9}{2c+9} \cdot \ln \left( n+(n-1) \cdot c\cdot 2/9\right) \right) \\&\quad =\; \frac{9e^{c}}{c(2c+9)} \cdot n \ln \left( n+(n-1) \cdot c\cdot 2/9\right) + O(n)\\&\quad \le \; \frac{9e^{c}}{c(2c+9)} \cdot n \ln (n) + O(n). \end{aligned}$$

Noting that the times in all Cases i.1 and i.3, summed up for all i, are O(n) proves the claim. \(\square \)

4 A Potential Function Approach

We now turn to the main contribution of this paper, the tight lower bound for the (2+1) GA. Before going into detail, this section describes the main idea behind our approach and clarifies some further notation and fundamental observations that will be used in the remainder.

We write a population \(\{x^1, x^2\}\) in order of monotonically decreasing fitness, that is, \(f(x^1) \ge f(x^2)\). Let \(n_{11}\) be the number of bit positions where both parents have ones and likewise for \(n_{00}\) and the number of zeros. Let \(n_{10}\) be the number of positions where \(x^1\) has a 1 and \(x^2\) has a 0 and likewise for \(n_{01}\). Then we have \(f(x^1) = n_{11}+n_{10}\) and \(f(x^2) = n_{11}+n_{01}\). Since by assumption, \(f(x^1) \ge f(x^2)\), we have \(n_{10} \ge n_{01}\) and \(n_{10}=n_{01}\) is equivalent to the two individuals having equal fitness. In case \(n_{10}=0\), both individuals are identical. Such a population is called monomorphic in population genetics, and we use this term here.

Note that the (2+1) GA is an unbiased algorithm in the sense of Lehre and Witt [19]. In brief, this means that the algorithm treats all bit positions and all bit values symmetrically when generating new search points. Owing to this symmetry of bit positions and the fact that the fitness function is symmetrical itself, i. e., it only depends on the number but not the positions of the one-bits, it suffices to know \(n_{11}, n_{10}\) and \(n_{01}\) to fully characterise the state of the algorithm. Note that \(n_{00}\) can be derived as \(n - n_{11}-n_{10}-n_{01}\).

The following lemma characterises probabilities of setting a bit to 1 in the offspring after a crossover of two different parents and a mutation of the result.

Lemma 8

Consider a crossover of two parents xy followed by mutation with mutation rate \(p_m\), resulting in an offspring z. For all i,

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( z_i = 1\right) } = {\left\{ \begin{array}{ll} 1-p_m &{} \text {if }x_i=y_i=1\\ 1/2 &{} \text {if }x_i \ne y_i\\ p_m &{} \text {if }x_i=y_i=0. \end{array}\right. } \end{aligned}$$

Proof

If \(x_i=y_i\) then crossover will create an offspring with the same bit value. The statement for \(x_i\ne y_i\) holds because of symmetry, or using the following, alternative argument. The offspring has a 1 if crossover creates a 1 and mutation does not flip bit i, or if crossover creates a 0 and mutation does flip bit i. The probability of the former event is \(1/2 \cdot (1-p_m)\) and the probability of the latter event is \(1/2 \cdot p_m\). Together, this gives 1/2. \(\square \)

Note that differing bits \(x_i \ne y_i\) are set to 1 with probability 1/2, irrespective of the mutation rate. Hence, when two parents are selected, we only need to consider the effect of mutation on the bits where the parents agree. We frequently and tacitly use this fact.

Our lower bound applies when only considering populations where the number of zeros in the fitter parent is at most \(n/\mathrm {polylog}(n)\) and at least \(\mathrm {polylog}(n)\). This implies that all probabilities that involve flipping a 0 to 1 are polylogarithmically small.

The main tool for our lower bound is going to be drift analysis, applied to a potential function that captures the current state and potential easy fitness improvements.

Definition 1

For a population P with values \(n_{11}, n_{10}, n_{01}, n_{00}\) we define the potential of P as

$$\begin{aligned} \varphi (P) = n_{11}+n_{10}+ \frac{n_{01}}{3}. \end{aligned}$$

The intuition is that \(n_{11}+n_{10}\) describes the current best fitness in the population. The term \(n_{01}/3\) adds potential to the best fitness as the population has the potential to exploit the diversity given by the \(n_{01}\) 1-bits that only exist in the less fit individual during a successful crossover operation.

The choice of the factor 1/3 is motivated as follows. We know from previous work [5, 27] that the most helpful populations for improvements are those where two search points have the same number of ones and Hamming distance 2, that is, \(n_{10}=n_{01}=1\). (Larger Hamming distances have the potential for larger fitness improvements, but such populations are rarely reached when the number of zeros becomes reasonably small.)

Assume the current state has \(n_{10}=n_{01}=1\), corresponding to a potential of \(n_{11}+n_{10}+1/3\). The most likely transitions (and, when only \(O(n/\mathrm {polylog}(n))\) zeros are left, the only transitions with probability \(\varOmega (1)\)) are (1) collapsing the population to copies of one parent (and potential \(n_{11}+n_{10}\)) and (2) creating a surplus of one 1-bit by crossover and not flipping anything else (potential \(n_{11}+n_{10}+1\)). The probability of the former event is roughlyFootnote 3\((1-p_m)^n/4\), which is the probability of selecting the same parent twice, not flipping any bits and then selecting the other population member for removal plus the probability of selecting different parents, creating one parent by crossover and not flipping any bits. The probability of the latter event is roughly \((1-p_m)^n/8\), which is the probability of selecting different parents, setting both differing bits to 1 and not flipping any bits in the subsequent mutation.

Comparing these terms, the conditional probability of an improvement via crossover is roughly 1/3. In case a monomorphic population is reached, the potential reduces by 1/3 and this happens with conditional probability \(1-1/3\). In the latter event, the potential increases by \(1-1/3\) and this happens with conditional probability 1/3. The net effect of these transitions in the expected change of the potential is \((1-1/3)1/3 - 1/3(1-1/3) = 0\). So the potential balances out the effects of “volatile” states left quickly.

Obviously, our analysis still needs to account for other, less likely transitions. For populations with \(n_{10}=n_{01}> 1\) the conditional transition probabilities change as the probability of creating one of the parents by crossover depends on the Hamming distance \(n_{10}+n_{01}\) between parents. For \(n_{10}=n_{01}> 1\) the likely progress in a successful crossover may be smaller than \(n_{01}/3\). Hence the term \(+n_{01}/3\) in Definition 1 is a precise estimate for the likely progress when \(n_{01}=1\) and for larger \(n_{01}\) it is an overestimation.

It suffices to restrict our considerations to moderate values of \(n_{10}\) and \(n_{01}\). The reason is that the (2+1) GA always has a constant probability of creating a monomorphic population in one generation, regardless of the current population. This means that large values of \(n_{10}\) and \(n_{01}\) are very unlikely.

Lemma 9

Let \(t\ge \log ^2 n\) and \(t=n^{O(1)}\). With probability \(1-n^{-\varOmega (\log n)}\), all populations within the time interval \([\log ^2 n, t]\) have Hamming distance at most \(\log ^2 n\) between their two individuals.

Proof

We call a generation that creates a population of two identical individuals a monomorphic generation. The crucial idea is to show that monomorphic generations are very frequent so that large Hamming distances are unlikely to occur.

The probability of a monomorphic generation happening is at least \((1/4)(1-1/n)^{n}(1/3)=\varOmega (1)\) since it is sufficient to select a fittest parent twice, to clone it and to remove the other parent (which has probability at least 1/3). For a number \(t\ge 0\) of generations after a monomorphic one, let \(D_t\) denote the maximum number of bits in which the two parents ever have differed during these t generations. The crucial idea is that only mutations can increase this D-value. The total number of bits flipped in t generations is the sum of tn Poisson trials with success probability c/n each. Hence, within t generations following a monomorphic one, the D-value is bounded from above by 2ct with probability \(1-2^{-\varOmega (t)}\) according to Chernoff bounds (Theorem 5), and clearly the Hamming distance is no larger than the D-value. We set \(t:=(\log ^2 n)/(2c)\) to bound the D-value by \(\log ^2 n\).

The proof is completed by noting that the probability of not observing a monomorphic generation within \(\log ^2 n\) generations is \((1-\varOmega (1))^{\log ^2 n} = n^{-\varOmega (\log n)}\). Together with the failure bound \(2^{-\varOmega (t)}\), which is \(n^{-\varOmega (\log n)}\) for \(t=(\log ^2 n)/(2c)\), and a union bound, this means that in any polynomial number of generations following the first monomorphic one the Hamming distance never exceeds \(\log ^2 n\) with probability \(1-n^{O(1)}n^{-\varOmega (\log n)} = 1-n^{-\varOmega (\log n)} \). \(\square \)

5 Roadmap for the Analysis

We give a deliberately informal, high-level view of our analysis, where \(\varDelta := \varphi (P_{t+1}) - \varphi (P_t)\) denotes the change in potential in one generation. By the law of total probability, \(\varDelta \) can be split up according to the number of zeros flipped by mutation:

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \right) }&=\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid \text {no zeros flip}\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( \text {no zeros flip}\right) }\\&\quad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid \text {one zero flips}\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( \text {one zero flips}\right) }\\&\quad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid \text {at least 2 zeros flip}\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( \text {at least 2 zeros flip}\right) }. \end{aligned}$$

The case that no zeros flip does not increase the potential, hence we aim to bound the first line from above by 0. The second line captures the most important case: one zero flips and subsequent progress is made. The second line will be bounded by the dominant term in our claimed lower bound. The third line involves the probability of flipping at least two zeros. If the number of zeros is small, this is unlikely and thus the third line only contributes a small order term.

The above high-level view is not particularly accurate. Firstly, the above estimations need to account for error terms. Secondly, the notion of “i zeros flip” used above is not well-defined. This is because the number of zeros that can flip during mutation depends on the parent selection. The same parents may be selected twice, and then the number of zeros depends on the fitness of the parent. If two different parents are used, we only consider mutations of bits that agree in both parents, as per Lemma 8.

Hence, we need to distinguish between different events from the parent selection. To formalise this, let \(P_{11}, P_{22}\), and \(P_{12}\) denote the events that parent selection chooses the first parent twice, the second parent twice and both parents, respectively. We further denote by \(F_{00}\) the number of flipping bits amongst the \(n_{00}\) bits and likewise for \(F_{11}\) and \(n_{11}\) bits. We use asterisks to indicate the union of sets: \(F_{0*}\) is the number of flipping bits among \(n_{01}+n_{00}\) bits and \(F_{*0}\) is the number of flipping bits among \(n_{10}+n_{00}\) bits. Variables \(F_{1*}\) and \(F_{*1}\) are defined analogously. Armed with this notation, we express the third line rigorously with a combination of events.

Lemma 10

For all populations with \(n_{11} \ge n - n/\log ^3 n\) and \(n_{10}+n_{01} \le \log ^2 n\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} \ge 2\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{0*} \ge 2\right) }\\&\quad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} \ge 2\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} \ge 2\right) }\\&\quad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} \ge 2\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{00} \ge 2\right) } = O\left( n_{0*}/(n \log n)\right) \end{aligned}$$

Proof

We give one common way of bounding the three lines from the statement. For all \(F \in \{F_{0*},F_{*0},F_{00}\}\), the drift can only increase by at most \(n_{01}+F\) as every flipping 0-bit can only increase the potential by at most 1 and crossover can increase the potential on all \(n_{01}\) by at most 1. Also note that \(\mathord {{\mathrm {P}}}\mathord {\left( F_{*0} \ge 2\right) }\) has the largest probability amongst all variables F (as the underlying number of zeros is maximal for \(F_{*0}\)). Thus, all 3 lines are bounded as

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid F \ge 2\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} \ge 2\right) }\\&\quad =\; \sum _{i=2}^{\infty } \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid F = i\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} = i\right) } \le \sum _{i=2}^{\infty } (n_{01}+i) \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} = i\right) }\\&\quad \le \; \sum _{i=2}^{\infty } (n_{01}+i) \left( \frac{cn_{0*}}{n}\right) ^i = \left( \frac{cn_{0*}}{n}\right) ^2 \sum _{i=0}^{\infty } (n_{01}+i+2) \left( \frac{cn_{0*}}{n}\right) ^{i} \end{aligned}$$

Since \(n_{0*} \le n - n_{11} \le n/\log ^3 n\), we get \(\left( \frac{cn_{0*}}{n}\right) ^2 = O(n_{0*}/(n\log ^3 n))\). The sum is bounded by \(n_{01}+2 + \sum _{i=0}^{\infty } i \left( \frac{cn_{0*}}{n}\right) ^{i}\), which is \(n_{01}+O(1) = O(\log ^2 n)\). Together, this implies the claim. \(\square \)

The cases of no zeros flipping and one zero flipping are more difficult to handle. Corresponding drift estimates will be derived in the following sections. More precisely, we derive closed formulas for the drift of the potential function depending on the choice of parents in the following Sect. 6 and then distinguish between generations that flip no zeros (Sect. 7) and one zero (Sect. 8). As mentioned in the introduction, the drift bounds obtained in these major technical sections are finally put together in Sect. 9.

6 Closed Formulas for the Potential Drift

Before proceeding to bound the drift of the potential, we derive closed bounds for the potential under the condition \(P_{12}\) that different parents are being selected. These bounds hold for arbitrary given values of \(F_{00}\), \(F_{11}\), the numbers of flipping common zeros and ones, respectively. These closed formulas will then be used in the subsequent sections to bound the drift for the cases \(F_{00}=0\) and \(F_{00}=1\), respectively.

For the derivation of closed formulas it is important to distinguish between two cases: the parents having equal fitness, \(n_{10}=n_{01}\), or the parents having unequal fitness, \(n_{10}>n_{01}\). This is because, depending on the fitness of the offspring, different cases may occur and the probabilities used during replacement selection may differ. For instance, when both parents have an equal fitness and the offspring happens to have the same fitness as well, there is a 3-way tie that is resolved uniform randomly by selection. When both parents have different fitness, 3-way ties are not possible (there can only be ties between the offspring and one parent). However, the offspring might be strictly worse than the better parent and strictly better than the worse parent. Such a case cannot happen when \(n_{10}=n_{01}\). It therefore makes sense to split the analysis into these two cases and to derive separate closed bounds on the potential drift.

6.1 A Closed Formula for Selecting Different Parents of Equal Fitness (\(n_{10}=n_{01}\))

In the following lemma we shall first derive a closed formula for the potential drift when two different parents are chosen that have the same fitness, \(n_{10}=n_{01}\). The lemma defines a threshold \(\ell \) that reflects the number of bits crossover needs to set to 1 to achieve the same fitness as \(x^1\) and \(x^2\).

Lemma 11

Let \(S \sim \mathrm {Bin}(n_{10}+n_{01}, 1/2)\) and \(\ell := n_{10}-F_{00}+F_{11}\). Then for all populations with \(n_{10}=n_{01}\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( \frac{5}{6} \cdot \mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) } + F_{00} - \frac{2F_{11}}{3} - n_{10} \right) + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \frac{n_{10}-F_{00}-F_{11}}{18}. \end{aligned}$$

Proof

Consider a step where both parents are selected and \(F_{00}, F_{11}\) are known. The number of bits set to 1 by crossover is given by \(S \sim \mathrm {Bin}(n_{10}+n_{01}, 1/2)\). By the law of total probability,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) } =\;&\sum _{s \in {\mathbb {Z}}} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }. \end{aligned}$$

Since \(n_{10}=n_{01}\), the fitness of both parents is \(n_{11}+n_{10}\) and the fitness of the offspring is \(n_{11} + s + F_{00} - F_{11}\). If \(s < \ell \), the latter is less than \(n_{11}+n_{10}\) and the offspring will be rejected. If \(s > \ell \), the offspring is fitter than the parent, one of the parents will be chosen uniformly at random for removal. If the offspring is as fit as both parents, the offspring is removed with probability 1/3. Thus,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \sum _{s \in {\mathbb {Z}}} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&=\; \sum _{s > \ell } \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) } + \frac{2}{3} \cdot \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = \ell \right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }\\&=\; \sum _{s \ge \ell } \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) } - \frac{1}{3} \cdot \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = \ell \right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }. \end{aligned}$$

Now we estimate \(\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\) for \(s\ge \ell \). Let \(S_{10}\) be the number of bits among the \(n_{10}\) bits that are set to 1 in the offspring and define \(S_{01}\) analogously for the \(n_{01}\) bits. Note that \(S := S_{10} + S_{01}\) where \(S_{10} \sim \mathrm {Bin}(n_{10}, 1/2)\) and \(S_{01} \sim \mathrm {Bin}(n_{01}, 1/2)\) according to Lemma 8.

Given \(S_{10} = s_{10}\) and \(S_{01} = s_{01}\), the potential difference \(\varDelta \) is derived as follows. Among the \(n_{11}\) bits, \(F_{11}\) bits flip to 0, reducing their contribution from 1 to 1/3 each, leading to a contribution of \(-2F_{11}/3\) to the potential difference. All the \(n_{10}\) bits contribute 1 to the potential \(\varphi (P_t)\). In \(P_{t+1}\), \(s_{10}\) bits contribute 1 and the remaining \(n_{10}-s_{10}\) bits contribute 1/3 each. Hence the contribution to the potential difference is \(-2(n_{10}-s_{10})/3\). The \(n_{01}\) bits contribute 1/3 each in \(\varphi (P_t)\) and in \(P_{t+1}\) we have \(s_{01}\) bits contributing 1 each and the other \(n_{01}-s_{01}\) bits contributing 0. Hence the contribution to the potential difference is \(s_{01} - n_{01}/3\). Finally, the contribution of the \(n_{00}\) bits to the potential difference is \(F_{00}\).

In the following we denote by \((X \mid Y)\) the conditional variable X conditional on Y, where Y is a sequence of events and/or random variables. Omitting the implicit conditions \(P_{12}, F_{00}, F_{11}\) for brevity, we obtain a conditional drift \(\varDelta \) of

$$\begin{aligned} (\varDelta \mid S_{10} = s_{10}, S_{01} = s_{01})&=\; F_{00} - \frac{2F_{11}}{3} - \frac{2(n_{10}-s_{10})}{3} + s_{01} - \frac{n_{01}}{3}\nonumber \\&=\; F_{00} - \frac{2F_{11}}{3} - n_{10} + s - \frac{1}{3} \cdot s_{10}. \end{aligned}$$
(1)

By the law of total probability,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) }&=\; \sum _{s_{10}=0}^{s} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s, S_{10} = s_{10}\right) } \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&=\; \sum _{s_{10}=0}^{s} \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + s - \frac{1}{3} \cdot s_{10}\right) \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&=\; F_{00} - \frac{2F_{11}}{3} - n_{10} + s - \frac{1}{3} \sum _{s_{10}=0}^{s} s_{10} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&=\; F_{00} - \frac{2F_{11}}{3} - n_{10} + s - \frac{1}{3} \cdot \mathord {{\mathrm {E}}}\mathord {\left( S_{10} \mid S = s\right) }. \end{aligned}$$

Now, \((S_{10} \mid S = s)\) follows a hypergeometric distribution with parameters \(n_{10}\) (number of red balls), \(n_{10}+n_{01}\) (number of balls) and s (number of draws). The expectation is thus \(s \cdot n_{10}/(n_{10}+n_{01}) = s/2\). Plugging this in yields

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } =\;&F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot s. \end{aligned}$$
(2)

Together, this gives

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \sum _{s \ge \ell } \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) } - \frac{1}{3} \cdot \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = \ell \right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }\\&=\; \sum _{s \ge \ell } \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot s\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\quad - \frac{1}{3} \cdot \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot \ell \right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }. \end{aligned}$$

Using

$$\begin{aligned}&\sum _{s \ge \ell } s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) } = \sum _{s \ge \ell } s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s \wedge S \ge \ell \right) }\\&\quad =\; \sum _{s \ge \ell } s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s \mid S \ge \ell \right) } \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } = \mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) } \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) }, \end{aligned}$$

the first terms simplify as

$$\begin{aligned}&\sum _{s \ge \ell } \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot s\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\quad =\; \left( F_{00} - \frac{2F_{11}}{3} - n_{10}\right) \sum _{s \ge \ell } \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) } + \frac{5}{6}\sum _{s \ge \ell } s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\quad =\; \left( F_{00} - \frac{2F_{11}}{3} - n_{10}\right) \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } + \frac{5}{6} \cdot \mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) } \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) }. \end{aligned}$$

The last term simplifies as

$$\begin{aligned}&- \frac{1}{3} \cdot \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot \ell \right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }\\&\quad =\; - \frac{1}{3} \cdot \left( F_{00} - \frac{2F_{11}}{3} - n_{10} + \frac{5}{6} \cdot (n_{10}-F_{00}+F_{11})\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }\\&\quad =\; - \frac{1}{3} \cdot \left( \frac{F_{00}}{6} +\frac{F_{11}}{6} - \frac{n_{10}}{6} \right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }\\&\quad =\; - \frac{F_{00}+F_{11}-n_{10}}{18} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } = \frac{n_{10}-F_{00}-F_{11}}{18} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) }. \end{aligned}$$

Together, this proves the claim. \(\square \)

The bound from Lemma 11 depends on the expected surplus \(\mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) }\) generated by a crossover on the bits that differ between the two parents, where \(\ell \) reflects the fitness threshold above which offspring are accepted. We use the following formula to simplify such expressions. The proof goes back to work by Gruder [13] that is highlighted in a paper by Johnson [17]. It is given in the appendix for the sake of completion.

Lemma 12

Let \(S \sim \mathrm {Bin}(n, 1/2)\), then for all \(\ell \in {\mathbb {N}}\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) } \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } = \frac{\ell }{2} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } + \frac{n}{2} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) }. \end{aligned}$$

Using Lemma 12, we obtain the following simplified formula.

Lemma 13

Let \(S \sim \mathrm {Bin}(n_{10}+n_{01}, 1/2)\). Then for all populations with \(n_{10}=n_{01}\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} - F_{00} + F_{11}\right) } \cdot \left( F_{00} - \frac{2F_{11}}{3} - \frac{n_{10}}{6} \right) \\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} - F_{00} + F_{11}\right) } \cdot \frac{17n_{10}-17F_{00}+13F_{11}}{36}. \end{aligned}$$

Proof

Recall \(\ell := n_{10} - F_{00} + F_{11}\). By Lemma 11 and Lemma 12,

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( \frac{5}{6} \cdot \mathord {{\mathrm {E}}}\mathord {\left( S \mid S \ge \ell \right) } + F_{00} - \frac{2F_{11}}{3} - n_{10} \right) + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \frac{n_{10}-F_{00}-F_{11}}{18}\\&\quad =\; \frac{5}{6} \left( \frac{\ell }{2} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } + n_{10} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) }\right) + \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( F_{00} - \frac{2F_{11}}{3} - n_{10} \right) \\&\qquad + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \frac{n_{10}-F_{00}-F_{11}}{18}\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( F_{00} - \frac{2F_{11}}{3} - \frac{n_{10}}{6} \right) + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \left( \frac{5}{12} \cdot \ell + \frac{n_{10}-F_{00}-F_{11}}{18}\right) \\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( F_{00} - \frac{2F_{11}}{3} - \frac{n_{10}}{6} \right) + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \frac{15\ell + 2n_{10}-2F_{00}-2F_{11}}{36}\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell \right) } \cdot \left( F_{00} - \frac{2F_{11}}{3} - \frac{n_{10}}{6} \right) + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell \right) } \cdot \frac{17n_{10}-17F_{00}+13F_{11}}{36}. \end{aligned}$$

\(\square \)

6.2 A Closed Formula for Selecting Different Parents of Unequal Fitness (\(n_{10}>n_{01}\))

For the case of unequal fitness (\(n_{10}>n_{01}\)), we use similar arguments as before. This scenario has more involved calculations as we need to distinguish different cases: the offspring may be at least as good as the fitter parent, and then the calculations are similar to the equal-fitness scenario. The offspring may also be worse than the fitter parent and better than the worse parent. In this case, the potential is derived from a different formula as the values for \(n_{10}\) and \(n_{01}\) in the next generation are still determined according to the fitter parent. In case the offspring’s fitness is equal to that of the worse parent, there is a tie and the offspring is only accepted with probability 1/2. The lemma defines two thresholds \(\ell _1\) and \(\ell _2\) that reflect the number of bits crossover needs to set to 1 to achieve the fitness of \(x^1\) and \(x^2\), respectively.

Lemma 14

Let \(S \sim \mathrm {Bin}(n_{10}+n_{01}, 1/2)\), \(\ell _1 :=n_{10}-F_{00}+F_{11}\) and \(\ell _2 :=n_{01}-F_{00}+F_{11}\). Then for all populations with \(n_{10}>n_{01}\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \mathord {{\mathrm {P}}}\mathord {\left( S> \ell _1\right) } \cdot \frac{2F_{00}-2F_{11}-n_{10}+n_{01}}{3}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _1\right) } \cdot \frac{F_{00}-F_{11}+n_{01}}{3}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S > \ell _2\right) } \cdot \frac{2F_{00} - n_{01}}{6}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } \cdot \frac{F_{00}}{6}. \end{aligned}$$

Proof

Consider a step where both parents are selected and \(F_{00}, F_{11}\) are known. The number of bits set to 1 by crossover is given by \(S \sim \mathrm {Bin}(n_{10}+n_{01}, 1/2)\). By the law of total probability,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) } =\;&\sum _{s \in {\mathbb {Z}}} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }. \end{aligned}$$

Note that the fitness of \(x^1\) is \(n_{11}+n_{10}\), that of \(x^2\) is \(n_{11}+n_{01}\) and the fitness of the offspring is \(n_{11} + s + F_{00} - F_{11}\).

We distinguish the following cases.

  1. 1.

    If \(s \ge \ell _1\), the offspring is at least as fit as the fitter parent, and \(x^2\) will be removed. The new values of \(n_{10}, n_{01}\) will be determined by the offspring.

  2. 2.

    If \(\ell _2< s < \ell _1\), the offspring is fitter than the worse parent \(x^2\), and \(x^2\) will be removed. Parent \(x^1\) will remain the fitter parent.

  3. 3.

    If \(s = \ell _2\), the offspring is as fit as \(x^2\) and \(x^2\) will be removed with probability 1/2. Then the potential is computed as in the case \(\ell _2< s < \ell _1\).

  4. 4.

    If \(s < \ell _2\), the offspring is worse than \(x^2\) and will be rejected. Hence the potential does not change.

Let \(\varDelta \) be the change of potential from the current population to the new population \(P_{t+1}\) as described above. Then,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \sum _{s=\ell _1}^{n_{10}+n_{01}} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\quad + \sum _{s=\ell _2}^{\ell _1-1} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = s\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\quad - \frac{1}{2} \cdot \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}, S = \ell _2\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) }, \end{aligned}$$

where the last line accounts for the fact that with probability 1/2 the offspring is removed if \(s=\ell _2\).

Now we estimate \(\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid S = s\right) } \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\), starting with the case for \(s\ge \ell _1\). Let \(S_{10}\) be the number of bits among the \(n_{10}\) bits that are set to 1 in the offspring and define \(S_{01}\) analogously for the \(n_{01}\) bits. Note that \(S := S_{10} + S_{01}\) where \(S_{10} \sim \mathrm {Bin}(n_{10}, 1/2)\) and \(S_{01} \sim \mathrm {Bin}(n_{01}, 1/2)\).

Given \(S_{10} = s_{10}\) and \(S_{01} = s_{01}\), the potential difference \(\varDelta \) is derived as follows. Among the \(n_{11}\) bits, \(F_{11}\) bits flip to 0, reducing their contribution from 1 to 1/3 each, leading to a contribution of \(-2F_{11}/3\) to the potential difference. All the \(n_{10}\) bits contribute 1 to the potential \(\varphi (P_t)\). In \(P_{t+1}\), \(s_{10}\) bits contribute 1 and the remaining \(n_{10}-s_{10}\) bits contribute 1/3 each. Hence the contribution to the potential difference is \(-2(n_{10}-s_{10})/3\). The \(n_{01}\) bits contribute 1/3 each in \(\varphi (P_t)\) and in \(P_{t+1}\) we have \(s_{01}\) bits contributing 1 each and the other \(n_{01}-s_{01}\) bits contributing 0. Hence the contribution to the potential difference is \(s_{01} - n_{01}/3\). Finally, the contribution of the \(n_{00}\) bits to the potential difference is \(F_{00}\). Together,

$$\begin{aligned} (\varDelta \mid S_{10} = s_{10}, S_{01} = s_{01})&=\; F_{00} - \frac{2F_{11}}{3} - \frac{2(n_{10}-s_{10})}{3} + s_{01} - \frac{n_{01}}{3}\nonumber \\&=\; F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}}{3} - \frac{n_{01}}{3} + \frac{2}{3} \cdot s_{10} + s_{01}\nonumber \\&=\; F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s - \frac{1}{3} \cdot s_{10}. \end{aligned}$$
(3)

By the law of total probability, for all \(s \ge \ell _1\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12},F_{00},F_{11},S = s\right) }\\&\quad =\; \sum _{s_{10}=0}^{s} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12},F_{00},F_{11},S = s, S_{10} = s_{10}\right) } \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&\quad =\; \sum _{s_{10}=0}^{s} \left( F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s - \frac{1}{3} \cdot s_{10}\right) \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&\quad =\; F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s - \frac{1}{3} \sum _{s_{10}=0}^{s} s_{10} \cdot \mathord {{\mathrm {P}}}\mathord {\left( S_{10} = s_{10} \mid S = s\right) }\\&\quad =\; F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s - \frac{1}{3} \cdot \mathord {{\mathrm {E}}}\mathord {\left( S_{10} \mid S = s\right) }. \end{aligned}$$

Now, \((S_{10} \mid S = s)\) follows a hypergeometric distribution with parameters \(n_{10}\) (number of red balls), \(n_{10}+n_{01}\) (number of balls) and s (number of draws). The expectation is thus \(s \cdot n_{10}/(n_{10}+n_{01})\). Plugging this in yields

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12},F_{00},F_{11},S = s\right) } =\;&F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s \left( 1 - \frac{n_{10}}{3n_{10}+3n_{01}}\right) . \end{aligned}$$
(4)

Now we consider the case \(\ell _2 \le S < \ell _1\). Here \(x^1\) remains the fitter parent and the potential difference \(\varDelta \) is derived as follows. Flipping bits among the \(n_{11}\) and \(n_{10}\) bits do not change the potential. We have \(S_{01}+F_{00}\) bits that each contribute 1/3 to the potential, compared to \(n_{01}\) bits in the previous population. Hence the change in potential is

$$\begin{aligned} (\varDelta \mid P_{12},F_{00},F_{11},S_{01} = s_{01}) =\;&\frac{s_{01}-n_{01}+F_{00}}{3}. \end{aligned}$$
(5)

By the law of total probability, for all \(\ell _2 \le s < \ell _1\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12},F_{00},F_{11},S = s\right) }\\&\quad =\; \sum _{s_{01}=0}^{s} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12},F_{00},F_{11},S = s, S_{01} = s_{01}\right) } \mathord {{\mathrm {P}}}\mathord {\left( S_{01} = s_{01} \mid S = s\right) }\\&\quad =\; \frac{\mathord {{\mathrm {E}}}\mathord {\left( S_{01} \mid S = s\right) } - n_{01} + F_{00}}{3}\\&\quad =\; s \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3} \end{aligned}$$

using the same calculations as before.

Plugging everything together,

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }\\&\quad =\; \sum _{s=\ell _1}^{n_{10}+n_{01}} \left( F_{00} - \frac{2F_{11}}{3} - \frac{2n_{10}+n_{01}}{3} + s \cdot \frac{2n_{10}+3n_{01}}{3n_{10}+3n_{01}}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad + \sum _{s=\ell _2}^{\ell _1-1} \left( s \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad - \frac{1}{2} \cdot \left( \ell _2 \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) }.\ \end{aligned}$$

By adding terms for \(\ell _1 \le s \le n_{10}+n_{01}\) to the second line and subtracting them from the first line, we get

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }\\ =\;&\sum _{s=\ell _1}^{n_{10}+n_{01}} \left( \frac{2F_{00}}{3} - \frac{2F_{11}}{3} - \frac{2n_{10}}{3} + \frac{2s}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad + \sum _{s=\ell _2}^{n_{10}+n_{01}} \left( s \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad - \frac{1}{2} \cdot \left( \ell _2 \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) }\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _1\right) } \cdot \frac{2F_{00}-2F_{11}-2n_{10}}{3} + \frac{2}{3}\sum _{s=\ell _1}^{n_{10}+n_{01}} s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad + \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _2\right) } \left( \frac{- n_{01} + F_{00}}{3}\right) + \frac{n_{01}}{3n_{10}+3n_{01}} \sum _{s=\ell _2}^{n_{10}+n_{01}} s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }\\&\qquad - \frac{1}{2} \cdot \left( \ell _2 \cdot \frac{n_{01}}{3n_{10}+3n_{01}} + \frac{- n_{01} + F_{00}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) }. \end{aligned}$$

Applying Lemma 12 twice, to the first and second line, yields the following two equations:

$$\begin{aligned} \sum _{s=\ell _1}^{n_{10}+n_{01}} s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }&=\; \frac{\ell _1}{2} \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _1\right) } + \frac{n_{10}+n_{01}}{2} \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _1\right) }\\ \sum _{s=\ell _2}^{n_{10}+n_{01}} s \cdot \mathord {{\mathrm {P}}}\mathord {\left( S = s\right) }&=\; \frac{\ell _2}{2} \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } + \frac{n_{10}+n_{01}}{2} \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _2\right) }. \end{aligned}$$

Plugging this in yields

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11}\right) }&=\; \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _1\right) } \cdot \frac{2F_{00}-2F_{11}-n_{10}+n_{01}}{3}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _1\right) } \cdot \frac{\ell _1}{3}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S \ge \ell _2\right) } \cdot \frac{2F_{00} - n_{01}}{6}\\&\quad + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } \cdot \frac{n_{01} - F_{00}}{6}. \end{aligned}$$

\(\square \)

7 Potential Drift When No Zeros Flip

We now consider the potential drift when no zeros flip. When the distance to the optimum is o(n), this case is by far the most frequent case. This also means that our drift bounds have to be precise, as even a small error term may have a big impact and spoil the analysis.

7.1 Selecting the Same Parent Twice

We start by considering the drift conditional on selecting the same parent twice.

Lemma 15

For all populations with \(n_{10}=n_{01}\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 0\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{0*} = 0\right) } = -\frac{n_{01}}{9} \left( 1 - \frac{c}{n}\right) ^n\\&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 0\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} = 0\right) } = -\frac{n_{01}}{9} \left( 1 - \frac{c}{n}\right) ^n. \end{aligned}$$

For all populations with \(n_{10} > n_{01}\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 0\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{0*} = 0\right) } = -\frac{n_{01}}{3} \left( 1 - \frac{c}{n}\right) ^n\\&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 0\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{*0} = 0\right) } = 0. \end{aligned}$$

Proof

First assume \(n_{10}=n_{01}\) and consider the event \(P_{11}\). Given \(F_{0*}=0\), that is, if no 0-bit is flipped, the offspring can only be accepted if no 1-bit is flipped, i. e., \(F_{1*}=0\). These events happen with probability \(\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=0\right) }\mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) }=(1-c/n)^n\) and they lead to an offspring that is identical to \(x^1\). Since all search points have equal fitness, \(x^2\) is removed with probability 1/3. This leads to a monomorphic population and the potential decreases by \(n_{01}/3\). Multiplying the above terms proves the claimed equality. The case of \(P_{22}\) follows analogously, considering \(F_{*0}\) and \(F_{*1}\) instead.

For \(n_{10} > n_{01}\), if the fitter parent \(x^1\) is selected twice, a copy of it is created with probability \((1-c/n)^n\) and then \(x^2\) is removed. This decreases the potential by \(n_{01}/3\). Multiplying the above terms yields an expectation of \(-n_{01}/3 \cdot (1-c/n)^n\). Note that other operations cannot increase the potential since no 0-bit is being flipped and flipping 1-bits in \(x^1\) does not decrease the potential.

If \(x^2\) is selected twice as parent, the potential cannot increase since no 0-bits are flipped, and it cannot decrease as any 1-bit being flipped will lead to the offspring being rejected. \(\square \)

Now we consider the event that two different parents are selected. The remainder of this section is split into drift bounds when the parents have equal fitness, \(n_{10}=n_{01}\), and the case where parents have unequal fitness, \(n_{10} > n_{01}\).

7.2 Selecting Different Parents of Equal Fitness (\(n_{10}=n_{01}\))

We use the closed formula from Lemma 13 to show that, to get an upper bound on the drift when \(F_{00}=0\), we only need to consider \(F_{11} \in \{0, 1\}\) as larger values lead to a non-positive drift. This is not obvious, but follows from a lengthy and trite calculation. The proof is placed in the appendix to keep the main part streamlined.

Lemma 16

For all \(n_{10}=n_{01}\) and all \(i \ge 2\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=0, F_{11}=i\right) } \le 0. \end{aligned}$$

Now we are able to give an upper bound on the potential drift, assuming that different parents are selected (\(P_{12}\)) and no common zero-bits flip (\(F_{00}=0\)).

Lemma 17

For every population with \(n_{10}=n_{01}\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 0\right) } \le {\left\{ \begin{array}{ll} 0 &{} \text {if }n_{10} = 0\\ \frac{1}{9} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } &{} \text {if }n_{10} = 1\\ \frac{1}{8} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } + \frac{1}{64} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } &{} \text {if }n_{10} \ge 2. \end{array}\right. } \end{aligned}$$

Proof

By the law of total probability and Lemma 13,

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}\right) }\\&\quad =\; \sum _{i=0}^{\infty } \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11} = i\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{11} = i\right) }\\&\quad =\; \sum _{i=0}^{\infty } \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) } \left( \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + i\right) } \cdot \frac{17n_{10}+13i}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} + i\right) } \cdot \frac{6n_{10} + 24i}{36}\right) \\&\quad =\; \sum _{i=0}^{n_{10}-1} \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) } \left( \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + i\right) } \cdot \frac{17n_{10}+13i}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} + i\right) } \cdot \frac{6n_{10} + 24i}{36}\right) \end{aligned}$$

as the term in brackets is 0 for all values \(i \ge n_{10}\).

For \(n_{10}=0\), the above is 0 and for \(n_{10}=1\), is it \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) }/9\) as claimed. For \(n_{10} \ge 2\), by Lemma 16 all terms for \(i \ge 2\) are non-positive and can be dropped. Thus, we get

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}\right) }\\&\quad \le \; \sum _{i=0}^{1} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}, F_{11} = i\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{11} = i\right) }\nonumber \\&\quad =\; \sum _{i=0}^{1} \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) } \left( \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + i\right) } \cdot \frac{17n_{10}+13i}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} + i\right) } \cdot \frac{6n_{10} + 24i}{36}\right) \\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } \left( \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{17n_{10}}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}\right) } \cdot \frac{6n_{10}}{36}\right) \\&\qquad + \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } \left( \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + 1\right) } \cdot \frac{17n_{10}+13}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} + 1\right) } \cdot \frac{6n_{10} + 24}{36}\right) . \end{aligned}$$

Writing the above as

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } a_0 + \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } a_1, \end{aligned}$$

we have \(a_0 = 0\) for \(n_{10}=0\) and \(a_0=1/9\) for \(n_{10}=1\). The maximum value is \(a_0 = 1/8\) for \(n_{10}=2\). This can be seen from evaluating \(a_0\) for \(0 \le n_{10} \le 10\) (shown in Table 1) and from the following analytical arguments for \(n_{10} \ge 11\). We use the bound on the largest binomial coefficient \(\left( {\begin{array}{c}2n_{10}\\ k\end{array}}\right) \le \left( {\begin{array}{c}2n_{10}\\ n_{10}\end{array}}\right) \le 2^{2n_{10}}/(\sqrt{\pi n_{10}})\) for all k according to Lemma 7 and the fact that \(\mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}\right) } \ge 1/2\) by symmetry of the binomial distribution. Then \(a_0\) is at most

$$\begin{aligned}&\mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{17n_{10}}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}\right) } \cdot \frac{6n_{10}}{36}\\&\quad \le \; \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{17n_{10}}{36} - \frac{n_{10}}{12}\\&\quad \le \; \frac{1}{\sqrt{\pi n_{10}}} \cdot \frac{17n_{10}}{36} - \frac{n_{10}}{12}\\&\quad =\; \frac{1}{\sqrt{\pi }} \cdot \frac{17\sqrt{n_{10}}}{36} - \frac{n_{10}}{12}. \end{aligned}$$

This upper bound becomes negative for \(n_{10} \ge 11\) (the real value \(a_0\) that is being bounded from above already becomes negative for \(n_{10} \ge 7\)). The term \(a_1\) is 0 for \(n_{10} \le 1\) and at most 1/64 for \(n_{10} \ge 2\). This can be seen by evaluating the above formula for \(2 \le n_{10} \le 10\) (see Table 1) and bounding it above as follows for \(n_{10} \ge 3\), using that \(\mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}+1\right) }\) is minimised for \(n_{10}=3\), where it is 5/16.

$$\begin{aligned}&\mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + 1\right) } \cdot \frac{17n_{10}+13}{36} - \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10} + 1\right) } \cdot \frac{6n_{10} + 24}{36}\\&\quad \le \; \frac{1}{\sqrt{\pi n_{10}}} \cdot \frac{17n_{10}+13}{36} - \frac{5}{16} \cdot \frac{6n_{10} + 24}{36}\\&\quad =\; \frac{1}{\sqrt{\pi n_{10}}} \cdot \frac{17n_{10}+13}{36} - \frac{5}{16} \cdot \frac{6n_{10} + 24}{36}, \end{aligned}$$

which is at most \(n_{10}/64\) for \(n_{10} \ge 11\). \(\square \)

Table 1 Numerical values for coefficients \(a_0\) and \(a_1\) defined in the proof of Lemma 17, for different values of \(n_{10}=n_{01}\)

7.3 Selecting Different Parents of Unequal Fitness (\(n_{10}>n_{01}\))

The case of \(P_{12}\), different parents being selected, no common 0-bits flipping (\({F_{00}=0}\)) and parents having unequal fitness is easy to handle as in this scenario the potential drift is always non-positive. This is shown in the following lemma. It also states a closed formula for the potential drift as this will be used later on, in the proof of Lemma 22.

Lemma 18

For all \(n_{10} > n_{01}\) and all \(F_{11} \in {\mathbb {N}}_0\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=0, F_{11}\right) }\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S> \ell _1\right) } \frac{-2F_{11}-n_{10}+n_{01}}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S=\ell _1\right) }\frac{-F_{11}+n_{01}}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S > \ell _2\right) } \frac{- n_{01}}{6} \le 0. \end{aligned}$$

The proof is not very insightful, hence it is placed in the appendix.

7.4 Combined Drift Bound When No Zero Flips

Assembling the previous drift bounds under various conditions, we get the following drift bounds.

Lemma 19

Assume \(c \le 56/9\), \(n_{10}=n_{01}\) and \(n_{11} \le n-c\).

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 0\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 0\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 0\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 0\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 0\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 0\right) }\\&\quad \le {\left\{ \begin{array}{ll} \frac{cn_{10}^2}{9n} &{} \text {if }n_{10}=n_{01}\\ -\frac{n_{01}}{12} \left( 1 - \frac{c}{n}\right) ^n &{} \text {if }n_{10} > n_{01}. \end{array}\right. } \end{aligned}$$

Proof

For \(n_{10}=n_{01}\), we argue that Lemma 17 implies

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 0\right) } \le \frac{n_{01}}{9} \left( 1 - \frac{c}{n}\right) ^{n_{11}}. \end{aligned}$$

This is obvious for \(n_{10} \le 1\); for \(n_{10} \ge 2\) it follows from \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } = cn_{11}/n \cdot \left( 1 - c/n\right) ^{n-1} = cn_{11}/(n-c) \cdot \left( 1-c/n\right) ^n \le c\left( 1-c/n\right) ^n\) and \(1/8 + c/64 \le 2/9 \le n_{01}/9\) using \(c \le 56/9\).

By Lemmas 15 and 17, along with \(\mathord {{\mathrm {P}}}\mathord {\left( P_{12}\right) } = \mathord {{\mathrm {P}}}\mathord {\left( P_{11} \cup P_{22}\right) } = 1/2\) and \(\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=0\right) }=(1-c/n)^{n_{00}}\), the left-hand side is at most

$$\begin{aligned}&\frac{n_{10}}{18} \left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{00}} - \frac{n_{10}}{18} \left( 1 - \frac{c}{n}\right) ^n\\&\quad =\;\frac{n_{10}}{18} \left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{00}} \left( 1 - \left( 1 - \frac{c}{n}\right) ^{n_{10}+n_{01}}\right) \\&\quad \le \;\frac{n_{10}}{18} \left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{00}} \left( 1 - \left( 1 - \frac{c(n_{10}+n_{01})}{n}\right) \right) \\&\quad =\;\frac{n_{10}}{18} \left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{00}} \frac{c(n_{10}+n_{01})}{n} \le \frac{cn_{10}^2}{9n}. \end{aligned}$$

The bound for the case \(n_{10}>n_{01}\) follows immediately from Lemmas 15 and 18, along with \(\mathord {{\mathrm {P}}}\mathord {\left( P_{11}\right) } = 1/4\). \(\square \)

8 Potential Drift When One Zero Flips

In this section we show that the drift is bounded by a term that yields the leading constant we are aiming for in our main result. Note that here we can afford to include error terms of lower order.

We first consider the case that two equal parents are selected.

Lemma 20

For all populations with \(n_{10}=n_{01}\),

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \le \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } \left( 1 - \frac{n_{01}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) } \left( \frac{2}{9} - \frac{n_{01}}{9} \right) \\&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \le \mathord {{\mathrm {P}}}\mathord {\left( F_{*1}=0\right) } \left( 1 - \frac{n_{01}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( F_{*1}=1\right) } \left( \frac{2}{9} - \frac{n_{01}}{9} \right) . \end{aligned}$$

For all populations with \(n_{10} > n_{01}\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \le \;&{\left\{ \begin{array}{ll} \frac{1}{3} + \frac{2}{3} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } &{} \text {if }n_{01}=0\\ \left( 1 - \frac{n_{01}}{3}\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } + \left( \frac{1}{3} - \frac{n_{01}}{3}\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) } &{} \text {otherwise}. \end{array}\right. }\\ \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \le \;&\frac{1}{3} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1} = 0\right) } + \frac{1}{6} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1}=1\right) }. \end{aligned}$$

Proof

First assume \(n_{10} = n_{01}\) and consider the event \(P_{11}\). Given \(F_{0*} = 1\), the offspring is accepted with certainty if no 1-bit is flipped. With probability 1/2 the parent survives and then the new potential is at most \(n_{11}+n_{10}+1\). Consequently, the potential changes by at most \(1-n_{01}/3\) due to the loss of diversity. With the remaining probability 1/2, the potential increases by at most 1. The overall expected change in potential in this case is thus at most \(1-n_{01}/6\). If a single 1-bit also flips together with the 0-bit, then the offspring has the same fitness as both parents and the individual to be removed is selected uniformly at random. If the offspring is removed, the potential does not change. If the parent is removed, the potential increases by at most 1/3. If the other population member is removed, the potential increases by at most \(1/3-n_{10}/3\) due to the loss of diversity. The expected change in potential in this case is thus at most \(2/9 - n_{10}/9\). If more than one 1-bits flip, then the offspring will have lower fitness than both other members and it will be rejected. Summing up the terms proves the first claim and the case of \(P_{22}\) follows analogously, considering \(F_{*0}\) and \(F_{*1}\) instead.

For \(n_{10} > n_{01}\), given \(F_{0*} = 1\) and that the fitter parent \(x^1\) is selected twice, we consider separate cases according to the size of \(n_{01}\). If \(n_{01}=0\), then no diversity can be lost. Thus, if no 1-bits flip, the potential increases by 1 because the new offspring has higher fitness than \(x^1\) and \(x^2\) is rejected. If at least one 1-bit flips, then the best fitness does not change and the potential increases by at most 1/3. Overall, for the case when \(n_{01}=0\), the potential changes by \(\mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } + 1/3 \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}>0\right) } = 1/3 + 2/3 \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) }\). If \(n_{01}>0\) and no 1-bits flip, then the potential changes by at most \(1 - n_{01}/3\) since the diversity is lost because \(x^2\) is removed. If instead at least one 1-bit flips, then the potential changes by at most \(1/3 - n_{01}/3\) since the best fitness does not change and the diversity may be lost if \(x^2\) is removed. Since for \(n_{01}>0\) these terms are negative, and the offspring is accepted with probability 1 if \(F_{1*} = 1\) the claim follows by summing up the two terms.

If \(x^2\) is selected twice and no 1-bits are flipped, then the potential increases by at most 1/3 (i.e., if an \(n_{00}\) bit is flipped) since the parent is removed and the diversity is kept. If a single 1-bit is flipped then the potential increases again by at most 1/3. However, since the offspring has the same fitness as its parent, it is necessary that the parent is removed which happens with probability 1/2. If more than one 1-bits are flipped, then the offspring is rejected. Summing up the terms completes the proof. \(\square \)

The proof of Lemma 20 has revealed a counterintuitive effect. A population of individuals with very different fitness values \(f(x^1) \gg f(x^2)\) can have an advantage over a population where both members have the same fitness \(f(x^1)\). This is because, conditioning on a 0-bit flipping, if the fitter parent is chosen twice, a near-arbitrary number of 1-bits can flip at the same time and the outcome may still be accepted. This increases the potential and explains why Lemma 20 contains an unexpectedly large potential drift in the case \(n_{10} > n_{01} = 0\).

Now we consider the case \(P_{12}\) that two different parents are selected.

When \(F_{11} \ge 2\) and the parents have equal fitness, the drift under \(P_{12}\) is non-positive for \(n_{10} \ge 10\), except for \(F_{11}=n_{10-1}\), where it is exponentially small in \(n_{10}\). The proof of the following lemma is given in the appendix.

Lemma 21

For all \(n_{10}=n_{01}\), \(n_{10} \ge 10\) and all \(2 \le i \le n_{10}-1\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=1, F_{11}=i\right) } \le {\left\{ \begin{array}{ll} 5n_{10}^32^{-2n_{10}}/3 &{} \text {if }i = n_{10}-1\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

When the two parents have unequal fitness, the following uniform upper bound on the potential drift applies.

Lemma 22

For all \(n_{10}>n_{01}\) and all \(F_{11} \in {\mathbb {N}}_0\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=1, F_{11}\right) } \le \frac{2}{3}. \end{aligned}$$

Proof

We start again with the drift formula from Lemma 14 and plug in \(F_{00}=1\). Then

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=1, F_{11}\right) } =\;&\mathord {{\mathrm {P}}}\mathord {\left( S> \ell _1\right) } \cdot \frac{2-2F_{11}-n_{10}+n_{01}}{3}\\&+ \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _1\right) } \cdot \frac{1-F_{11}+n_{01}}{3}\\&+ \mathord {{\mathrm {P}}}\mathord {\left( S > \ell _2\right) } \cdot \frac{2 - n_{01}}{6}\\&+ \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } \cdot \frac{1}{6}. \end{aligned}$$

For \(n_{10}<12\) the claim is verified numerically for all pairs \(n_{01}<n_{10}\), see Table 4. We now turn to an analytical argument for \(n_{10}\ge 12\).

From Lemma 18 we already know that

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( S> \ell _1\right) } \frac{-2F_{11}-n_{10}+n_{01}}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S=\ell _1\right) }\frac{-F_{11}+n_{01}}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S > \ell _2\right) } \frac{- n_{01}}{6} \le 0 \end{aligned}$$

regardless of \(F_{11}\). Hence,

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=1, F_{11}\right) }\\&\quad \le \; \mathord {{\mathrm {P}}}\mathord {\left( S> \ell _1\right) } \cdot \frac{2}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _1\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S> \ell _2\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } \cdot \frac{1}{6} \\&\quad \le \; \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S > \ell _2\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = \ell _2\right) } \cdot \frac{1}{6} \end{aligned}$$

since \(\mathord {{\mathrm {P}}}\mathord {\left( S\ge \ell _1\right) }\le 1/2\) (using that \(\ell _1\) is strictly greater than the mean of S). Altogether,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00}=1, F_{11}\right) }&\le \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S\ge \ell _2\right) }\frac{1}{3} \le \frac{2}{3} \end{aligned}$$

no matter how \(\ell _1\) and \(\ell _2\) turn out. \(\square \)

To obtain a combined drift formula, we need to consider probabilities for flipping a certain number of 1-bits. The following simple lemma gives bounds for these.

Lemma 23

For every mutation rate c/n and every i,

$$\begin{aligned} \left( \frac{n_{11}-i+1}{n-c}\right) ^i \cdot \frac{c^i}{i!}\left( 1 - \frac{c}{n}\right) ^{n_{11}} \le \mathord {{\mathrm {P}}}\mathord {\left( F_{11} = i\right) } \le \left( \frac{n_{11}}{n-c}\right) ^i \cdot \frac{c^i}{i!} \left( 1 - \frac{c}{n}\right) ^{n_{11}}. \end{aligned}$$

Proof

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{11} = i\right) }&=\; \left( {\begin{array}{c}n_{11}\\ i\end{array}}\right) \left( \frac{c}{n}\right) ^i \left( 1 - \frac{c}{n}\right) ^{n_{11}-i}\\&=\; \left( {\begin{array}{c}n_{11}\\ i\end{array}}\right) \left( \frac{c}{n-c}\right) ^i \left( 1 - \frac{c}{n}\right) ^{n_{11}}\\&=\; \left( \frac{n_{11}}{n-c} \cdot \frac{n_{11}-1}{n-c} \cdot \dots \cdot \frac{n_{11}-i+1}{n-c}\right) \cdot \frac{c^i}{i!} \cdot \left( 1 - \frac{c}{n}\right) ^{n_{11}}. \end{aligned}$$

The first bracket is bounded as

$$\begin{aligned} \left( \frac{n_{11}-i+1}{n}\right) ^i \le \left( \frac{n_{11}}{n-c} \cdot \frac{n_{11}-1}{n-c} \cdot \dots \cdot \frac{n_{11}-i+1}{n-c}\right) \le \left( \frac{n_{11}}{n-c}\right) ^i. \end{aligned}$$

\(\square \)

Using Lemmas 20 and 21 and considering the drift under \(P_{12}\) separately for \(F_{11} \in \{0,1\}\) and for all \(F_{11}\) when \(n_{10} \le 10\), we get:

Lemma 24

Assume \(c \le 2.71\), \(n_{10}+n_{01} \le \log ^2 n\), \(n_{11} \ge n - n/\log ^3 n\) and \(n_{00} \ge n_{10}\log n\).

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\\&\quad \le {\left\{ \begin{array}{ll} \frac{c(2c+9)}{9e^c} \cdot \frac{n_{00}}{n} \cdot (1+O(1/\log n)) &{} \text {if }n_{10}=n_{01}\\ \frac{c}{e^c} \left( \frac{1}{4} + \frac{c}{48} + \frac{e^c}{4} + O(1/\log n)\right) \frac{n_{00}}{n} &{} \text {if }n_{10}> n_{01} = 0\\ \frac{c}{e^c} \left( \frac{1}{4} + \frac{c}{24} + \frac{e^c}{3} + O(1/n)\right) \frac{n_{00}}{n} &{} \text {if }n_{10}> n_{01} > 0. \end{array}\right. } \end{aligned}$$

Proof

We consider different cases.

Case \({{\varvec{n}}}_{{{\varvec{10}}}}={{\varvec{n}}}_{{{\varvec{01}}}}\): In case both parents have equal fitness,

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }&=\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) } \end{aligned}$$

and we bound the sum of both terms, using \(\mathord {{\mathrm {P}}}\mathord {\left( P_{11}\right) } = \mathord {{\mathrm {P}}}\mathord {\left( P_{22}\right) } = 1/4\) and Lemma 20 (noting that the formulas for \(P_{11}\) and \(P_{22}\) are identical) as

$$\begin{aligned}&\frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) }}{2} \left( \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } \left( 1 - \frac{n_{01}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) } \left( \frac{2}{9} - \frac{n_{01}}{9} \right) \right) . \end{aligned}$$
(6)

For \(P_{12}\) we consider probabilities \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11} = \cdot \right) }\), hence we aim to relate this with probabilities \(\mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=\cdot \right) }\) as follows. (Note that, for \(n_{10}=0\), \(F_{1*}=F_{11}\).)

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } = \left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{10}} = \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } \left( 1 - \frac{c}{n}\right) ^{n_{10}} \end{aligned}$$

and the term \(\left( 1 - \frac{c}{n}\right) ^{n_{10}}\) is at least \(1 - \frac{cn_{10}}{n}\) and at most 1. Similarly,

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) } = \frac{c(n_{11}+n_{10})}{n}\left( 1 - \frac{c}{n}\right) ^{n_{11}+n_{10}-1} = \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } \frac{n_{11}+n_{10}}{n_{11}} \left( 1 - \frac{c}{n}\right) ^{n_{10}} \end{aligned}$$

and the term \(\frac{n_{11}+n_{10}}{n_{11}} \left( 1 - \frac{c}{n}\right) ^{n_{10}}\) is at least \(1 - \frac{cn_{10}}{n}\) and at most \(1+O(n_{10}/n)\). Likewise,

$$\begin{aligned} \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) }}{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }} = \frac{n_{00}+n_{01}}{n_{00}} \left( 1 - \frac{c}{n}\right) ^{n_{01}}, \end{aligned}$$

which is at least \(1-cn_{01}/n\) and at most \(1+O(1/\log n)\) owing to the assumption \(n_{00} \ge n_{10}\log n \ge n_{01}\log n\).

Thus, (6) can be written as

$$\begin{aligned}&\frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{2} \left( \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=0\right) } \left( 1 - \frac{n_{01}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=1\right) } \left( \frac{2}{9} - \frac{n_{01}}{9} \right) \right) + \xi \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \end{aligned}$$
(7)

for an error term \(\xi \in [-O(1/\log n), +O(1/\log n)]\), regardless of the signs of the factors the terms \(\mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) }\) and \(\mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) }\) are multiplied with. Here we used that these factors are in \([-O(n_{10}), O(n_{10})]\) and the absolute value of the error from replacing each term \(\mathord {{\mathrm {P}}}\mathord {\left( F_{1*} = \cdot \right) }\) with \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11} = \cdot \right) }\) is at most \(O(n_{10}^2/n) = O(1/\log n)\).

The third term from the statement is

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\nonumber \\&\quad =\;\frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{2} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) }\nonumber \\&\quad =\;\frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{2} \sum _{i=0}^{n_{10}+1} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }. \end{aligned}$$
(8)

We simplify the expectations in the above terms for \(i \in \{0, 1\}\) using Lemma 13 and add them to the terms in (7). This yields an upper bound of

$$\begin{aligned}&\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1, F_{11}=0\right) } \cdot a_0 + \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1, F_{11}=1\right) } \cdot a_1 \end{aligned}$$

for coefficients \(a_0, a_1\) defined as follows:

$$\begin{aligned} a_0&:=\; \frac{1}{2} \left( 1 - \frac{n_{10}}{6} + \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}-1\right) }\left( 1 - \frac{n_{10}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}-1\right) } \cdot \frac{17n_{10}-17}{36}\right) \\ a_1&:=\; \frac{1}{2} \left( \frac{2}{9} - \frac{n_{10}}{9} + \mathord {{\mathrm {P}}}\mathord {\left( S \ge n_{10}\right) }\left( \frac{1}{3} - \frac{n_{10}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{17n_{10}-4}{36}\right) . \end{aligned}$$

It is easy to verify that \(a_0 = 1\) for \(n_{10}=0\) and \(a_0 \le 1\) for \(1 \le n_{10} \le 9\). For \(n_{10} \ge 10\), \(a_0\) becomes negative since the term \(\mathord {{\mathrm {P}}}\mathord {\left( S > n_{10}-1\right) }\left( 1 - \frac{n_{10}}{6}\right) \) is non-positive and using \(\mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}-1\right) } \le 1/(\sqrt{\pi n_{10}})\) according to Lemma 7, we get

$$\begin{aligned} a_0 \le \;&\frac{1}{2} \left( 1 - \frac{n_{10}}{6} + \frac{1}{\sqrt{\pi n_{10}}} \cdot \frac{11n_{10}+19}{36}\right) < 0. \end{aligned}$$

Similarly, it is easy to check that \(a_1 = 2/9\) for \(n_{10}=0\) and \(a_1 \le 2/9\) for \(1 \le n_{10} \le 6\). For \(n_{10} \ge 7\), \(a_1\) becomes negative as

$$\begin{aligned} a_1&=\; \frac{1}{2} \left( \frac{2}{9} - \frac{n_{10}}{9} + \mathord {{\mathrm {P}}}\mathord {\left( S > n_{10}\right) }\left( \frac{1}{3} - \frac{n_{10}}{6}\right) + \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{11n_{10}+8}{36}\right) \\&\le \; \frac{1}{2} \left( \frac{2}{9} - \frac{n_{10}}{9} + \frac{1}{\sqrt{\pi n_{10}}} \cdot \frac{11n_{10}+8}{36}\right) < 0. \end{aligned}$$

By (8), for larger values of \(F_{11}\), coefficients \(a_i\) for \(\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1, F_{11}=i\right) }\), with \(2 \le i \le n_{10}+1\), can be bounded from considering only

$$\begin{aligned} a_i =\;&\frac{\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) }}{2}. \end{aligned}$$

Then we can bound the sought expression as

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\\&\quad \le \xi \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } + \sum _{i=0}^{n_{10}+1} a_i \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1,F_{11}=i\right) }\\&\quad = \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \left( \xi + \sum _{i=0}^{n_{10}+1} a_i \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }\right) . \end{aligned}$$

We bound \(a_i\) by \(\max \{0, a_i\}\) to ensure non-negative terms, which enables us to use the upper bound on \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }\) provided by Lemma 23:

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) } \le \left( \frac{n_{11}}{n-c}\right) ^i \cdot \frac{c^i}{i!} \left( 1 - \frac{c}{n}\right) ^{n_{11}} \le \frac{c^i}{i!} \left( 1 - \frac{c}{n}\right) ^{n_{11}} \end{aligned}$$

where the last inequality follows from \(n_{11} \le n-c\).

Thus,

$$\begin{aligned}&\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \left( \xi + \sum _{i=0}^{n_{10}+1} a_i \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }\right) \\&\quad \le \; \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \left( \xi + \sum _{i=0}^{n_{10}+1} \max \{a_i, 0\} \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }\right) \\&\quad \le \; \frac{cn_{00}}{n} \left( 1 - \frac{c}{n}\right) ^{n_{00}-1} \left( \xi + \sum _{i=0}^{n_{10}+1} \max \{a_i, 0\} \cdot \frac{c^i}{i!} \cdot \left( 1 - \frac{c}{n}\right) ^{n_{11}}\right) \\&\quad =\; \frac{cn_{00}}{n} \left( 1 - \frac{c}{n}\right) ^{n_{00}+n_{11}-1} \left( \xi \left( 1 - \frac{c}{n}\right) ^{-n_{11}} + \sum _{i=0}^{n_{10}+1} \max \{a_i, 0\} \cdot \frac{c^i}{i!}\right) . \end{aligned}$$

Using

$$\begin{aligned}&\left( 1 - \frac{c}{n}\right) ^{n_{00}+n_{11}} = \left( 1 - \frac{c}{n}\right) ^{n-n_{10}-n_{01}} \le \; e^{-c} \left( 1+\frac{c}{n-c}\right) ^{n_{10}+n_{01}}\\&\quad \le \; e^{-c} \left( 1+\frac{2cn_{10}}{n-c}\right) , \end{aligned}$$

and recalling \(\xi \le O(1/\log n)\) we can write this as

$$\begin{aligned} \;&\frac{cn_{00}}{e^c n} \left( 1 + O(1/\log n)\right) \sum _{i=0}^{n_{10}+1} \max \{a_i, 0\} \cdot \frac{c^i}{i!} \end{aligned}$$

assuming that \(\max \{a_i, 0\} \cdot \frac{c^i}{i!} = \varOmega (1)\) (if it is 0 or o(1), we have proved the claim).

Now we obtain the claimed bound if we can show that

$$\begin{aligned} \sum _{i=0}^{n_{10}+1} \max \{a_i, 0\} \cdot \frac{c^i}{i!} \le 1 + \frac{2}{9} \cdot c. \end{aligned}$$
(9)

For \(n_{10}=0\) we have \(a_0 = 1\), \(a_1 = 2/9\), hence (9) holds with equality for \(n_{10}=0\). For \(0 \le n_{10} \le 10\) we evaluate the coefficients \(a_0, \dots , a_{n_{10}+1}\) with the closed formulas given above and verify (9) for \(c \le 2.71\). The coefficients and the ranges of c for which (9) holds are shown in Table 2. Since we assume \(c \le 2.71\), the inequality holds for all considered \(n_{10}\).

For \(n_{10} > 10\), we already established above that \(a_0 \le 0\) and \(a_1 \le 0\). By Lemma 21, \(a_i \le 0\) for all \(2 \le i \le n_{10}+1\), except for \(i=n_{10}-1\), where it is at most \(5n_{10}^3 2^{-2n_{10}}/3\). For \(n_{10} \ge 11\), this is at most 0.00053 and thus we get an upper drift bound of

$$\begin{aligned} \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \left( \xi + 0.00053\right) = \frac{cn_{00}}{n} \cdot \left( \xi + 0.00053\right) \le \frac{c(2c+9)}{9e^c} \cdot \frac{n_{00}}{n} \cdot (1+O(1/\log n)) \end{aligned}$$

where the last inequality holds for \(c \le 6.09\). This completes the proof for the case \(n_{10}=n_{01}\).

Table 2 Values for coefficients \(a_i\) from the proof of Lemma 24 under \(P_{12}, F_{00}=1\) and \(n_{10}=n_{01}\), rounded to 4 digits

Case \({{\varvec{n}}}_{{{\varvec{10}}}}>{{\varvec{n}}}_{{{\varvec{01}}}}={{\varvec{0}}}\): Now we consider unequal fitness, \(n_{10}>n_{01}\), in the special case \(n_{01}=0\) (no diversity). By Lemma 20

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }\\&\quad =\; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) }}{4} \cdot \left( \frac{1}{3} + \frac{2}{3} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) }\right) \\&\quad =\; \frac{1}{12} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) } + \frac{1}{6} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1,F_{1*}=0\right) } \end{aligned}$$

and

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) }\\&\quad =\; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{*0}=1\right) }}{4} \cdot \left( \frac{1}{3} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1} = 0\right) } + \frac{1}{6} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1}=1\right) }\right) \\&\quad =\; \frac{1}{12} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*0}=1,F_{*1} = 0\right) } + \frac{1}{24} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*0}=1,F_{*1}=1\right) }. \end{aligned}$$

Moving from events \(F_{0*}\)/\(F_{*0}\) and \(F_{1*}\)/\(F_{*1}\) to events \(F_{00}\) and \(F_{11}\) at the expense of an additive error term \(\xi ' \in [-O(1/\log n), +O(1/\log n)]\) as before and adding these two expressions shows that the first two lines of the formula from the statement are bounded by

$$\begin{aligned} \left( \frac{1}{12} + \xi '\right) \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } + \frac{1}{4} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1, F_{11}=0\right) } + \frac{1}{24} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1, F_{11}=1\right) }. \end{aligned}$$
(10)

Note that the first of these summands is multiplied by \(\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }\), which does not include a statement on \(F_{11}\).

Reusing the above calculations, the third term from the statement is

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\nonumber \\&\quad =\;\frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{2} \sum _{i=0}^{n_{10}+1} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) } \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }. \end{aligned}$$

Using Lemma 14, the expectation simplifies to

$$\begin{aligned}&\;\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) }\\&\quad =\; \mathord {{\mathrm {P}}}\mathord {\left( S> n_{10}+F_{11}\right) } \cdot \frac{2-2F_{11}-n_{10}}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}+F_{11}\right) } \cdot \frac{1-F_{11}}{3}\\&\qquad + \mathord {{\mathrm {P}}}\mathord {\left( S > F_{11}\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = F_{11}\right) } \cdot \frac{1}{6}. \end{aligned}$$

Since \(n_{10}+n_{01}=n_{10} \le n_{10}+F_{11}\), the event \(S > n_{10}+F_{11}\) is impossible. The event \(S = n_{10}+F_{11}\) is only possible for \(F_{11}=0\), where \(\mathord {{\mathrm {P}}}\mathord {\left( S = n_{10} + F_{11}\right) } = \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } = \mathord {{\mathrm {P}}}\mathord {\left( S = 0\right) }\) by symmetry of the binomial distribution. For \(F_{11}=0\) we thus get

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=0\right) }&=\; \mathord {{\mathrm {P}}}\mathord {\left( S = n_{10}\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S > 0\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = 0\right) } \cdot \frac{1}{6}\\&=\; \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = 0\right) } \cdot \frac{1}{6}. \end{aligned}$$

For all \(i > 0\), we get

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) } =\;&\mathord {{\mathrm {P}}}\mathord {\left( S > F_{11}\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = F_{11}\right) } \cdot \frac{1}{6}. \end{aligned}$$

Now we can bound the sought expression as

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) }\\&\qquad +\; \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\\&\quad \le \mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) } \left( \frac{1}{12} + \xi ' + \frac{1}{2}\sum _{i=0}^{n_{10}+1} a_i \mathord {{\mathrm {P}}}\mathord {\left( F_{11}=i\right) }\right) \end{aligned}$$

where, putting together (10) and the above bounds on \(\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1, F_{11}=i\right) }\),

$$\begin{aligned} a_0&=\; \frac{7}{12} + \mathord {{\mathrm {P}}}\mathord {\left( S = 0\right) } \cdot \frac{1}{6}\\ a_1&=\; \frac{1}{24} + \mathord {{\mathrm {P}}}\mathord {\left( S> 1\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = 1\right) } \cdot \frac{1}{6}\\ \forall i&\ge 2 :a_i =\; \mathord {{\mathrm {P}}}\mathord {\left( S > i\right) } \cdot \frac{1}{3} + \mathord {{\mathrm {P}}}\mathord {\left( S = i\right) } \cdot \frac{1}{6}. \end{aligned}$$

Since all coefficients are positive, we can bound \(\mathord {{\mathrm {P}}}\mathord {\left( F_{11} = i\right) }\) as before and obtain a drift bound of

$$\begin{aligned}&\frac{cn_{00}}{e^c n} \left( 1 + O(1/\log n)\right) \left( \frac{1}{12} + \xi ' + \frac{1}{2}\sum _{i=0}^{n_{10}+1} a_i \cdot \frac{c^i}{i!}\right) . \end{aligned}$$

Bounding \(a_0 \le 2/3\) (since \(\mathord {{\mathrm {P}}}\mathord {\left( S = 0\right) } \le 1/2\)), \(a_1 \le 1/24 + 1/3 = 3/8\) and \(a_i \le 1/3\) for \(i \ge 2\), we have

$$\begin{aligned} \sum _{i=0}^{n_{10}+1} a_i \cdot \frac{c^i}{i!}&\le \; \frac{2}{3} + \frac{3}{8} \cdot c + \frac{1}{3} \sum _{i=2}^{\infty } \frac{c^i}{i!}\\&=\; \frac{2}{3} + \frac{3}{8} \cdot c + \frac{1}{3} \cdot (e^c - c - 1)\\&=\; \frac{1}{3} + \frac{1}{24} \cdot c + \frac{1}{3} \cdot e^c. \end{aligned}$$

Plugging this in yields an upper drift bound of

$$\begin{aligned}&\frac{cn_{00}}{e^c n} \left( 1 + O(1/\log n)\right) \left( \frac{1}{12} + \xi ' + \frac{1}{6} + \frac{c}{48} + \frac{e^c}{6}\right) \\&\quad =\; \frac{cn_{00}}{e^c n} \left( \frac{1}{4} + \frac{c}{48} + \frac{e^c}{4} + O(1/\log n)\right) . \end{aligned}$$

Case \({\varvec{n}}_{{\varvec{10}}}>{\varvec{n}}_{{\varvec{01}}}>{\varvec{0}}\): Finally, we deal with the case \(n_{10}> n_{01} > 0\). By Lemma 20

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{11}, F_{0*} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{11}, F_{0*} = 1\right) }\\&\quad =\; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) }}{4} \left( \left( 1 - \frac{n_{01}}{3}\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) } + \left( \frac{1}{3} - \frac{n_{01}}{3}\right) \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=1\right) }\right) \\&\quad \le \; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{0*}=1\right) }}{4} \left( 1 - \frac{n_{01}}{3}\right) \mathord {{\mathrm {P}}}\mathord {\left( F_{1*}=0\right) }\\&\quad \le \; \frac{cn_{00}}{n} \left( 1 - \frac{c}{n}\right) ^{n-1} \left( \frac{1}{4} - \frac{n_{01}}{12}\right) . \end{aligned}$$

and

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{22}, F_{*0} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{22}, F_{*0} = 1\right) }\\&\quad =\; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{*0}=1\right) }}{4} \cdot \left( \frac{1}{3} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1} = 0\right) } + \frac{1}{6} \cdot \mathord {{\mathrm {P}}}\mathord {\left( F_{*1}=1\right) }\right) \\&\quad =\; \frac{cn_{00}}{n} \left( \left( 1 - \frac{c}{n}\right) ^{n-1} \frac{1}{12} + \frac{c(n_{11}+n_{01})}{n} \left( 1 - \frac{c}{n}\right) ^{n-1} \frac{1}{24}\right) \\&\quad \le \; \frac{cn_{00}}{n} \left( \left( 1 - \frac{c}{n}\right) ^{n-1} \frac{1}{12} + \left( 1 - \frac{c}{n}\right) ^{n} \frac{c}{24}\right) \\&\quad =\; \frac{cn_{00}}{n} \left( 1 - \frac{c}{n}\right) ^{n-1} \left( \frac{1}{12} + \frac{c}{24}\right) . \end{aligned}$$

We use Lemma 22 to bound the third term in the statement as

$$\begin{aligned}&\mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) } \cdot \mathord {{\mathrm {P}}}\mathord {\left( P_{12}, F_{00} = 1\right) }\\&\quad =\; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{2} \cdot \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \mid P_{12}, F_{00} = 1\right) }\\&\quad \le \; \frac{\mathord {{\mathrm {P}}}\mathord {\left( F_{00}=1\right) }}{3} \le \frac{cn_{00}}{n} \cdot \frac{1}{3}. \end{aligned}$$

Together, we get a drift bound of

$$\begin{aligned}&\left( \frac{1}{3} + \left( 1 - \frac{c}{n}\right) ^{n-1}\left( \frac{1}{3} - \frac{n_{10}}{12} + \frac{c}{24}\right) \right) \frac{cn_{00}}{n}\\&\quad \le \; \left( \frac{e^c}{3} + \frac{1}{4} + \frac{c}{24} + O(1/n)\right) \frac{cn_{00}}{e^c n}. \end{aligned}$$

\(\square \)

9 Putting Everything Together

Combining results from previous sections, for different numbers of flipping zeros, yields the following unconditional drift bound.

Lemma 25

If \(c \le 1.422\), \(n_{10}+n_{01} \le \log ^2 n\), \(n_{11} \ge n-n/\log ^3 n\) and \(n_{00} \ge \log ^5 n\), then

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta \right) } \le \frac{c(2c+9)}{9e^c} \cdot \frac{n_{00}}{n} \cdot (1+O(1/\log n)). \end{aligned}$$

Proof

This follows from adding drift bounds from Lemmas 1019 and 24. For \(n_{10}=n_{01}\), Lemma 24 gives the stated bound. The terms \(O(n_{10}^2/n) = O((\log ^4 n)/n)\) from Lemma 19 and \(O(n_{0*}/(n\log n)) = O(n_{00}/(n \log n))\) can both be absorbed in the \(O(1/\log n)\) term since \(n_{00}/n \ge \log ^5 n/n\).

For \(n_{10}>n_{01}=0\), the bound from Lemma 24 is at most the bound from the same lemma for \(n_{10}=n_{01}\) since

$$\begin{aligned} \frac{c}{e^c} \left( \frac{1}{4} + \frac{c}{48} + \frac{e^c}{4}\right) \le \frac{c(2c+9)}{9e^c} \end{aligned}$$

for \(c \le 1.422\). Then the claim follows as above.

For \(n_{10}>n_{01}>0\) note that Lemma 19 yields a negative upper bound of \(-n_{01}/8 \cdot (1-c/n)^n = -\varOmega (1)\). Since \(n_{00} \le n/\log ^3 n\), we obtain a negative drift bound if n is large enough. \(\square \)

To translate the upper bound from Lemma 25 into a lower bound on the expected runtime, we use the lower-bound version of the multiplicative drift theorem from Theorem 4. This theorem requires an upper bound on the drift of the potential function and a sufficiently small probability for large jumps of this value. Such large jumps can occur if the two individuals of the (2+1) GA have a large Hamming distance. Recall that Lemma 9 shows this to be unlikely.

The following lemma shows that a drift of the potential can be translated into a lower bound on the expected optimisation time. This is not immediate since the potential function is a weighted combination of two quantities.

Lemma 26

Let \(N_t\) denote the number of \(n_{00}\)-bits at time t of the (2+1) GA and T the first point in time where \(n_{00}\le \log ^5 n\). Assume that \(n_{01}+n_{10}\le \log ^2 n\) for all points in time before T, and \(N_0\ge \log ^5 n\). If \( \mathord {{\mathrm {E}}}\mathord {\left( \varphi _{t+1}-\varphi _t\mid \varphi _t\right) } \le \delta N_t \) for some \(n^{-O(1)} \le \delta <1\) and all \(t<T\), then

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( T\mid \varphi _0\right) } \ge (1-O(1/\log n)) \frac{\ln (n)-O(\ln \ln n)}{\delta }. \end{aligned}$$

Proof

We introduce a distance function \({\overline{\varphi }}_t :=n - \varphi _t = n_{00} + \frac{2}{3} n_{01}\) as the mirror image of our potential to obtain a function to be minimised, as required in Theorem 4. Moreover, we write \(\varDelta _t={\overline{\varphi }}_t-{\overline{\varphi }}_{t+1}\). The key idea is to show that the statement on \(\mathord {{\mathrm {E}}}\mathord {\left( T\mid \varphi _0\right) }\) holds under the slightly different drift condition

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta _t\mid {\overline{\varphi }}_t\right) } \le \delta {\overline{\varphi }}_t \end{aligned}$$
(11)

and then to prove that the actual drift condition \(\mathord {{\mathrm {E}}}\mathord {\left( \varphi _{t+1}-\varphi _t\mid \varphi _t\right) } \le \delta N_t\) leads to the same result, up to lower-order terms.

We consider the process from the first point in time where \({\overline{\varphi }}_t\le n/\log ^3 n\), assume (11) to hold for all \(t<T\) and estimate the remaining expected time to minimise the distance \({\overline{\varphi }}_t\). Lemma 9 and the facts that at most \(\log n\) bits flip per generation and that the distance does not drop below \(n/\log ^3 n\) within the first \(\log ^2 n\) generations (each happening with probability \(1-n^{-\varOmega (\log n)}\)), imply that \({\overline{\varphi }}_{0} \ge n/(2\log ^3 n)\) with respect to our time count. We assume this to happen. By Lemma 9, with probability \(1-n^{-\varOmega (\log n)}\) it holds that \(n_{10}+n_{01}\le \log ^2 n\) for any polynomial number of steps. Assuming this for a sufficiently long period obtained from applying Markov’s inequality on \(\mathord {{\mathrm {E}}}\mathord {\left( T\mid {\overline{\varphi }}_0\right) }\), our assumptions only change the bound on the expected value by a \(1-n^{-\varOmega (\log n)}\) factor.

Clearly, since \(n_{10}+n_{01}\le \log ^2 n\), crossover followed by a neutral mutation can change the \({\overline{\varphi }}\)-value by at most \(\log ^2 n\). Moreover, each mutation flips k or more bits with probability at most

$$\begin{aligned} \left( {\begin{array}{c}n\\ k\end{array}}\right) \left( \frac{c}{n}\right) ^k\le \frac{c^k}{k!}; \end{aligned}$$
(12)

in particular it flips at most \(\log ^2 n\) bits with probability \(1-n^{-\varOmega (\log n)}\). Adding up these effects, we arrive at \( \mathord {{\mathrm {P}}}\mathord {\left( {\overline{\varphi }}_t-{\overline{\varphi }}_{t+1} \ge 2\log ^2 n\right) } = n^{-\varOmega (\log n)}. \) The time to minimise the distance function is no smaller than the time to reach a distance of at most \(x_{\min }:=\log ^5 n\) (we stop at this point to fulfill the condition on \(n_{00}\) in Lemma 25). Along with \(\beta :=2/\log n\) and using \(X_t:={\overline{\varphi }}_t\), we verify the second condition of Theorem 4 by estimating

$$\begin{aligned}&\mathord {{\mathrm {P}}}\mathord {\left( X_t-X_{t+1}\ge \beta X_t\right) } \le \\&\quad \ \ \mathord {{\mathrm {P}}}\mathord {\left( X_t-X_{t+1}\ge \beta x_{\min }\right) } = \mathord {{\mathrm {P}}}\mathord {\left( X_t-X_{t+1}\ge 2\log ^2 n\right) } = n^{-\varOmega (\log n)}. \end{aligned}$$

Finally, we estimate this bound as \(n^{-\varOmega (n)} \le \beta \delta /\log n\le \beta \delta /\!\log (X_t)\) since both \(\beta \) and \(\delta \) are at least inversely polynomial in n. Hence, \(\mathord {{\mathrm {P}}}\mathord {\left( X_t-X_{t+1}\ge \beta X_t\right) }\le \beta \delta /\!\log (X_t)\), which satisfies the condition. Applying the theorem and recalling our assumption \({\overline{\varphi }}_{0} \ge n/(2\log n)\),

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( T\mid X_0\right) }\ge \frac{1-\beta }{1+\beta } \cdot \frac{\ln ({\overline{\varphi }}_0/x_{\min })}{\delta } =\;&(1-O(1/\log n))\frac{\ln (n/\log ^4 n)}{\delta } \end{aligned}$$

which is \((1-O(1/\log n))\frac{\ln (n)-O(\ln \ln n)}{\delta }\). Recall that the unconditional expected time is only by a factor \(1-n^{-\varOmega (\log n)}\) smaller.

Finally, we relate \(N_t\) to the distance function \({\overline{\varphi }}_t\) and note first that \({\overline{\varphi }}_t\ge N_t\). Since, for \(t<T\), our assumptions imply \({\overline{\varphi }}_t=n_{00}+(2/3)n_{01} \le (1+1/\log n)N_t\), the prerequisite

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varphi _{t+1}-\varphi _{t}\mid \varphi _t\right) } \le \delta N_t \end{aligned}$$

along with the fact \(\varDelta _t=\varphi _{t+1}-\varphi _{t}\) imply

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( \varDelta _t \mid {\overline{\varphi }}_t\right) } \le \delta (1-1/\log n) {\overline{\varphi }}_t. \end{aligned}$$

Hence, (11) has been established with parameter \(\delta (1-1/\log n)\) so that

$$\begin{aligned} \mathord {{\mathrm {E}}}\mathord {\left( T\mid \varphi _0\right) }\ge (1-O(1/\log n))\frac{\ln (n/\log ^4 n)}{(1-1/\log n)\delta }, \end{aligned}$$

which is again \((1-O(1/\log n))\frac{\ln (n/\log ^4 n)}{\delta }\) as claimed. \(\square \)

We are now ready to prove our main result.

Proof of Theorem 2

We apply drift analysis to a “typical” run, that is, a run where events mentioned in the following that happen with overwhelming probability do occur. The contrary is called a failure event and if a failure occurs, we pessimistically assume that the runtime is 0. By the law of total probability, the expected runtime is bounded from below by the expected runtime in a typical run, multiplied by the probability that no failure occurs.

With probability \(1-2^{-\varOmega (n)}\) the initial population has a maximum number of (2/3)n one-bits. The random number of one-bits that crossover creates among the relevant \(n_{10}\) and \(n_{01}\) bits follows a binomial distribution with parameters \(n_{10}+n_{01}\) and 1/2. The expected value of this number equals \((n_{10}+n_{01})/2\le n/2\), and using Chernoff bounds (Theorem 5) with \(\delta =2n^{-1/3}\) and the upper bound n/2 on the expectation, the number of one-bits of every fixed offspring is at most \(U:=(n_{10}+n_{01})/2 + n^{2/3}\) with probability \(1-2^{-\varOmega (n^{1/3})}\). By a union bound, the probability that at least one offspring has more than U one-bits is still \(2^{-\varOmega (n^{1/3})}\). Since the fittest parent has at least \((n_{10}+n_{01})/2\) one-bits, this means that the number of one-bits at the relevant positions, and thereby the number of one-bits of the fitter offspring, grows by at most \(n^{2/3}\) with probability \(1-2^{-\varOmega (n^{1/3})}\). The same holds for each mutation with constant rate c as seen in (12). Hence, the subsequent \(2\log ^2 n\) generations do not increase the OneMax-value to more than (3/4)n with probability \(1-2^{-\varOmega (n^{1/3})}\).

We consider the first point in time when \(n_{11} \ge n - n/\log ^3 n\). By the same arguments as before, we then have \(n_{11} \le n-n/(2\log ^3 n)\) with overwhelming probability. Now Lemma 9 is in force, implying that we can apply Lemma 26 with

$$\begin{aligned} \delta \le \frac{c(2c+9)}{9e^c n} \cdot (1+O(1/\log n)), \end{aligned}$$

according to the drift bound from Lemma 25. Hence, if none of the above-mentioned failure events occur, the expected optimization time is bounded from below by

$$\begin{aligned} \left( 1-O\!\left( \frac{1}{\log n}\right) \right) \frac{9e^c n\ln (n)-O(n\ln \ln n)}{c(2c+9)}, \end{aligned}$$

which is \(\frac{9e^c}{c(2c+9)} \cdot n\ln (n) -O(n \ln \ln n)\) as claimed. This still holds when multiplying the above with the probability of no failure event occurring, as the union of all failure events is superpolynomially small.

10 Experiments

Our lower bound is tight up to lower-order terms. We ran experiments to see how close the results are to the dominant term of \(\frac{9e^c}{c(2c+9)} \cdot n \ln n\), when varying c and n.

We first fixed \(n=1000\) and varied c in steps of 0.1. For each value of c, we ran the (2+1) GA 10000 times to obtain smooth curves. For the range of c-values, \(c \in (0, 1.422]\), covered by Theorem 2, Fig. 1 shows that the empirical averages are very close to the dominant term. The vertical line indicates the c-value of \(\frac{\sqrt{97}-5}{4} \approx 1.2122\) that was determined to be optimal (up to small-order terms) in Theorem 3.

Fig. 1
figure 1

Average optimisation times over 10,000 runs of the (2+1) GA on OneMax with \(n=1000\) bits. The mutation rate is set to c/n with \(c \in \{0.05, 0.1, \ldots , 1.4\}\). The thin blue lines show mean ± standard deviation (Color figure online)

An obvious question arises: what if values of c are chosen that are larger than the limit 1.422 in Theorem 2? Fig. 2 shows results of the previous experiment for a larger range of c-values, \(c \in \{0.1, 0.2, 0.3, \dots , 5.0\}\). The first vertical line in Fig. 2 again indicates the c-value of \(\frac{\sqrt{97}-5}{4} \approx 1.2122\); the second vertical line indicates the limit of 1.422 from Theorem 2.

Fig. 2
figure 2

Average optimisation times over 10,000 runs of the (2+1) GA on OneMax with \(n=1000\) bits. The mutation rate is set to c/n with \(c \in \{0.1, 0.2, \dots , 5.0\}\). The thin blue lines show mean ± standard deviation (Color figure online)

We see that as c increases beyond 1.422, a gap starts to appear between the dominant term and the average runtime. The average runtime seems to be smaller than \(\frac{9e^c}{c(2c+9)} \cdot n \ln n\). This could indicate that the lower bound does not hold for \(c > 1.422\), or that there are small-order terms that affect the plots for \(n=1000\).

To see how quickly the runtime approaches the dominant term of \(\frac{9e^c}{c(2c+9)} \cdot n \ln n\) as n grows, we ran experiments with n increasing exponentially in powers of 2 from \(n=2^3=8\) to \(n=2^{13}=8192\). Figure 3 shows the normalised average runtimes, which is the runtime divided by \(n \ln n\). One can see that, for both the default and the optimal mutation rates, \(c=1\) and \(c=\frac{\sqrt{97}-5}{4}\), respectively, the curves approach their respective dominant terms. However, the approach is quite slow. Even for \(n=8192\) there is still a significant gap to the leading constant of the dominant term. This might indicate that the average runtime includes a significant, negative small-order term.

Recall that our lower bound (Theorem 2) includes a negative term of \(-O(n \log \log n)\), while the upper bound (Theorem 1) contains an additive term \(+O(n)\). The term \(-O(n \log \log n)\) is an artefact of the fact that we only considered around \(n/\log ^3 n\) fitness levels and we excluded the most difficult \(\log ^5 n\) ones. It is easy to show that optimising these fitness levels takes expected time \(O(n \log \log n)\) for both (1+1) EA and (2+1) GA. We conjecture that the expected running time is close to \(\frac{9e^{c}}{c(2c+9)} \cdot n \ln (n) + bn\) for some negative constant b. This is based on related work on the (1+1) EA whose expected runtime on OneMax is known to be \(en\ln (n) -1.8925\dots n + O(\log n)\) (cf. [14], who even determine the expected runtime up to additive errors of \(O(\log n/n)\)). Intuitively, the negative linear term appears in the bound for the (1+1) EA since the algorithm does not start with the largest possible value of the underlying potential function, which is the number of one-bits. More precisely, the initial number of one-bits \(X_0\) has an expected value of n/2 and the multiplicative drift theorems both for lower and upper bounds involve the term \(\ln (X_0/x_{\min })\). For \(X_0\approx n/2\) this becomes roughly \(\ln (n) - \ln 2\), so that the crucial term \(\ln (X_0/x_{\min })/\delta \), where \(\delta =\varTheta (1/n)\), becomes at most \((\ln n)/\delta - (\ln 2)/\delta = \varTheta (n\ln n)-O(n)\). The expected initial value of our potential function for the (2+1) GA equals \((1/4+1/4+1/12)n = (7/12)n\), so we believe there is a non-negligible negative term in the expression for the expected running as well.

Fig. 3
figure 3

Average optimisation times over 10,000 runs of the (2+1) GA and (1+1) EA on OneMax for different values of n and different mutation rates, normalised by dividing by \(n \ln n\). The solid lines show the best fit obtained by a non-linear regression fit to a model of \(a n \ln (n) + bn\), using the non-normalised running times. The dashed lines show the leading constants provided by theoretical results

To see how closely the empirical data matches a conjectured bound with a negative linear term, we performed a non-linear regression to fit the average runtimes to a model of \(a n \ln (n) + bn\), for parameters ab. We used the software R, version 4.0.3, calling the nls command with starting values of \(a=1\) and \(b=0\). The results of the regression were values of ab that minimised the residual sum of squares.

For the (2+1) GA with \(c=1\), this resulted in a fitted function of \(2.2381 n \ln (n) -0.7995n\). The leading constant 2.2381 is close to the value 2.224 from our theoretical analysis. Likewise, for the optimal value \(c=\frac{\sqrt{97}-5}{4} \approx 1.2122\), the best fit is for \(2.1532 n \ln (n) -0.5883n\) and the leading constant is close to 2.18417 from our analysis and smaller than the leading constant for \(c=1\). In both cases, the linear term has a negative sign. For comparison, the same regression for the (1+1) EA returned the best fit for the function \(2.745 n \ln (n) -2.116n\). The leading constant is close to the theoretically proven one of \(e=2.71828\) and the coefficient of the linear term is also close to the theoretical one of \(-1.8925\dots \) [14].

11 Conclusions

Proving lower bounds for crossover-based GAs is a notoriously hard problem. We have provided such a lower bound for the (2+1) GA on OneMax through a careful analysis of a potential function that captures both the current best fitness and the potential for finding improvements through crossover combining different “building blocks” of good solutions. Our lower bound is tight up to small-order terms. This for the first time proves rigorously that populations are beneficial for standard steady-state genetic algorithms. We also identified the optimal mutation rate for the (2+1) GA as \((\sqrt{97}-5)/(4n)\) for the considered range of mutation rates c/n.

Our lower bound applies for \(c \le 1.422\) and an obvious open question is whether the leading constant in the expected runtime remains at \(9e^c/(c(2c+9))\) when this threshold is exceeded. Our empirical results suggest that the expected runtime is smaller than the stated bound, albeit it is not clear whether these results are affected by negative small-order terms. We conjecture that the runtime is very close to \(\frac{9e^c}{c(2c+9)} \cdot n \ln (n) + bn\), where bn is a linear term with a negative constant b. Our drift estimates subsumed errors that were by a factor of \(O(1/\log n)\) smaller than the dominant term in asymptotic notation, hence tighter analyses would be required to obtain rigorous bounds on the conjectured linear term. Another avenue for future work would be to simplify our approach, possibly by exploiting that states with \(n_{10} > 1\) are rarely reached in the late stages of a run, or to generalise our analysis to larger parent population sizes.