The Complex Parameter Landscape of the Compact Genetic Algorithm

The compact Genetic Algorithm (cGA) evolves a probability distribution favoring optimal solutions in the underlying search space by repeatedly sampling from the distribution and updating it according to promising samples. We study the intricate dynamics of the cGA on the test function OneMax, and how its performance depends on the hypothetical population size K, which determines how quickly decisions about promising bit values are fixated in the probabilistic model. It is known that the cGA and the Univariate Marginal Distribution Algorithm (UMDA), a related algorithm whose population size is called λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda$$\end{document}, run in expected time O(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n \log n)$$\end{document} when the population size is just large enough (K=Θ(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = \varTheta (\sqrt{n}\log n)$$\end{document} and λ=Θ(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda = \varTheta (\sqrt{n}\log n)$$\end{document}, respectively) to avoid wrong decisions being fixated. The UMDA also shows the same performance in a very different regime (λ=Θ(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =\varTheta (\log n)$$\end{document}, equivalent to K=Θ(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = \varTheta (\log n)$$\end{document} in the cGA) with much smaller population size, but for very different reasons: many wrong decisions are fixated initially, but then reverted efficiently. If the population size is even smaller (o(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o(\log n)$$\end{document}), the time is exponential. We show that population sizes in between the two optimal regimes are worse as they yield larger runtimes: we prove a lower bound of Ω(K1/3n+nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varOmega (K^{1/3}n + n \log n)$$\end{document} for the cGA on OneMax for K=O(n/log2n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = O(\sqrt{n}/\log ^2 n)$$\end{document}. For K=Ω(log3n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = \varOmega (\log ^3 n)$$\end{document} the runtime increases with growing K before dropping again to O(Kn+nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(K\sqrt{n} + n \log n)$$\end{document} for K=Ω(nlogn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = \varOmega (\sqrt{n} \log n)$$\end{document}. This suggests that the expected runtime for the cGA is a bimodal function in K with two very different optimal regions and worse performance in between.


Introduction
Estimation-of-distribution algorithms (EDAs) are general metaheuristics for blackbox optimisation that represent a more recent alternative to classical approaches like evolutionary algorithms (EAs). EDAs typically do not directly evolve populations of search points but build probabilistic models of promising solutions by repeatedly sampling and selecting points from the underlying search space. Hence, information about the search can be stored in a relatively compact way, which can make EDAs space-efficient.
Recently, there has been significant progress in the theoretical understanding of EDAs, which supports their use as an alternative to evolutionary algorithms. It has been shown that EDAs are robust to noise [6] and that they have at least comparable runtime behaviour to EAs. Different EDAs like the cGA [22], EDA-like ant colony optimisers (ACO) [17,22], and the UMDA [2,12,14,24] have been investigated from this perspective.
In this paper, we pick up recent research about the runtime behaviour of the compact Genetic Algorithm (cGA) [9]. The behaviour on the theoretical benchmark function OneMax (x) ∶= ∑ n i=1 x i is of particular interest since this function tests basic hill-climbing properties and serves as a basis for the analysis on more complicated functions. Already early analyses of GAs [8] similar to the cGA indicate for OneMax that a population size of ( √ n) is necessary to prevent premature convergence of the system; together with convergence time analyses of such systems [16] this suggests a runtime that grows not much slower than linear in this case ( O(n log n) for ideal parameter settings). These analyses rely on simplified models of GAs, and they yield good predictions for the behaviour of cGA for some regimes, but behave very differently from the cGA in other regimes. In particular, the simplified models do not resemble the cGA in the regime of medium population sizes that we consider in this paper, and so the performance of the cGA in this regime remained unknown. See also the survey [13] for further results from the theory of EDAs in the last 25 years.
Droste [3] was the first to prove rigorously that the cGA is efficient on OneMax by providing a bound of O(n 1+ ) on the runtime. Recently, this bound was refined to O(n log n) by Sudholt and Witt [21,22]. However, this bound only applies to a very specific setting of the hypothetical population size K, which is an algorithm-specific parameter of the cGA. Parameters equivalent to K exist in other EDAs, including the UMDA mentioned above.
The choice of the parameter K is crucial for EDAs. It governs the speed at which the probabilistic model is adjusted towards the structure of recently sampled good solutions; more precisely, at hypothetical population size K the algorithm makes steps of size 1/K. If this step size is too large, the adjustment is too greedy, it is too likely to adapt to incorrect parts of sampled solutions and the system behaves chaotically. If it is too small, adaptation takes very long. However, the dependency of the runtime of the cGA and the UMDA on the population size is very subtle 1 . For both 1 3 the cGA and the UMDA, it is possible to pick some small step size that leads to optimal performance where with high probability all decisions are made correctly, but still as fast as possible. For the UMDA it was shown that there is another, much bigger step size (corresponding to smaller population size) that allows incorrect decisions to be reflected in the probabilistic model for a while, but this is compensated by faster updates.
More concretely, the results from [22] show that for K ≥ c √ n log n , where c is an appropriate constant, the cGA and the UMDA (with K being replaced by the corresponding parameter ) optimise OneMax efficiently since for all marginal probabilities of the model, the so-called frequencies, the probabilities of sampling a one increase smoothly towards their optimal value because of the small step size 1/K. The same holds for the UMDA, leading to runtime bounds O( n) and O( √ n) , respectively [2,24]-where for some parameter ranges the results rely on the additional assumption = (1 + (1)) . In these regimes the dynamics of the algorithm can also be well described by gambler's ruin dynamics [8,9]. At K = c √ n log n (resp. = c √ n log n ) both algorithms optimise OneMax in expected time O(n log n) . For smaller step sizes (larger K), at least for the cGA it is known that the runtime increases as (K √ n) [22]. On the other hand, it has been independently shown in [2,14] and [23,24] that the UMDA achieves the same runtime O(n log n) for = c � log n for a suitable constant c ′ . The analysis of these very large step sizes indicates that the search dynamics proceed very differently from the dynamics at small step sizes. Namely, for many frequencies the model first learns incorrectly that the optimal value is 0 and then efficiently corrects this decision. The results in [2] and [24] show a general runtime bound of O( n) for all ≥ c ′ log n and = o( √ n log n) (if the additional assumption = ( ) is made for = ( √ ) . We call this regime the medium step size regime, and it is separated from other regimes by two phase transitions: one for small step sizes, corresponding to K > c √ n log n as discussed above, and one for even larger step sizes, corresponding to K = o(log n) , where the system behaves so chaotically that correct decisions are regularly forgotten and the expected runtime on OneMax becomes exponential 2 .
We also know that the runtime of the cGA is (n log n) for all K [22]. However, it remained an open question whether the runtime is (n log n) throughout the whole medium step size regime, or whether the runtime increases with K as suggested by the upper bound O( n) for the UMDA.
Here we show that the runtime of the cGA does indeed increase, where we formally define runtime as the number of function evaluations until the optimum is sampled for the first time. To simplify the presentation, we assume throughout the paper algorithms exist, they coincide. Thus we take results for the UMDA as strong indication for analogous behaviour of the cGA, and vice versa.
If K ≤ 0.3 log n then the runtime of the cGA on OneMax is exponential with overwhelming probability 3

and in expectation.
If K = O(n 1∕2 ∕(log(n) log log n)) then the runtime is (K 1∕3 n + n log n) with probability 1 − o (1) and in expectation.
If K = O(n 1∕2 ∕(log(n) log log n)) and K = (log 3 n) then for a suitable constant < 1, even the time to create a solution with fitness at least n is (K 1∕3 n) with probability 1 − o (1)

and in expectation.
This result suggests that the runtime and the underlying search dynamics depend in an astonishingly complex way on the step size: as long as the step size is in the large regime ( K ≤ 0.3 log n ), the expected runtime is exponential. Assuming that the upper bound for the UMDA also holds for the cGA, it then decreases to O(n log n) at the point where the medium regime is entered. Then the runtime grows with K in the medium regime, where it grows up to (n 7∕6 ∕ log n) . Before entering the small step size regime ( K = c √ n log n ) the runtime drops again to O(n log n) [22]. For even smaller step sizes (larger K) the runtime increases again [22]. See Fig. 1 for a simplified illustration of Theorem 1, highlighting the different runtime regimes studied. Experiments conducted for different values of n and K in Sect. 6 confirm that the runtime indeed shows this complex bimodal behaviour.
In addition, the last statement in Theorem 1 shows that even finding a solution within a linear Hamming distance to the optimum takes time (K 1∕3 n) . This is remarkable as many other lower bounds, like the general (n log n) bound [22] rely on the fact that optimising the final few incorrect frequencies takes the claimed time (cf. the coupon collector's theorem).
The runtime landscape of the cGA on OneMax (simplified) 3 A probability p is called overwhelming if 1∕(1 − p) is exponential.

3
The proof of our main theorem is technically demanding but insightful: we obtain insights into the probabilistic process governing the cGA through careful drift analysis. In very rough terms, we analyse the drift of a potential function that measures the distance of the current sampling distribution to the optimal distribution. However, the drift depends on the sampling variance, which is a random variable as well. This leads to a complex feedback system between sampling variance and drift of potential function that tends to self-balance. We are confident that the approach and the tools used here yield insights that will prove useful for analysing other stochastic processes where the drift is changing over time.
This paper is structured as follows. Section 2 defines the cGA and presents fundamental properties of its search dynamics. Section 3 elaborates on the intriguing search dynamics of the cGA in the medium parameter range, including a proof of the fact that many probabilities in the model initially are learnt incorrectly. Section 4 is the heart of our analysis and presents the so-called Stabilisation Lemma, proving that the sampling variance and, thereby, the drift of the potential approach a steady state during the optimisation. It starts with a general road map for the proof. Section 5 puts the whole machinery together to prove the main result. Finally, Sect. 6 contains experiments showing the average runtime across the whole parameter range for K.

The Compact Genetic Algorithm and Its Search Dynamics
The cGA, defined in Algorithm 1, uses marginal probabilities (which, as mentioned above, are also known as frequencies) p i,t that correspond to the probability of setting bit i to 1 in iteration t. In each iteration two solutions x and y are being created independently using the sampling distribution p 1,t , … , p n,t . Then the fitter offspring amongst x and y is determined, and the frequencies are adjusted by a step size of ±1∕K in the direction of the better offspring for bits where both offspring differ. Here K determines the strength of the update of the probabilistic model.
The frequencies are always restricted to the interval [1∕n, 1 − 1∕n] to avoid fixation at 0 or 1. This ensures that there is always a positive probability of reaching a global optimum. Throughout the paper, we refer to 1/n and 1 − 1∕n as (lower and upper) borders. We call frequencies off-border if they do not take one of the two border values, i.e., they are not in {1∕n, 1 − 1∕n} . Algorithmica (2021) Overall, we are interested in the cGA 's number of function evaluations until the optimum is sampled; this number is typically called runtime or optimisation time. Note that the runtime is twice the number of iterations until the optimum is sampled.
The behaviour of the cGA is governed by , the sampling variance at time t. We know from previous work [17,22] that V t plays a crucial role in the drift of the frequencies. The following lemma makes this precise by stating transition probabilities and showing that the expected drift towards higher Recall that all results in this paper tacitly assume K ∈ K.
We only need to bound the probability for p � i,t+1 = p i,t + 1∕K as it implies the symmetric bound on the probability for p � i,t+1 = p i,t − 1∕K. Consider the fitness difference D i,t = ∑ j≠i (x j − y j ) on all other bits. If |D i,t | ≥ 2 then bit i does not affect the decision whether to update with respect to x or y. Thus we have a conditional probability Pr as p i,t increases if and only if bit i is set to 1 in the fitter individual and to 0 in the other. Such steps are called random walk steps (rw-steps) in [22]. If D i,t = 0 then there is a higher probability for increasing p i,t . In that case, and if x i ≠ y i , then bit i does determine the decision whether to update with respect to x or y: the offspring with a bit value of 1 will be chosen for the update, leading to a conditional probability of Pr In this scenario, selection between x and y yields a bias towards increasing p i,t . Such steps are called biased steps (b-steps) in [22].
In [17, proof of Lemma 1] it was shown that In order to bound Pr D t = 0 from above, imagine that the cGA first creates all bits x j for j ≠ i , such that these bits are given and y j for j ≠ i are random variables. Then D t = 0 is equivalent to ∑ j≠i y j = ∑ j≠i x j . Note that ∑ j≠i y j is a Poisson-Binomial distribution on n − 1 bits. Using the general probability bound for such distributions from [1] (see Theorem 3.2 in [14]), for any fixed k, the second inequality following from The remaining cases D i,t = −1 and D i,t = +1 fall in one of the above cases and can be handled in the same way. Together, Pr , which proves the claimed probability bounds.
The statement on the expectation follows easily from the probability bounds and verifying the statement for boundary values, noting that K ∈ K . ◻ Remark 1 A statement very similar to Lemma 2 also holds for the UMDA on One-Max, even though the latter algorithm uses a sampling and update procedure that is rather different from the cGA as it can in principle lead to large changes in a single iteration. However, the expected change of a frequency follows the same principle as for the cGA. Roughly speaking, the results from [12] and [23] together show that the UMDA 's frequencies evolve according to Note that this drift is by a factor of K larger than in the cGA. However, since each iteration of the UMDA entails fitness evaluations, where is a parameter that can be compared to K in the cGA, the overall runtime is the same for both algorithms. The progress of the cGA can be measured by considering a natural potential function: the function t ∶= ∑ n i=1 (1 − p i,t ) measures the distance to the "ideal" distribution where all p i,t are 1. While the drift on individual frequencies is inversely proportional to the root of the sampling variance, √ V t , the following lemma shows that the drift of the potential is proportional to √ V t . It also provides a tail bound for the change of the potential.
) as at least one offspring needs to be sampled at 1. In both cases, the drift for one frequency is bounded by To bound the step size in one iteration, note that t can only be changed by frequencies that are sampled differently in both offspring, and in that case t is changed by at most 1/K. Hence | t − t+1 | is stochastically dominated by the sum of indicator variables for each frequency i that take on value 1/K with probability 2p i,t (1 − p i,t ) and 0 otherwise. These variables are independent (not identically distributed) and their sum's expectation is 2V t ∕K.
We estimate the contribution of off-border frequencies to | t − t+1 | separately from the contribution of frequencies at a border, showing that both quantities are at most ( √ V t log n)∕2 with the claimed probability. Let m denote the number of offborder frequencies at time t. Frequencies at a border only change with probability 2(1 − 1∕n)∕n . The expected number of frequencies that change is 2(1 − 1∕n)(n − m)∕n ≤ 2(1 − 1∕n) and the probability that at least (1 − 1∕n)K∕2 ⋅ log n frequencies at borders change is at most (K log n) − (K log n) ≤ n − (K log log n) , which follows from the well-known Chernoff with 1 + ∶= K(log n)∕4 . As V t ≥ 1 − 1∕n and √ V t ≥ 1 − 1∕n , with overwhelming probability fewer than (1 − 1∕n)K∕2 ⋅ log n ≤ ( √ V t log n)K∕2 frequencies at borders change. Since every change alters t by ±1∕K , the total contribution of frequencies at borders to . For the m off-border frequencies we note that every such frequency contributes at least 1∕K ⋅ (1 − 1∕K) to V t , hence V t = (m∕K) . Recall from above that the sum of all variables leads to an expectation of 2V t ∕K , hence the expectation of just the off-border frequencies is at most 2V t ∕K . Using the assumption V t = O(K 2 ) , which is equivalent to We apply Chernoff-Hoeffding bounds (Lemma 23) with ∑ i b 2 i = m∕K 2 and a deviation from the expectation of ( Denoting by X the contribution of frequencies that are off-border at time t to | t − t+1 | , we have Taking the union bound over all failure probabilities completes the proof. ◻

Dynamics with Medium Step Sizes
As described in the introduction, the cGA in the medium step size regime, corresponding to K = o( √ n log n) and K = (log n) , will behave less stable than in the small step size regime. In particular, many frequencies will be reinforced in the wrong way and will walk to the lower border before the optimum is found, resulting in an expected runtime of (n log n) [22]. With respect to the UMDA it is known [23] that such wrong decisions can be "unlearned" efficiently, more precisely the potential t improves by an expected value of (1) per iteration. This implies the upper bound O( n) in the medium regime, which becomes minimal for = (log n) . Even though formally we have no upper bounds on the runtime of the cGA on OneMax in the medium regime, we conjecture strongly that it exhibits the same behaviour as the UMDA and has expected runtime O(Kn). We finally recall the first statement of Theorem 1: for extremely large step sizes, K ≤ 0.3 log n , the runtime becomes exponential. This statement will be shown in Sect. 5; the main reason for the exponential time is that the system contains too few states to build a reliable probabilistic model.
The following lemma shows that a linear number of frequencies tends to reach the upper and lower borders in the initial phase of a run.
A proof of Lemma 4 is given in the "appendix" as it repeats many arguments from the proof of Theorem 8 in [22], where calculations can be simplified because of the assumption on K.
Frequencies at any border tend to remain there for a long time. The following statement shows that in an epoch of length r = o(n) the fraction of frequencies at a border only changes slightly.
Definition 1 Let (t) denote the fraction of frequencies at the lower border at time t.

Both statements also hold for the fraction of frequencies at the upper border.
Proof The first statement follows from the fact that a frequency at a border has to sample the opposite value in one offspring to leave its respective border. Taking a union bound over two created search points, the probability for leaving the border is at most 2/n, hence the expected number of frequencies leaving a border during r steps is at most 2r. The probability that at least 4r frequencies leave the border is e − (r) by Chernoff bounds. This implies the first inequality.
The statement regarding 0 follows from Lemma 4 and the first statement: (n) frequencies will hit the lower and the upper border, respectively, within the first t 0 = O(K 2 ) steps with probability 1 − 2 − (n) and, for frequencies hitting a border before time t 0 with probability 1 − e − (r) less than 4r = o(n) frequencies will leave the border before time t 0 . ◻ We now show that with high probability, every off-border frequency will hit one of the borders after a short number of iterations. The proof of the following lemma uses that the probability of increasing a frequency is always at least the probability of decreasing it. Hence, if every iteration was actually changing the probability, the time bound O(K 2 ) would follow by standard arguments on the fair random walk on K states. However, the probability of changing the state is only p i,t (1 − p i,t ) and the additional log K-factor covers that the process has to travel through states with a low probability of movement before hitting a border. Proof We consider the process X t , t ≥ 0 , on the state space {q(0), q(1), … , q(K � )} where q(i) = 1∕n + i∕K and K � = K(1 − 2∕n) ; note that K ′ is an integer since K ∈ K . Obviously, T equals the first hitting time of q(0) or q(K � ) for the X t -process. To analyze T, we only use that X t is stochastically at least as large as a fair random walk with self-loop probability 1 − 2q(i)(1 − q(i)) at state q(i). More precisely, it holds for The aim is to show that state q(0) or q(K � ) is reached by random fluctuations due to the variance of the process even in the case that the transition probabilities are completely fair in both directions. Since we do not have a lower bound on the probability of going from i to i − 1 in the actual process, it may happen that the actual process is unable to hit state q(0) whereas the general class of processes considered here may well be able to hit this state. Therefore, we make state q(0) reflecting by defining Pr (X t+1 = q(1) | X t = q(0)) ∶= 1 . Then we estimate the first hitting time of state q(K � ) for the process modified in this way. Since hitting state q(0) is included in the stop event, we can only overestimate T by this modification.
We introduce the potential function g on {0, … , ) through drift analysis. To this end, we need an upper bound on g(K � ) and a lower bound on the drift.
By expanding the recursive definition, we note that for i ≥ 1 and therefore, representing g(K � ) − g(0) as a telescoping sum, and symmetrically for i > K � ∕2 . The last inequality holds if K is larger than some constant. Using and For the tail bound, we note that the upper bound on E[T | X 0 ] holds for all starting points. Hence, we obtain from Markov's inequality that T ≤ 8K 2 ln K with probability at least 1/2. The probability that the target is not reached within m ≥ 1 phases of length 8K 2 log K each is bounded by at most 2 −m . The claimed tail bound now follows for all r ≥ 8 . ◻

Stabilisation of the Sampling Variance
Now that we have collected the basic properties of the cGA, we can give a detailed road map of the proof. We want to use a drift argument for the potential t (recall . After a short initial phase, most of the frequencies are at the borders, but since a linear fraction is at the lower border we start with t = (n) . As we have seen, the drift of t is O( , so the heart of the proof is to study how V t evolves. However, the behaviour of V t is complex. It is determined by the number and position of the frequencies in the off-border region (the other frequencies contribute only negligibly). By Lemma 2, each p i,t performs a random walk with (state-dependent) drift proportional to 1∕ √ V t . Therefore, V t affects itself in a complex feedback loop. For example, if V t is large, then the drift of each p i,t is weak (not to be confused with the drift of t , which is strong for large V t ). This has two opposing effects. Consider a frequency that leaves the lower border. On the one hand, the frequency has a large probability to be re-absorbed by this border quickly. On the other hand, if it does gain some distance from the lower border then it spends a long time in the off-border region, due to the weak drift. For small V t and large drift, the situation is reversed. Frequencies that leave the lower border are less likely to be re-absorbed, but also need less time to reach the upper border. Thus the number and position of frequencies in the off-border region depends in a rather complex way on V t .
To complicate things even more, the feedback loop from V t to itself has a considerable lag. For example, imagine that V t suddenly decreases, i.e. the drift of the p i,t increases. Then frequencies close to the lower border are less likely to return to the lower border, and this also affects frequencies which have already left the border earlier. On the other hand, the drift causes frequencies to cross the off-border region more quickly, but this takes time: frequencies that are initially in the off-border region will not jump to a border instantly. Thus the dynamics of V t play a role. For instance, if a phase of small V t (large drift of p i,t ) is followed by a phase of large V t (small drift of p i,t ), then in the first phase many frequencies reach the off-border region, and they all may spend a long time there in the second phase. This combination could not be caused by any static value of V t .
Although the situation appears hopelessly complex, we overcome these obstacles using the following key idea: the sampling variance V t of all frequencies at time t can be estimated accurately by analysing the stochastic behaviour of one frequency i over a period of time. More specifically, we split the run of the algorithm into epochs of length K 2 (n) = o(n∕ log log n) , with (n) = C log 2 n for a sufficiently large constant C, long enough that the value of V t may take effect on the distribution of the frequencies. We assume that in one such epoch we know bounds V min ≤ V t ≤ V max , and we show that, by analysing the dynamics of a single frequency, (stronger) bounds V ′ min ≤ V t ≤ V ′ max hold for the next epoch. The following key lemma makes this precise.
, with the following parameters.
To understand where the values of V ′ min and V ′ max come from, we recall that , and we regard the terms p i,t (1 − p i,t ) from an orthogonal perspective. For a fixed frequency i that leaves the lower border at some time t 1 , we consider the total lifetime contribution of this frequency to all V t until it hits a border again at some time t 2 , so we consider . Note that V t and P i are conceptually very different quantities, as the first one adds up contributions of all frequencies for a fixed time, while the second quantifies the total contribution of a fixed frequency over its lifetime. Nevertheless, we show in Sect. 4.1 that their expectations are related, is the expected number of frequencies that leave the lower border in each round. 4 Crucially, E[P i ] is much easier to analyse: we link E[P i ] to the expected hitting time E[T] of a rescaled and loop-free version of the random walks that the frequencies perform. In Sect. 4

.2 we then derive upper and lower bounds on E[T] that hold for all random walks with given bounds on the drift, which then lead to upper and lower bounds
, so we would like to use the Chernoff bound. Unfortunately, all the random walks of the frequencies are correlated, so the p i,t are not independent. However, we show by an elegant argument in Sect. 4.3 that we may still apply the Chernoff bound. We partition the set of frequencies into m batches, and show that the random walks of the frequencies in each batch do not substantially influence each other. This allows us to show that the contribution of each batch is concentrated with exponentially small error probabilities. The overall proof of Lemma 7 is then by induction. Given that we know bounds V min and V max for one epoch, we show by induction over all times t in the next epoch that V t satisfies even stronger bounds V ′ min and V ′ max . In Sect. 5 we then apply Lemma 7 iteratively to show that the bounds V min and V max become stronger with each new epoch, until we reach V min = (K 2∕3 ) and V max = O(K 4∕3 ) . At this point the approach reaches its limit, since then the new bounds V ′ min and V ′ max are no longer sharper than V min and V max . Still, the argument shows that V t = O(K 4∕3 ) from this point onwards, which gives us an upper bound of O(K −1∕3 ) on the drift of t and a lower bound of (K 1∕3 n) on the runtime of the algorithm.
As the proof outline indicates, the key step is to prove Lemma 7, and the rest of the section is devoted to it.

Connecting V t to the Lifetime of a Frequency
In this section we will lay the foundation to analyse E[V t ] . We consider the situation of Lemma 7, i.e., we assume that we know bounds The main result of this section (and one of the main insights of the paper) is that the contribution of the off-border frequency can be described by where T is the lifetime of a random variable that performs a rescaled and loop-free version of the random walk that each p i,t performs.
First we introduce the rescaled and loop-free random walk. It can be described as the random walk that p i,t performs for an individual frequency if we ignore self-loops, i.e., if we assume that in each step p i,t either increases or decreases 1 3 by 1/K. Moreover, it will be convenient to scale the random walk by roughly a factor of K so that the borders are 0 and K instead of 1/n and 1 − 1∕n . The exact scaling is given by the formula X i,t = (p i,t − 1∕n)∕(K − 2∕n) . Formally, assume that X t is a random walk on {0, … , K} where the following bounds hold whenever Note that by Lemma 2, if we condition on p i,t+1 ≠ p i,t then p i,t follows a random walk that increases with probability 1∕2 + (1∕ then this loop-free random walk of p i,t follows the description in (2) after scaling. Therefore, we will refer to the random walk defined by (2) as the loop-free random walk of a frequency. We remark that it is slight abuse of terminology to speak of the loop-free random walk, since (2) actually describes a class of random walks. Formally, when we prove upper and lower bounds on the hitting time of "the" loopfree random walk, we prove bounds on the hitting time of any random walk that follows (2).
To link E[V t ] and E[T] , we need one more seemingly unrelated concept. Consider a frequency i that leaves the lower border at some time t 0 , i.e., p i,t 0 −1 = 1∕n and p i,t 0 = 1∕n + 1∕K , and let t ′ > 0 be the first point in time when p i,t hits a border, so p i,t � = 1∕n or p i,t � = 1 − 1∕n . Then we call the lifetime contribution of the i-th frequency. Analogously, we denote by P ′ i the lifetime contribution if frequency i leaves the upper border, Note that V t and P i are both sums over terms of the form p i,t (1 − p i,t ) . But while V t sums over all i for fixed t, P i sums over some values of t for a fixed i. Nevertheless, as announced in the proof outline, we will show that the expectations E[V t ] and E[P i ] are closely related, and this will be the link between E[V t ] and E[T] . More precisely, we show the following lemma.

Lemma 8 Consider the situation of Lemma
. Let S low be the set of all frequencies i with p i,t ∉ {1∕n, 1 − 1∕n}, and such that their last visit of a border was in [t 1 , t], and it was at the lower border. Formally, we require that t 0 ∶= max{ ∈ [t 1 , t] | p i, ∈ {1∕n, 1 − 1∕n}} exists and that p i,t 0 = 1∕n . Let S upp be the analogous set, where the last visit was at the upper border. Then Proof (a) Recall that we assume (t 1 ) = (1) . With high probability (t) is slowly changing by Lemma 5 and (t) ≤ 1 always holds trivially, so with high probability there is a constant More precisely, since t 3 − t 1 = (log n) , the error probability in Lemma 5 is n − (1) . Since p i,t is polynomially lower bounded in n, such small error probabilities are negligible. In particular, we may assume that for every t � ∈ [t 1 , t 3 ] , the expected number of frequencies s(t) which leave the lower border at time t is . Consider a frequency that leaves the lower border at time 0, and let [1, t], and t ∶= 0 otherwise. Hence, p t is similar to the contribution of a frequency to V t , but only up to the point where the frequency hits a border for the first time. We will show in the follow- since t is zero after the frequency hits a border. On the other hand, for a fixed t ∈ [t 2 , . Assume that frequency i leaves the border at some time t − ∈ [t 1 , t] . If it does not hit a border until time t, then it contributes to V t,low . The same is true if it does hit a border, and doesn't leave the lower border again in the remainder of the epoch, since then i ∉ S low and = 0 . For the remaining case, assume that i leaves the lower border several times t − 1 , t − 2 , … , t − k , with 1 > 2 > … > k . Then 2 = … = k = 0 , and by the same argument as before, the contribution of i to V t,low is 1 = ∑ k i=1 k , where 1 may or may not be zero. Therefore, we can compute E[V t,low ] by summing up a term E for every frequency that leaves the lower border at time t − , counting frequencies multiple times if they leave the lower border multiple times. Recall that the number of frequencies s(t) that leave the lower border at time t − has expectation E[s(t)] = (1) . Therefore, The sum on the right hand side is almost E[P i ] , except that the sum only goes to we may apply Lemma 6, and obtain that the probability that a frequency does not hit a border state in > t − t 1 rounds is e − ( ∕(K 2 log K)) . Hence, we may split the range [t − t 1 + 1, ∞) into subintervals of the form . Therefore, setting i 0 ∶= (n)∕ log K ≥ C log n , where we may assume C > 3 , the missing part of the sum is at most using K ≤ n in the last step. This is clearly smaller than the rest of the sum, since already , as required. For (b), the proof is the same as for (a), except that the number s � (t) of frequencies that leave the upper border at time t is given by (2 − o(1)) � (t) , where � (t)n is the number of frequencies at the upper border at time t. Since � (t) = (1) , the same argument as in (a) applies.
For (c), a frequency i ∈ {1, … , n} ⧵ (S low ∪ S upp ) is either at the border at time t, or it is never at a border throughout the whole epoch. The former frequencies, which are at the border at time t, contribute 1∕n ⋅ (1 − 1∕n) each, which sums to less than 1. For the other frequencies, similar as before, by Lemma 6 the probability that a frequency does not hit a border in t − t 1 ≥ K 2 (n) rounds is e − ( (n)∕ log K) = o(1∕n) since = C log 2 n for a sufficiently large constant C. Therefore, the expected number of such frequencies is o(1), and their expected contribution is o (1). This proves (c). ◻ The next lemma links the lifetime contribution P i and P ′ i to the hitting time T of the loop-free random walk.

Lemma 9
Consider the situation of Lemma 7. Assume for a = 1 or a = K − 1 that T a,min and T a,max are a lower and upper bound, respectively, on the expected hitting time of {0, K} of every random walk as in (2) with X 0 = a. Then the lifetime contributions P i and P ′ i defined in (3) and (4) (2) with X 0 = a . Assume T 1,min = (1). Then for all t ∈ [t 2 , t 3 ], By Corollary 10, in order to understand E[V t ] it suffices to analyse the expected hitting time E[T] of the loop-free random walk.

Bounds on the Lifetime of a Frequency
We now give upper and lower bounds on the expected lifetime of every loop-free random walk, assuming that we only have lower and upper bounds min and max on the drift that hold the whole time. We start with the upper bound.

Lemma 11
Consider a stochastic process {X t } t≥0 on {0, 1, … , K}, variables t that may depend on X 0 , … , X t and min > 0 , 1∕(2K) ≤ max ≤ 1∕2 such that for Let T be the hitting time of states 0 or K, then regardless of the choice of the t ,

Remark 2 The most important term for us is E[T
. This is tight, i.e., there is a scheme for choosing t that yields a time of (K max ∕ min ) if min = (1∕K).
Consider t = max for states i ≤ K∕2 and t = min for states i > K∕2 . Then with probability ( max ) the random walk never reaches 0. Once it reaches K/2, it can be shown that the expected time to reach K or 0 from there is (K∕ min ) for min = (1∕K) . (The latter condition is needed since, if min = o(1∕K) , the random walk would be nearly unbiased and reach a border in expected time O(K 2 ) = o(K∕ min ) , which contradicts the claimed lower bound of (K∕ min ) .) We omit the details.
Proof We first give a brief overview over the proof. For X 0 = 1 we fix an intermediate state k 0 = (1∕ max ) and show, using martingale theory and the upper bound max on the drift, that (1) the time to reach either state 0 or state k 0 is O(1∕ max ) , and (2) the probability that k 0 is reached is O( max ) . In that case, using the lower bound min on the drift, the remaining time to hit state 0 or state K is O(K∕ min ) by additive drift. The time from k 0 is also bounded by O(K 2 ) as it is dominated by the expected time a fair random walk would take if state 0 was made reflecting. The statement for X 0 = K − 1 is proved using similar arguments, starting from K − 1 instead of k 0 .
We first show the upper bound for X 0 = 1 . Let k 0 = 1∕(2 max ) and note that k 0 ≤ K since max ≥ 1∕(2K) . Let be the first point in time when we either hit 0 or k 0 , and let p and p h be the probability to hit 0 or k 0 , respectively, at time . Now consider Y t ∶= X 2 t during the time before we hit k 0 . Then Y t has a positive drift, more precisely Therefore, Z t ∶= Y t − t is a submartingale (has non-negative drift). By the optional stopping theorem [7, page 502], at time we have On the other hand, since X t − t ⋅ max is a supermartingale (has non-positive drift), we can do the same calculation and obtain Solving for E[ ] in both equations, we get Now we ignore the term in the middle and sort for p h : This is equivalent to and plugging this into (7) If state k 0 is reached, we use that the drift is always at least min . Then a distance of K − k 0 ≤ K has to be bridged, and by additive drift (Theorem 24) the expected remaining time until state K or state 0 is reached is O(K∕ min ).
It is also bounded by O(K 2 ) as the first hitting time of either state 0 or state K is stochastically dominated by the first hitting time of state K for a fair random walk starting in k 0 when state 0 is made reflecting. This is equivalent to a fair gambler's ruin game with 2K dollars (imagine a state space of {−K, … , 0, … , K} where −K and +K are both ruin states), and the game starts with K − k 0 dollars. The expected duration of the game is Together, we obtain an upper bound of where 1∕ max can be absorbed since For X 0 = K − 1 an upper bound O(1∕ min ) follows from additive drift as only a distance of 1 has to be bridged, and the drift is at least min . To show an upper bound of O(K), we again use that the aforementioned fair gambler's ruin game stochastically dominates the sought hitting time. As X 0 = K − 1 , the game starts with 1 dollar and the expected duration of the game is 1 ⋅ (2K − 1) = O(K) . ◻ The following lemma gives a lower bound on the lifetime of every loop-free random walk.

Lemma 12
Consider a stochastic process {X t } t≥0 on {0, 1, … , K}, variables t that may depend on X 0 , … , X t and min ≥ 0 , max ≥ (4 ln K)∕K such that for min ≤ t ≤ max , Let T be the hitting time of states 0 or K, then regardless of the choice of the t , and Remark 3 There is a scheme for choosing t such that the bound on the expectation from Lemma 12 is asymptotically tight.
The scheme uses minimum drift min until state 0 or state √ K∕ max is reached for the first time. In the latter case we switch to a maximum drift max . By gambler's ruin, the probability of reaching state √ K∕ max can be shown to be at most 1∕ √ K∕ max + 4 min , and in this case the remaining time to reach state 0 or K is O(K∕ max ) by additive drift. We omit the details.
Proof The lower bound on the expectation follows immediately from the lower bounds on the probabilities. We first give an overview of the proof of the lower bound on the expectation. We couple the process with two processes X min t and X max t that always use the minimum and maximum drift min and max , respectively. The coupling ensures that X min t ≤ X t ≤ X max t , hence as long as X min t > 0 and X max t < K , the process cannot have reached a border state. We show for both coupled processes that the probability of reaching their respective borders in time 1 2 K∕ max is small, and then apply a union bound. For the X max t process a negligibly small failure probability follows from additive drift with tail bounds [11] and the condition max ≥ (4 ln K)∕K . For the X min t process we show that the fair random walk on the integers, starting in state 1, does not reach state 0 in time 1 2 K∕ max with probability ( √ max ∕K) . In addition, the X min t process on the integers never reaches state 0 with probability ( min ) [5, page 351], which yields the second term in the claimed probability.
More specifically, we show that all schemes for choosing the t lead to the claimed probability bound. We couple the random walk with two processes: X min t is a random walk on {0, 1, … } (i. e. with the border K removed) with the minimum drift, i. e. using the minimum possible values for t : min t ∶= min for all t. Moreover, X max t is a process on {K, K − 1, … } (i. e. with the border 0 removed) with the maximum drift, max t ∶= max for all t. The coupling works as follows: draw a uniform random variable r from [0, 1]. If r ≤ 1∕2 ⋅ (1 − t ) , X t decreases its current state, and the same applies to X min ) . Otherwise, the random walks increase their current state. This coupling and min t ≤ t ≤ max t ensures that for every time step t, we have X min t ≤ X t ≤ X max t . This implies in particular that, as long as X min t > 0 and X max t < K , X t will not have hit any borders. Let T min 0 be the first hitting time of the X min t process hitting state 0 and T max K be the first hitting time of the X max t process hitting state K. Thus the first hitting time T of the X t process hitting either state 0 or state K is bounded from below by T ≥ min{T min 0 , T max K }. In particular, by the union bound we have and we proceed by bounding the last two probabilities from above.
By additive drift, it is easy to show that E[T max K ] = (K∕ max ) , and this time is highly concentrated. Using Theorem 25, we have as K max ≥ 4 ln K . It remains to analyse T min 0 , that is, the time until a random walk with drift min on the positive integers, starting at X 0 = 1 , hits state 0. This time stochastically dominates the time until a fair random walk (with no drift) hits state 0.
For the fair random walk, the probability that state 0 will be hit at time t is [5, III.7, Theorem 2] where the binomial coefficient is 0 in case the second argument is non-integral. Hence The binomial coefficient (for odd t) is at least (2 t ∕ √ t) . Hence we get a lower bound of Including terms for even t as 1∕t 3∕2 ≥ 1∕2 ⋅ 1∕t 3∕2 + 1∕2 ⋅ 1∕(t + 1) 3∕2 and using ∑ leads to a lower bound of and plugging this and (9) into (8) yields . We only need to prove a lower probability bound of ( min ) in case min = ( √ max ∕K) = (1∕K) . The sought probability bound then follows from observing that, according to [5, page 351], the X min t process never reaches 0 with probability and in that case (8) and (9)

Establishing Concentration
Our major tool for showing concentration will be using the Chernoff bound [4] and the Chernoff-Hoeffding bound [4].
The basic idea is that for fixed t, we define for each frequency i a random variable X i ∶= p i,t (1 − p i,t ) to capture the contribution of the i-th frequency to V t = ∑ n i=1 X i . In the previous sections we have computed E[V t ] by studying the expected lifetime E[T] . Concentration of V t would follow immediately by the Chernoff bound if the random walks of the different frequencies were independent of each other. Unfortunately, this is not the case. However, for the initial case of the stabilisation lemma, Lemma 7 (a), we show that the random walks behave almost independent, which allows us to show the following lemma.

Lemma 13 Assume the situation of Lemma 7 (a). Then
Proof We use an inductive argument over t ∈ [t 2 , t 3 ] . Note that in Lemma 7 the statement gets weaker with increasing C ′ and C ′′ , so we may assume that they are as large as we want. We claim that if they are chosen appropriately, then for part (b) of the lemma we have V ′ min ≥ V min and V ′ max ≤ V max . Therefore, by induction hypothesis we may assume that To check the claim for V ′ min , we write the first statement from min for some c > 0 , making the -notation explicit. Then we use the condition V min ≤ K 2∕3 ∕C � , or equivalently This proves the claim for V ′ min . After fixing C ′ , we inspect the second statement from Lemma 7 (b), which implies in particular V � max ≤ c � K √ V max ∕ √ V min for some c ′ , since replacing a minimum by one of its terms can only make it larger. We plug in the two conditions V min ≤ K 2∕3 ∕C � and V max ≥ C �� K 4∕3 , the latter in the equivalent form K 2∕3 ≤ √ V max ∕C �� , and obtain where the last steps holds for fixed C ′ , if we choose C ′′ sufficiently large. Thus we may assume V ′ min ≥ V min and V ′ max ≤ V max . As mentioned above, we know that E[V t ] = E[T] = ( √ K) by Corollary 10 and Lemma 12 with trivial drift bounds min = 0 and max = 1∕2 , so it remains to show concentration. Fix i ∈ {1, … , n} , and consider the random walk that p i,t performs over time. More precisely, we consider one step of this random walk, from t to t + 1 . If the offspring x and y have the same i-th bit, then p i,t+1 = p i,t , so assume that x and y differ in the i-th bit. We want to understand how the drift of p i,t changes if we condition on what the other frequencies do.
So assume that we have already drawn all bits of the two offspring x and y at time t + 1 except for the i-th bit. Let f � (x) ∶= f (x) − x i and f � (y) ∶= f (y) − y i be the number of one-bits among the n − 1 uncovered bits of x and y, respectively. Assume also that someone tells us which of x, y is the selected offspring. In some cases, for example if f � (x) ≥ f � (y) + 2 and x is selected, the probability that x i = 1 is exactly 1/2, since the one-bit is equally likely in x and y, and it does not have any influence on the selection process. In other cases, for example if f � (x) = f � (y) + 1 and y is selected, then Pr(y i = 1) = 1 , because this is the only scenario in which y can be selected. However, in all cases the selected offspring has probability at least 1/2 to have a one-bit at position i, because the selection process can never decrease the probability that the selected offspring has a one-bit at position i. Therefore, even after conditioning on the steps that all other p j,t perform, we still have a non-negative drift for p i,t , i.e., for any collection (q j ) j∈{1,…,n}⧵{i} of frequencies, On the other hand we have the upper bound p i,t+1 − p i,t ≤ 1∕K by definition of the algorithm. Therefore, we can use the following uncovering procedure to force independence between the contributions of different i. In each step, we first uncover for all bits whether the offspring coincide for this bit or not, which is independent for all bits. Then we uncover for all 1 ≤ i ≤ n one after the other whether the value of p i,t+1 increases or decreases (or stays the same). Crucially, even conditioned on all the prior information, p i,t still has non-negative drift. Therefore, the associated loop-free random walk still follows the description in Lemma 12 with min = 0 and max = 1 . Hence, if we uncover the random walks one by one as described above, then still the contribution of the frequencies sum up to ( √ K) in expectation, and the contribution of the i-th frequency is independent of the contribution of the previous frequencies. 5 Therefore, for a fixed t ∈ [t 2 , t 3 ] we may apply the Chernoff bound (Lemma 22 with = 1∕2 ), and obtain that V t = ( √ K) with probability e − ( √ K) . Then the claim follows by a union bound over all We would like to use a similar argument also in the cases with non-trivial min and max . Unfortunately, it is no longer true that the drift remains lower bounded by min > 0 if we uncover the random walk steps of the other frequencies. However, the bound still remains true if we condition on only a few of the other frequencies.
More precisely, if we consider a batch of r frequencies b 1 , … , b r for a suitably chosen r ∈ ℕ , then even if we condition on the values that the two offspring have in the bits b 1 , … , b r−1 then frequency of b r will still perform a random walk where the drift in each round is in (1∕(K √ V t )) . Hence, we can couple the random walks of b 1 , … , b r−1 to r independent random walks, and apply the Chernoff bound to show that the contribution of this batch is concentrated. Afterwards we use a union bound over all batches.
Formally, we show the following pseudo-independence lemma. Note that there are two types of error events in the lemma. One is the explicit event E , the other is the event that B ∉ , i.e., that the other frequencies in the batch display an atypical distribution. However, both events are very unlikely if V t is large, which we may assume after one application of Lemma 13.

Lemma 14 Consider a vector of probabilities p t with potential
Let m = m(n) ≥ 3. Let S ⊆ {1, … , n} be a random set which contains each position independently with probability 1/m. Then there is an error event E of probability Pr(E) = e − (V t ∕m) such that, conditioned on ¬E, the following holds for all i 0 ∈ S. Let b 1 i and b 2 i be the i-th bit in the first and second offspring, respectively, and let B ∶= (b j i ) i∈S⧵{i 0 },j∈{1,2} . There is a set ⊆ {0, 1} 2(m−1) such that Pr(B ∈ ) = 1 − e − (min{m,V t ∕m}) and such that for all B 0 ∈ , Before we prove the lemma, we remark briefly on how we apply it. Recall that our overall proof strategy is to show that V t is between V min = (K 2∕3 ) and V max = O(K 4∕3 ) , and then stays in this regime for the remaining runtime. For this regime, by choosing m = √ V min , both error events (the event E and the event B ∉ ) have probability e − (K 1∕3 ) = e − (C 1∕3 log n) , where C is the constant from the assumption K ≥ C log 3 n . So for any n O (1) iterations, the error events will not happen if C is sufficiently large.

Proof of Lemma 14
The error event E is that the contribution of S to V t deviates from its expectation V t ∕m , more precisely, . To estimate its probability note that the contribution of all frequencies sum to V t , so the contribution of the frequencies in S sums to V t ∕m in expectation. We apply the Chernoff bound to random variables where the i-th random variable takes value p i,t (1 − p i,t ) if i ∈ S (with probability 1/m), and value 0 otherwise. Hence we apply Lemma 22 with b = 1 . We obtain that the probability that the contribution of the frequencies in S deviates from its expectation by more than 1 2 V t ∕m is at most e − (V t ∕m) , as required.
We uncover the offspring in three steps. First we uncover all bits in S ⧵ {i 0 } , then we uncover the bits in S ∶= {1, … , n} ⧵ S , and finally we uncover i 0 . We call d 1 and d 2 the difference of the fitnesses of the uncovered bits in the first and second uncovering steps, respectively. Assume first that |d 1 + d 2 | ≥ 2 . Then the values of i 0 in the two offspring do not have an effect on the selection step, and by symmetry p i 0 ,t performs a (possibly stagnating) unbiased random walk step. On the other hand, assume that d 1 + d 2 = 0 , and that the two i 0 -bits in the offspring are different. Then the offspring which has a one-bit in i 0 will always be selected. (The case d 1 + d 2 = ±1 contributes similarly as the case of zero difference, but is not needed for the argument.) For the upper bound on the drift, assume that d 1 = k for some k ∈ ℤ . Note that the frequencies in S contribute at least V t ∕2 to V t , with room to spare. In particular, by the general probability bound for Poisson-Binomial distributions [1], Since this holds for any value of k, analogously to Lemma 2 we obtain For the lower bound, we use a similar argument, but we need to be more careful since Pr(d 2 = −k) = (1∕ √ V t ) holds only if �k� ≤ √ V t for a sufficiently small constant > 0 [23, Lemma 2.5]. Thus the claim will follow as before if we define It only remains to check that Pr(B ∉ ) = e − (min{m,V t ∕m}) . To this end, we proceed in two steps. First, let S ′ be the set of all positions i ∈ S such that the two offspring differ in the i-th bit. We claim that then |S � | ≤ 4V t ∕m with probability e − (V t ∕m) . Indeed, this follows from the Chernoff bound by using |S| indicator random variables X i , where X i = 1 if i ∈ S � , and X i = 0 otherwise. In a second step, we use |S ′ | random variables Y i , where for i ∈ S � we set Y i = +1 if the first offspring has a one-bit in i and the second offspring has a zero-bit in i, and Y i = −1 otherwise. (Recall that by definition of S ′ , the offspring differ in the bits in S ′ .) By symmetry E[Y i ] = 0 for all i, and d 1 (B) = ∑ i∈S � Y i . Now we apply the Chernoff-Hoeffding bound, Lemma 23 to the random variables We note that from Lemma 14 we may derive the following corollary.

Corollary 15
In the situation of Lemma 7 with V min = (log 2 K) and V min ≤ K 2 , we may split the set of frequencies randomly into m = √ V min batches of size (n∕m), such that for every batch S there are independent random walks (L i,t ) i∈S,t≥0 and (U i,t ) i∈S,t≥0 which both satisfy the recurrence (1), and such that L i,t ≤ p i,t ≤ U i,t holds for all off-border frequencies i ∈ {1, … , n} and all t 1 ≤ t ≤ t 2 with probability at Proof For each frequency we decide randomly (independently) to which batch it belongs. Then each batch satisfies the description of Lemma 14, and with sufficiently large probability all batches have size (n∕m) by the Chernoff bound (since V min ≤ V t ≤ n we have m ≤ √ n ). The coupling is an immediate consequence of Lemma 14, which states that for any value of the other frequencies in the batch, the frequency i 0 still performs a random walk that satisfies the recurrence (10). It just remains to check the error probabilities.
Corollary 15 allows us to partition the frequencies randomly into m batches, such that in each batch the frequencies perform random walks that can be coupled to independent random walks. In particular, we will be able to apply the Chernoff-Hoeffding bounds to each batch. This gives concentration of the V t as follows.

Lemma 16 Assume the situation of Lemma 7 (b), in particular
where we may choose the hidden constants suitably. Then with probability Proof Apart from the complication with the batches, the proof is analogous to the proof of Lemma 13. For simplicity we shall abbreviate q ∶= min{ √ V min , √ K∕V 1∕4 min } , and note that K O(1) e − (q) = e − (q) . Therefore, it suffices to show all statements for a single t, since we can afford a union bound over all K O(1) values of t. More precisely, as for Lemma 13 we use induction on t ∈ [t 2 , t 3 ] , and we may choose the constants C ′ , C ′′ in Lemma 7 such that V ′ min ≥ V min and V ′ max ≤ V max . Therefore, by induction hypothesis we may assume that For every 1 ≤ i ≤ n , we define a random variable X i ∶= p i,t (1 − p i,t ) , and we are interested in V t = ∑ n i=1 X i . The frequencies perform a random walk with drift between ) . Therefore, the loop-free random walk with state space {1, … , K} has drift between , where T is the lifetime of a random walk on {1, … , K} with drift between min and max . By Lemma 11, and by Lemma 12 (where the precondition max ≥ (4 ln K)∕K follows from V min = O(K 2∕3 ) with room to spare) we have Now we split the set {1, … , n} of frequencies into m ∶= √ V min batches as in Corollary 15. Since each frequency enters the batch with probability 1/m, the contribution X S ∶= ∑ i∈S X i of the frequencies in S is Even after conditioning on the random walks of the other frequencies, by Corollary 15 the i-th frequency of the batch still performs a random walk with drift between , with an error probability of e − ( √ V min ) . Thus its loop-free random walk still has drift between min and max . Therefore, the contribution of the i-th frequency stays the same even after conditioning on the contribution of the other frequencies in the batch. Hence, we may apply the Chernoff bound, and the probability that X S deviates from its expectation by more than a factor of 2 is at most . By a union bound over all K O(1) batches, the contribution of every batch is within a factor of 2 from its expectation. Therefore, , and the lemma follows from (11) and (12). ◻

E[X S ] = (E[V t ]∕m) = (E[T]∕m).
Altogether, we have proven the Stabilisation Lemma 7: part (a) is proven in Lemma 13, and part (b) is proven in Lemma 16.

Proof of the Main Result
With the Stabilisation Lemma in place, we now prove the three statements in our main result, Theorem 1. We first show the first statement in Theorem 1 about too large step sizes, which is implied by the following slightly more detailed theorem.
The condition K ≤ log n makes sense as we suspect from closely related results on the UMDA [2,24] that the cGA optimises OneMax in expected time O(n log n) if K ≥ c log n for a sufficiently large constant c > 0.
The main idea behind the proof of Theorem 17 is that if the step size 1/K is too large then frequencies frequently hit the lower border due to the large variance in the stochastic behaviour of frequencies. To keep the paper streamlined and focused on the medium step size regime, a proof of Theorem 17 is placed in the "appendix".
The following lemma is used to prove the remaining two statements in Theorem 1.

Lemma 18
With probability 1 − exp(− (K 1∕4 )), we have V min = (K 2∕3 ) and Moreover, for any fixed t ≥ i * r, as long as ( ) = (1) for all ∈ [i * r, t − 1] , V max and V min are bounded in the same way during [i * r, t], with a failure probability of at most t∕r ⋅ exp(− (K 1∕3 )), and with probability 1 − tn exp(− ( (n)∕ log n)) the number of off-border frequencies at any time t ∈ [i * r, t] is at most 4K 2 (n). In particular, if t = n 2 , (n) = C log 2 n , and K ≥ C log 3 n for a sufficiently large constant C > 0 , then the error probability is o(1).
Proof By Lemma 4, we know that the initial fraction of frequencies at the lower border is (1) , with probability 1 − e − ( √ n) . We apply the first statement of the Stabilisation Lemma 7 (a) with respect to an initial epoch of length r and obtain that with probability 1 − e − ( √ K) we have V t = (K 1∕2 ) in a epoch [t 2 , t 3 ] of length at least r. Applying the statement again, now with respect to this epoch and with the assumption V min = (K 1∕2 ) , we obtain V min = (K 5∕8 ) for the next epoch, with error probability exp(− (min{ √ V min , √ K∕V 1∕4 min })) = exp(− (K 1∕4 )) . Iterating this argument i times, we have V min = (K 2∕3−(2∕3)(1∕4) i+1 ) after i epochs of length r, and each error probability is at most exp(− (K 1∕4 )) . In particular, choosing i * = c ln ln K for a sufficiently large constant c > 0 , we get V min = (K 2∕3−1∕ log K ) = (K 2∕3 ) after i * ∕2 iterations, with error probability exp(− (K 1∕4 )) in each step.
Applying the second statement of the Stabilisation Lemma 7 with respect to the i * -th epoch, we obtain with error probability exp(− (K 1∕3 )) that V max = O(K 2 ) for the next epoch. We apply the statement again, and the next epoch will satisfy . Iterating this argument using the new value of V max and still V min = (K 2∕3 ) for O(log log K) epochs similarly as above, we arrive at V max = O(K 4∕3 ) , with an error probability of i * ∕2 ⋅ exp(− (K 1∕3 )) = exp(− (K 1∕3 )).
For t ≥ i * r , we may apply the same argument again, getting an error probability of exp(− (K 1∕3 )) for each epoch. The statement on V min and V max then follows from a union bound over all epochs. For the number of off-border frequencies, by Lemma 6 every frequency hits a border after at most K 2 (n) rounds with probability 1 − exp(− ( (n)∕ log n)) . By a union bound over all frequencies and all rounds, the probability that there is ever a frequency that does not hit a border within K 2 (n) rounds is at most tn exp(− ( (n)∕ log n)) . Therefore, for every , the only offborder frequencies at time are frequencies that left the border in the last K 2 (n) rounds. The expected number of such frequencies is at most 2K 2 (n) , and by the Chernoff bound, Lemma 22, the number exceeds 4K 2 (n) with probability at most exp(− (K 2 (n))) , which is negligible compared to exp(− ( (n)∕ log 2 n)) . This proves the statement on the number of off-border frequencies.
Finally, the statement for t = n 2 follows since n 2 e − (log n) = o(1) if the hidden constant is large enough. ◻ We are finally ready to prove our main result.

Proof of Theorem 1
As mentioned earlier, the first statement follows from Theorem 17. Concerning the second statement, a lower bound of ( √ nK + n log n) was shown in [22]. Hence it suffices to show a lower bound of (K 1∕3 n) for K ≥ C log 3 n , where we may choose the constant C to our liking. In the following, we assume that all events that occur with high probability do occur.
Recall that the potential t ∶= ∑ n i=1 (1 − p i,t ) is the total distance of all frequencies to the optimal value of 1. By Lemma 5, we have a 0 = (1) fraction of frequencies at the lower border at some time within the first O(K 2 ) iterations with probability 1 − e − (K 2 (n)) − e − ( √ n) . In particular, this implies t ≥ 0 (n − 1). Let ∶= 1 − 0 ∕8 . We show that the time until either t has decreased to 0 ∕4 ⋅ (n − 1) or a solution with fitness at least n is found is (K 1∕3 n) with high probability. This implies the second and third statements since in an iteration where t > 0 ∕4 ⋅ (n − 1) the expected fitness is at most n − 0 ∕4 ⋅ (n − 1) and the probability of sampling a solution with fitness at least n is 2 − (n) by Chernoff bounds. This still holds when considering a union bound over O(K 1∕3 n) steps.
Moreover, also by Lemma 18, if we can show (t) = (1) then the bound V t = O(K 4∕3 ) remains true for the next K 1∕3 n rounds, with probability 1 − o (1) . So it remains to show (t) = (1) for t ∈ [T, (K 1∕3 n)] . Note that the prerequisites of Lemma 18 only concern times strictly before t, so we can use the statement of the lemma inductively to show that (t) = (1) . By Lemma 18, the number of offborder frequencies in each epoch is

Experiments
We have carried out experiments for the cGA on OneMax to gain some empirical insights into the relationship between K and the runtime. The algorithm was implemented in the C programming language using the WELL512a random number generator. The experiments supplement our asymptotic analyses and confirm that the algorithm indeed exhibits a bimodal runtime behavior also for small problem sizes. We ran the cGA with n = 1000 (Fig. 2), n = 2000 (Fig. 3), n = 3000 (Fig. 4), all averaged over 3000 runs, and n = 10000 (Fig. 5), averaged over 500 runs, as detailed in the figures. In all four cases, we observe the same picture: the empirical runtime starts out from very high values, takes a minimum when K is around 10 and then increases again, e. g., up to K = 30 for n = 1000 . Thereafter it falls again, e. g., up to K ≈ 130 for n = 1000 , and finally increases rather steeply for the rest of the range. The location of the first minimum does not change much in the three scenarios, but the second minimum clearly grows with K, roughly from 130 at n = 1000 via roughly 210 at n = 2000 to finally roughly 590 at n = 10000 . As n grows, the relative difference between the maximum and second minimum increases as well, from roughly 23 % at n = 1000 to roughly 45 % at n = 10000 . Close inspection of the left part of the plot also shows that the range left of the first minimum leads to very high runtimes. We could not plot even smaller values of K due to exploding runtimes. This is consistent with our exponential lower bounds for K ≤ 0.3 log n.
The right-hand sides of the pictures also illustrate that the number of times the lower frequency border is hit seems to decrease exponentially with K. The phase transition where the behavior of frequencies turns from chaotic into stable is empirically located somewhere around the value of K where the second minimum of the runtime is reached.

Conclusions
We have investigated the complex parameter landscape of the cGA, highlighting how performance depends on the step size 1/K. In addition to an exponential lower bound for too large step sizes ( K ≤ 0.3 log n ), we presented a novel lower bound of (K 1∕3 n + n log n) for the cGA on OneMax that at its core has a very careful analysis of the dynamic behaviour of the sampling variance and how it stabilises in a complex feedback loop that exhibits a considerable lag. A key idea to handle this complexity was to show that the sampling variance V t of all frequencies at time t can be estimated accurately by analysing the stochastic behaviour of one frequency i over a period of time.
Assuming that the cGA has the same upper bound as the UMDA for step sizes K = (log n) , the expected runtime of the cGA is a bimodal function in K with worse performance in between its two minima.
We believe that our analysis can be extended towards an upper bound of O(K 2∕3 n + n log n) , using that typically V t = (K 2∕3 ) after an initial phase, which implies a drift of ( √ V t ∕K) = (K −2∕3 ) for t . This would require additional arguments to deal with (t) decreasing to sub-constant values where showing concentration becomes more difficult. Another avenue for future work would be to investigate whether the results and techniques carry over to the UMDA, where the frequencies can make larger steps. changes the frequency by 1/K. A necessary condition for increasing the frequency by a total of at least 1/6 is that we have at least K/6 b-steps among the first t steps. Choosing small enough to make ⋅ c 3 K ≤ 1∕2 ⋅ K∕6 , by Chernoff bounds the probability to get at least K/6 b-steps in t steps is at most (e∕4) K∕12 ≤ (e∕4) 1∕12 < 0.97.
So we conclude that each frequency in S ′ satisfies the condition in (b) with probability at least 0.03. Moreover, by choice of S ′ , every such frequency automatically also satisfies the condition in (a). Thus the expected number of frequencies that satisfy the conditions in (a) and (b) is at least 0.03 n . It remains to show concentration, i.e., we show that the number of frequencies which have at most K/6 b-steps among the first t steps is concentrated.
The number of b-steps is not independent for different frequencies, so we cannot apply Chernoff bounds. However, we can use the same argument as in the proof of Corollary 15, which we repeat briefly. We split the set S ′ randomly into √ n batches, assigning each frequency independently to a batch. Then each batch satisfies the description of Lemma 14, and with probability 1 − e − (

Proof of Lemma 4
The proof follows closely arguments from the proof of Theorem 8 in [22], using our improved Lemma 20. For concentration we again need the batch argument as in Corollary 15. We will focus on proving that frequencies are likely to hit the lower border. Since the probability of a frequency hitting the upper border is no smaller than the probability of hitting the lower border, a symmetric statement also holds for frequencies hitting the upper border. Let T ∶= K 2 for a small enough constant > 0 . We first fix one frequency, and we use Lemma 19 to show that some frequencies are likely to walk down to the lower border. Note that Lemma 19 applies for an arbitrary (even adversarial) mixture of rw-steps and b-steps over time. Lemma 20 states that there are (n) frequencies whose displacement owing to b-steps during the first T steps is at most 1/6. We focus on these frequencies in the following and show that a constant fraction of them reach the lower border.
We shall fix such a frequency i and focus on the effect of its rw-steps during the first T steps. We will apply both statements of Lemma 19, to prove that p i walks to its lower border with a not too small probability. First we apply the second statement of the lemma for a positive displacement of s ∶= 1∕6 within T steps, using ∶= T∕((sK) 2 ) . The random variable T s describes the first point of time when the frequency reaches a value of at least 1∕2 + 1∕6 + s = 5∕6 through a mixture of b-and rw-steps. This holds since we work under the assumption that the b-steps only account for a total displacement of at most 1/6 during the phase. Lemma 19 now gives us a probability of at least 1 − e −1∕(4 ) = (1) (using = O(1) ) for the event that the frequency does not exceed 5/6. In the following, we condition on this event.
We then revisit the same stochastic process and apply Lemma 19 again to show that, under this condition, the random walk achieves a negative displacement. Note that the event of not exceeding a certain positive displacement is positively correlated with the event of reaching a given negative displacement (formally, the state of the conditioned stochastic process is always stochastically smaller than of the unconditioned process), allowing us to apply Lemma 19 again despite dependencies between the two applications.
We now apply the first statement of Lemma 19 for a negative displacement of s ∶= −1 through rw-steps within T steps, using ∶= T∕((sK) 2 ) . Since we still work under the assumption that the b-steps only account for a total displacement of at most 1/6 during the phase, the displacement is then altogether no more than s + 1∕6 ≤ −5∕6 , implying that the lower border is hit as the frequency does not exceed 5/6. We note that = (1) by definition and that 1∕ = (1) = o(K) under our assumption K = (1) . Now Lemma 19 states that the probability of the random walk reaching a total displacement of −5∕6 (or hitting the lower border before) is at least Since K = (1) , = (1) and |s| = 1 , (13) is at least Combining with the probability of not exceeding 5/6, which we have proved to be constant, the probability of the frequency hitting the lower border within T steps is (1). Therefore the expected number of frequencies which reach the lower border is (n) . To show the whp statement, we use the same trick as in the proofs of Corollary 15 and of Lemma 20, and split the set of frequencies into batches of size ( √ n) . Then by Lemma 14, the frequencies in each batch can be coupled to independent random walks. This allows us to apply the Chernoff bound and to conclude that within each batch, with probability 1 − e − ( √ n) a constant fraction of the frequencies reaches the lower border. The statement of the lemma is then obtained by a union bound over all batches. We omit the details as they are analogous to Lemma 20. ◻

A.2 Proof of Theorem 17
Here we prove an exponential lower bound for too large step sizes as stated in Theorem 17.
The following lemma shows that with a good probability, a frequency will reach the lower border through a sequence of steps that are all decreasing. Such a sequence was called landslide sequence in the context of a simple ACO algorithm in [17].

Lemma 21
Consider the cGA on OneMax at a point in time when the number of frequencies at the lower border is at most n − (n). Then with probability at least (10 −K ) within the following O(K log K) steps one of the remaining frequencies will reach the lower border.
Proof We first estimate transition probabilities for a frequency i with 1∕n < p i < 1 − 1∕n using arguments from the proof of Lemma 2, but providing bounds on the constants hidden in the ( √ V t ) terms. For every t ≥ t * we have as x i and y i need to be sampled differently for p i,t to change. A sufficient condition for p i,t to decrease is that x i is sampled at 0 (probability 1 − p i,t ), y i is sampled at 1 (probability p i,t ) and the fitness difference on all other bits, D i,t = ∑ j≠i (x j − y j ) , to be at least 1. By symmetry, Pr(D i,t ≥ 1) = Pr(D i,t ≤ −1) = 1∕2 ⋅ Pr(D i,t ≠ 0) . Using the general bound for Poisson binomial distributions from [1] (see Theorem 22 in [22]), for all p j , j ≠ i, Together, we have for large enough n. Hence the conditional probability of p i,t decreasing, given that it changes, is at least 10 . For the remainder we choose a frequency i with 1∕n < p i < 1 − 1∕n , if such a frequency exists. If no such frequency exists, there must be (n) frequencies at the upper border and the probability that at least one such frequency detaches from the upper border in the next iteration is at least (n) ⋅ 1∕n ⋅ (1 − 1∕n) ⋅ 1∕5 = (1) , reusing arguments from above. We assume that this happens, keeping in mind a (1) factor in the claimed probability (and absorbing the additional iteration in the time bound), and choose one such frequency i.
A sufficient condition for frequency i reaching the lower bound before returning to the upper bound is that p i,t always decreases if it is changed. This needs to happen at most K times, yielding a probability of at least 10 −K as claimed. The expected time for this sequence of events to happen is at most By Markov's inequality, the probability that the time is at most 20K(ln(K) + 1) is at least 1/2. Absorbing this factor in the term (10 −K ) completes the proof. ◻

Proof of Theorem 17
According to Lemma 4, with probability 1 − e − ( √ n) at least 0 n frequencies reach their lower border within the first t * = O(K 2 ) iterations, for some constant 0 > 0 . As argued in Lemma 5, frequencies that hit the lower border before time t * have a chance to leave the border again. However, since the probability of a frequency detaching from the lower border is at most 2/n irrespective of other frequencies, the probability that at time t * there will be at least 0 n∕2 frequencies at the lower border is 1 − 2 − (n) by Chernoff bounds.
Let t denote the number of frequencies at the lower border at iteration t. We consider periods of T = O(K log K) iterations, where the O-term is the one from Lemma 21, and how the number of frequencies at the lower border changes in expectation during such a period. By Lemma 21, if t ≤ n − (n) , the number increases by 1 with probability at least p + = (10 −K ) and since every frequency at the lower border detaches only with probability at most 2/n, we have Note that for every frequency i, every time t and all remaining frequencies, the probability that frequency i is at the lower border at time t + 1 is maximised if frequency i is already at the lower border at time t. More formally, the sought probability is at least 1 − 2∕n if p i,t = 1∕n , it is at most (1∕n + 1∕K)(1 − 1∕n − 1∕K) ≪ 1 − 2∕n if p i,t = 1∕n + 1∕K (by Lemma 2) and it is 0 otherwise. Hence we are being pessimistic if we underestimate the number of frequencies at the lower border.
We argue in the following that the number of frequencies at the lower border stochastically dominates that of a simpler Markov chain Z 0 , Z 1 , Z 2 , … defined as follows. One step of the Z-process reflects a simplified view of T iterations of the cGA, with Z t being defined so that it is stochastically dominated by the number of frequencies at the lower border after t ⋅ T iterations of the cGA. We will define the Z-process so that it is capped: Z t ∈ [0, b + 1] for a value b ≤ n − (n) chosen later. The value of Z t+1 is determined by starting with Z t , subtracting Z t ⋅ T independent Bernoulli variables with parameters 2/n and, if and only if Z t ≤ b , adding the outcome of a Bernoulli trial with parameter p + .
The simpler process Z t is stochastically dominated by t * +tT since Z t = min{ t * +tT , b + 1} (thus in particular Z 0 ≤ t * ) and all transition probabilities are estimated pessimistically: for all d ≥ 1 and all i ≤ b + 1 we have as the left-hand side is 0 for d > 1 or i = b + 1 and p + ⋅ (1 − 2∕n) iT otherwise, which is a lower bound for Pr( t * +tT+T = t * +tT + 1 | t * +tT = i) by Lemma 21 and the fact that all i frequencies at the lower border remain there for T steps with probability at least (1 − 2∕n) iT . Furthermore, for all d ≥ 1, where the last inequality follows from the same arguments as above. Note that state b + 1 is a reflective state, but this does not affect the drift estimates for states Z t ≤ b as the process can only increase by 1 in each step. We apply the negative drift theorem [18,19] in the variant with self-loops [20], stated as Theorem 26 in Sect. B.2, to the process Z 1 , Z 2 , … . The interval is chosen as [a, b] with a ∶= b∕2 and b + 1 ∶= min{p + n∕(4T), 0 n∕2} , such that we start at a state at least b. This implies and also that the converse of the self-loop probability is Pr(Z t+1 ≠ Z t | Z t ) ≤ 3p + ∕2 by a union bound over all Bernoulli trials. This establishes the first condition of the negative drift theorem with self-loops.
To establish the second condition, note that Pr(Z t+1 ≠ Z t | Z t ) is bounded from below by p + if Z t ≤ b and 1 − (1 − 2∕n) (b+1)T = 1 − (1 − 2∕n) p + n∕4 = (p + ) if Z t = b + 1 and b + 1 = p + n∕(4T) ; if b + 1 = 0 n∕2 a lower bound of (1) = (p + ) follows in the same way. Hence for all Z t and all d ≥ 1 when choosing r = O(1) appropriately. Along with and Pr(Z t+1 = Z t + d | Z t ) = 0 for d > 1 , this establishes the second condition of the negative drift theorem. Invoking said theorem and noting that (b − a)∕r = (p + n∕T) = ((10) −K n∕(K log K)) = (n 1− log(10) ∕(log(n) log log n)) shows that with probability 1 − 2 − (n 1− log(10) ∕(log(n) log log n)) the time to reduce the number of frequencies at the lower border below a = (n 1− log(10) ∕(log(n) log log n)) is at least 2 cn 1− log(10) ∕(log(n) log log n) for a suitable constant c > 0 . Note that while t ≥ a the probability of sampling the optimum in one iteration is at most 2n −a since at least a frequencies at the lower border have to be sampled at 1 in one of the two search points. Taking a union bound over 2 cn 1− log(10) ∕(log(n) log log n) iterations still yields a failure probability that is absorbed in the term 1 − 2 − (n 1− log(10) ∕(log(n) log log n)) .