Haldane’s formula in Cannings models: the case of moderately strong selection

For a class of Cannings models we prove Haldane’s formula, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi (s_N) \sim \frac{2s_N}{\rho ^2}$$\end{document}π(sN)∼2sNρ2, for the fixation probability of a single beneficial mutant in the limit of large population size N and in the regime of moderately strong selection, i.e. for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_N \sim N^{-b}$$\end{document}sN∼N-b and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0< b<1/2$$\end{document}0<b<1/2. Here, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_N$$\end{document}sN is the selective advantage of an individual carrying the beneficial type, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho ^2$$\end{document}ρ2 is the (asymptotic) offspring variance. Our assumptions on the reproduction mechanism allow for a coupling of the beneficial allele’s frequency process with slightly supercritical Galton–Watson processes in the early phase of fixation.


Introduction
Analysing the probability of fixation of a beneficial allele that arises from a single mutant is one of the classical problems in population genetics, see Patwa and Wahl (2008) for a historical overview. A rule of thumb known as Haldane's formula states that the probability of fixation of a single mutant of beneficial type with small selective advantage s > 0 and offspring variance ρ 2 in a large population of individuals, whose total number N is constant over the generations, is approximately equal to 2s/ρ 2 . Originally, this was formulated for the (prototypical) model of Wright and Fisher, in which the next generation arises by a multinomial sampling from the previous one (which leads to ρ 2 = 1 − 1 N in the neutral case), with the "reproductive weight" of an individual of beneficial type being increased by the (small) factor 1 + s. A natural generalization of the Wright-Fisher model are the Cannings models; here one assumes exchangeable offspring numbers in the neutral case (Cannings 1974;Ewens 2004), and separately within the sets of all individuals of the beneficial and the non-beneficial type in the selective case (Lessard and Ladret 2007).
The reasoning in the pioneering papers by Fisher (1922), Haldane (1927) and Wright (1931) was based on the insight that, as long as the beneficial type is rare, the number of individuals carrying the beneficial type is a slightly supercritical branching process for which the survival probability is π(s)∼ 2s ρ 2 as s → 0, where 1 + s is the offspring expectation and ρ 2 is the offspring variance (see Athreya 1992, Theorem 3). The heuristics then is that the branching process approximation should be valid until the beneficial allele has either died out or has reached a fraction of the population that is substantial enough so that the law of large numbers dictates that this fraction should rise to 1. Notably, Lessard and Ladret (2007) obtained (for fixed population size N ) the result as a special case of their explicit analytic representation of π(s) within a quite general class of Cannings models and selection mechanisms. An interesting parameter regime as N → ∞ is that of moderate selection, which is between the classical regimes of weak and strong selection. Is the Haldane asymptotics valid in the regime (2)?
If one could bound in this regime the o(s)-term in (1) by o(N −b ), then (1) would turn into (3). Such an estimate seems, however, hard to achieve in the analytic framework of Lessard and Ladret (2007).
The main result of the present paper is a proof of the Haldane asymptotics using an approximation by Galton-Watson processes in the regime of moderately strong selection, which corresponds to (2) for 0 < b < 1 2 . Hereby, we assume that the Cannings dynamics admits a paintbox representation, whose random weights are exchangeable and of Dirichlet-type, and fulfil a certain moment condition, see Sect. 3. Here, the effect of selection is achieved by a decrease of the reproductive weights of the non-beneficial individuals by the factor 1 − s N .
An approximation by Galton-Watson processes was used in González Casanova et al. (2017) to prove the asymptotics (3) in the regime of moderately strong selection for a specific Cannings model that arises in the context of experimental evolution, with the next generation being formed by sampling without replacement from a pool of offspring generated by the parents.
In the case b ≥ 1 2 the method developed in the present paper would fail, because then the Galton-Watson approximation would be controllable only up to a time at which the fluctuations of the beneficial allele (that are caused by the resampling) still dominate the trend that is induced by the selective advantage. However, in Boenkost et al. (2021) we proved the Haldane asymptotics (3) for the case of moderately weak selection, i.e. under Assumption (2) with 1 2 < b < 1. There a backward point of view turned out to be helpful, which uses a representation of the fixation probability in terms of sampling duality via the Cannings ancestral selection graph developed in Boenkost et al. (2021) (see also González Casanova and Spanó 2018).
The results of the present paper together with those of Boenkost et al. (2021) do not cover the boundary case b = 1 2 between moderately strong and moderately weak selection. We conjecture that the Haldane asymptotics (3) is valid also in this case.

A class of Cannings models with selection
This section is a short recap of Boenkost et al. (2021) Section 2; we include it here for self-containedness.

Paintbox representation in the neutral case
Neutral Cannings models are characterized by the exchangeable distribution of the vector ν = (ν 1 , . . . , ν N ) of offspring sizes; here the ν i are non-negative integer-valued random variables which sum to N . An important subclass are the mixed multinomial Cannings models. Their offspring size vector ν arises in a two-step manner: first, a vector of random weights W = (W 1 , . . . , W N ) is sampled, which is exchangeable and satisfies W 1 + · · · + W N = 1 and W i ≥ 0, 1 ≤ i ≤ N .
In the second step, a die with N possible outcomes 1, . . . , N and outcome probabilities W = (W 1 , . . . , W N ) is thrown N times, and ν i counts how often the outcome i occurs. Hence, given the random weights W the offspring numbers ν = (ν 1 , . . . , ν N ) are Multinomial(N , W )-distributed. Following Kingman's terminology, we speak of a paintbox representation for ν, and call W the underlying (random) paintbox.
This construction is iterated over the generations g ∈ Z: be independent copies of W , and denote the individuals in generation g by (i, g), i ∈ [N ]. Assume that each individual ( j, g + 1), j ∈ [N ] := {1, . . . , N } in generation g + 1, chooses its parent in generation g, with conditional distribution where given W (g) the choices of the parents for individuals {( j, g + 1), j ∈ [N ]} are independent and identically distributed. This results in exchangeable offspring vectors ν (g) which are independent and identically distributed over the generations g.
For notational simplicity we do not always display dependence of W (g) on the generation g, and write W instead. From time to time however we want to emphasise the dependence of W on N and therefore write W (N ) instead of W .

A paintbox representation with selection
Let W (g) , g ∈ Z, be as in the previous section, and let s N ∈ [0, 1). Assume each individual carries one of two types, either the beneficial type or the wildtype. Depending on the type of individual (i, g) we set is of beneficial type. The probability that an individual is chosen as parent is now given by for all i, j ∈ [N ]. Parents are chosen independently for all i ∈ [N ] and the distribution does not change over the generations. If (i, g) is the parent of ( j, g + 1) the child ( j, g + 1) inherits the type of its parent. In particular, this reproduction mechanism leads to offspring numbers that are exchangeable among the beneficial as well as among wildtype individuals.

The Cannings frequency process
In the previous section we gave a definition for a Cannings model which incorporates selection, by decreasing the random weight of each wildtype individual by the factor 1 − s N . This allows to define the Cannings frequency process X = (X g ) g≥0 with state space [N ] which counts the number of beneficial individuals in each generation g.
Assume there are 1 ≤ k ≤ N beneficial individuals at time g; due to the exchangeability of W (g) we may assume that the individuals (1, g), . . . , (k, g) are the beneficial ones. Given W (g) = W , the probability that individual ( j, g + 1) is of beneficial type is then due to (4) equal to and is the same for all j ∈ [N ]. Hence, given W (g) = W and given there are k beneficial individuals in generation g, the number of beneficial individuals in generation g + 1 has distribution this defines the transition probabilities of the Markov chain X .

Main result
Before we state our main result we specify the assumptions on the paintbox and the strength of selection.
Definition 1 (Dirichlet-type weights) We say that a random vector W (N ) with exchangeable components W (N ) where Y 1 , . . . , Y N are independent copies of a random variable Y with P(Y > 0) = 1.
We assume that for some h > 0, which implies the finiteness of all moments of Y . The relevance (and possible relaxations) of Condition (8) are discussed further in Remark 2(a), see also the comment in Remark 1(a).
Remark 1 (a) The biological motivation for considering Dirichlet-type weights comes from seasonal reproductive schemes. At the beginning of a season a set (of size N ) of individuals is alive. These individuals and their offspring reproduce and generate a pool of descendants within that season. Only a few individuals from this pool survive till the next season. The number N in the model is assumed to be the total number of individuals that make it to the next season. Dirichlettype weights arise in the asymptotics of an infinitely large pool of offspring; then sampling with and without replacement coincide. Condition (8), which we will require for the proof of Theorem 1 (see also Remark 2), guarantees that the pool of descendants of a single individual is not too large in comparison to the pool of descendants generated by the other individuals. The simplifying assumption P(Y > 0) = 1 implies that the weight W (N ) i of a parent cannot be equal to zero. Observe, however, that weights of single parents can be arbitrarily small if (e.g.) Y has a density which is continuous and strictly positive in zero.
Theorem 1 in Huillet and Möhle (2021), gives a classification of a large class of Cannings models with a paintbox of the form (7) with regard to the convergence of their rescaled genealogies. (c) Let ν (N ) be a sequence of Cannings offspring numbers that are represented by the paintboxes W (N ) . It is well known (and easily checked) that Var ν If W (N ) is of the form (1) with E[Y 2 ] < ∞, which is clearly implied by (8), then (see Huillet and Möhle (2021) Theorem 1 (i)) the right hand side of (9) converges In view of Remark 1(c) we have for the asymptotic neutral offspring variance does not affect (7), hence we can and will assume E[Y ] = 1 in the proofs, which simplifies (10) to ρ 2 = E[Y 2 ] (and makes (10) consistent with the notation of Huillet and Möhle (2021)). Under the assumption E[Y 4 ] < ∞, which is implied by (8) as well, the following asymptotics is valid We will discuss the relevance of this asymptotics in Remark 2(b), and prove it in Lemma 2(b).
Turning to the selective advantage, we assume that for a fixed η ∈ (0, 1 4 ) the sequence (s N ) obeys which we call the regime of moderately strong selection, thus generalizing the corresponding notion introduced in Sect. 1. (Note that (12) has an analogue in the regime of moderately weak selection as discussed in Boenkost et al. (2021)). In order to connect to (2) we define which is equivalent to s N = N −b N , with (12) translating to We now state our main result on the asymptotics (as N → ∞) of the fixation probability of the Cannings frequency process (X Note that the Markov chain (X g ) has the two absorbing states 0 and N , with the hitting time of {0, N } being a.s. finite for all N .
Theorem 1 (Haldane's formula) Assume that Conditions (7), (8) and (12) are fulfilled. Let (X g ) g≥0 be the number of beneficial individuals in generation g, with X 0 = 1. Let τ = inf g ≥ 0 : X g ∈ {0, N } , then We give the proof of Theorem 1 in Sect. 5, after preparing some auxiliary results in Sect. 4. Next we give a strategy of the proof and its main ideas, with an emphasis on the role of Condition (12). In Remark 2 we discuss possible relaxations of Condition (8) and the boundary case b = 1 2 . The proof of Theorem 1 is divided into three parts, corresponding to three growth phases of X . Concerning the first phase we show that the probability to reach the level N b+δ is 2s N ρ 2 (1 + o(1)), for some small δ > 0 and b := b N ; this is the content of Proposition 1. The proof is based on stochastic domination from above and below by slightly supercritical Galton-Watson processes Z and Z with respective offspring distributions (29) and (30).
To construct a Galton-Watson stochastic upper bound Z of X in its initial phase, we recall that the transition probabilities of X are mixed Binomial specified by (6). Using (7) we approximate (5) from above by As we will show in Lemma 4, this is possible with probability 1 − O(exp(−c N 1−2α )) for some α < 1 2 and c > 0, and for k ≤ N b+δ with b + δ < 1/2. We will then be able to dominate the mixed Binomial distribution (6) by the mixed Poisson distribution with random parameter (15), again up to an error term of order o(s N ). Noting that (15) is a sum of independent random variables, we arrive at the upper Galton-Watson approximation for a single generation. For any small ε > 0 this can be repeated for N b+ε generations, which (as an application of Lemma 1 will show) is enough to reach either the level 0 or the level N b+δ with probability 1 − o(s N ).
To obtain a Galton-Watson stochastic lower bound Z of X in its initial phase, we adapt an approach that was used in González Casanova et al. (2017) in a related situation. As in Sect. 2.1, number the individuals in generation g by (i, g), now with (1, g), . . . , (X g , g), being the beneficial individuals, and denote by ω (g) i the number of children of the individual (i, g), 1 ≤ i ≤ X g . As will be explained in the proof of Lemma 5, as long as X g has not reached the level N b+δ , the distribution of ω (g) i can be bounded from below by a mixed binomial distribution for some sufficiently small ε > 0, again for b + δ < 1/2. A suitable stopping and truncation at the level N b+δ will give the Galton-Watson process approximation from below for the first phase. We will verify in Sect. 4.1 that both slightly supercritical branching processes Z and Z reach the level N b+δ with probability 2s N ρ 2 (1 + o(1)). As to the second phase, we will argue in Sect. 5.2 that, after reaching the level N b+δ the Cannings frequency process X will grow to a macroscopic fraction εN with high probability. If the frequency of beneficial individuals is at least N b+δ (but still below εN ), then in a single generation the frequency of beneficial individuals grows in expectation at least by 1 + (1 − ε)s N + o(s N ). Hence, cs −1 N ln N generations after X has reached the level N b+δ , the expected value of the process X reaches the level 2εN . Similarly one bounds the variance produced in a single generation and derives from this an estimate for the variance accumulated over cs −1 N ln N generations. This bound being sufficiently small, an application of Chebyshev's inequality yields that (after cs −1 N ln N generations) X crosses the level εN with probability tending to 1 after reaching the level N b+δ .
In Sect. 5.3) we deal with the last phase, and will show that the fixation probability tends to 1 as N → ∞ if we start with at least εN individuals of beneficial type. Here we use the representation for the fixation probability that is based on a sampling duality between the Cannings frequency process and the Cannings ancestral selection process (CASP) which was provided in Boenkost et al. (2021). For a subregime of moderately weak selection the claim will follow quickly from the representation formula combined with a concentration result for the equilibrium distribution of the CASP that was proved in Boenkost et al. (2021). To complete the proof we will then argue that both the CASP and the representation of the fixation probability depend on the selection parameter in a monotone way.
Remark 2 (a) With some additional work the assumption (8) of the existence of some exponential moment of Y can be relaxed to some weaker moment condition. In order not to overload the present paper, we restrict here to a sketch. In Lemma 5 we couple the frequency process of the beneficial individuals with Galton-Watson processes for N b+δ generations. By means of the estimates in Lemma 3 and Lemma 4 we show that these couplings hold for a single generation with probability 1 − O(exp(−N c )) for some appropriate c > 0. Since we need the couplings to hold for N b+δ generations, it suffices that the couplings hold in a single generation with probability 1 − O(N −2(b+δ) ) for some δ > 0 (since in this case the probability of the coupling to fail is o(s N ) and therefore can be neglected with regard to (14)). Such probability bounds can also be obtained under weaker assumptions on the distribution of the random variable Y . Assume e.g. that Y has a regularly varying tail, i.e. P(Y > x) ∼ x −β L(x) for some β > 0 and L is a slowly varying function. For the proof of Lemma 5 we need to estimate the probability of the event figuring in Lemma 3 with b < c ≤ 1 and the probability of the event figuring in Lemma 4 with b < α < 1 2 . To show that these probabilities are of ) (since the remaining probability in Lemma 3 can be estimated with Hoeffding's inequality, see Hoeffding (1963) Mikosch and Nagaev (1998) This works for all choices of 0 < b < 1 2 , provided that β ≥ 4. It would be nice to have a proof of the asymptotics (14) under the assumption that the 4th moment of Y is finite, even without the assumption of a regularly varying tail. The investigation of the analogue to (14) in the absence of finite second moments, i.e. for Cannings models with heavy-tailed offspring distributions, is the subject of ongoing research, and will be treated in a forthcoming paper. (b) Relation (11) will be used in the proof of Lemma 6. Moreover, this relation is also instrumental in the companion paper Boenkost et al. (2021) (on the regime of moderately weak selection). The special case n = 3 in Lemma 2 a) shows that the assumption This gives a rate of decay O(N −2 ) for the triple coalescence probability (and is the moment condition (3.6) in Boenkost et al. (2021)). Condition (8) (on the existence of an exponential moment of Y ) guarantees the Haldane asymptotics (14) for Cannings models with weights of Dirichlet type also in the whole regime of moderately weak selection N −1+η ≤ s N ≤ N − 1 2 −η without any further assumption. In particular the assumption on the finiteness of a negative moment of Y in Boenkost et al. (2021), Lemma 3.7 b), is unnecessary. Indeed, in the proof of Lemma 2(a) we show that E[(W (N ) (1)).
As shown in the proof of Lemma 3.7(b) in Boenkost et al. (2021) Condition (8) guarantees that for a sequence Boenkost et al. (2021) is fulfilled. (c) It seems a mathematically intriguing question whether in the regime of moderate selection all Cannings models which admit a paintbox representation with Dirichlet-type weights and are in Kingman domain of attraction, also follow the Haldane asymptotics (14). An example of a sequence of Cannings models (with weights not of Dirichlettype) which fulfil Möhle's condition but do not follow the Haldane asymptotics, is the following. In each generation a randomly chosen individual gets weight N −γ , 0 < γ < 1 2 and all the other individuals have a weight , therefore by Möhle's criterion the genealogy lies in the attraction of Kingman's coalescent. However, the Haldane asymptotics would predict that the survival probability is of order Since the fixation probability of a beneficial allele cannot be smaller than the fixation probability under neutrality (which is 1 N ), (14) must be violated in this example. (d) The present work together with the approach in Boenkost et al. (2021) does not cover the boundary case b = 1 2 . A quick argument why our arguments cannot be extended simply to the boundary case is the following. We show that once the beneficial type exceeds (in the order of magnitude) the frequency s −1 N = N b it goes to fixation with high probability. In the regime b < 1 2 we use couplings with Galton-Watson processes to show that this threshold is reached with probability 2s N ρ 2 (1 + o(1)). However, these couplings are not guaranteed as soon as collisions occur, i.e. when beneficial individuals are replacing beneficial individuals. By the well known "birthday problem" collisions are common as soon as N 1 2 individuals are of the beneficial type. Therefore we require N b N 1 2 , i.e. b < 1 2 . In the light of the results of the present paper and of Boenkost et al. (2021), there is little reason to doubt that the assertion of Theorem 1 should fail in the boundary case b = 1/2. However, the question remains open (and intriguing) whether then the backward or the forward approach (or a combination of both) is the appropriate tool for the proof.

Slightly supercritical Galton-Watson processes
Throughout this subsection, (s N ) N ∈N is a sequence of positive numbers converging to 0, σ 2 is a fixed positive number, and offspring variance σ 2 + o(1) and uniformly bounded third moments for the survival probability of (Z (N ) ) and observe The derivation and discussion of the asymptotics (18) has a venerable history, a few key references being Haldane (1927), Kolmogorov (1938), Eshel (1981, Hoppe (1992), Athreya (1992) (2017) gives a statement on the asymptotic probability that Z (N ) either quickly dies out or reaches a certain (moderately) large threshold. The following lemma improves on this in a twofold way. It dispenses with the assumption s N ∼ cN −b for a fixed b ∈ (0, 1) and more substantially, it gives a quantitative estimate for the probability that, given non-extinction, the (moderately) large threshold is reached quickly.
Proof Observe that In Part 1 of the proof we will estimate the first probability on the r.h.s. of (20); this will give the above-mentioned improvement of Lemma B.3 in González Casanova et al.  (17). The offspring distribution of Z arises from that of Z (N ) as (see Lyons and Peres 2017 Proposition 5.28). In particular one has Denote, as usual, for a random variable X and an event A by Furthermore, because of the assumed uniform boundedness of the third moments of Z (N ) 1 . These two relations together with (16), (18) and the fact that E 1 [Z 1 ; Z 1 = 1] ≤ 1 immediately give a lower bound for E 1 [Z 1 ; Z 1 = 2] ≤ 1, implying that for any β ∈ (0, 1) Hence the process Z (N ) , when conditioned on survival, is bounded from below by the counting process Z of immortal lines, which in turn is bounded from below by the process Z = ( Z n ) n≥0 with offspring distribution So far we closely followed the proof in González Casanova et al. (2017), but now we deviate from that proof to obtain the rate of convergence claimed in (19). An upper bound for the time T := inf{n ≥ 0 : Z n ≥ (1/s N ) 1+δ } also gives an upper bound for the time T (N ) . The idea is now to divide an initial piece of k ≤ (1/s N ) (1+ε) generations into (1/s N ) ε/2 parts, each of n 0 ≤ (1/s N ) (1+ε/2) generations. Because of the immortality of Z and the independence between these parts we obtain immediately that We then bound P 1 ( Z n 0 > (1/s N ) 1+δ ) from below by an application of the Paley-Zygmund inequality in its form where X is a non-negative random variable (with finite second moment). For a supercritical Galton-Watson process with offspring expectation m and offspring variance σ 2 the n-th generation offspring expectation and n-th generation offspring variance σ 2 n are given by m n and σ 2 m n (m n − 1)/(m 2 − m) (see Athreya and Ney (1972), p.4). Hence, we obtain We choose the smallest n 0 such that Observe that n 0 ∼ 1 Applying (23) with X := Z n 0 yields which because of (25) implies P 1 ( Z n 0 ≤ (1/s N ) 1+δ ) ≤ 7 8 . If after time n 0 the process Z n 0 is still smaller than our desired bound (1/s N ) 1+δ , we can iterate this argument (1/s N ) ε/2 times and arrive at with c = − log 7 8 . This gives the desired bound for the first term in (20). Part 2. We now turn to the second term on the r.h.s. of (20). Define This proof follows closely that of the second part of Lemma B.3 in González Casanova et al. (2017); we include it here for completeness.
We observe where the first and the last equality follow from the branching property and from (21), respectively. We have shown in (22) that P 1 (Z 1 = 1) ≤ 1 − βs N + o(s N ) and hence we can conclude Finally, an application of Markov's inequality yields (26) (1))).

Estimates on the paintbox
The following lemma provides the asymptotics (11) as well as the moment bounds for the Dirichlet-type weights that were addressed in Remark 2(b).

Proof a) Consider the event F N
Hoeffding's inequality applied to the sample mean of i.i.d. copies of the bounded random variable Y ∧ K implies that P(F N ) decays exponentially fast. Since E Y n is bounded this yields the claim.
4 . Again, Hoeffding's inequality applied to the sample mean of i.i.d.copies of the bounded random variable Y ∧ K together with monotonicity imply that P(E  We now prove a bound on the deviations for the total weight of k individuals.

Lemma 3 (Large deviations bound for a moderate number of random weights) Let (Y i ) and (W (N ) i
) satisfy (7), (8) Proof This follows by a combination of two Cramér bounds. Indeed, the l.h.s. of (28) is by assumption bounded from above by we estimate the latter probability from above by denoting by I (y) the rate function of Y . Due to (8) I (y) exists around E[Y ] = 1 and is strictly positive for y = 1 (see Dembo and Zeitouni (1994)

Theorem 2.2.3). This yields an upper bound of
The next lemma gives stochastic upper and lower bounds for the sums of the random weights in terms of sums of the independent random variables Y i . (7) and (8)

Lemma 4 (Bounds for the random weights) Assume that Conditions
for some c > 0.
Proof It suffices to show where I (y) is the rate function of Y . Condition (8) ensures that I (c) > 0 for E [Y 1 ] = 1 = c, see Dembo and Zeitouni (1994) Theorem 2.2.3. For any a, a ≥ 1 one has 1 a − 1 a ≤ |a − a |. This yields Using Cramér (1938) Theorem 1, the probability on the r.h.s. can, with a suitable c > 0, be estimated from above by which gives the desired result.

Proof of the main result
Recall from (13) that we denote the order of the selection strength by b N = − log s N log N . To simplify notation we will drop the subscript and simply write b := b N . As mentioned already in the sketch of the proof of Theorem 1 we assume without loss of generality that E[Y ] = 1.
The proof of the Theorem is divided into three parts, which correspond to three phases of growth for the Cannings frequency process X . The initial phase is decisive: due to Proposition 1, the probability that X reaches the level N b+δ for some sufficiently small δ is given by the r.h.s. of (14). Lemmas 6 and 7 then guarantee that, once having reached the level N b+δ , the process X reaches N with high probability. The proof of the Theorem is then a simple combination of these three results and the strong Markov property. Indeed, with τ 1 , τ 2 , τ 3 as in Proposition 1, Lemmas 6 and 7, and with δ, δ , ε fulfilling the requirements specified there, the fixation probability in the l.h.s. of (14) can be rewritten as

First phase: from 1 to N b+ı
In this section we show that as long as X g ≤ N b+δ the process X can be upper and lower bounded (with sufficiently high probability) by two slightly supercritical branching processes Z = (Z g ) g≥0 and Z = (Z g ) g≥0 . To construct the upper bound Z we take the highest per capita selective advantage, which occurs when only a single individual is beneficial. Using Lemmas 3 and 4, we will approximate the thus arising mixed binomial distribution by a mixed Poisson distribution, which leads for Z to the offspring distribution where Y 1 is the random variable figuring in (7). To arrive at the lower bounding Galton-Watson process Z we note that the per capita selective advantage is bounded from below by the one when N b+δ beneficial individuals are present in the parent generation, as long as the process X has not reached the level N b+δ . Again using Lemmas 3 and 4 we will show that the offspring distribution of Z can be chosen as the mixed binomial distribution Lemma 5 (Coupling with Galton-Watson processes) Let δ and α be such that 0 < δ < η and 1 2 − η < α < 1 2 , and put Then X can be defined on one and the same probability space together with two branching process Z and Z with offspring distributions (30) and (29), respectively, such that for j = 1, 2, . . .
with c as in Lemma 4.
Applying the latter estimate g times consecutively yields immediately the following corollary: Corollary 1 Let δ, α, τ 1 , Z and Z be as in Lemma 5. If X 0 ≤ N b+δ , then for all g ∈ N 0 (32) Proof of Lemma 5. We proceed inductively, assuming that for g = 1, 2, . . . we have constructed X , Z and Z up to generation g−1 such that (31) holds for j = 1, . . . , g−1.
Together with X g we will construct Z g and Z g , and check the asserted probability bound for the coupling. Given {X g−1 = k} and the weights (W i ) in generation g−1, the number of beneficial individuals X g in generation g has the binomial distribution (6). Aiming first at the construction of the upper bound Z, we relate (6) to (29) in terms of stochastic order. For p, p ≥ 0, a Bin(N , p)-distributed random variable B is stochastically dominated by a Pois(N p )-distributed random variable P if see (1.21) in Klenke and Mattner (2010). Indeed, in this case the probability of the outcome zero is not larger for a Pois( p )-distributed random variable P 1 than for a Bernoulli( p)-distributed random variable B 1 , which yields B 1 P 1 , where denotes the usual stochastic ordering of the random variables. Consequently with B i and P i being independent copies of B 1 and P 1 , respectively. In particular, for p ≥ 0 and p = p(1 + N b+2δ−1 ) we have Hence Condition (33) holds if Given (W i ), the success probability of the binomial distribution (6) is bounded from above via Thus by Lemma 3, (34) is fulfilled with probability 1 − O(exp(−c ε N b+δ )) with c ε as in Lemma 3. In this sense the number of beneficial offspring is dominated by a -distributed random variable with high probability. Applying Lemma 4 yields that with probability 1 − exp(−c N 1−2α )(1 + o(1)) the following chain of inequalities is valid: In this way X can be coupled with a branching process Z with a mixed Poisson offspring distribution of the form (29). The lower bound also uses a comparison with a Galton-Watson process, now with a mixed binomially distributed offspring distribution: Number the individuals in generation g − 1 by (i, g − 1), with (1, g − 1), . . . , (X g−1 , g − 1), being the beneficial individuals. Given W , we use a sequence of coin tossings to determine which of the individuals from generation g are the children of (i, g −1). The first N tosses determine which individuals are the children of (1, g −1). Denoting the number of these children by ω g−1 1 , the next N − ω g−1 1 tosses (with an updated success probability) determine which individuals are the children of (2, g−1), etc. Observe that as long as X g−1 ≤ N b+δ , and given W and i−1 =1 ω (g−1) =: h, then Note that the success probability in (35) can be estimated from below by As long as X g−1 ≤ N b+δ , Lemma 3 ensures that for ε > 0 with probability 1 − O(exp(−c ε N b+δ )). Lemma 4, in turn, yields that the r.h.s. of (36) is bounded from below by If ω (g−1) 1 + · · · + ω (g−1) i−1 = h > N b+δ , then we have Z g∧τ 1 ∧ N b+δ ≤ X g∧τ 1 ∧ N b+δ . Consequently X can be coupled with a Galton-Watson process Z with offspring distribution of the form (30) such that also the lower estimate in (32) is fulfilled. This completes the proof of Lemma 5.
We are now ready to prove that X reaches the level N b+δ with probability 2s N ρ 2 (1 + o(1)).
Proposition 1 (Probability to reach the critical level) Assume Conditions (7), (8) and (12) are fulfilled and define τ 1 = inf{g ≥ 0 : X g ≥ N b+δ or X g = 0} with 0 < δ < η, then Proof We use the couplings of X with the slightly supercritical branching processes Z and Z from Corollary 1 and show that both processes reach the level N b+δ with probability 2s N ρ 2 (1 + o(1)). Let δ > 0 and E be the event that the stochastic ordering between Z, X and Z holds until generation n 0 = N b+δ , that is We show below that the stopping time τ 1 fulfils For some g that is polynomially bounded in N , the r.h.s. of (32) is bounded from above by 1 − o(s N ). Thus, combining Corollary 1 and (38) we deduce We are now going to bound (37) from above by estimating the corresponding probability for Z and the stopping time τ 1 = inf{g ≥ 0 : Z g ≥ N b+δ or Z g = 0}. More precisely, To obtain an upper bound for the probability of Z to reach the level N b+δ it suffices to estimate the survival probability of Z. For notational simplicity let us write {Z survives} for the event {∀g ≥ 0 : Z g > 0} and similarly {Z dies out} for the event {∃g ≥ 0 : Z g = 0}. We have The survival probability of Z will now be estimated by means of (18). To this purpose we calculate the expectation and variance of the offspring distribution (29).
The expectation is 1 + s N + o(s N ) and the variance is given by Equation (18) yields that the survival probability of the process Z is given by 2s N ρ 2 (1 + o (1)). The lower bound in (37) follows by similar arguments by considering the process Z instead.
It remains to show (38). Define τ (0) and τ (u) as the stopping times that the process Z reaches 0 and the process Z reaches the upper bound N b+δ , respectively, i.e.
with the convention that the infimum over an empty set is infinity. Then by an application of Lemma 1 and Corollary 1 and α < 1 2 as defined there. To keep the notation simple we denote by e N terms of the order exp(−N c ) for some c > 0. Proceeding with (39) we obtain again by an application of Lemma 1. Note that P(Z dies out , Z survives, E) + P(Z survives , Z survives, E) (1)).
In order to show that (40) is o(s N ) it suffices to prove that (1)).
Considering again the event {τ (u) ≤ N b+δ } and applying (18) one obtains since the events E and {τ (u) ≤ N b+δ )} imply that Z g ≥ N b+δ for some g ≤ N b+δ and the probability for Z to die out after reaching N b+δ is (1− 2s N ρ 2 (1+o(1))) N b+δ = e N .
One more application of (18) yields which finishes the proof.

Second phase: from N b+ı to "N
In this section we show that X , once having reached the level N b+δ , will reach the level εN with probability tending to 1 as N → ∞.
Lemma 6 (From N b+δ to εN with high probability) Assume X 0 ≥ N b+δ with 0 < δ < η, let 0 < ε < δ 2−2η−δ and define the stopping time Then there exists some δ > 0 such that Proof By monotonicity it is enough to prove the claim for X 0 = N b+δ . By definition we have Next we lower-bound X by the process X = ( X g ) g≥0 , X 0 = X 0 , with conditional distribution as long as X g ≤ εN . If X g > εN we assume that X g+1 is distributed as a slightly supercritical branching process with Pois(Y 1 q N ) distributed offspring, where We will see that by this definition in each generation the expectation of X increases by the factor q N , see (45) and (46). The generation-wise increase of the variance conditioned on the current state can be estimated from above by a factor ρ 2 (1 + o(1)), see (45) and (47), leading to an iterative estimate on the variance of the form (48).
As long as X g ≥ X g , the success probability in the mixed Binomial distribution on the r.h.s. of (41) dominates the corresponding one on the r.h.s. of (42). Consequently, starting X and X both in N b+δ we can couple them, such that X g ≤ X g as long as X did not cross the level εN . In particular, we have for the stopping time τ = inf{g ≥ 0 : X g / ∈ {1, 2, . . . , εN }} To show P X τ ≥ εN = 1 − O(N −δ ) we will estimate the first and second moment of X g 0 for a suitably chosen g 0 ∈ N and then use Chebyshev's inequality to show that X g 0 is above εN with sufficiently high probability. For this purpose we consider m(x) and v(x), the one-step conditional expectation and variance of X at x ∈ N, that is From the definition of X as a branching process above εN we have for x > εN Next we show that m(x) and v(x) fulfil relations similar to (45) also for x ≤ εN , which will allow to estimate the expectation and the variance of X g 0 . For x ≤ εN we have due to (42) In the penultimate equality we used and (11). Consequently, we have for all x ∈ N, recalling (43), Next we analyse v(x), again for x ≤ εN . In view of (42), a decomposition of the variance gives Because of the negative correlation of the W i , the sum of the first and the second term is not larger than x N 2 Var (W 1 ), which because of (11) is Combining (46) and (47) allows us to estimate the variance Var X g for g ∈ N, again by decomposing the variance: Iterating this argument yields (1)).
Choose the minimal g 0 ∈ N such that 2εN ≤ E X g 0 = q g 0 N X 0 , which yields recalling the initial condition X 0 = N b+δ .
Applying Chebyshev's inequality with X 0 = X 0 , we obtain for some small δ > 0 due to the assumptions on ε.
Since E X g 0 ≥ 2εN , this implies and due to (44) this finishes the proof.

Third phase: from "N to N
Lemma 7 below concerns the last step of the proof, showing that once the process X has reached the level εN , it goes to fixation with high probability. Our proof relies on a representation of the fixation probability of X in terms of (a functional of) the equilibrium state A eq := A Intuitively, this says that X goes extinct if and only if a random sample of (random) size A eq , drawn without replacement from the population of size N , avoids the k beneficial individuals. Formula (49) implies The representation of the transition probabilities of A in Boenkost et al. (2021) Section 2.3 in terms of two half steps yields that for fixed N CASPs with different selection parameters can be coupled in such a way that A (N ) eq is increasing in s N . Take a sequence (s N ) satisfyings N ≤ s N and Condition (1.2) in Boenkost et al. (2021), i.e. N −1+η ≤s N ≤ N −2/3+η .

LetÃ (N )
eq be the equilibrium state belonging tos N (and to the same Dirichlet-type paintbox as that of X ). The central limit result Boenkost et al. (2021), Corollary 6.10, implies thatÃ (N ) eq → ∞ in probability as N → ∞. Because of the just mentioned monotonicity in the selection coefficient, the same convergence holds true for the sequence A (N ) eq . The following lemma is thus immediate from (50) and dominated convergence: Lemma 7 (From εN to N with high probability) Let X be a Cannings frequency process with X 0 = k ≥ εN for some 0 < ε < 1/2. Assume that Conditions (7), (8) and (12) are fulfilled. Define τ 3 := inf{g ≥ 0 : X g ∈ {0, N }}. Then P k (X τ 3 = N ) = 1 − o(1).

Discussion
The analysis of fixation probabilities of slightly beneficial mutants is at the heart of population genetics; some seminal and more modern references are given in the Introduction. Our main result concerns Haldane's asymptotics (3) for the fixation probability in a regime of moderate selection.
Our framework is that of Cannings models with selection (as reviewed in Sect. 2), where the corresponding neutral genealogies are assumed to belong to the domain of attraction of Kingman's coalescent. This class of models is motivated by seasonal reproduction cycles in which within each season a large number of offspring is generated but only a comparatively small number (concentrated around a carrying capacity N ) of randomly sampled offspring survive to the next season. In this setting it is reasonable to approximate sampling without replacement by sampling with replacement. Thus, under the assumption of neutrality, the probability that the j-th offspring that survives till the next generation is a child of parent i is approximately given by the random weight where Y 1 , . . . , Y N are the sizes of (potential one-generation) offspring of parents 1, . . . , N . These sizes are assumed to be independent and identically distributed in the present paper, leading to the concept of weights of Dirichlet type. The subsequent generation then arises by a multinomial sampling with random weights, and to add selection the weights of wildtype parents are decreased by the factor (1 − s N ). For a closely related model with a specific distribution of Y i (and sampling without replacement) in the context of Lenski's long-term evolution experiment see González Casanova et al. (2017) and Baake et al. (2019). We prove Haldane's asymptotics in the case of moderately strong selection, see Theorem 1, in which the selection strength s N obeys N − 1 2 +η ≤ s N ≤ N −η for some η > 0 and a large population size N . In the companion paper Boenkost et al. (2021) the range of moderately weak selection was considered, i.e. in the case N −1+η ≤ s N ≤ N − 1 2 −η for some η > 0. Since s N N −1 in the regime of moderate selection, selection acts in this case on a faster timescale than genetic drift.
In Boenkost et al. (2021) an ancestral selection graph for the just described class of Cannings models with selection was defined, and it was shown that the fixation probability π N is equal to the expeced value E eq is the number of lines of the ancestral selection graph in its equilibrium. While we could analyse directly the asymptotics of that quantity in the regime of moderately weak selection, we were facing too large fluctuations of A N eq in the regime of moderately strong selection in order to be successful with this approch. Conversely, it turned out that the classical idea of branching process approximation is suitable precisely in that latter regime.
For highly skewed offspring distributions an asymptotics for the fixation probability arises which is different from (3). In cases where the neutral genealogy is attracted by a Beta(2 − α, α)-coalescent, Okada and Hallatschek (2021) argue that the fixation probability is proportionally to s 1 α−1 N , if 1 s N N −(α−1) . Thus the probability of fixation is substantially smaller than in Haldane's asymptotics, which is reasonable since the offspring variance is diverging as N → ∞. Notably, since the evolutionary timescale of Cannings models in the domain of attraction of Beta-coalescents is of the order N α−1 , the case 1 s N N −(α−1) corresponds to the regime of moderate selection; note also that the case of coalescents being in the domain of attraction of a Kingman coalescent corresponds formally to α = 2.