1 Introduction

While the theory of branching processes is undoubtedly one of the best developed areas of probability theory, stochastic branching models that incorporate effects of selection and mutation have only recently become the subject of rigorous mathematical analysis. This is despite the unquestionable relevance of these effects to the evolution of populations in nature and in the laboratory [1,2,3].

By contrast, deterministic high density models of a population undergoing selection and mutation have been studied for quite some time [4]. The model most closely associated with our stochastic process is Kingman’s model [5]. This is a dynamical system on the space of probability measures describing the fitness distribution of a population. The fitness distribution \(p_t\) at generation t of the population is replaced in the next generation by

$$\begin{aligned} p_{t+1}(dx) = (1-\beta ) \frac{ x \, p_{t}(dx)}{\int y \, p_{t}(dy)} + \beta \mu (dx). \end{aligned}$$
(1)

Here a proportion \(1-\beta \) of the new generation has been selected from the current generation proportionally to their fitness and a proportion \(\beta \) are mutants that get a new fitness, sampled independently of their past using a fixed mutant fitness distribution (MFD) \(\mu \). Note that this model is only well-defined if the mean fitness of the population remains finite and therefore requires moment bounds for the MFD. For certain MFDs with bounded support Kingman’s model undergoes a condensation phase transition, which implies that a nonzero fraction of the total population attains the maximally possible fitness value when the mutation rate is low enough [6]. A rigorous analysis of the condensation transition can be found in [7], and variants of the model have been considered in [8,9,10].

Kingman’s model is based on two main assumptions about the evolutionary process. First, the fitness of a mutant is assumed to be random and independent of the parental fitness, a setting often referred to as the “house-of-cards” model [4, 5, 11]. Second, each mutation gives rise to a new genetic type or allele, an assumption known as the infinite alleles model [12]. Park and Krug [13] studied a stochastic version of Kingman’s model which incorporates these two features. The population is updated in discrete generations following asexual Wright–Fisher dynamics, which can be viewed as a branching process conditioned on a constant population size N, see [1]. For unbounded MFDs the mean population fitness grows without bound. Since the MFD is time-independent, the fraction of beneficial mutations declines indefinitely over time. As a consequence, for long times beneficial mutants emerge and evolve independently of each other, and the dynamics can be analyzed rather straightforwardly in terms of a record-like point process [13]. In particular, for a MFD with an exponential tail the mean population fitness increases logarithmically, which is consistent with the behaviour observed in Lenski’s long-term evolution experiment with bacteria [2, 14]. A generalization of this model that includes the response of the immune system to a population of pathogens was considered in [15].

The decline of the supply of beneficial mutations distinguishes the fixed population version of Kingman’s model from a related class of stochastic models where the selection coefficients of novel mutants, rather than their fitness, is drawn from a given, time-invariant distribution [1]. In these models the fitness distribution of the population converges to a fitness wave traveling at a constant speed, which is determined by the interference between competing mutant clones [16,17,18,19,20,21].

The branching process version of Kingman’s model considered in this paper is, in a sense, intermediate between the deterministic model (1) and the stochastic finite population model of [13]. The dynamics of the branching process is stochastic, but because of the unbounded growth of the population, competing clones can coexist at arbitrary long times and the population retains a nontrivial type structure. While our motivation here is primarily conceptual and mathematical, we note that the clonal composition of growing populations is a problem of considerable interest for the modeling of proliferating tumours [22,23,24]. In this context, Durrett et al. [22] studied a branching process with selection and mutation where, similar to the fitness wave models described in the previous paragraph, the selection coefficients of beneficial mutations are drawn from a fixed, continuous probability distribution with bounded or unbounded support.

The first papers studying branching process models of Kingman type that express the selective advantage of a fit individual in terms of its offspring distribution are [25], which deals with Weibull type MFDs and puts the focus on the condensation phenomenon in that model, and [26] which looks at the growth of the fittest family in the case of Gumbel type MFDs. Both papers are limited to bounded MFDs and implicitly rely on the analogy to Kingman’s original model, though of course the methods of study are entirely different in a stochastic setting. The present paper initiates the study of Kingman type branching processes with selection and mutation for unbounded MFDs. We focus on the case of Fréchet type MFDs, where the mathematical challenge is linked to the fact that the analogous Kingman model is ill-defined [13].

The structure of the paper is as follows. In Sect. 2, we introduce the models and state the main theorem. Section 3 explains the heuristics behind the formal results and in Sect. 4 we present a rigorous proof of the theorem. Section 5 contains refined results for the empirical fitness distribution for one of our models. These results are not yet accessible by a complete rigorous mathematical analysis, so that we resort to a numerical and heuristic study and a rigorous analysis of an approximating deterministic system. In Sect. 6 we provide a short discussion that places our results into the context of previous work and points to directions for future research. [3]

2 Models and Main Result

We study two models of a population evolving in discrete and non-overlapping generations. In both models all individuals are assigned a fitness value, which is a positive real number. As model parameters we fix a probability distribution \(\mu \) on \((0,\infty )\) from which the random fitness values F are sampled, referred to in the following as the mutant fitness distribution (MFD), and a mutation probability \(\beta \in (0,1)\). As was explained in the Introduction, we assume an infinite alleles model with a house-of-cards fitness landscape.

In both models we start from generation \(t=0\) with a single individualFootnote 1 with fitness f. Each individual in the population in generation \(t\ge 0\) produces a Poisson random number of offspring with mean given by its fitness. With probability \(1-\beta \) an offspring individual inherits its parent’s fitness and is added to the population at generation \(t+1\). Otherwise, with probability \(\beta \), it is a mutant. The two models differ in the fate of the mutants.

  • Fittest mutant model (FMM) Every mutant is assigned a fitness sampled independently from \(\mu \). Only the fittest mutant (if there is one) is added to the population at generation \(t+1\). All other mutants die instantly.

  • Multiple mutant model (MMM) Every mutant is assigned a fitness sampled independently from \(\mu \) and is added to the population at generation \(t+1\).

We write X(t) for the number of individuals in generation t and study the growth of the population conditioned on the event of survival, i.e. when \(X(t) \ne 0\) for all times t. It is easy to see that the population size of the MMM dominates the population size of the FMM at all times. Because the growth is determined by the fittest mutants we expect both models to grow at the same rate and to show this, it suffices to find an upper bound for the MMM and a matching lower bound for the FMM.

Naturally, the rate of growth depends on the MFD \(\mu \). If \(\mu \) is an unbounded distribution in both models individuals of ever increasing fitness occur and hence the population will grow superexponentially fast. By contrast, if \(\mu \) is bounded we can only have exponential growth. Indeed, if \(\mu \) is continuous with essential supremum one, then for a closely related continuous time model of immortal individuals, it is shown in [25, Remark 1] that

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\log X(t)}{t}= \lambda ^*, \end{aligned}$$

where \(\lambda ^*\in [1-\beta ,1)\) is the unique solution of the equation

$$\begin{aligned} \beta \int \frac{\lambda ^*}{\lambda ^*-(1-\beta )x} \, \mu (dx) = 1 \end{aligned}$$

if \(\beta \int \frac{1}{1-x} \, \mu (dx)\ge 1\), and otherwise \(\lambda ^*:=1-\beta \). Further details on the long term growth of the process in [25] depend on the classification of \(\mu \) according to its membership in the max domain of attraction of an extremal distribution. This also applies to other model variants and unbounded MFDs. By the celebrated Fisher-Tippett theorem there are three such universality classes, see for example [27, Proposition 0.3]. These are

  • the Weibull class, which roughly occurs if \(\mu \) is bounded with mass decaying slowly near the essential supremum,

  • the Gumbel class, which roughly occurs if the mass of \(\mu \) is decaying quickly near the essential supremum, which may be finite or infinite,

  • the Fréchet class, which roughly occurs if \(\mu \) is unbounded with mass decaying slowly near infinity.

Extreme value theory plays an important role in the interpretation of experimentally determined effect size distributions of beneficial mutations, and representatives of all three universality classes have been identified empirically [28,29,30,31].

In the present paper, we are mainly interested in the asymptotic behaviour of the population size X(t) in the case that \(\mu \) belongs to the Fréchet class (or, in short, is of Fréchet type). Precisely, this means that the tail function

$$\begin{aligned} G(x):=\mu ((x,\infty ))={\mathbb {P}}(F> x) \end{aligned}$$

is regularly varying with index \(-\alpha \) for some \(\alpha >0\). In other words, there exists a function \(\ell :(0,\infty ) \rightarrow {\mathbb {R}}\) which is slowly varying at infinity such that \(G(x)=x^{-\alpha } \ell (x)\). MFDs of Fréchet type have been found in several experimental studies [32,33,34,35], and appear to be typical for populations subjected to strong selection pressures, such as bacteria or viruses exposed to drugs.

As in this case \(\mu \) is an unbounded distribution, the process \((X(t) :t\ge 0)\) will grow superexponentially fast on survival and therefore our discussion will focus on the limiting quantity

$$\begin{aligned} \nu = \lim _{t\rightarrow \infty } \frac{\log \log X(t)}{t}. \end{aligned}$$

Our main result is stated in the following theorem.

Theorem 1

Given \(\alpha >0\), let \(T\in {\mathbb {N}}\) be the unique number such that

$$\begin{aligned} \frac{(T-1)^T}{T^{T-1}} <\alpha \le \frac{T^{T+1}}{(T+1)^{T}} \end{aligned}$$

and define (see Lemma 4 for another equivalent definition)

$$\begin{aligned} \nu (\alpha ) := \frac{1}{T} \log \frac{T}{\alpha }. \end{aligned}$$
(2)

Let \((X(t))_{t\ge 0}\) be the size of the population in either the FMM or the MMM. Then, almost surely on survival,

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\log \log X(t)}{t}=\nu (\alpha ). \end{aligned}$$

We would like to emphasize that although the survival probability depends not only on the initial condition but also on the model (FMM or MMM), the almost sure convergence on survival in Theorem 1 holds irrespective of the actual value of the survival probability as long as it is nonzero. Before presenting the proof of the theorem in Sect. 4, in the next section we motivate the expression (2).

3 Motivation of the Main Result

Here we explain the statement of Theorem 1 by a heuristic analysis of the FMM. For convenience we take the MFD \(\mu \) to be of Pareto form, \(G(x) = x^{-\alpha }\) for \(x \ge 1\) and \(G(x) = 1\) for \(x< 1\). Moreover, throughout this section we assume that the initial fitness f is so large that the fluctuations induced by Poisson sampling are negligible at all times, which implies that both the total population size and the sizes of subpopulations of mutants are well approximated by their expectations. Denoting the fitness of the mutant that is added to the population at generation t by \(W_t\), we can then write

$$\begin{aligned} X(t) \approx (1-\beta )^t f^t + \sum _{i=1}^t (1-\beta )^{t-i} W_i^{t-i}, \end{aligned}$$
(3)

where the factors \(1-\beta \) account for the fact that (apart from the added mutant) only the unmutated fraction of the population survives to the next generation. For the same reason the total number \(N_t\) of mutants produced in generation t (including the ones that die immediately) is approximatelyFootnote 2

$$\begin{aligned} N_t \approx \frac{\beta }{1-\beta } X(t). \end{aligned}$$

Since the probability that the largest fitness \(W_t\) among \(N_t\) independent and identically distributed random variables with common distribution G is smaller than x is \((1-x^{-\alpha })^{N_t}\), the random variable \(W_t\) can be sampled as

$$\begin{aligned} W_t = \left( 1 - Z_t^{1/N_t} \right) ^{-1/\alpha } \approx X(t)^{1/\alpha } Y_t, \quad Y_t:= \left( \frac{1-\beta }{\beta } \log \frac{1}{Z_t} \right) ^{-1/\alpha }, \end{aligned}$$

where \(Z_t\) is uniformly distributed in the interval (0, 1) and we have approximated \(Z_t^{1/N_t} \approx 1 + (1/N_t)\log Z_t\). Note that \(Y_t\) does not depend on X(t).

To proceed, we define \(\omega _t\) as

$$\begin{aligned} \omega _t := \frac{\log X(t)}{\log f}, \end{aligned}$$

which implies that \(X(t) = f^{\omega _t}\) and \(W_t \approx Y_t f^{\omega _t/\alpha }\). Inserting these relations into (3) we obtain

$$\begin{aligned} f^{\omega _t} \approx (1-\beta )^t f^t + \sum _{i=1}^t (1-\beta )^{t-i} Y_{i}^{t-i} f^{(t-i)\omega _{i}/\alpha }. \end{aligned}$$
(4)

In the limit \(f \rightarrow \infty \) with t fixed, the sum on the right hand side is dominated by the term with the largest exponent of f. Correspondingly, the \(\omega _t\) for large but finite f can be well approximated by the solution \(\chi _t\) of the recursion relation

$$\begin{aligned} \chi _t = \max \left\{ t,\frac{t-1}{\alpha }\chi _1, \frac{t-2}{\alpha }\chi _2,\ldots ,\frac{t-k}{\alpha }\chi _k, \ldots , \frac{1}{\alpha }\chi _{t-1}\right\} \end{aligned}$$
(5)

with \(\chi _1=1\). We now argue that the \(\chi _t\) grow at least exponentially. Since for any \(t_0 \ge 1\) and any positive integer m

$$\begin{aligned} \chi _{t_0+m} \ge \frac{m}{\alpha } \chi _{t_0}, \end{aligned}$$

we have, for any \(n \ge 1\)

$$\begin{aligned} \chi _{t_0+nm} \ge \left( \frac{m}{\alpha } \right) ^n \chi _{t_0}. \end{aligned}$$

Correspondingly

$$\begin{aligned} \lim _{t \rightarrow \infty } \frac{\log \chi _{t}}{t} =\lim _{n \rightarrow \infty } \frac{\log \chi _{t_0+nm}}{nm} \ge \frac{1}{m} \log \frac{m}{\alpha }, \end{aligned}$$
(6)

where we have assumed that the limit is well-defined. Since (6) is valid for any integer \(m \ge 1\), an optimal lower-bound can be found by maximizing the right hand side. As shown by Lemma 4 in Sect. 4, the maximizer over the positive integers is precisely the function \(\nu (\alpha )\) in Theorem 1. As the population size depends exponentially on \(\omega _t\) or \(\chi _t\), the heuristic argument makes it plausible that \(\nu (\alpha )\) is a lower bound on the double-exponential growth rate of X(t). Remarkably, Theorem 1 states that the bound is tight, and moreover applies also to the MMM. Informally this implies that the population at time t is dominated by the fittest mutant that was generated at time \(t-T\). As a consequence the empirical fitness distribution changes periodically with period T (see Sect. 5 for further discussion).

In Fig. 1, we depict \(\nu (\alpha )\) together with the numerical solutionFootnote 3 of the recursion relation (5). The fact that \(\nu (\alpha )\) is the exact exponential growth rate of \(\chi _t\) is proven rigorously in Lemma 6 in Sect. 4. In the inset of Fig. 1, we compare (2) to an approximation obtained by treating m in (6) as a continuous variable. This yields

$$\begin{aligned} \max \left\{ \frac{1}{x} \log \frac{x}{\alpha } :x \ge 1 \right\} = {\left\{ \begin{array}{ll} 1/(e \alpha ), &{} \alpha e \ge 1,\\ -\log \alpha , &{} \alpha e < 1. \end{array}\right. } \end{aligned}$$
(7)

Although (7) is not exact, the relative error is less than 7% in all cases.

For \(\alpha e < 1\) the expressions (2) and (7) actually coincide. In this regime of extremely heavy-tailed MFDs (more precisely, in the case of \(\alpha \le 0.5\) with \(T=1\); see Theorem 1) selection becomes irrelevant, in the sense that the double-exponential growth rate \(\nu (\alpha ) = \log (1/\alpha )\) persists in the extreme case \(\beta \rightarrow 1\) of the MMM, where all individuals are replaced by mutants in each generation and the process becomes a classical Galton–Watson process albeit with infinite mean, cf. [36]. In the case of the FMM, the extreme case would stop the population from growing but the fitness \(W_t\) of the single individual present approximately satisfies \(W_{t+1} \approx W_t^{1/\alpha }\), which gives

$$\begin{aligned} \lim _{t\rightarrow \infty } \frac{\log \log W_{t}}{t} = \log (1/\alpha ). \end{aligned}$$
Fig. 1
figure 1

Plots of \(\nu \) vs \(\alpha \). Solid line depicts (2) and open circles present numerical solutions of the recursion relation (5). Inset: Plot of \((\nu e \alpha )^{-1}-1\) vs. \(\alpha \) with \(\nu \) in (2). The error of the approximation (7) is small and vanishes when \(\alpha \le 1/e\)

4 Proof of Theorem 1

4.1 Preparation for the Proof

In this subsection we collect some tools that will be used in the proofs of the lower and upper bounds in the estimate leading to Theorem 1. The lower bound will be verified in Sect. 4.2 and the upper bound in Sect. 4.3.

For \(t\in \mathbb {N}_0\) let \(W_t\) be the fitness of the fittest of the mutants in generation t and \(W_t=0\) if there are no mutants in generation t. Our first observation is that under the weak assumption \(G(x) >0\) for all large x (which always holds if \(\mu \) is of Fréchet type) either the sequence \((W_t)\) is unbounded or the branching process dies out in finite time. Heuristically speaking, on survival the accumulated number of mutants is unbounded almost surely, which naturally entails unbounded largest fitness.

Lemma 2

Almost surely on survival the sequence \((W_t)\) is unbounded.

Proof

We first show that the branching process can be coupled to a sequence \((\xi _1, \ldots , \xi _t)\) of independent Bernoulli variables with success parameter \(\beta \) and an independent sequence \((F_1,\ldots , F_t)\) of independent fitnesses with distribution \(\mu \) such that on survival up to generation t we have, for all \(1\le i \le t\),

  • \(\xi _i=1\) if there is at least one mutant in generation i, and

  • \(W_i \ge F_i \xi _i\).

Indeed, once the random variables \((F_1,\ldots , F_t)\) and \((\xi _1, \ldots , \xi _t)\) are generated with the given law we generate the branching process as follows: Produce the offspring in the nth generation as a Poisson distribution with the right parameter given by the previous generation (possibly zero). If there is at least one offspring use \(\xi _n\) to decide whether it is a mutant and if so give it fitness \(F_n\). Then use other newly sampled Bernoulli variables with parameter \(\beta \) and fitnesses to decide whether other variables are mutants and if they are decide their fitness. Then surival implies \(W_n\ge \xi _n F_n\) as required.

Now \(N:=\sum _{i=1}^t \xi _i\) is binomially distributed with parameters \(\beta >0\) and \(t\in \mathbb {N}\). We infer that, for any fixed \(x>1\),

$$\begin{aligned}&\mathbb {P}(W_i \le x~\text { for all } i\le t) \le \mathbb {P}(F_i \xi _i \le x~\text { for all } i\le t) + \mathbb {P}( \text {extinction}) \\&\quad = \sum _{i=0}^t {t \atopwithdelims ()i } \beta ^{i} (1-\beta )^{t-i} \mathbb {P}(F \le x)^{i}+ \mathbb {P}( \text {extinction})\\&\quad = (\beta \mathbb {P}(F \le x) +(1-\beta ))^t + \mathbb {P}( \text {extinction}). \end{aligned}$$

Since \(\mathbb {P}(F \le x) < 1\) and \(\beta >0\), we get

$$\begin{aligned}&\mathbb {P}(W_i \le \, x~\text { for all } i) = \lim _{t\uparrow \infty } \mathbb {P}(W_i \le x~\text { for all } i\le t) \\&\quad \le \lim _{t\uparrow \infty } (\beta \mathbb {P}(F \le x) +(1-\beta ))^t + \mathbb {P}( \text {extinction}) = \mathbb {P}( \text {extinction}), \end{aligned}$$

hence \(\mathbb {P}((W_t)\) is unbounded\() = \mathbb {P}( \text {survival})\) as claimed. \(\square \)

We next describe the distribution of \(W_t\) given the process at time \(t-1\).

Lemma 3

Suppose that at generation \(t-1\) there are n individuals with fitness \(F_1\), \(F_2\), ..., \(F_n\) and set \(\mathcal {X}:= \sum _{i=1}^n F_i\). Then, for all \(x \ge 0\),

$$\begin{aligned} \mathbb {P}(W_{t}>x ) =1- e^{-\beta \mathcal {X}G(x) }. \end{aligned}$$

Proof

First fix a positive integer n and suppose \(W_t^{_{(n)}}\) is the largest of n independently sampled fitnesses and \(W^{_{(0)}}_t=0\). Let \({\bar{G}}(x)=1-G(x)\) and note that

$$\begin{aligned} \mathbb {P}(W^{_{(n)}}_{t}>x) = 1- \mathbb {P}(W^{_{(n)}}_{t} \le x) = 1- {\bar{G}}(x)^n. \end{aligned}$$

Now let N be the number of mutants in generation t, which is Poisson distributed with mean \(\beta \mathcal {X}\). Hence, for \(x \ge 0\),

$$\begin{aligned} \mathbb {P}\left( W_t>x\right)&= \sum _{n=0}^\infty \mathbb {P}\left( W_t^{_{(n)}}>x\right) \mathbb {P}(N=n) = \sum _{n=0}^\infty (1-{\bar{G}}(x)^n) \mathbb {P}(N=n) \\&= 1- \sum _{n=0}^\infty {\bar{G}}(x)^n \frac{(\beta \mathcal {X})^{n}}{n!} e^{-\beta \mathcal {X}} = 1 - e^{ -\beta \mathcal {X}(1-{\bar{G}}(x))}. \end{aligned}$$

As \(1-{\bar{G}}(x)=G(x)\) the proof is complete. \(\square \)

The next two results concern the potential limit \(\nu (\alpha )\). We first characterise \(\nu (\alpha )\) as a maximum and then as the growth rate in a recursion relation. Note that the first result easily implies that \(\nu (\alpha )\) is decreasing, as well as continuous and positive.

Lemma 4

We have

$$\begin{aligned} \nu (\alpha )=\max \left\{ \tfrac{1}{m} \log (m/\alpha ) :m\in {\mathbb {N}} \right\} . \end{aligned}$$

In particular, for all \(m \in \mathbb {N}\), we have

$$\begin{aligned} m \le \alpha e^{ \nu (\alpha ) m} \quad \text { and } \quad T = \alpha e^{\nu (\alpha ) T}. \end{aligned}$$
(8)

Proof

First recall \((T-1)^T/T^{T-1} < \alpha \le T^{T+1}/(T+1)^T\) and observe that \(\alpha > m^{m+1}/(m+1)^m\) for \(m<T\) and \(\alpha \le m^{m+1}/(m+1)^m\) for \(m \ge T\), where \(m \in \mathbb {N}\). Since

$$\begin{aligned} m(m+1)\left( \frac{1}{m} \log \frac{m}{\alpha } - \frac{1}{m+1} \log \frac{m+1}{\alpha } \right) = \log \frac{m^{m+1}}{(m+1)^m \alpha }, \end{aligned}$$

we have the desired result. \(\square \)

Remark 5

Let \(\alpha _T:= T^{T+1}/(T+1)^T\). The equality \(m = \alpha e^{\nu (\alpha )m}\) holds iff (\(m=T\)) or (\(m=T+1\) and \(\alpha = \alpha _T\)).

For the remainder of this subsection, we abbreviate \(\nu :=\nu (\alpha )\).

Lemma 6

For some positive sequence \((a_n)\) we define inductively

$$\begin{aligned} \chi _t:=\chi _t(\alpha , (a_n)):= \max \left\{ a_t, \tfrac{t-1}{\alpha }\chi _1, \ldots , \tfrac{1}{\alpha }\chi _{t-1} \right\} . \end{aligned}$$
(9)

Then, if \(\displaystyle \lim _{n\rightarrow \infty } a_n e^{-\nu n}=0\), there are positive constants c and \(c'\) such that

$$\begin{aligned} c' e^{\nu t} \le \chi _t \le c e^{\nu t} \quad \text { for all } t \ge 1, \end{aligned}$$

and therefore we have

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{\log \chi _t}{t}=\nu . \end{aligned}$$

Proof

Abbreviate \(\hat{\chi }_t:= \chi _t /c'\) with \(c'=e^{-\nu T}\min \{\chi _1,\chi _2,\ldots ,\chi _T\}\). Obviously, \(\hat{\chi }_t \ge e^{\nu t}\) for \(t \le T\). Now assume \( n \ge T\) and \(\hat{\chi }_t \ge e^{\nu t}\) for all \(t \le n\). By the assumption and (8), we have

$$\begin{aligned} \hat{\chi }_{n+1} \ge \frac{T}{\alpha } \hat{\chi }_{n+1-T} \ge \frac{T}{\alpha } e^{\nu (n+1-T)} =e^{\nu (n+1)}. \end{aligned}$$

Induction gives \(\hat{\chi }_t \ge e^{\nu t}\) and hence \(\chi _t \ge c' e^{\nu t}\) for all \(t \ge 1\).

Now, choose a positive integer \(n_0\) such that \(a_n \le e^{\nu n}\) for all \(n \ge n_0\). Let \(\bar{\chi }_t = \chi _t/c\) with \(c = \max \{1,\chi _1,\chi _2,\ldots ,\chi _{n_0}\}\). Obviously, \(\bar{\chi }_t \le 1 \le e^{\nu t}\) for all \(t \le n_0\). Now let \(n \ge n_0\) and assume that \(\bar{\chi }_t \le e^{\nu t}\) for all \(t \le n\). Then,

$$\begin{aligned} \bar{\chi }_{n+1}&= \max \left\{ \tfrac{a_{n+1}}{c}, \tfrac{n}{\alpha }\bar{\chi }_1,\tfrac{n-1}{\alpha }\bar{\chi }_2,\ldots , \tfrac{n-k+1}{\alpha }\bar{\chi }_k,\ldots ,\tfrac{1}{\alpha }\bar{\chi }_{n}\right\} \\&\le \max \left\{ e^{\nu (n+1)}, \tfrac{n}{\alpha }e^\nu ,\tfrac{n-1}{\alpha }e^{2\nu },\ldots \tfrac{n-k+1}{\alpha } e^{\nu k},\ldots ,\tfrac{1}{\alpha } e^{\nu n}\right\} \le e^{\nu (n+1)}, \end{aligned}$$

where we have used (8). By induction, we have \(\chi _t \le c e^{\nu t}\) for all \(t\ge 1\). \(\square \)

For later reference we define

$$\begin{aligned} \tilde{\chi }_i(t):= {\left\{ \begin{array}{ll}-\infty , &{} \text { if } i<0,\\ a_{t},&{} \text { if } i=0,\\ (t-i)\chi _i/\alpha , &{} \text { if } 1\le i \le t-1. \end{array}\right. } \end{aligned}$$
(10)

Lemma 7

Define

$$\begin{aligned} I_t:=\max \{i<t :\chi _t = \tilde{\chi }_i(t)\}. \end{aligned}$$
(11)

Then \(t-I_t\) is bounded.

Proof

By Lemma 6 we have \(c'e^{\nu t} \le \chi _t \le c e^{\nu t}\) for all t. Since there is \(t_0\) such that \(c'e^{\nu t} > a_t\) for all \(t \ge t_0\), we can write, for \(t > t_0\),

$$\begin{aligned} \chi _t = \max \left\{ \tfrac{t-1}{\alpha }\chi _1,\ldots ,\tfrac{1}{\alpha }\chi _{t-1} \right\} . \end{aligned}$$

Now it is enough to show that \(t - I_t\) is bounded for \(t> t_0\).

Note that, for \(1 \le m \le t-1\),

$$\begin{aligned} c e^{\nu t} A(m) \ge \tfrac{m}{\alpha } \chi _{t-m} \ge c' e^{\nu t} A(m), \end{aligned}$$

where \(A(m) = me^{-\nu m} /\alpha \) with \(A(T)=1\). Since \(\lim _{m\rightarrow \infty } A(m)=0\), there is \(m_0\) such that \(c'> c A(m)\) and hence

$$\begin{aligned} \tfrac{m}{\alpha } \chi _{t-m} < c' e^{\nu t}, \text { for all } m \ge m_0. \end{aligned}$$

As the right hand side is a lower bound of \(\frac{T}{\alpha }\chi _{t-T}\) we get that \(t - I_t\) cannot be larger than \(\max \{m_0,t_0\}\), as desired. \(\square \)

Remark 8

If we choose \(T' > \sup \{t-I_t :t\in \mathbb {N}\}\), then, for all t,

$$\begin{aligned} \chi _t = \max \{a_t,(t-1)\chi _1/\alpha ,\ldots ,\chi _{t-1}/\alpha \} =\max \{\tilde{\chi }_{t-T'}(t),\ldots ,\tilde{\chi }_{t-1}(t)\}. \end{aligned}$$

In words, \(\chi _t\) is completely determined by \(\tilde{\chi }_i(t)\) for i within the window \(t-T' \le i \le t-1\). This fact will play an important role in the proof of Theorem 1.

We conclude the subsection with two estimates for classical Galton–Watson processes.

Lemma 9

Consider a supercritical Galton–Watson process \((\mathcal {X}_t)_{t\ge 0}\) with Poisson offspring distribution with mean \(\theta >1\), starting in generation 0 with a single individual. Fix \(0<x<1\) and an integer \(n \ge 1\). Then,

$$\begin{aligned} \mathbb {P}\left( \mathcal {X}_t \ge x^t \theta ^t \text { for all } 1\le t \le n \right) \ge 1 - n \frac{(1- x)^{-2}}{\theta -1}. \end{aligned}$$
(12)

Proof

First note that the mean and the variance of \(\mathcal {X}_t\) are (see, e.g., [37])

$$\begin{aligned} \mathbb {E}[\mathcal {X}_t] = \theta ^t,\quad \mathbb {V}[\mathcal {X}_t] = \frac{\theta ^{2t} - \theta ^{t}}{\theta -1}, \end{aligned}$$

respectively and that

$$\begin{aligned} \mathbb {P}\left( \mathcal {X}_t \ge x^t \theta ^t \text { for all }1\le t \le n \right) \ge 1 - \sum _{t=1}^{n} \mathbb {P}\left( \mathcal {X}_t \le x^t \theta ^t \right) , \end{aligned}$$
(13)

where we have used the sub-additivity of the probability measure. Using Chebyshev’s inequality, we get

$$\begin{aligned} \mathbb {P}\left( \mathcal {X}_t \le x^t \theta ^t \right)&= \mathbb {P}\left( \theta ^t - \mathcal {X}_t \ge \theta ^t-x^t \theta ^t \right) \le \mathbb {P}\left( \left| \mathcal {X}_t- \theta ^t\right| \ge \theta ^t-x^t \theta ^t \right) \\&\le \left( 1-x^t \right) ^{-2} \frac{\mathbb {V}[\mathcal {X}_t]}{ \theta ^{2t}} =\left( 1-x^t \right) ^{-2}\frac{1-\theta ^{-t}}{\theta -1} \le \frac{(1-x)^{-2}}{\theta -1}, \end{aligned}$$

which, along with (13), gives the claimed inequality. \(\square \)

Lemma 10

For a Galton–Watson process \((\mathcal {X}_t)\) with \(\mathcal {X}_0 = K_0\) and generation dependent offspring distribution \(N_t\) with \(\mathbb {E}[N_t] \le N\) for all t,

$$\begin{aligned} \mathbb {P}\big (\mathcal {X}_t \le K N^t B^t \text { for all } t\ge 1\big ) \ge 1 - \frac{K_0}{K(B-1)}, \end{aligned}$$

for all \(B>1\) and \(K>0\).

Proof

By Markov’s inequality, we have

$$\begin{aligned} \mathbb {P}\left( \mathcal {X}_t \ge K N^t B^t\right) \le \frac{\mathbb {E}[\mathcal {X}_t]}{K N^t} B^{-t}. \end{aligned}$$

Since \(\mathbb {E}[\mathcal {X}_{t+1}\vert \mathcal {X}_t] = \mathcal {X}_t \mathbb {E}[N_t]\), we have \(\mathbb {E}[\mathcal {X}_t] = K_0 \prod _{i=0}^{t-1} \mathbb {E}[N_i] \le K_0 N^t\), which gives

$$\begin{aligned} \mathbb {P}\left( \mathcal {X}_t \ge K N^t B^t\right) \le \frac{K_0}{KB^{t}}. \end{aligned}$$

Since

$$\begin{aligned} \mathbb {P}( \mathcal {X}_t \le K N^t B^t \text { for all }t\ge 1) \ge 1-\sum _{t=1}^\infty \mathbb {P}(\mathcal {X}_t \ge K N^t B^t), \end{aligned}$$

a geometric sum gives the claimed inequality. \(\square \)

4.2 Proof of the Lower Bound

In this subsection we show that, for given \(\alpha >0\) and all \(\alpha '>\alpha \), we have

$$\begin{aligned} \mathbb {P}\Big ( \liminf _{t\rightarrow \infty }\frac{\log \log X(t)}{t} \ge \nu (\alpha ') \,\Big \vert \, \text {survival}\Big ) = 1. \end{aligned}$$
(14)

In both models at each generation s, we can regard a lineage originating from the mutant with fitness \(W_s\) as a version \((\hat{X}_t(f))_{t\ge s}\) of the same model starting in generation s with a single individual of fitness \(f=W_s\). Since \(X(t) \ge \hat{X}_t(f)\) for \(t \ge s\), (14) is proved, if there is at least one s such that

$$\begin{aligned} \liminf _{t\rightarrow \infty }\frac{\log \log \hat{X}_t(W_s)}{t} \ge \nu (\alpha '). \end{aligned}$$

As \((W_t)\) is unbounded almost surely on survival it therefore suffices to show that

$$\begin{aligned} \lim _{f\rightarrow \infty } \mathbb {P}\left( \liminf _{t\rightarrow \infty }\frac{\log \log \hat{X}_t(f)}{t} \ge \nu (\alpha ')\right) = 1. \end{aligned}$$
(15)

For convenience, we use the convention \((\log \log \hat{X}_t(f))/t = -\infty \) if \(\hat{X}_t(f) = 0\). As \((\hat{X}_t(x))\) can be coupled to an FMM \((S_t(x))\) with the same initial condition such that \(\hat{X}_t(x)\ge S_t(x)\) for all \(t\ge 0\), the result follows by combining Lemma 6 with the following statement.

Lemma 11

Fix \(0<\epsilon <1/2\) and let

$$\begin{aligned} E(f):= \big \{ S_t(f) \ge (1-\beta )^t f^{\chi _t'} \text { for all } t\ge 1 \big \}, \end{aligned}$$

where \(\chi _t':= \chi _t(\alpha ', (\frac{n}{2}))\) with \(\alpha ':=\alpha /(1-2 \epsilon )\). Then

$$\begin{aligned} \lim _{f \rightarrow \infty } \mathbb {P}(E(f)) =1. \end{aligned}$$

Proof

We define \(m_0:=f\), \(n_0:=f^{1/2}\), and (\(i \ge 1\))

$$\begin{aligned} m_i:=f^{(1-\epsilon )\chi _i'/\alpha },\quad n_i:=f^{(1-2\epsilon )\chi _i'/\alpha }=f^{\chi _i'/\alpha '}. \end{aligned}$$

For later reference, we also define \(U_i:= n_i/m_i\) for all \(i\ge 0\).

Set \(\epsilon _0=\epsilon /(2-2\epsilon )\). By our assumption on \(\mu \), there is \(f_0\) such that

$$\begin{aligned} G(x) \ge x^{-\alpha (1+\epsilon _0)} \text { for all } x > f_0. \end{aligned}$$

Since we are only interested in the limit as \(f\rightarrow \infty \), we may assume that f is so large that \((1-\beta )m_0 \ge 2\), \((1-\beta ) m_1 \ge 2\), \(U_0 < 1/2\), \(U_1 < 1/2\), and \(m_1 > f_0\). Notice that by assumption,

$$\begin{aligned} G(m_1)\ge G(m_i) \ge m_i^{-\alpha (1+\epsilon _0)} = f^{-(1-\epsilon /2)\chi _i'} \text { for all } i \ge 1. \end{aligned}$$

For \(\chi '_t\), we choose \(T'\) as in Remark 8. By \(N_{i,t}\) we denote the number of individuals with fitness \(W_i\) at generation t. Define events

$$\begin{aligned} A_i:= \{ N_{i,t} \ge (1-\beta )^{t-i} n_i^{t-i} \text { for all }i < t \le i+T' \}, \quad B_i:= \{ W_i \ge m_i \}. \end{aligned}$$

Let \(D_{-1}\) be the certain event and, for \(i\in \mathbb {N}_0\),

$$\begin{aligned} D_{i}:= D_{i-1} \cap A_i \cap B_i,\quad A:= \bigcap _{i=0}^\infty D_i. \end{aligned}$$

Now observe that

$$\begin{aligned} \mathbb {P}(A) = \lim _{i\rightarrow \infty } \mathbb {P}(D_i),\quad \mathbb {P}(D_i) = \mathbb {P}(A_i\vert B_{i}\cap D_{i-1}) \mathbb {P}(B_i\vert D_{i-1})\mathbb {P}(D_{i-1}). \end{aligned}$$

By Lemma 9 we have

$$\begin{aligned} \mathbb {P}(A_i \vert B_i \cap D_{i-1}) \ge 1 - \frac{T'(1-n_i/W_i)^{-2}}{(1-\beta ) W_i -1} \ge 1 -\frac{T'(1-U_i)^{-2}}{(1-\beta ) m_i -1}, \end{aligned}$$

where we have used (12).

To proceed, we find the \(\mathcal {X}\) in Lemma 3 on the event \(D_{n-1}\) as

$$\begin{aligned} \mathcal {X}\ge \sum _{i=0}^{t-1} N_{i,t-1} W_i \ge (1-\beta )^t \sum _{i=t-T'-1}^{t-1} f^{\tilde{\chi }_i'(t)} \ge (1-\beta )^t f^{\chi _{t}'}, \end{aligned}$$

where we have used \(W_i \ge m_i \ge n_i\) and \(\tilde{\chi }_i'(t)\) as in (10) for parameters \(\alpha '\) and \(a_s=s/2\). Using Lemma 3 with \(G(m_i) \ge f^{-(1-\epsilon /2)\chi _i'}\), we have

$$\begin{aligned} \mathbb {P}(B_i\vert D_{i-1}) \ge 1- \exp \left( - \beta (1-\beta )^i f^{\epsilon \chi _i'/2} \right) . \end{aligned}$$

Now we define

$$\begin{aligned} b_i:= \frac{T'(1 - U_i)^{-2}}{(1-\beta ) m_i-1} + (1-\delta _{i0})\exp \left( - \beta (1-\beta )^i f^{\epsilon \chi _i'/2} \right) , \quad \phi (f) := \sum _{i=0}^\infty b_i, \end{aligned}$$

where \(\delta _{ij}\) is the Kronecker delta symbol. Trivially, we have \(\lim _{f\rightarrow \infty } b_i = 0\) for all \(i\ge 0\). Since, for sufficiently large f, \(b_s\) for fixed s is a bounded and decreasing function of f and since Lemma 6 gives

$$\begin{aligned} \lim _{s\rightarrow \infty } b_s 2^s = 0, \end{aligned}$$

there is \(s_0\) such that \(\vert b_s\vert < 2^{-s}\) for all \(s >s_0\) and for all assumed value of f. Therefore, the series defining \(\phi (f)\) converges uniformly for sufficiently large f and \(\lim _{f \rightarrow \infty } \phi (f)=0\). Therefore, for sufficiently large f, we get

$$\begin{aligned} \mathbb {P}(A) \ge \prod _{i=0}^\infty (1-b_i) \ge 1-\phi (f),\quad \lim _{f\rightarrow \infty } \mathbb {P}(A) = 1, \end{aligned}$$

where we have used \((1-x)(1-y) \ge 1- x - y\) for \(x,y\ge 0\). As, on the event A,

$$\begin{aligned} S_t(f) \ge \sum _{i=t-T'}^t N_{i,t} \ge (1-\beta )^t f^{\chi _t'} \end{aligned}$$

where we have assumed \(N_{i,t} =0\) for \(i<0\), we see that \(A\subset E(f)\) and the proof is completed. \(\square \)

In fact, Lemma 11 and its proof are applicable to the MMM verbatim, except that \(S_t\) is replaced by \(\hat{X}_t\). If we are interested in the proof only for the MMM, we actually do not need to introduce \(S_t\).

4.3 Proof of the Upper Bound

In this subsection we show that, for given \(\alpha >0\) and all \(0<\alpha '<\alpha \), we have for the MMM denoted by \((M_t)\), or \((M_t(x))\) if in the initial generation there is a single individual with fixed fitness x, that

$$\begin{aligned} \mathbb {P}\Big ( \limsup _{t\rightarrow \infty }\frac{\log \log M_t}{t} \le \nu (\alpha ')\Big )= 1. \end{aligned}$$
(16)

In case of extinction the upper bound holds by convention. One can construct two processes with initial fitness \(x\le y\) on the same probability space such that \(M_t(x) \le M_t(y)\) for all t. Indeed, this can be done as follows. First construct \((M_t(y))\) and look at its genealogical tree truncated after the first mutant in every line of descent from the root. Removing any individual in that tree together with all its offspring from \((M_t(y))\) independently with probability x/y we obtain \((M_t(x))\).

We now construct an MMM with special initial conditions. Fix \(\epsilon >0\). For given \(\alpha \), let \(\delta = (1+2\epsilon )/(1+3\epsilon )\), \(\alpha '=\alpha /(1+3\epsilon )\), \(\nu '=\nu (\alpha ')\), \(T = T(\alpha ')\), and

$$\begin{aligned} \Delta _0 = \frac{\epsilon }{\nu ' (1+3\epsilon )}, \end{aligned}$$

which is equivalent to \(\nu ' \Delta _0 +\delta = 1\). We choose \(\Delta \) such that \(0 < \Delta \le \Delta _0\) and

$$\begin{aligned} \hat{n}:=\frac{T}{\Delta } \end{aligned}$$

is an integer. We define, for a given \(f>0\),

$$\begin{aligned} \chi _t' := e^{ \nu 't},\quad \kappa _{n,t} := e^{\nu '(t-T + n \Delta )},\quad g_{n,t} := f^{\kappa _{n,t}/\alpha }, \quad h_{n,t} := f^{(1+\epsilon )\kappa _{n,t}/\alpha }. \end{aligned}$$

Note that \(\chi _t'\) above is different from that in Lemma 11.

We briefly explain the motivation of introducing \(h_{n,t}\) and other quantities to find an upper bound. Unlike the proof for the lower bound in Lemma 11, where we have only to investigate a single lineage \(\hat{X}_t\), we have to consider all mutants to find an appropriate upper bound of the MMM. Since we only need an inequality for the proof, we divide the fitness space of mutants by \(h_{n,t}\) and regard mutants appearing at generation t with fitness in the region \((h_{n,t},h_{n+1,t}]\) as a mutant class with growth rate bounded by \(h_{n+1,t}\).

We consider the MMM \((\tilde{M}_t(f))_{t\ge T-1}\) starting in generation \(T-1\) with an initial condition such that there are T different mutant classes with fitness \(g_{\hat{n},m}\) for \(0\le m\le T-1\) and the number of individuals with fitness \(g_{\hat{n},m}\) is \(\lfloor (g_{\hat{n},m})^{T-m-1} \rfloor \). We only consider f sufficiently large so that \((1-\beta ) f>2\) and \((1-\beta ) f^{\epsilon /\alpha }>2\).

Now assume that we have proved, for all \(\alpha '< \alpha \),

$$\begin{aligned} \lim _{f\rightarrow \infty } \mathbb {P}\left( \limsup _{t\rightarrow \infty } \frac{\log \log \tilde{M}_t(f)}{t} \le \nu (\alpha ')\right) =1. \end{aligned}$$
(17)

Given an arbitrary \(f>0\) and \(\varepsilon >0\) pick \(f_\varepsilon \) such that the probability above exceeds \(1-\varepsilon \) and the smallest fitness in the initial condition of \(\tilde{M}_t(f_\varepsilon )\) is larger than f. Then (17) guarantees that

$$\begin{aligned} \mathbb {P}\left( \limsup _{t\rightarrow \infty } \frac{\log \log M_t(f)}{t} \le \nu (\alpha ') \right) >1-\varepsilon , \end{aligned}$$

which proves (16). So it is enough to prove (17). Once (17) is proved, we use the natural coupling such that \(S_t\le M_t\) for all t. Then, almost surely on survival,

$$\begin{aligned} \limsup _{t\rightarrow \infty } \frac{\log \log S_t}{t} \le \limsup _{t\rightarrow \infty } \frac{\log \log M_t}{t}\le \nu (\alpha ), \end{aligned}$$

which completes the proof of Theorem 1.

Lemma 12

In an MMM, let \(Z_t\) be the number of non-mutated descendants at generation t of \(\mathcal {X}\) individuals at generation m whose fitness values are within a bounded interval I with right endpoint b. Assume \(\mathcal {X}\le K\). Then, for all \(B>1\),

$$\begin{aligned} \mathbb {P}(Z_t \le K (1-\beta )^{t-m} b^{t-m} B^{t-m} \text { for all } t\ge m+1) \ge 1 - \frac{1}{B-1}. \end{aligned}$$

Proof

As the mean number of non-mutated offspring of an individual is bounded by \((1-\beta )b\) we get the result by applying Lemma 10. \(\square \)

Lemma 13

Suppose at generation \(t-1\) of an MMM the population consists of n individuals with fitness \(F_1,\ldots , F_n\). Let

$$\begin{aligned} Y_{t}:= \beta \sum _{i=1}^n F_i \end{aligned}$$
(18)

and let Z be the number of mutants in generation t with fitness in the interval (ab]. Then, with \(p:= \mu ((a,b])\), we have

$$\begin{aligned} \mathbb {P}(Z> K) =e^{-Y_tp}\sum _{n=K+1}^{\infty } \frac{(Y_tp)^n}{n!}. \end{aligned}$$

Proof

Observe that Z is Poisson distributed with mean \(Y_tp\). \(\square \)

Remark 14

Using Markov’s inequality, we get

$$\begin{aligned} \mathbb {P}(Z > K) \le \frac{Y_t p}{K}, \end{aligned}$$
(19)

which is useful when \(K \gg Y_tp\). By Chebyshev’s inequality, for \(K > Y_t\),

$$\begin{aligned} \mathbb {P}(Z > K ) \le \mathbb {P}(\vert Z-Y_tp\vert \ge K - Y_tp) \le \frac{Y_tp}{(K-Y_tp)^2} \le \frac{Y_t}{(K-Y_t)^2}, \end{aligned}$$
(20)

which is useful when \((K-Y_t)^2 \gg Y_t\). For \(K=0\), we will use

$$\begin{aligned} \mathbb {P}(Z= 0) = e^{-Y_tp} \ge 1-Y_tp\ge 1- Y_t G(a). \end{aligned}$$
(21)

We denote the number of non-mutated descendants at generation \(t\ge T-1\) of initial individuals with fitness \(g_{\hat{n},m}\) by \(N_{m,T-1,t}\) and define

$$\begin{aligned} N_{T-1,t}:= \sum _{m=0}^{T-1} N_{m,T-1,t}. \end{aligned}$$

The number of mutants that appear at generation \(t\ge T\) with fitness in the interval \((h_{n-1,t}, h_{n,t}]\) is denoted by \(N_{n,t,t}\) for \(0 \le n \le \hat{n}+1\), where we have assumed \(h_{-1,t}:=0\) and \(h_{\hat{n}+1,t}:=\infty \). Typically, \(N_{\hat{n}+1,t,t}\) will be zero. The number of non-mutated descendants of \(N_{n,m,m}\) at generation \(t>m\) is denoted by \(N_{n,m,t}\). For \(t \ge m \ge T\) define

$$\begin{aligned} N_{m,t} := \sum _{n=0}^{\hat{n}+1} N_{n,m,t}, \end{aligned}$$

which gives

$$\begin{aligned} \tilde{M}_t(f) = \sum _{m=T-1}^t N_{m,t}. \end{aligned}$$

Let \((\theta _t)_{t\ge T-1}\) be a sequence satisfying \(\theta _{T-1} = T\) and, for \(t\ge T\),

$$\begin{aligned} \theta _t = (t-T) \hat{n} + \sum _{m=T-1}^{t-1} \theta _m. \end{aligned}$$

Since \(\theta _{t+1} - \theta _t = \theta _t + \hat{n}\) for \(t \ge T\), we have

$$\begin{aligned} \theta _t = {\left\{ \begin{array}{ll} 2^{t-T}(T + \hat{n}) - \hat{n}, &{} \text { for }t \ge T\\ T, &{} \text { for }t= T-1. \end{array}\right. } \end{aligned}$$
(22)

Lemma 15

For \(T\le x\le m<t\) (tm are integers and x is real), we have

$$\begin{aligned}&e^{\nu 'm } + \tfrac{t-m}{\alpha '} e^{\nu ' (m-T)} \le e^{\nu ' t},\end{aligned}$$
(23)
$$\begin{aligned}&e^{\nu 'm} - e^{\nu '(x-\Delta )} +\delta \tfrac{t-m}{\alpha '} e^{\nu ' x} \le e^{\nu ' t}. \end{aligned}$$
(24)

Proof

Using (8), we have

$$\begin{aligned} e^{\nu 'm} \left( 1 + \tfrac{t-m}{\alpha '} e^{-\nu ' T} \right) = e^{\nu 'm } \tfrac{T+t-m}{T} \le e^{\nu ' t} e^{\nu ' T} \tfrac{\alpha '}{T} = e^{\nu ' t}, \end{aligned}$$

which proves (23). If \(\delta (t-m) - \alpha ' e^{-\nu ' \Delta }\) is negative, then (24) is trivially valid. If \(\delta (t-m) - \alpha ' e^{-\nu ' \Delta }\) is positive, then the left hand side of (24) has maximum at \(x=m\). Therefore, it is enough to prove (24) only for \(x =m\). Plugging \(x=m\), we have

$$\begin{aligned} e^{\nu 'm} - e^{\nu '(m-\Delta )} + \delta \tfrac{t-m}{\alpha '} e^{\nu ' m}&=e^{\nu 'm} ( 1 - e^{-\nu '\Delta } ) + \delta \tfrac{t-m}{\alpha '} e^{\nu ' m} \\ {}&\le e^{\nu ' t} \left( \nu ' \Delta + \delta \right) \le e^{\nu ' t}, \end{aligned}$$

where we have used \(1-e^{-x} \le x\), \(e^{\nu 'm} \le e^{\nu 't}\), and \(t-m \le \alpha ' e^{\nu '(t-m)}\). \(\square \)

Lemma 16

Let \(E(f):=\{\tilde{M}_t(f) \le \theta _{t+1} f^{\chi '_t}\) for all \(t \ge T \}\). Then

$$\begin{aligned} \lim _{f\rightarrow \infty } \mathbb {P}(E(f)) = 1, \end{aligned}$$

which implies (17).

Proof

Set \(\epsilon _0=\epsilon /(2+2\epsilon )\). By our assumption on \(\mu \) there is \(f_0\) such that \(G(f) \le f^{-\alpha (1-\epsilon _0)}\) for all \(f > f_0\). Now we assume \(h_{0,T}=f^{(1+\epsilon )/\alpha } > f_0\), which gives

$$\begin{aligned} G(h_{n,t}) \le f^{-(1+\epsilon /2)\kappa _{n,t}}, \quad \text {for all } 0\le n \le \hat{n} \text { and } t \ge T. \end{aligned}$$
(25)

Let

$$\begin{aligned} A_{n,T-1}:=\left\{ N_{n,T-1,t} \le \left\lfloor (g_{\hat{n},n})^{T-n-1} \right\rfloor (1-\beta )^{t-T+1} f^{(t-T+1)\chi '_n/\alpha '} \text{ for } \text{ all } t\ge T \right\} \end{aligned}$$

and define \(A_{T-1}:= \bigcap _{n=0}^{T-1} A_{n,T-1}\). By Lemma 12 with \(K= \lfloor (g_{\hat{n},n})^{T-n-1} \rfloor \), \(b = g_{\hat{n},n}=f^{\chi _n'/\alpha }\), and \(b B = f^{\chi _n'/\alpha '} = f^{(1+3\epsilon )\chi _n'/\alpha }\), we have

$$\begin{aligned} 1-\mathbb {P}(A_{n,T-1}) \le \left( f^{3 \epsilon \chi _n'/\alpha }-1\right) ^{-1},&\quad \mathbb {P}(A_{T-1}) \ge 1 - b_{T-1}, \\ b_{T-1}:= \sum _{n=0}^{T-1} \left( f^{3 \epsilon \chi _n'/\alpha }-1\right) ^{-1}. \end{aligned}$$

Since \(m e^{-\nu ' m} \le \alpha '\) and \(g_{\hat{n},n} \le f^{\chi '_n/\alpha '}\), we have

$$\begin{aligned} N_{n,T-1,t} \le f^{\chi _n'(t-n)/\alpha '} \le f^{\chi '_t}\quad \text {for all }n,t. \end{aligned}$$

Therefore,

$$\begin{aligned} \sum _{n=0}^{T-1}N_{n,T-1,t} \le T f^{\chi '_t},\quad \text {for all }t \ge T, \end{aligned}$$
(26)

on the event \(A_{T-1}\). Let

$$\begin{aligned} A_{n,m,t}:= \left\{ \begin{array}{ll} \big \{ N_{0,m,t} \le \theta _m f^{\chi _m'} (1-\beta )^{t-m} f^{(t-m)\kappa _{0,m}/\alpha '} \big \}, &{}\quad \text { for }n=0,\\ \big \{N_{n,m,t} \le \frac{f^{\chi _m'}}{(g_{n-1,m})^\alpha } (1-\beta )^{t-m} f^{\delta (t-m) \kappa _{n,m}/\alpha '} \big \}, &{}\quad \text { for }1\le n\le \hat{n}, \\ \big \{ N_{\hat{n}+1,m,t} = 0 \big \} &{}\quad \text { for }n = \hat{n} +1. \end{array} \right. \end{aligned}$$

Note that \(A_{n,t,t}\) has information on the empirical distribution of mutants’ fitness that appear at generation t. We define

$$\begin{aligned}&\tilde{A}_{n,m} = \bigcap _{t=m+1}^\infty A_{n,m,t},\quad A_{n,m}:=A_{n,m,m} \cap \tilde{A}_{n,m},\quad A_m := \bigcap _{n=0}^{\hat{n}+1} A_{n,m},\\&D_{T-1} := A_{T-1},\quad D_{n} := D_{n-1} \cap A_n,\quad A(f) := \bigcap _{n=T-1}^\infty D_n. \end{aligned}$$

By Lemma 15, we have on the event \(A_m\), that for all \(t\ge m \ge T\),

$$\begin{aligned} N_{m,t}=\sum _{n=0}^{\hat{n}+1} N_{n,m,t} \le (\theta _m + \hat{n}) f^{\chi '_t}, \end{aligned}$$

and, in turn,

$$\begin{aligned} \tilde{M}_t(f) \le \left( (t+1-T)\hat{n} + \sum _{m=T-1}^ t \theta _m \right) f^{\chi '_t} =\theta _{t+1} f^{\chi '_t} \end{aligned}$$

on the event A(f). Therefore, \(A(f) \subset E(f)\) and the proof is complete if we show

$$\begin{aligned} \lim _{f\rightarrow \infty } \mathbb {P}(A(f)) = 1. \end{aligned}$$

Now we investigate \(\mathbb {P}(A_m \vert D_{m-1})\). First note that

$$\begin{aligned} 1-\mathbb {P}(A_m\vert D_{m-1})&\le \sum _{n=0}^{\hat{n}+1} \left[ 1 - \mathbb {P}(A_{n,m}\vert D_{m-1}) \right] ,\\ \mathbb {P}(A_{n,m}\vert D_{m-1})&=\mathbb {P}(\tilde{A}_{n,m}\vert A_{n,m,m}\cap D_{m-1}) \mathbb {P}(A_{n,m,m}\vert D_{m-1}), \end{aligned}$$

and, on the event \(D_{m-1}\),

$$\begin{aligned} Y_m\le \beta \theta _m f^{\chi '_m}, \end{aligned}$$
(27)

where \(Y_m\) is defined in (18).

We begin with \(\mathbb {P}(A_{\hat{n}+1,m}\vert D_{m-1})\), which clearly equals \(\mathbb {P}(A_{\hat{n}+1,m,m}\vert D_{m-1})\). Using (21) with \(a = h_{\hat{n},m}\) and \(Y_m G(a) \le \theta _m f^{-\epsilon \chi '_m/2}\), we obtain

$$\begin{aligned} \mathbb {P}(A_{\hat{n}+1,m}\vert D_{m-1}) \ge 1 - b_{\hat{n}+1,m},\quad b_{\hat{n}+1,m}:= \theta _m f^{-\epsilon \chi '_m/2}. \end{aligned}$$

Now we consider \(\mathbb {P}(A_{0,m}\vert D_{m-1})\). Using (20) with \(K = \theta _m f^{\chi _m'}\) and (27), we have

$$\begin{aligned} \mathbb {P}(A_{0,m,m}\vert D_{m-1}) \ge 1- \frac{\beta }{(1-\beta )^2 \theta _{m} } f^{-\chi '_m}. \end{aligned}$$

Using Lemma 12 with \(B = f^{\kappa _{0,m}/\alpha '}/h_{0,m} = f^{2\epsilon \kappa _{0,m}/\alpha }\), we have

$$\begin{aligned} \mathbb {P}(\tilde{A}_{0,m}\vert A_{0,m,m}\cap D_{m-1}) \ge 1 - \left( f^{2 \epsilon \kappa _{0,m}/\alpha } -1 \right) ^{-1}. \end{aligned}$$

Therefore, defining

$$\begin{aligned} b_{0,m}:= \frac{\beta }{(1-\beta )^2 \theta _{m} } f^{-\chi '_m} + \left( f^{2 \epsilon \kappa _{0,m}/\alpha } -1 \right) ^{-1}, \end{aligned}$$

we have \(\mathbb {P}(A_{0,m}\vert D_{m-1}) \ge 1-b_{0,m}\).

Finally, we move on to \(\mathbb {P}(A_{n,m}\vert D_{m-1})\) for \(1 \le n \le \hat{n}\). Using (19) with \(K = f^{\chi _m' - \kappa _{n-1,m}}\), \(a = h_{n-1,m}\), and \(Y_m G(a) \le \theta _m f^{\chi _m' - (1+\epsilon /2) \kappa _{n-1,m}}\), we have

$$\begin{aligned} \mathbb {P}(A_{n,m,m}\vert D_{m-1}) \ge 1 - \theta _m f^{-\epsilon \kappa _{n-1,m}/2}. \end{aligned}$$

Using Lemma 12 with \(B = f^{\delta \kappa _{n,m}/\alpha '}/h_{n,m} = f^{\epsilon \kappa _{n,m}/\alpha }\), we have

$$\begin{aligned} \mathbb {P}\left( \tilde{A}_{n,m}\vert A_{n,m,m}\cap D_{m-1}\right) \ge 1 - \left( f^{\epsilon \kappa _{n,m}/\alpha } -1 \right) ^{-1}. \end{aligned}$$

Therefore, defining

$$\begin{aligned} b_{n,m}:= \theta _m f^{-\epsilon \kappa _{n-1,m}/2} +\left( f^{\epsilon \kappa _{n,m}/\alpha } -1 \right) ^{-1}, \end{aligned}$$

we have \( \mathbb {P}(A_{n,m}\vert D_{m-1}) \ge 1-b_{n,m}. \) We define

$$\begin{aligned} b_m:= \sum _{n=0}^{\hat{n}+1} b_{n,m} \text { for }m \ge T,\quad \text { and } \phi (f):= \sum _{m=T-1}^\infty b_m. \end{aligned}$$

Recall that we have assumed \((1-\beta ) f>2\) and \((1-\beta ) f^{\epsilon /\alpha }>2\). Since \(b_m\) for given m is a bounded function of f which is decreasing to zero and

$$\begin{aligned} \lim _{m\rightarrow \infty } b_m 2^{m} = 0, \end{aligned}$$

there is \(m_0\) such that \(\vert b_m\vert < 2^{-m}\) for all \(m >m_0\). Hence the series defining \(\phi (f)\) converges uniformly for sufficiently large f and, accordingly, \(\lim _{f \rightarrow \infty } \phi (f) = 0\). Therefore, for sufficiently large f,

$$\begin{aligned} \mathbb {P}(A(f)) \ge \prod _{m=T-1}^\infty ( 1 - b_m) \ge 1-\phi (f), \end{aligned}$$

and \(\displaystyle \lim _{f\rightarrow \infty } \mathbb {P}(A(f)) = 1\), which completes the proof. \(\square \)

5 Empirical Fitness Distributions of the FMM

Apart from the fact that the population is dominated by a single mutant class at all times, the proof of the double-exponential growth rate \(\nu \) presented in Sect. 4 does not give any insight into the structure of the population. However, since the solution \(\chi _t\) of the recursion relation (9) correctly describes the asymptotic growth of X(t), it provides a natural starting point for addressing this question at least on a heuristic level. In this section, we analyze the recursion relation in more depth to understand the demographic structure of the FMM in the long time limit, which turns out to display a rather rich behaviour.

5.1 Numerical Solution of the Recursion Relation

To characterize the empirical fitness distribution we introduce the following quantities:

$$\begin{aligned} J_i(t)&:=\frac{\log W_i}{\log W_t} \approx \frac{\chi _i}{\chi _t}, \;\; P(t) := \frac{\log X(t+1)-\log X(t)}{\log W_t} \approx \alpha \frac{\chi _{t+1} - \chi _t}{\chi _t}, \\ R_i(t)&:= \frac{\log W_i^{t-i} - \log X(t)}{\log X(t)}= \frac{t-i}{\alpha }J_i(t) -1, \end{aligned}$$

where the second approximate relations in the definitions of \(J_i(t)\) and P(t) become equalities in the formal deterministic limit \(f \rightarrow \infty \) (see Sect. 3). The ratio \(J_i(t) \in [0,1]\) compares the log-fitness of the mutant class born at time i to the log-fitness of the current fittest mutant. Since \(X(t+1) \approx (1-\beta ) X(t) {\bar{F}}_t\) with \({\bar{F}}_t\) denoting the mean fitness of the population at generation t, P(t) quantifies the mean fitness at generation t on the same scale. The decomposition in Eq. (3) shows that the fraction of the population in mutant class i at time t is proportional to \(W_i^{t-i}\), and therefore \(R_i(t) \in [-1,0]\) serves as proxy of the (logarithmic) empirical fitness distribution at generation t over mutant classes i.

Fig. 2
figure 2

a Plots of \(R_i(t)\) vs. \(J_i(t)\) for \(\alpha = 1\) at generations \(t=16\), 17, \(\ldots \), 24. The vertical line at each panel indicates the location of the mean fitness P(t). b Data collapse plots of \(R_i(t)\) vs. \(J_i(t)\) for each column of a

In Fig. 2a, we plot \(R_i(t)\) against \(J_i(t)\) for nine consecutive generations, obtained by numerically solving the recursion relation (5) for \(\alpha = 1\) with \(a_n=n\). The salient feature is the periodic behavior of the fitness distribution with period 3; note that \(T=3\) for \(\alpha =1\). To illustrate the accuracy of the periodic behaviour, we present data-collapse plots in Fig. 2b. In most regions of \(J_i\), the collapse looks perfect (since the number of mutant classes increases with the number of generations, the empirical fitness distributions at different times cannot be identical). For another illustration of the periodicity, we depict P(t) vs. t for various values of \(\alpha \) (with \(a_n = n\)) in Fig. 3. After an early transient behavior, P(t) clearly shows periodic behavior. A rigorous proof of the periodicity will be given in Sect. 5.2.

Fig. 3
figure 3

Plots of P(t) vs. t for various \(\alpha \) with \(a_n\! =\! n\). The function P(t) exhibits complete periodicity for large t

The periodicity was taken into account in the numerical estimates of \(\nu \) reported in Fig. 1. Rather than monitoring \(\log \chi _t/t\), which converges very slowly, we computed the quantity

$$\begin{aligned} \hat{\nu }(t):= \frac{1}{T} \log \frac{\chi _{t+T}}{\chi _{t}}, \end{aligned}$$

which approaches a constant in a relatively short time.

5.2 Periodicity of \(\chi _t e^{-\nu t}\)

By Lemma 6, we know that

$$\begin{aligned} c_t := \chi _t e^{-\nu t} \end{aligned}$$

is bounded away from zero and infinity. Now we show that \(c_t\) is not only bounded, but eventually becomes periodic.

Proposition 17

For any sequence \((a_n)\) in the recursion relation (9), there is a \(t_1\) such that \(c_t = c_{t+T}\) for all \(t \ge t_1\).

Proof

In this proof, k and \(k'\) are exclusively used as integers in the range \(1 \le k,k' \le T\). Since \(\chi _{t+T} \ge e^{\nu T} \chi _t\) (see Sect. 3), the sequence \((c_{k+nT})_n\) is nondecreasing and bounded. Consequently,

$$\begin{aligned} C_k := \lim _{n\rightarrow \infty } c_{k + nT} \end{aligned}$$
(28)

is well defined. Note that \(\max \{C_k: 1 \le k \le T\}\) becomes the optimal upper bound in Lemma 6. If n satisfies \( n T> T'\) with \(T' > \max \{t-I_t\}\) (see Remark 8), then we have

$$\begin{aligned} c_{k+nT}&= e^{-\nu (k+nT)}\max \left\{ \frac{k+nT-t'}{\alpha }\chi _{t'} : k+nT-T' \le t' < k+nT\right\} \nonumber \\&= \max \left\{ \frac{s}{\alpha }e^{-\nu s}c_{k+nT-s} : 1 \le s \le T'\right\} . \end{aligned}$$
(29)

Taking n to infinity, we get

$$\begin{aligned} C_k = \max \left\{ \frac{s}{\alpha }e^{-\nu s}C_{k-s} : 1 \le s \le T'\right\} , \end{aligned}$$
(30)

and, by definition, \(C_{k+mT} = C_k\) for any integer m. Since \(T e^{-\nu T}/\alpha = 1\) and \(C_{k-T} = C_k\), we can rewrite (30) as

$$\begin{aligned} C_k = \max \left\{ \frac{s}{\alpha }e^{-\nu s}C_{k-s} : 1 \le s \le T+1\right\} . \end{aligned}$$

Comparing terms with \(s=T-1, T, T+1\) for any k, we have

$$\begin{aligned} \frac{T-1}{T} e^{\nu } C_{k+1} =\frac{T-1}{\alpha }e^{-\nu (T-1)} C_{k+1}\le C_{k},\\ \frac{T+1}{T} e^{-\nu } C_{k-1} = \frac{T+1}{\alpha }e^{-\nu (T+1)} C_{k-1} \le C_{k}, \end{aligned}$$

which gives

$$\begin{aligned} \frac{T+1}{T} \le \frac{C_{k+1} e^\nu }{C_k} \le \frac{T}{T-1}, \end{aligned}$$

for all k. Let \(\varphi _k = C_{k} e^\nu / C_{k-1}\) with \(\varphi _1 =\varphi _{T+1}= C_1 e^\nu / C_T\). Then,

$$\begin{aligned} C_k = C_1 e^{-\nu (k-1)}\prod _{j=2}^{k} \varphi _j, \end{aligned}$$

for \(k > 1\). Setting \(k=T+1\) and considering \(C_{T+1} = C_1\), we have

$$\begin{aligned} \prod _{j=1}^T \varphi _j = e^{\nu T} = \frac{T}{\alpha }. \end{aligned}$$

To sum up, \(C_k\) takes the form

$$\begin{aligned} C_k = C_0 e^{-\nu k} \prod _{j=1}^k \varphi _j, \end{aligned}$$

where \(C_0\) is a positive constant (note that \(\chi _t(\alpha ,(C_0 a_n)) = C_0 \chi _t(\alpha ,(a_n))\)) and \(\varphi _k\) satisfies

$$\begin{aligned} \frac{T+1}{T} \le \varphi _k \le \frac{T}{T-1},\quad \prod _{j=1}^T \varphi _j = \frac{T}{\alpha } = e^{\nu T}. \end{aligned}$$
(31)

If \(\alpha =\alpha _T\) (see Remark 5), then \(e^\nu = (T+1)/T\) and the only possible value of \(\varphi _k\) is \(\varphi _k = e^\nu \) for all k because of (31).

To simplify (29) for large n, we use the following observation. For \(p \in \mathbb {N}\) with \(X:= 1/T\) and \(C_{k - (T \pm p)} = C_{k \mp p}\), we have

$$\begin{aligned} \frac{T-p}{\alpha } e^{-\nu (T-p)}C_{k+p}&= \frac{T-p}{T} e^{\nu p} \frac{C_{k+p}}{C_k} C_k =C_k \frac{T-p}{T} \prod _{j=1}^p e^{\nu } \frac{C_{k+j}}{C_{k+j-1}}\\&\le C_k \frac{T-p}{T} \left( \frac{T}{T-1} \right) ^p = C_k \frac{1-pX}{(1-X)^p}, \end{aligned}$$

and

$$\begin{aligned} \frac{T+p}{\alpha } e^{-\nu (T+p)}C_{k-p}&= \frac{T+p}{T} e^{-\nu p} \frac{C_{k-p}}{C_k} C_k =C_k \frac{T+p}{T} \prod _{j=1}^p \frac{C_{k-j}}{e^\nu C_{k-j+1}}\\&\le C_k \frac{T+p}{T} \left( \frac{T}{T+1} \right) ^p = C_k \frac{1+pX}{(1+X)^p}. \end{aligned}$$

Since \(\sup _{p\ge 2} (1+ pX)/(1+X)^p<1\) for all nonzero X not smaller than \(-1\), relating s and p by \(p=\vert s-T \vert \) we can choose \( \epsilon > 0\) such that \(C_k - \epsilon > s e^{-\nu s} C_{k-s}/\alpha \) for all s with \(\vert s-T\vert > 1\) and for all k. By (28), for this \(\epsilon \), there is an integer \(m_0\) such that \(C_k - \epsilon < c_{k+nT} \le C_k\) for all \(n \ge m_0\) and for all k. If \(n> m_0\), then

$$\begin{aligned} c_{k+nT}> C_k - \epsilon > s e^{-\nu s} C_{k-s}/\alpha \ge s e^{-\nu s} c_{k-s+nT}/\alpha , \end{aligned}$$

for all s with \(\vert s - T \vert > 1\), which reduces (29) to

$$\begin{aligned} c_{k+(n+1)T} = \max \left\{ c_{k+nT}, \tfrac{T+1}{T}e^{-\nu }c_{k-1+nT}, \tfrac{T-1}{T}e^{\nu }c_{k+1+nT} \right\} . \end{aligned}$$
(32)

In the following, n is assumed so large that (32) is valid for all k. Defining \(\delta _{k,n}:=\)\(1-c_{k+nT}/C_k\) with the convention \(\delta _{k + m T, n}:= \delta _{k,n+m}\) for integer m and using the definition of \(\varphi _k\), we can write

$$\begin{aligned} \delta _{k,n+1} = \min \left\{ \delta _{k,n} , 1- \tfrac{T+1}{T\varphi _k} (1 - \delta _{k-1,n}), 1 - \tfrac{T-1}{T}\varphi _{k+1} (1 - \delta _{k+1,n} )\right\} . \end{aligned}$$

As \(c_{k+nT} \rightarrow C_k\), we have \(\delta _{k,n} \rightarrow 0\) as \(n \rightarrow \infty \). If \((T+1)/(T\varphi _k)<1\), then \(1 - (T+1)/(T\varphi _k) (1-\delta _{k-1,n} )> 1 - (T+1)/(T\varphi _k)> 0\) for sufficiently large n and, therefore, the term with \((T+1)/(T\varphi _k)\) cannot be a minimum for large n. The same argument is applicable to the term with \((T-1)\varphi _{k+1}/T\).

If \(\alpha = \alpha _T\), then \(\varphi _k = (T+1)/T\) for all k and, accordingly, we have \(\delta _{k,n+1} = \min \{\delta _{k,n},\delta _{k-1,n}\}\) for all k and for all sufficiently large n. If there is m and \(k'\) such that \(\delta _{k',m}=0\), then \(\delta _{k',n}=0\) for all \(n \ge m\) and \(\delta _{k'+1,m+1} = 0\), which again gives \(\delta _{k'+2,m+2}=0\) and so on. Therefore, we have \(\delta _{k,n} =0\) for all \(n > m+T\) and all k. Hence, to complete the proof for this case, we need to elicit a contradiction if \(\delta _{k,n}\) is strictly positive for all n and for all k. Since \(\delta _{k,n}\) is a nonincreasing sequence of n, we have

$$\begin{aligned} \delta _{k,n+s}&= \min \{ \delta _{k,n+s-1}, \delta _{k-1,n+s-1}\} = \min \{\delta _{k,n+s-2}, \delta _{k-1,n+s-2}, \delta _{k-1,n+s-1}\}\\&= \min \{\delta _{k,n+s-2}, \delta _{k-1,n+s-1}\} = \min \{\delta _{k,n}, \delta _{k-1,n+s-1}\}, \end{aligned}$$

for all \(s \in \mathbb {N}\). Since \(\delta _{k-1,n+s-1}\) should approach zero monotonically as \(s \rightarrow \infty \), there should be \(s_0\) such that \(\delta _{k-1,n+s-1} < \delta _{k,n}\) for all k and for all \(s > s_0\). Therefore, we get \(\delta _{k,n + s+T} = \delta _{k-1,n+s+T-1} = \delta _{k-2,n+s+T-2} = \delta _{k,n+s}\) for all \(s>s_0\). Since \(\delta \) cannot increase, we conclude that \(\delta _{k,n}\) is a constant for all sufficiently large n. If \(\delta _{k,n}\) is strictly positive for all n as assumed, \(C_k\) cannot be a limit and we arrive at a contradiction. Therefore, there is \(t_1\) such that \(c_{t+T} = c_t\) for all \(t \ge t_1\) in this case.

If \(\alpha \ne \alpha _T\), then there is at least one \(\varphi _k\) such that \((T+1)/T < \varphi _k\). If \(\epsilon \) also satisfies \(\epsilon /C_k < 1 - (T+1)/ (T \varphi _k)\), then we can write

$$\begin{aligned} \delta _{k,n+1} = \min \left\{ \delta _{k,n} , 1 - \tfrac{T-1}{T}\varphi _{k+1} (1 - \delta _{k+1,n} ) \right\} , \end{aligned}$$

for all \(n > m_0\). If \(\varphi _{k+1} < T/(T-1)\), then \(\delta _{k,n}\) will eventually be smaller than \(1-(T-1)\varphi _{k+1}/T\) and we have \(\delta _{k,n+1} = \delta _{k,n}=0\) for all large n. On the other hand, if \(\varphi _{k+1} = T/(T-1) > (T+1)/T\), we have

$$\begin{aligned} \delta _{k,n+1}&= \min \left\{ \delta _{k,n}, \delta _{k+1,n} \right\} ,\nonumber \\ \delta _{k+1,n+1}&= \min \left\{ \delta _{k+1,n}, 1 - \tfrac{T-1}{T}\varphi _{k+2} (1 - \delta _{k+2,n} ) \right\} . \end{aligned}$$
(33)

Since it is impossible for all \(\varphi _k\) to be \(T/(T-1)\), there exists \(k'\) such that \(\varphi _{k+i} = T/(T-1)\) for \(1 \le i \le k'\) and \(\varphi _{k+k'+1} < T/(T-1)\). Therefore,

$$\begin{aligned} \delta _{k+k',n+1} = \delta _{k+k',n}, \end{aligned}$$

for all sufficiently large n. Once \(\delta _{k+k',m} = 0\), then \(\delta _{k+i,n} = 0\) for all \(0 \le i \le k'\) and for all \(n > m+T\) by (33). If \(\varphi _{k+k'+1} > (T+1)/T\), we can repeat the above procedure. If \(\varphi _{k+k'+1} = (T+1)/T\), we have

$$\begin{aligned} \delta _{k+k'+1,n+1} = \min \left\{ \delta _{k+k'+1,n}, \delta _{k+k',n}, 1 - \tfrac{T-1}{T}\varphi _{k+k'+2} ( 1 - \delta _{k+k'+2,n}) \right\} = 0 \end{aligned}$$

for all sufficiently large n. Hence, the proof is complete. \(\square \)

5.3 Non-uniqueness of Periodic Solutions

Proposition 17 and its proof have shown the general periodic solutions of recursion relation (9) for \(t > t_1\) to be of the form

$$\begin{aligned} \chi _t = \chi _{t_1} \prod _{k=t_1+1}^t \varphi _k,\quad c_t = c_{t_1} e^{-\nu (t-t_1)} \prod _{k=t_1+1}^t \varphi _k, \end{aligned}$$

where the \(\varphi _k\) satisfy \(\varphi _{T+k} = \varphi _k\) and (31). Since \((T+1)/T \le e^\nu <T/(T-1)\), setting \(\varphi _i = e^\nu \) for all i satisfies (31), which gives the constant sequence \(c_{t} = c_{t_1}\). We refer to this solution as the homogeneous state. Recall that the homogeneous state is the unique possibility for \(\alpha = \alpha _T\), as shown right after (31). By constructing an appropriate sequence \((a_n)\), we now show that any set \(\{\varphi _k\}\) that satisfies the conditions (31) can give rise to a periodic solution \(c_t\). Therefore, the periodic solution \(c_t\) is not unique and can vary substantially with \((a_n)\) unless \(\alpha = \alpha _T\) or \(T=1\).

Proposition 18

Let

$$\begin{aligned} a_t = \max \left\{ \frac{T-i+t-1}{\alpha } \psi _i :0\le i < T \right\} ,\quad \psi _i := \prod _{j=1}^{i} \varphi _j, \end{aligned}$$
(34)

where the \(\varphi _j\) are as in (31) with periodicity \(\varphi _{T+j} = \varphi _j\) and we have used the convention \(\prod _{j=1}^0 = 1\). Then

$$\begin{aligned} \chi _t = \psi _{t+T-1}:= \prod _{j=1}^{t+T-1} \varphi _j. \end{aligned}$$
(35)

Proof

To find \(a_1\), we observe that for \(0 \le i < T-1\)

$$\begin{aligned} \frac{T-i-1}{\alpha } \psi _{i+1}&= \frac{T-i}{\alpha } \psi _i \frac{T-i-1}{T-i} \varphi _{i+1}\\&\le \frac{T-i}{\alpha } \psi _i \frac{(T-i-1)T}{(T-i)(T-1)} =\frac{T-i}{\alpha } \psi _i \frac{T^2-(i+1)T}{T^2-(i+1)+i} \le \frac{T-i}{\alpha }\psi _i, \end{aligned}$$

where we have used \(\varphi _i \le T/(T-1)\). Therefore, we get

$$\begin{aligned} \chi _1 =a_1 = \frac{T}{\alpha } = \prod _{j=1}^T \varphi _j =\psi _T, \end{aligned}$$

which is (35) for \(t=1\). Note that this \(\chi _1\) is trivially valid for \(T=1\).

Now assume (35) is valid up to \(t = n\). Then,

$$\begin{aligned} \chi _{n+1} = \max \left\{ \frac{T-i+n}{\alpha }\psi _i: 0 \le i \le n+T-1\right\} . \end{aligned}$$

For \(i \le n\), we have

$$\begin{aligned} \frac{T-i+n}{\alpha }\psi _i&= \frac{T-i+n+1}{\alpha }\psi _{i-1} \frac{T-i+n}{T-i+n+1} \varphi _i \\&\ge \frac{T-i+n+1}{\alpha }\psi _{i-1} \frac{(T-i+n)(T+1)}{(T-i+n+1)T}\\&=\frac{T-i+n+1}{\alpha }\psi _{i-1} \frac{T^2 + (n+1-i)T + n-i}{T^2 + (n+1-i)T } \\&\ge \frac{T-i+n+1}{\alpha }\psi _{i-1}, \end{aligned}$$

and for \( n+T-1>i \ge n\) and \(T>1\), we have

$$\begin{aligned} \frac{T-i+n}{\alpha }\psi _i&= \frac{T-i+n-1}{\alpha }\psi _{i+1} \frac{T-i+n}{T-i+n-1} \frac{1}{\varphi _{i+1}} \\&\ge \frac{T-i+n-1}{\alpha }\psi _{i+1} \frac{(T-i+n)(T-1)}{(T-i+n-1)T}\\&=\frac{T-i+n-1}{\alpha }\psi _{i+1} \frac{T^2 + (n-1-i)T + i-n}{T^2 + (n-1-i)T } \\&\ge \frac{T-i+n-1}{\alpha }\psi _{i+1}. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \chi _{n+1} = \frac{T}{\alpha } \psi _n = \psi _n \prod _{j=n+1}^{T+n} \varphi _j = \prod _{j=1}^{T+n} \varphi _j. \end{aligned}$$

Induction completes the proof. \(\square \)

Now we illustrate that any allowed set of \(\varphi _j\)’s can appear in the actual branching process by choosing an appropriate initial condition. For a realization of (34) in the branching process, consider an initial condition such that there are T different mutant classes with fitness \(f_i:= f^{\psi _i/\alpha }\) (\(0\le i < T\)) and the number \(N_i\) of individuals with fitness \(f_i\) is

$$\begin{aligned} N_i = \left\lfloor f^{(T-i-1)\psi _i/\alpha } \right\rfloor . \end{aligned}$$

Notice that this initial condition with \(\varphi _j = e^{\nu '}\) together with a shift in time was used in the proof of Lemma 16. In the limit \(f\rightarrow \infty \) as in Sect. 3, X(t) is well approximated by \(f^{\chi _t}\) with \(a_t\) in (34).

In the above discussion, we have illustrated that any permissible set of \(\varphi _j\)’s can be realized by choosing an appropriate initial condition. Now we argue that a surviving outcome with an arbitrary initial condition should approach such a permissible set, but which values of the \(\varphi _j\)’s are realized may depend on the stochasticity in the early time regime. In the original branching process, the sequence \((a_n)\) depends both on the initial condition and the stochastic evolution in the early time regime before the deterministic approximation through the recursion relation (9) becomes valid. To see this, we recall from Sect. 3 how the recursion relation arises from the stochastic process. Since on survival the total population size as well as the largest fitness increases indefinitely, there should be a generation \(t_0\) such that \(X(t_0) > K\) for any preassigned K. Let \(W_0\) be the largest fitness at generation \(t_0\), define \(Y = X(t_0)\) and introduce a shifted time variable \(t' = t - t_0\) with \(\tilde{X}(t'):= X(t'+t_0)\). If K is extremely large, \(\tilde{X}(t')\) can be well approximated as \(\tilde{X}(0) = Y\),

$$\begin{aligned}&{\tilde{X}}(1) = Y^{a_1}+1 \approx Y^{\chi _1},\quad W_1 \approx Y^{\chi _1/\alpha },\\&{\tilde{X}}(2) = Y^{a_2} + Y^{\chi _1/\alpha }+1 \approx Y^{\chi _2},\quad W_2 \approx Y^{\chi _2/\alpha },\\&{\tilde{X}}(3) = Y^{a_3} + Y^{2\chi _1/\alpha }+Y^{\chi _2/\alpha }+1 \approx Y^{\chi _3},\quad W_3 \approx Y^{\chi _3/\alpha } \end{aligned}$$

where \(Y^{a_n}\) is the population size of all mutant classes that appeared prior to generation \(t_0\). Since \(Y^{a_n} \le Y W_0^n\), we naturally have \(\lim _{n\rightarrow \infty } a_n e^{-\nu n} = 0\), and \((a_n)\) is a permissible sequence that can be entered into the recursion relation (9).

5.4 Empirical Fitness Distribution for Large \(\alpha \)

Whereas the preceding subsection has shown that the empirical fitness distribution at long times is generally non-universal, we will now argue that it nevertheless has a well-defined limit for \(\alpha \rightarrow \infty \). Let us begin with the homogeneous state. In this case,

$$\begin{aligned} J_i(t)&= e^{-\nu (t-i)},\quad P(t) \equiv \alpha (e^\nu -1), \\ R_i(t)&= \frac{(t-i)}{\alpha }J_i(t) -1 = -\frac{1}{\nu \alpha } J_i(t) \log J_i(t) - 1. \end{aligned}$$

Since \(\nu \alpha \rightarrow 1/e\) as \(\alpha \rightarrow \infty \), the homogeneous state for all sufficiently large \(\alpha \) is well described by

$$\begin{aligned} R_i \approx -e J_i \log J_i - 1, \end{aligned}$$
(36)

and the mean log fitness converges to \(P = \frac{1}{e}\). Moreover, since

$$\begin{aligned} \frac{T}{T-1} - \frac{T+1}{T} = \frac{1}{(T-1) T} = O(T^{-2}) \end{aligned}$$

and \(T/\alpha \rightarrow e\) as \(\alpha \rightarrow \infty \), in this limit all periodic solutions that satisfy the constraints (31) become close to homogeneous, \(\varphi _i = e^{\nu } +O(\alpha ^{-2})\). Therefore, we conjecture that the empirical fitness distribution on survival has (36) as a limit distribution for \(\alpha \rightarrow \infty \). As an illustration, in Fig. 4 we compare  (36) to numerical solutions of the recursion relation for \(\alpha =3\), 4, 5, 6. The numerical data are hardly distinguishable from (36) already for \(\alpha = 5\).

Fig. 4
figure 4

Plot of \(R_i\) vs \(J_i\) at \(t=500\) and 501 for \(\alpha =3\), 4, 5, 6. For comparison, the asymptotic prediction (36) is depicted as a solid curve. For \(\alpha =3\) and 4, the changes of the empirical fitness distributions between generations 500 and 501 are still visible, but the distributions become indistinguishable from the asymptotic form (36) for \(\alpha \ge 5\)

6 Summary and Discussion

In this article we have provided a detailed characterization of the superexponential population growth in two closely related stochastic models of evolution. To the best of our knowledge, this is the first rigorous analysis of a branching process with selection and mutations where the random fitness values (rather than the fitness differences [22]) are drawn from an unbounded probability distribution. A remarkable feature of the models considered here is the emergence of an integer-valued time scale T which depends (discontinuously) on the index \(\alpha \) of the underlying Fréchet distribution. As a consequence, the empirical fitness distribution displays oscillations with period T, a phenomenon that has been observed previously in certain models that include sexual reproduction [38, 39]. A partial understanding of the periodic behaviour of the population structure was achieved in a deterministic approximation. Further work on this problem is needed, addressing in particular how the stochastic initial phase of the process determines the non-universal aspects of the asymptotic population distribution.

It is instructive to compare our findings for the branching process to the earlier analysis of a stochastic fixed finite population version of Kingman’s model in [13]. In both cases the long-time behaviour is dominated by, and can quantitatively understood in terms of extremal mutation events in the past. However, in the fixed finite population model the likelihood of generating mutants that exceed the current population fitness declines with time, and the dynamics reduces to a modified record process, where the takeover of the population by a fit mutant is instantaneous compared to the waiting time for the next fitter mutant. As a consequence, the population at time t is dominated by a mutant that arose at a time of order t in the past. By contrast, in the branching process with Fréchet-type distributions, the declining probability of exceeding the current fitness is compensated by the rapid growth of the population in such a way that the time lag since the birth of the currently dominant mutant takes on a fixed value T. Moreover, the branching process never enters the regime of rare sequential fixation events associated with the decreasing supply of beneficial mutations in the finite population setting. Instead, the population attains a nontrivial stationary clonal structure which is approximately described in Sect. 5.

It is reasonable to expect that the growth of the population fitness in the branching process is intermediate between that of the fixed finite population model [13] and the deterministic infinite population model [5]. For Fréchet type fitness distributions the deterministic model is ill-defined, but the analysis of the fixed finite population model predicts a polynomial increase of the fitness with exponent \(1/\alpha \) [13], which is indeed much slower than the superexponential growth in the branching process. For unbounded Gumbel type distributions the growth law of the fitness is known for infinite as well as for finite populations [5, 13]. The corresponding behaviour of the branching process will be addressed in future work.