Separation of timescales for the seed bank diffusion and its jump-diffusion limit

We investigate scaling limits of the seed bank model when migration (to and from the seed bank) is ‘slow’ compared to reproduction. This is motivated by models for bacterial dormancy, where periods of dormancy can be orders of magnitude larger than reproductive times. Speeding up time, we encounter a separation of timescales phenomenon which leads to mathematically interesting observations, in particular providing a prototypical example where the scaling limit of a continuous diffusion will be a jump diffusion. For this situation, standard convergence results typically fail. While such a situation could in principle be attacked by the sophisticated analytical scheme of Kurtz (J Funct Anal 12:55–67, 1973), this will require significant technical efforts. Instead, in our situation, we are able to identify and explicitly characterise a well-defined limit via duality in a surprisingly non-technical way. Indeed, we show that moment duality is in a suitable sense stable under passage to the limit and allows a direct and intuitive identification of the limiting semi-group while at the same time providing a probabilistic interpretation of the model. We also obtain a general convergence strategy for continuous-time Markov chains in a separation of timescales regime, which is of independent interest.


Motivation and main results
In this extended introductory section, we first provide some background on the biological concept of dormancy and its relevance in particular in microbial communities. This is followed by a short review of modelling approaches for dormancy in population genetics, where we think that dormancy might be seen as an additional evolutionary force, interacting with other forces such as genetic drift in complex ways. Since dormancy periods vary over several orders of magnitude (depending on the underlying species and environmental conditions), we aim for a systematic classification of relevant timescales, leading to the distinction of three separate scaling regimes. While the first two regimes have been modelled and analysed in population genetics before, the last one, leading to a separation of timescales between genetic drift and dormancy periods, is new, and completes the picture (at least on the level of 'toy models') of modelling scenarios. Our results for this regime will be presented in this introduction both for the forward-in time population model as well as for the dual genealogical processes, leading to novel scaling limits, which are interesting also from a purely mathematical perspective.
The proofs of these results can be found in Sects. 2 and 3 for the results going backwards and forwards in time, respectively. We believe that our rather direct method of proof to obtain and characterise these limits, making extensive use of duality for Markov processes, can be applied in a variety of situations, so that in each section, we first present the corresponding methodology in a general set-up and then discuss its application to our concrete motivation. Background on dormancy Dormancy is a complex trait that has developed independently in many species across the tree of life and comes in many different guises. Originally, theory for dormancy and the resulting seed banks has be developed in the context of bet-hedging strategies for plants Cohen (1966). However, dormancy is also a highly common trait in microbial communities, with important consequences for their evolutionary, ecological and pathogenic properties.
Here, we define dormancy as the ability of (micro-) organisms to enter and leave a state of vanishing metabolic activity. It has been observed for many habitats that at any given time a large fraction of micro-organisms can be in such a dormant state. For example, more than 80% of bacteria in soil are reported to be metabolically inactive, forming large 'seed banks' comprised of dormant individuals, see Lennon and Jone (2011). While dormancy seems to be an efficient and wide-spread strategy, e.g. to withstand unfavourable environmental conditions, competitive pressure, or antibiotic treatment, it is at the same time a costly trait whose maintenance involves energy and a sophisticated 'switching machinery'.
Dormancy also plays a role in various (human) diseases. So-called persister cells, that may evade antibiotic treatment by remaining in a state of low activity, play a major role in chronic infections, cf. Fisher et al. (2017), and individual cell dormancy is linked to relapses in cancer, cf. Marx (2018), Endo and Inoue (2019).
In this paper, we will focus on microbial seed banks. Lennon and Jone (2011) and Shoemaker and Lennon (2018) provide a broad overview of this rich and fascinating field and serve as a motivation in the present paper. Given the relevance of biological systems exhibiting dormancy, investigating the mathematical implications of dormancy in large populations seems to be a timely and interesting task. Classification of the duration of dormancy: Known models and motivation for this paper As indicated above, dormancy comes in many different forms, specific to the involved species and environments. One variation lies in the duration of dormancy periods: While in some microbial species dormancy periods last at most a few days, others stay dormant for prolonged periods of time, and some, e.g. bacterial endospores, have been reported to successfully resuscitate from dormancy after millions of years (Shoemaker and Lennon 2018;Cano and Borucki 1995;Johnson et al. 2007;Morono et al. 2020). The theoretical derivation and analysis of mathematical models may help to identify, understand and classify the different effects of dormancy, on suitable timescales, on the population dynamics and genealogical processes of the underlying populations.
Hence, in this paper, we consider the consequences of dormancy and seed banks in the framework of population genetics. More precisely, we are interested in the interplay of dormancy and the classical evolutionary force of random genetic drift, in particular with respect to its sensitivity to the duration of dormancy periods.
In a bi-allelic, haploid population that reproduces according to the Wright-Fisher model, the frequency of a given allele converges to the Wright-Fisher diffusion, given as the solution to where (B(t)) t≥0 is a standard Brownian motion, if one measures time in the coalescent timescale (also known as the evolutionary timescale), i.e. on the order of the population size as this tends to infinity. This diffusion is dual to the block-counting process of the Kingman coalescent which in turn describes the genealogy of the population. These objects serve as a reference for populations without dormancy and are widely studied and applied in biology and mathematics alike. See e.g. Wakeley (2009) or Etheridge (2011) for an overview. We will consider suitable extensions incorporating dormancy.
We propose to distinguish three regimes comparing the duration of dormancy periods to the coalescent timescale, i.e. the scale at which the random genetic drift acts. 1. Dormancy periods are small compared to the coalescent timescale.
In 2001, Kaj et al. (2001) introduced a model for dormancy in the following fashion: instead of always choosing the ancestor in the preceding generations like in the Wright-Fisher model, individuals are allowed to choose an ancestor several generations in the past. Their lineages thus 'jump' this number of generations and can be interpreted as dormant during that time. If we denote by B ≥ 1 the expected size of the 'jump', the genealogy of the model converges on the coalescent timescale to a delayed Kingman coalescent, depicted in Fig. 1b, where coalescences occur at rate β 2 , where β := 1/B, instead of at rate 1, cf. Kaj et al. (2001), Blath et al. (2013). This in turn is dual to the delayed Wright-Fisher diffusion that again describes the frequency of a given allele in the population, cf. Fig. 2a. Note that β does not depend on the population size, whence its qualitatively weak impact on the coalescent timescale. 2. Dormancy periods on the order of the coalescent timescale For microbial species, however, dormancy times can be much longer than just a few 'generations', In this set-up, Lennon and Jone (2011) proposed a model based on two reservoirs, the 'active' and the 'dormant' population, between which individuals 'migrate/switch' via initiation of and resuscitation from dormancy, at fixed rates. A mathematical model for 'spontaneous/stochastic' switching (observed in nature under stable environmental conditions, cf. Epstein 2009;Shoemaker and Lennon 2018), was introduced and studied in Blath et al. (2016). This is reminiscent of the 'two-island model' (Wright 1931;Moran 1959) with the notable difference of the absence of reproduction on the second island.
If the size of the active and dormant population are proportional with the ratio given by some K > 0, the frequencies X (t) and Y (t) of a given allele in the active and dormant population, respectively, when time is measured on the coalescent timescale, are described by the seed bank diffusion, cf. Fig. 2b. This diffusion was first introduced in Corollary 2.5 in Blath et al. (2016). The existence of a unique strong solution that is Feller follows from Theorem 3.2 and Remark 3.2 in Shiga and Shimizu (1980), see also Greven et al. (2020) for a more general seed bank diffusion. Definition 1.1 (Seed bank diffusion) Let (B(t)) t≥0 be a standard Brownian motion and c, K finite positive constants. The [0, 1] 2 -valued continuous strong Markov process (X (t), Y (t)) t≥0 given as the unique strong solution of the initial value problem with (X (0), Y (0)) = (x, y) ∈ [0, 1] 2 , is called seed bank diffusion with parameters c, K , starting at (x, y) ∈ [0, 1] 2 . The genealogy of such a population is given by the seed bank coalescent, introduced in Definition 3.2 in Blath et al. (2016). Here, lineages can switch between an active and a dormant state independently (hence 'spontaneous' switching) at a given rate c > 0. While the active lineages behave like the Kingman coalescent, dormant lineages are prohibited from coalescing, as depicted in Fig. 1c. That dormancy appears in such a prominent form in the coalescent and in the diffusion and therefore is visible on the coalescent timescale is due to the underlying scaling assumptions of the model. These imply that dormancy times are of the order of the population size and therefore on the coalescent timescale. Here, many population genetic quantities and statistics are affected in non-trivial ways, see Blath et al. (2015), Blath et al. (2016) and Blath et al. (2020b) for a discussion of the scaling assumptions and further extensions of the model. Since the seed bank here has a major qualitative effect on both the diffusion and the coalescent, this is sometimes referred to as the strong seed bank model.
As in the previous models, an important mathematical tool in our analysis will be the formal duality relation between the seed bank diffusion (X (t), Y (t)) t≥0 and the block-counting process of the seed bank coalescent (N (t), M(t)) t≥0 . Note that the notion of a 'block' comes from the mathematical definition of a coalescent as a partition-valued process. In the biological context, the process could as well be denoted the line-counting process, keeping track of the number of ancestral lines presents at each time in the past. Definition 1.2 (Block-counting process of the seed bank coalescent) Let E := N 0 × N 0 . Let c, K > 0. We define (N (t), M(t)) t≥0 to be the continuous-time Markov chain taking values in E with conservative Q-matrix R given by This continuous-time Markov chain introduced in Definition 2.7 in Blath et al. (2016), satisfies the moment duality for every t > 0, for every (x, y) ∈ [0, 1] and for every n, m ∈ N 0 , see Theorem 2.8 in Blath et al. (2016). In other words, the distribution of the seed bank diffusion at any time t is uniquely determined by the moment dual at said time.
3. Dormancy periods are large compared to the coalescent timescale.
In view of the (potentially) extreme duration of dormancy times of bacterial spores, it is natural to ask: What happens in the third natural scaling-regime, when dormancy times are long in comparison to the scale on which genetic drift acts? This is the question answered in this manuscript in the following subsections.
To this end, we consider scaling limits of the above seed bank/two-island model when migration between active and dormant states (say at rate c) and reproduction (say at rate 1) act on different timescales, that is c being much smaller than 1. Interesting limits can only be expected when switching to a 'fast' super-evolutionary timescale. Indeed, if one just lets c → 0, then one obtains the trivial limit where the active population follows a Wright-Fisher diffusion and a Kingman coalescent, respectively, and is completely separated from the dormant population, as can be readily seen from (2) and (3). Hence, in order to capture the effect of long dormancy times one needs to speed up time by a factor 1/c, as c → 0, thus switching to a new timescale, which we will refer to as the super-evolutionary timescale. At this super-evolutionary timescale migration between the active and the dormant population occurs at rate 1 while reproduction, and hence genetic drift, acts 'instantaneously'. Intuitively, fast reproduction should drive the X coordinate of the diffusion process immediately towards the boundaries 0 and 1, which then only rarely switches between these states due to immigration of 'ancient' alleles. This is indeed what we will see below.
This scaling regime also leads to mathematically appealing problems. The naïve scaling limit would lead to a coefficient of "∞" for the genetic drift in the seed bank diffusion and an infinite coalescent rate in the seed bank coalescent, respectively, and we thus need to find a way to rigorously identify and describe such a 'degenerate' mathematical limit.
Main results under separation of timescales: the frequency process The following two theorems provide the main results for the frequency processes of Wright-Fisher models with seed banks, if dormancy times are sufficiently long for the timescales of dormancy and genetic drift to separate. Note that we switch to the super-evolutionary timescale. Theorem 1.3 Let (X c (t), Y c (t)) t≥0 be the seed bank diffusion given in Definition 1.1 with migration rate c > 0. Assume that the initial distributions (X c (0), Y c (0)) converge weakly to an (x, y) ∈ [0, 1] 2 as c → 0. Then, there exists a strong Markov process (X (t),Ỹ (t)) t≥0 , started in (X (0),Ỹ (0)) = (x, y) with the property that for any sequence of migration rates with c κ → 0 when κ → ∞, and we may choose (X (t),Ỹ (t)) t≥0 to be cádlág and such that for every t > 0 Here, càdlàg stands for continue à droite, limite à gauche, i.e. the property of a path to be right-continuous for every t ≥ 0 and have a limit from the left for every t > 0.
Note that the above convergence is in the sense of the finite-dimensional distributions (f.d.d.), which uniquely determines the law of the limit. As indicated above, it will have jumps in the first componentX , which is remarkable since the prelimiting processes all have continuous paths. In order to understand this, we prove in Proposition 3.8 that, if started in {0, 1} × [0, 1], (X (t),Ỹ (t)) t≥0 coincides in distribution with a Feller process (X (t),Ȳ (t)) t≥0 taking values in {0, 1} × [0, 1] which is defined via the generator The dynamics of the process (X (t),Ỹ (t)) t≥0 are therefore as follows: The first componentX is indeed a piece-wise deterministic process, switching between states 0 and 1. The switching rate at time t for jumps from 0 to 1 is just given by the value of the second componentỸ (t), and from 1 to 0 with complementary rate 1−Ỹ (t). In-between jump times ofX , the second componentỸ behaves deterministically, following the equation So whileX (t) is in state 0,Ỹ (t) decreases deterministically with exponential rate −KỸ (t), and whileX (t) is in state 1,Ỹ (t) increases with exponential rate K (1−Ỹ (t)). This is illustrated in Fig. 2c.

Interpretation: dormancy versus genetic drift on different timescales
In the classical Wright-Fisher model without dormancy, genetic drift drives the frequency process (Z (t)) t≥0 of a given allele towards the boundaries 0 and 1, where it fixates. This occurs on timescales of the order the (effective total) population size.
In the weak seed bank regime frequencies are described by (Z (t)) t≥0 and genetic drift is 'slowed down' in a quantitative sense by a factor β 2 , since dormant individuals may jump generations, increasing the effective population size accordingly. For example, expected fixation times will be stretched by the factor β −2 .
In the strong seed bank regime, dormancy times and genetic drift both act on the same timescale. The resulting additional seed bank 'island' in the diffusion (X (t), Y (t)) t≥0 will slow down the effect of genetic drift in a qualitative sense. In fact, although the active population may fixate briefly in 0 or 1, the seed bank component will then quickly reintroduce variability via the migration term, hence the memory in the seed bank prevents final fixation in finite time (at least for non-trivial initial states). This interesting effect is discussed in detail in Blath et al. (2019), where it is also shown that the seed bank introduces 'variability' into the population model in a suitable sense, by means of a delay-equation reformulation of the seed bank diffusion.
Finally, in the extreme case where dormancy periods are much longer than the timescale of genetic drift, if time is measured in the super-evolutionary scale, fixation/extinction in the active population of (X (t),Ỹ (t)) t≥0 will happen instantaneously, and last for a finite time. The switches of the frequency in the active population between 0 and 1 can be explained as follows: When a single 'ancient' allele 'resuscitates', it will usually not be able to fixate in the population and go extinct again. However, on the super-evolutionary timescale, these 'trials' reoccur many times, and eventually a resuscitating allele will fixate. If it is of the same type as the allele currently present in the active population, nothing changes and there will be no jump. However, if it is of the other type, this will causeX to switch to the opposite boundary. The probabilities of the allele resuscitating at time t being of the given type or of the opposite type arẽ Y (t) and 1 −Ỹ (t), which explains the form of the rates in Theorem 6. These observations regarding fixation or coexistence of types can be summed up as follows. In the Wright-Fisher diffusion without mutation (Z (t)) t≥0 , ultimately, one type will fixate. In the weak seed bank regime described by (Z (t)) t≥0 , there will also be one type that fixates, but the (expected) time until this happens is increased by a factor of β −2 . In the strong seed bank regime, we will occasionally see fixation of one type in the active population, but then the seed bank will reintroduce variability immediately, so that coexistence is visible almost all the time. Finally, in the case of dormancy on the super-evolutionary timescale, at any given time, the active population will always be homomorphic, but the dominant type will switch from time to time, and there are no visible periods of coexistence at all.

Duality and genealogical interpretation of the scaling regimes
As we have seen, the processes describing the forward-in-time frequency of a given allele in a Wright-Fisher model with seed bank have natural dual processes describing their genealogies. Such genealogical processes shed light on the effect of dormancy on the ancestral processes of samples, but are also useful tools for the proofs of the previous theorems, as they tend to be mathematically simpler objects. Our new scaling regime is no exception.
In the super-evolutionary scaling regime of Theorem 1.3 we obtain the blockcounting process of the ancient ancestral lines process as a scaling limit of the genealogies (see Theorem 1.5 below). Intuitively, since we are considering a population for which dormancy times are of a larger order than the times of coalescences, at the super-evolutionary timescale, coalescences occur instantaneously, while migration between the active and the dormant state occurs at order 1, cf. Fig. 3. Hence, in the limit, for each time t > 0, there will be at most one active line. More formally, we obtain the following definition.

Definition 1.4 (The ancient ancestral lines process)
where (0) is defined as I E , the identity on E (n 0 ,m 0 ) . P is a projection (P 2 = P) given by for all (n, m), (n,m) ∈ E (n 0 ,m 0 ) and G is defined as Note the form of the semi-group of the Markov chain which in particular is not standard, i.e. lim t↓0 (t) = P = Id E (cf. Chung 1960). Since the projection P acts for all t > 0, this process takes values in the smaller space {0, 1} × {0, . . . , m 0 + 1} Pa.s. for every (fixed) t > 0. The first two "rates" given in the definition of G correspond to the events of resuscitation (with immediate coalescence if applicable) and initiation of dormancy. G is, however, not a Q-matrix, since for anyn ≥ 2 it has negative values off the diagonal. These only regard states that will be collapsed by P into the smaller state space.
The technical challenges due to the degenerate form of the semi-group of the scaling limit coming from "separation of timescales phenomena" (cf. for example Wakeley 2009, Chapter 6 from the population genetics perspective) require special care as we detail in Sect. 2.1. Subsequently, we apply the above strategy to our model in Sect. 2.2 proving that the ancient ancestral lines process arises as the scaling limit of the block-counting process of the seed bank coalescent in the sense of convergence of the finite-dimensional distributions.
Theorem 1.5 Denote by (N c (t), M c (t)) t≥0 the block counting process of the seed bank coalescent as defined in Definition 1.2 with migration rate c > 0 and assume that it starts at some (n 0 , m 0 ) ∈ N × N, P-a.s.
Furthermore let (Ñ (t),M(t))) t≥0 be the ancient ancestral lines process from Definition 1.4 with the same initial condition. Then, for any sequence of migration rates Without loss of generality, we assume (Ñ (t),M(t)) t≥0 to be càdlàg.

Spontaneous and simultaneous switching
One should note that for the above models, we assumed a 'spontaneous' switching. 'Simultaneous' switching, where transition to and from the dormant population are triggered by environmental cues, are currently an active area of research, see e.g. Blath et al. (2020a).

Scaling limits for continuous-time Markov chains
Motivated by the example of the super-evolutionary scaling in the introductory section, as a first step, we consider scaling limits of continuous-time Markov chains. Indeed, when speeding up time, some transition rates diverge to ∞, thus obstructing direct Q-matrix computations and producing states that are vacated immediately. This effect is frequently observed when dealing with "separation of timescales phenomena" and can in a 'well-behaved' scenario still lead to a scaling limit with potentially "degenerate", i.e. non-standard transition semi-group of the form where P is a projection to a subspace of the original state space as a result of "immediately vacated states" and satisfies G = PG = G P. For discrete-time Markov chains, this situation was considered e.g. in Möhle (1998), Birkner et al. (2013) and recently also Möhle and Notohara (2016). Since the handling of such situations for continuoustime Markov chains (such as the above block counting process) might be of general interest and is somewhat more involved than the discrete case, we give a detailed "recipe" for such convergence proofs in Sect. 2.1. Note that all of these results can in principle be seen as specialised and ready-to-use variants of the general operatortheoretic scheme derived in Kurtz (1973) in the context of 'random evolutions' (see also Ethier and Kurtz 1986, Sect. 1.7). Recent applications of this scheme can also be found in Bobrowski (2015).

Separation of timescales phenomena for continuous-time Markov chains: a strategy
Given a sequence of continuous-time Markov chains (ξ κ (t)) t≥0 , κ ∈ N with finite state-space E (equipped with a metric d), suppose that our aim is to prove its convergence in finite-dimensional distributions under a suitable time-rescaling (C κ ) κ∈N to a continuous-time Markov chain (ξ(t)) t≥0 when κ → ∞.
Our programme to carry out such a proof has two steps: First, consider an appropriate time discretisation of (ξ κ (t)) t≥0 , κ ∈ N. Employing the machinery from Birkner et al. (2013), Möhle (1998) and Möhle and Notohara (2016) available in this context, one can prove convergence of a rescaling of the discretised processes to a continuous-time Markov chain (ξ(t)) t≥0 when κ → ∞ in the sense of weak convergence in finite-dimensional distributions.
Second, we prove a continuity result to show that the suitably rescaled original process converges in finite-dimensional distributions to the same limit.
In order to formulate the conditions on the time-rescaling and the original sequence of Markov chains, we rewrite the time-rescaling as C κ = b κ /a κ , where further assumptions on the non-negative sequences (a κ ) κ∈N and (b κ ) κ∈N will be specified below.

Step (i) Time discretisation and its convergence
The following lemma is an immediate application of Lemma 1.7 in Birkner et al. (2013) analogous to Theorem 1 in Möhle (1998). We rephrase it in this framework for the convenience of the reader and as reference for the examples we will consider below.
Observe that for a non-negative sequence (a κ ) κ∈N , (ξ κ (i/a κ )) i∈N is a discretetime Markov chain with finite state-space E for each κ ∈ N. We equip the matrices A = (A e,ē ) e,ē∈E on E with the matrix norm A := max e∈E ē∈E A e,ē . Since E is finite, convergence in the matrix norm is equivalent to pointwise convergence.
Assume that for every κ ∈ N we have a representation of the transition matrix of the form such that the following holds: A κ is a stochastic matrix and for some matrix P. Furthermore, we require that the matrix limit with respect to the matrix norm Then, we obtain the following convergence (with respect to the matrix norm): In particular, if we define (0) := Id E , then ( (t)) t≥0 is a semi-group that generates a continuous-time Markov chain which we denote by (ξ(t)) t≥0 . If Here, w − → denotes weak convergence.
Before proceeding to the proof of this lemma, let us make a few remarks about the assumptions and results observed in it.
Remark 2.2 1. Since κ is the transition matrix of the (ξ κ (t)) t≥0 under a time-change by a −1 κ , in a representation like (8), A κ is a stochastic matrix that contains only entries of order 1 and a −1 κ , and B κ contains only entries of order 1 and o(1). Since we then speed-up time by a factor b κ , we obtain a separation of timescales, where the entries in A κ give rise to a projection matrix P acting on the probability distributions on E, while the entries in B κ give rise to a "Q-matrix". The A κ contain the transition rates of (ξ κ (t)) t≥0 that occur at a faster rate than the new timescale, hence they occur "instantaneously" in the limit. The entries in B κ correspond to the transitions of (ξ κ (t)) t≥0 that either occur on the new timescale or are slower, hence describing the transitions visible in the limit and those that vanish. 2. Note that given (9), the matrix P is necessarily a projection on E, i.e. satisfies P 2 = P. Since P = P 2 , we have PG = G P = G and hence Pe tG = e tG P = P − I +e tG for any t ≥ 0. In particular, ( (t)) t≥0 is not standard, as lim t↓0 (t) = P = (0) = Id E . P effectively restricts the state-space of the limiting chain to a subspace of E.
Observe that G differs from a normal Q-matrix as it may have negative entries off the diagonal. (8), (9) and (10) above are precisely conditions (36), (46) and (48) in Birkner et al. (2013). Hence (11) is the claim of (49) in Lemma 1.7 and Remark 1.8 in Birkner et al. (2013). Remark 2.2 in particular implies that the Chapman-Kolmogorov equations hold for ( (t)) t≥0 and hence this generates a continuous-time Markov chain which we denote by (ξ(t)) t≥0 (see, for example, Kallenberg 2002, Thm. 8.4). The convergence in Eq. (11) and the Markov property then imply the convergence in finite-dimensional distributions.

Step (ii) Convergence of the continuous-time Markov chains
The previous step ensured the existence of a limit for suitably discretised versions of the original sequence of continuous-time Markov chains (ξ κ (t)) t≥0 . The following lemma tells us under what conditions such a discretisation is sufficiently fine to also imply the convergence of the (ξ κ (t)) t≥0 to the same limit.
Denote by G κ the Q-matrix of (ξ κ (t)) t≥0 for each κ ∈ N and set q κ := max e∈E −G κ e,e . If Proof When started at e ∈ E, the time to the first jump of ξ κ is exponentially distributed with parameter −G κ e,e . Hence on sees that condition a) was chosen precisely such that P (ξ κ (t)) t≥0 has a jump in 0 , Observe that for the distance between ξ κ b κ t a κ and ξ κ b κ t a κ at any time t ≥ 0 we have only if the process (ξ κ (t)) t≥0 has a jump in the interval b κ t a κ , b κ t a κ . Since the length of this interval can be estimated through and the Markov chains are time-homogeneous we can in turn estimate the probability of a jump in the interval using (12) and obtain In order to prove the convergence of the finite-dimensional distributions, recall that weak convergence of measures is equivalent to convergence in the Prohorov metric (see, e.g. Whitt (2002), Section 3.2). Hence, assumption (b) yields that for all time points 0 ≤ t 0 , . . . , t l < ∞, states e 0 , . . . , e l ∈ E and any ε > 0 sufficiently small there exists aκ ∈ N such that for all κ ≥κ: Combining this with (13) we see that for all time points 0 ≤ t 0 , . . . , t l < ∞, states e 0 , . . . , e l ∈ E and any ε > 0 sufficiently small there exists aκ ∈ N such that for all κ ≥κ This implies the convergence of the finite-dimensional distributions of ξ κ b κ a κ t t≥0 to the finite-dimensional distributions of (ξ(t)) t≥0 in the Prohorov metric and hence weakly, which completes the proof.

The ancient ancestral lines process (and other scaling limits)
Let us apply this machinery to the "ancestral lines process" introduced in Sect. 1. Indeed, consider the block-counting process of the seed bank coalescent defined in Definition 1.2 with vanishing migration rate c.
If we let c → 0 and simultaneously speed up time by a factor 1/c → ∞, we obtain a new structure given in Definition 1.4, thus uncovering a separation-of-timescales phenomenon. Theorem 1.5 formalises this heuristic and establishes the ancient ancestral lines process as scaling limit in finite-dimensional distributions of the block-counting process of the seed bank coalescent. Note that indeed P is a projection matrix and PG = G P = G, for P and G as in Definition 1.4.
Proof of Theorem 1.5 Let (c κ ) κ∈N be a positive sequence such that c κ → 0. Without loss of generality assume c κ ≤ 1 for all κ ∈ N. We prove the result using the machinery outlined in the previous section with a κ := c −2 κ and b κ := c −3 κ . Recall that (N c κ (t), M c κ (t)) t≥0 is the block counting process of the seed bank coalescent as defined in Definition 1.2 with migration rate c κ > 0 and assume that it starts at some (n 0 , m 0 ) ∈ N × N, P-a.s. Let N 0 be equipped with the discrete topology.
Step (i) In analogy to the notation in the previous section we abbreviate (ξ κ (t)) t≥0 := (N c κ (t), M c κ (t)) t≥0 and consider a discretised process with time steps of length a −1 κ = c 2 κ by defining Let κ be the transition matrix of the Markov chain (η κ (i)) i∈N 0 . The transition probabilities of this chain are for any sensible (n, m), (n,m) ∈ E (n 0 ,m 0 ) , recalling the convention of n 2 = 0 for n ≤ 1. This can be seen as follows.
Denote by T 1 the time of the first jump of ξ κ and by T 2 the time between the first and the second jump of ξ κ . By the strong Markov property we know that T 1 and ξ κ (T 1 ), as well as T 1 and T 2 are independent. Conditioning on ξ κ to start in (n, m), we also know that T 1 follows an exponential distribution with parameter n 2 + c κ n + c κ K m and that T 2 dominates an exponential random variable with parameter 2 n−1 2 + n+1 2 + c κ (3n +1+3K m) (condition on the possible values of ξ κ (T 1 ), then take the minimum of the possible exponential random variables describing the waiting time to the next jump). Using this one can check that To calculate the transition probabilities in (14), note that (15) tells us that the probability of seeing more than one jump by ξ κ in the interval [0, c 2 κ ] is in o(c 3 κ ). In particular, this gives us the order of the transition probabilities for η κ to states summarised under "otherwise", i.e. those that require more than one jump by ξ κ . The transitions that are possible with just one jump are "coalescence", "dormancy" and "resuscitation" in the order in which they appear in (14). We calculate the case of "coalescence": Note that in order to see such a transition at least one jump must have happened. Hence, where we used (15) for the third equality, the independence of ξ κ (T 1 ) and T 1 for the fourth and a Taylor expansion and the distribution of T 1 for the fifth equality. The transition probabilities for "dormancy" and "resuscitation" can be calculated analogously. The calculation of the transition probability to the same state the chain originated from is obvious. With the representation in (14) we now obtain the decomposition as in (8) In order to apply Lemma 2.1, we now need to check condition (9), i.e.
for P given in (7). Since A κ is a stochastic matrix, let (Z κ r ) r ∈N 0 be the Markov chain associated to it. This is a pure death process in the first component and constant in the second. By definition of the matrix norm, we get Observe that for all n ∈ {2, . . . , n 0 } (and all m ∈ {0, . . . , m 0 }) the probability of Z κ to jump to (n − 1, m) in the next step can be bounded: Hence, the number of time-steps required for Z κ to reach (1, m 0 ) if it is started in (n 0 , m 0 ) is dominated by the sum of n 0 − 1 independent geometric random variables γ κ 1 , . . . , γ κ n 0 −1 with success probability c 2 κ . More precisely, if we define T : By Markov's inequality, we get P γ κ 1 + · · · + γ κ n 0 −1 ≥ r ≤ 1 r E γ κ 1 + · · · + γ κ n 0 −1 = 1 Combining these observations we obtain and (17) holds. We are now left to establish the matrix-norm limit (10) and show that coincides with the G given in Definition 1.4. Notice that B κ itself converges when κ → ∞ uniformly and in the matrix norm (recalling that the state space E (n 0 ,m 0 ) is finite): Simply multiplying the matrices on the left-hand-side we obtain P B P = G and therefore where (Ñ (t),M(t)) t≥0 is the ancient ancestral lines process defined in Definition 1.4.

Condition (b) was proven in
Step (i). Therefore we may conclude when κ → ∞ and the proof of Theorem 1.5 is complete.

Remark 2.4 (Imbalanced Island Size)
It is straightforward to pursue the same consideration for the two-island model and its structured coalescent Herbots (1994); Notohara (1990). The two-island model considers two populations much like the seed bank model, but allows for coalescence in the second population. Its genealogy is then given by the structured coalescent, whose block-counting process allows for the same transition rates described in (3) adding r (n,m),(n,m) = m 2 forn = n andm = m − 1, i.e. coalescence in the second island (and adapting the diagonal entries accordingly).
Letting the migration rate converge c → 0 while speeding up time by 1/c → ∞ as we have done for the block counting process of the seed bank coalescent above will lead to a structure with instantaneous coalescences in both islands, leaving us with a single line migrating between them.
In this set-up it is much more interesting to consider a two-island model with different scalings of the coalescence rates in the islands. In order to do this, we introduce the parameters α and α such that the Q-matrix of the block-counting process of the structured coalescent now iŝ α and α are associated with the notion of effective population size (cf. e.g. Wakeley 2009) so a different scaling corresponds to a significant difference in population size on the two islands. If, in addition to c → 0 we assume the coalescence rate α = α (c) > 0 in the second island to scale as c, i.e. α /c → 1, the result is a two-island model with instantaneous coalescences in the first island, but otherwise 'normal' migration and coalescence behaviour in the second. In order to formalise this heuristic observation, denote by (N c,α (t), M c,α (t)) t≥0 the block-counting process of the structured coalescent as defined by the rates in (19) with migration rate c > 0 and coalescence rate α > 0 in the second island and assume that it starts at some (n 0 , m 0 ) ∈ N × N, P-a.s. (The parameters α, K > 0 are arbitrary but fixed.) Define (N (t),M(t)) t≥0 to be the continuous-time Markov chain with initial value (N (0),M(0)) = (n 0 , m 0 ), taking values in the state space E (n 0 ,m 0 ) := {0, . . . , n 0 + m 0 } 2 , with transition matrix (t) := Pe tĜ , for t > 0 and (0) = Id E , where P is given by (7) (as in the case of seed banks) andĜ is now a matrix of the form Then, for any sequence of migration rates (c κ ) κ∈N and any sequence of coalescence rates (α κ ) κ∈N with c κ → 0 and c κ /α κ → 1 when κ → ∞ This observation for the two-island model is analogous to Theorem 1.5 for seed banks. Its proof is a close parallel to that of Theorem 1.5. Considering, again, the sequences a κ := c −2 κ and b κ := c −3 κ , A κ and P coincide with those in the proof of Theorem 1.5, hence the hardest work has already been done. Small alterations to B κ immediately yield the result and we therefore omit any further details.

Scaling limits for the diffusion
We would now also like to observe similar scaling limits for the diffusion (2). As we saw in the case of Markov chains, rescaling time may lead to a limiting process that is still Markovian, but whose semi-group is not standard, i.e. not continuous in 0. We can use moment duality to obtain this limit.

Convergence of the finite-dimensional distributions obtained from duality
We present a method to obtain convergence in finite-dimensional distributions of a sequence of Markov processes using moment duality and the convergence in finitedimensional distributions of the dual processes. The result does not depend on whether time is rescaled, too, or not. It is, however, of particular interest in the rescaled case, since it might lead to the identification of limiting objects which rather "ill-behaved". Indeed, we will see examples in Sect. 3.2 where the limit does not have a generator with a sufficiently large domain and hence the common approach of proving convergence through generator convergence fails.
As usual, P n and P x denote the distributions for which ξ and ζ , start in n and x, respectively. If (ξ κ ) κ∈N 0 converges to some Markov chain ξ in the f.d.d.-sense, then there exists a Markov process ζ with values in [0, 1] d such that it is the f.d.d.-limit of (ζ κ ) κ∈N 0 and the moment dual to ξ , i.e.
Remark 3.2 At first glance one might suspect that this result should also hold in a more general set-up as long as the employed duality function yields convergence determining families for the respective semi-groups. Indeed, most of the steps of the proof would still go through. However, note that we did not assume existence of a limiting Markov process beforehand. We can conclude this by the solvability of Hausdorff's moment problem on [0, 1] d Hildebrandt and Schoenberg (1933), which precisely treats the existence (and uniqueness) of a distribution with a given sequence of moments and therefore "matches" the moment duality function in our theorem.

Proof of Theorem 3.1
The proof can roughly be split into three steps: We first use duality to prove the convergence of the one-dimensional distributions of (ζ κ ) κ∈N 0 . This, together with the Markov property will give us the convergence of the finitedimensional distributions of (ζ κ ) κ∈N 0 to a family of limiting distributions. Then we prove consistency of this family of distributions and hence by Kolmogorov's Extension-Theorem the existence of a limiting process ζ , which must then be Markovian.
Since the mixed-moment function m is continuous and bounded as a function on N d 0 , the convergence of the finite-dimensional distributions of (ξ κ ) κ∈N and the assumed moment duality yield for any t ≥ 0, x ∈ [0, 1] d and n ∈ N d 0 .
The duality of ζ and ξ follows from the duality of the prelimiting processes.

Ancient ancestral material scaling regime
As an application of Theorem 3.1 we consider the diffusion (2) with the scaling regime of Sect. 2.2, namely, with the migration rate c → 0 while simultaneously speeding up time by a factor 1/c → ∞ and obtain Theorem 1.3 stating the convergence of the rescaled diffusions to a Markovian limit (X (t),Ỹ (t)) t≥0 .
Note that (X (t),Ỹ (t)) t≥0 makes sense for more general initial conditions in [0, 1] 2 . In any case, the limiting process (X (t),Ỹ (t)) t≥0 instantaneously jumps into the smaller state space {0, 1} × [0, 1] at time 0+. The jump probabilities to 0 and 1 are the fixation probabilities of the ordinary Wright-Fisher diffusion. This corresponds to an instantaneous application of a projection operatorP defined as the limit (in a suitable sense)P where (P t ) t≥0 is the semi-group associated to the classical Wright-Fisher diffusion, cf. Kurtz (1973) (or Bobrowski 2015). Intuitively, this can be explained as follows: In the regime, where dormancy duration is significantly larger than the effect of genetic drift, the population evolves according to a Wright-Fisher diffusion without dormancy and has the chance to be absorbed in 0 or 1, before ever seeing a resuscitation/migration into the population from the seed bank. Hence, on the superevolutionary time-scale the probabilitiesIntuitively to immediately jump to 0 or 1 are precisely given by the corresponding fixation probabilities of the Wright-Fisher diffusion.
Remark 3.4 (Convergence on path space?) Once convergence of the finite-dimensional distributions is established in Theorem 1.3, it is natural (at least for mathematicians) to ask whether it is possible to prove tightness on the space of càdlàg paths space in order to obtain weak convergence. However, since the set of continuous paths form a closed subset of the càdlàg paths in the classical Skorohod (J 1 ) topology (cf. Skorohod (1956)), and the solutions to our pre-limiting seed bank diffusions are continuous, convergence in the above topologies would predict a limit with continuous paths, which we know not to be correct at least in 0. This makes weak convergence on path space impossible. However, the set of jump times of the above process is finite on finite time intervals, and in particular has Lebesgue-measure zero, so that we expect that convergence is true in weaker topologies, such as the Meyer-Zheng topology corresponding to convergence in measure (Meyer and Zheng 1984;Kurtz 1991). However, we refrain from going into these technicalities here, which we consider to be outside the scope of this manuscript. Shiga and Shimizu (1980) implies that the unique strong solution to the SDE (1.1) which is the seed bank diffusion from Definition 2 is a Feller process. This is considered in more generality in Theorem 2.4 in Greven et al. (2020).

Proof of Theorem 3.3
Since the (X c κ (t/c κ ) , Y c κ (t/c κ )) t≥0 are constant time-changes of the seed bank diffusion introduced in Definition 1.1, they are Feller, as well.
Since the moment duality of the block-counting process of the seed bank coalescent and the seed bank diffusion (4) holds for every time t ≥ 0, it is preserved for the timechanged processes (N c κ (t/c κ ) , M c κ (t/c κ )) t≥0 and (X c κ (t/c κ ) , Y c κ (t/c κ )) t≥0 . Together with Theorem 1.5 all assumptions of Theorem 3.1 hold and we get the existence of a Markov process (X (t),Ỹ (t)) t≥0 that is the dual of (Ñ (t),M(t)) t≥0 . By the uniqueness of the solution to the Hausdorff moment problem (Theorem 2 in in Hildebrandt and Schoenberg 1933) a distribution on [0, 1] 2 is uniquely determined by all its mixed-moments. The moment duality of the limit with a process that does not depend on the scaling sequence (c κ ) κ∈N 0 therefore implies that the one-dimensional distributions of the limit do not depend on the choice of scaling sequence, either. Since the limit is a Markov-process the one-dimensional distributions uniquely determine its entire distribution. Hence, the distribution of the limit does not depend on the choice of scaling sequence (c κ ) κ∈N 0 .
So far we have characterised the process (X (t),Ỹ (t)) t≥0 only as the moment dual of the continuous-time Markov chain (Ñ (t),M(t)) t≥0 whose semi-group we could give explicitly in Definition 1.4. We now use this characterisation to better understand the process (X (t),Ỹ (t)) t≥0 itself. More precisely, since (20) holds in particular for t > 0, m = 0 and any n ≥ 1, x, y ∈ [0, 1] we see We used the fact that the first component of the ancient ancestral lines process (Ñ (t),M(t)) t≥0 takes values in {0, 1} for any t > 0 in the second equality and the definition of the projection in the last equality. Since the right-hand side does not depend on n ≥ 1, we can conclude that 1} P x,y -a.s. for any t > 0 and any (x, y) ∈ [0, 1] 2 .
We can use this observation together with (25) to obtain ) is the identity matrix on N 2 0 .) This small observation has an important consequence: Much like in the case of its dual (Ñ (t),M(t)) t≥0 , the semi-group of the ancient ancestral material process (X (t),Ỹ (t)) t≥0 is not right-continuous in 0.
Intuitively, the reproduction mechanism (in the active population) acts so fast, that fixation (or extinction) in the active population happens instantaneously. Whenever there is an invasion from the seed bank, the chances that this is by an individual of the type extinct in the active population (thereby causing a change of type here) are given by the frequency of said type in the dormant population. The limit is thus a pure jump process in the active component that moves between the states 0 and 1 at rates proportional to the frequency in the dormant population of the allele that is extinct in the active population, while the seed bank component retains its classical behaviour. We can formalise this observation if we restrict the process to the smaller state space {0, 1} × [0, 1], see Proposition 3.8 below. for any (n, m), (n,m) ∈ {0, 1} × N 0 . Furthermore, let (X (t),Ȳ (t)) t≥0 be the Markov process on {0, 1} × [0, 1] defined by the generator given in (6). (6) is indeed the generator of a Markov process and this process is Feller. Furthermore, we may assume that (X (t),Ȳ (t)) t≥0 is cádlág on [0, ∞).

Proposition 3.7 (X (t),Ȳ (t)) t≥0 is well-defined i.e. the closure ofĀ given in
Proof Define E := {0, 1} × [0, 1] and We verify the conditions of the Hille-Yosida Theorem, cf. Theorem 19.11 in Kallenberg (2002), for (Ā, DĀ), whereĀ is given in (6). First note that hence DĀ is dense in C. In order to verify the maximum principle choose an arbitrary f ∈ DĀ and let ( Since we assumed f to have a maximum in (x,ȳ), the first two summands are nonpositive. Ifȳ ∈ (0, 1), a maximum in ( Hence,Ā f (x,ȳ) ≤ 0 and the maximum principle holds. We are left to prove that there exists a λ > 0 such that (λ −Ā)DĀ is dense in C. First, observe that f ∈ C if and only if it can be written in the form f (x, Since the polynomials are dense in the continuos functions on [0, 1] andĀ is a linear operator, it suffices to show that for any r ∈ N 0 we can find f (r ) In an intuitive abuse of notation, we will in the following denote maps of the form (x, y) → x y r by x y r and likewise for (1 − x)y r . We begin by calculating, for any r ∈ N 0 Proceed by induction on the degree r , beginning with r = 0. Observe that (λ−Ā)1 = λ and Now let n ∈ N and assume that for any r ≤ n − 1 there exist f (r ) , f (r ) ∈ DĀ such that (λ −Ā) f (r ) (x, y) = (1 − x)y r and (λ −Ā) f (r ) (x, y) = x y r . Note that (λ −Ā) y n + nK f (n−1) (x, y) = (λ + nK )y n .
In addition, similarly to the above, Hence we may again obtain x(λ + nK )(−y n ) + x −λ λ + (n + 1)K y n = x y n and with this also This completes the proof that (λ −Ā)DĀ is dense in C. Hence, the closure ofĀ generates a Feller semigroup on C. According to Kallenberg (2002, Proposition 19.14) this Feller semigroup then generates a Feller process, which we may assume to be cádlág paths thanks to Kallenberg (2002, Theorem 19.15).
(28) Moment duality of the involved processes will be important for the proof of the last statement, which is crucial for the proof of Theorem 1.3.

Proof
We prove the claims in order of appearance.
The duality of (N (t),M(t)) t≥0 and (X (t),Ȳ (t)) t≥0 can be shown using the respec-  Fig. 4 Strategy of the proof of Proposition 3.8. The moment duality of (Ñ (t),M(t)) t≥0 and (X (t),Ỹ (t)) t≥0 is a consequence of Theorem 3.3. The laws of (Ñ (t),M(t)) t≥0 and (N (t),M(t)) t≥0 agree when restricted to the reduced state-space {0, 1}×N 0 . We show the moment duality of (N (t),M(t)) t≥0 and (X (t),Ȳ (t)) t≥0 , which then allows us to conclude that the restricted laws of (X (t),Ỹ (t)) t≥0 and (X (t),Ȳ (t)) t≥0 also agree on where we continue to use 0 0 = 1, the fact that n ∈ {0, 1} and simply sorted the terms by powers of y for easier comparison in the last line.
Combining these results we obtain the proof of Theorem 1.3.
Proof of Theorem 1.3 Theorem 3.3 already yields the existence of (X (t),Ỹ (t)) t≥0 as the limit in finite-dimensional distributions.
(5) is simply the observation of (27). Hence we are left to prove that we can choose a process with the above properties (determined only by the distribution!) with nice path-properties.