Rare event simulation for stochastic dynamics in continuous time

Large deviations for additive path functionals of stochastic dynamics and related numerical approaches have attracted significant recent research interest. We focus on the question of convergence properties for cloning algorithms in continuous time, and establish connections to the literature of particle filters and sequential Monte Carlo methods. This enables us to derive rigorous convergence bounds for cloning algorithms which we report in this paper, with details of proofs given in a further publication. The tilted generator characterizing the large deviation rate function can be associated to non-linear processes which give rise to several representations of the dynamics and additional freedom for associated numerical approximations. We discuss these choices in detail, and combine insights from the filtering literature and cloning algorithms to compare different approaches and improve efficiency.


Introduction
Large deviation simulation techniques based on classical ideas of evolutionary algorithms [1,2] have been proposed under the name of 'cloning algorithms' in [3] for discrete and in [4] for continuous time processes, in order to study rare events of dynamic observables of interacting lattice gases. This approach has subsequently been applied in a wide variety of contexts (see e.g. [5,6,7,8] and references therein), and more recently, the convergence properties of the algorithm have become a subject of interest. Analytical approaches so far are based on a branching process interpretation of the algorithm in discrete time [9], with limited and mostly numerical results in continuous time [10]. Systematic errors arise from the correlation structure of the cloning ensemble which can be large in practice, and several variants of the approach have been proposed to address those including e.g. a multicanonical feedback control [7], adaptive sampling methods [11] or systematic resampling [12]. A recent survey of these issues and different variants of cloning algorithms in discrete and continuous time can be found in [13], Section III.
In this paper we provide a novel perspective on the underlying structure of the cloning algorithm, which is in fact well established in the statistics and applied probability literature on Feynman-Kac models and particle filters [14,15,16]. The framework we develop here can be used to generalize rigorous convergence results in [17] to the setting of continuous-time cloning algorithms as introduced in [4]. Full mathematical details of this work are published in [18], and here we focus on describing the underlying approach and report the main convergence results. A second motivation is to use different McKean interpretations of Feynman-Kac semigroups (see Section 2.2) to highlight several degrees of freedom in the design of cloning-type algorithms that can be used to improve performance. We illustrate this with the example of current large deviations for the inclusion process (originally introduced in [19]), aspects of which have previously been studied [20]. Current fluctuations in stochastic lattice gases have attracted significant recent research interest (see e.g. [21,22,23] and references therein), and are one of the main application areas of cloning algorithms which are particularly challenging. In contrast to previous work in the context of cloning algorithms [9,10], our mathematical approach does not require a time discretization and works in a very general setting of a jump Markov process on a compact state space. This covers in particular any finite state Markov chain or stochastic lattice gas on a finite lattice.
The paper is organized as follows. In Section 2 we introduce notation, the Feynman-Kac semigroup and several representations of the associated non-linear process. In Section 3 we describe different particle approximations including the cloning algorithm, and summarize results published in [18] on convergence properties of estimators based on the latter. In Section 4 we describe a modification of the cloning algorithm for a particular class of stochastic lattice gases and apply it to the inclusion process as an example.
2 Mathematical setting

Large deviations and the tilted generator
We consider a continuous-time Markov jump process X(t) : t ≥ 0 on a compact state space E. To fix ideas we can think of a finite state Markov chain, such as a stochastic lattice gas on a finite lattice Λ with a fixed number of particles M . Here E is of the form S Λ with a finite set S of local states (e.g. S = {0, 1} or {0, . . . , M }), but continuous settings with compact E ⊂ R d for any d ≥ 1 are also included. One can in principle also generalize to separable and locally compact state spaces, including countable Markov chains and lattice gases on finite lattices with open boundaries. But this would require more effort and complicate not only the proof but also the presentation of the main results for technical reasons which we want to avoid here (see [18] for a more detailed discussion).
Jump rates are given by the kernel W (x, dy), such that for all x ∈ E and measurable subsets A ⊂ E P X(t + ∆t) ∈ A X(t) = x = ∆t A W (x, dy) + o(∆t) as ∆t → 0 , where o(∆t)/∆t → 0. We use the standard notation P and E for the distribution and the corresponding expectation on the usual path space for jump processes Ω = ω : [0, ∞) → E right continuous with left limits .
If we want to stress a particular initial condition x ∈ E of the process we write P x and E x . The process can be characterized by the infinitesimal generator acting on all continuous bounded functions f ∈ C b (E) on the state space. The adjoint L † of this operator acts on probability distributions µ on E, and determines their time evolution via where P[X t ∈ A] = A µ t (dy) for any regular A ⊂ E characterizes the distribution of the process at time t ≥ 0. In case of countable E, (2.3) is simply the usual master equation of the process for µ t (y) = P[X t = y], but we focus our presentation on the equivalent description via the generator (2.2), which leads to a more compact notation and applies in the general setting. As a technical assumption we require that the total exit rate of the process is uniformly bounded w(x) := E W (x, dy) ≤w < ∞ for all x ∈ E . (2.4) We are interested in the large deviations of additive path space observables A T : Ω → R of the general form where g ∈ C b (E 2 ) and h ∈ C b (E). Note that A T is well defined since the bound on w(x) implies that the sum in the first term almost surely contains only finitely many terms for any T > 0. The above functional, which recently appeared in this form in [24], assigns a weight via the function g to jumps of the process, as well as to the local time via the function h. Dynamics conditioned on such a functional have been studied in many contexts [5], including driven diffusions on periodic continuous spaces E [25]. As mentioned before, the simplest examples covered by our setting are Markov chains with finite state space E. This includes stochastic particle systems on a finite lattice with periodic or closed boundary conditions such as zero-range or inclusion processes [23,20,26], and also processes with open boundaries and bounded local state space such as the exclusion process [3]. Choosing g appropriately and h ≡ 0 the functional A T can, for example, measure the empirical particle current across a bond of the lattice or within the whole system up to time T .
We assume that A T admits a large deviation rate function, which is a lower semi-continuous function I : for all regular intervals U ⊂ R (see e.g. [27,28] for a more general discussion). Based on the graphical construction of the jump process and the contraction principle, existence and convexity of I can be established in a very general setting for countable state space (see e.g. [29] and references therein). If I is convex, it is characterized by the scaled cumulant generating function (SCGF) via the Legendre transform It is well known (see e.g. [26,24]) that λ k can be characterized as the principal eigenvalue of a tilted version of the generator (2.2) with modified rates for the jump partL k W k (x, dy) := W (x, dy)e kg(x,y) , (2.10) and potential for the diagonal part By (2.4) and the boundedness of g and h, for each k ∈ R there exist constants such that we have uniform bounds Note also that L 0 = L, but for any k = 0 the diagonal part of the operator does not vanish and Still, it generates a Feynman-Kac semigroup (see e.g. [17,24] for details), defined as which is the unique solution to the backward equation Due to the diagonal part of L k this does not conserve probability, i.e. for the constant function f ≡ 1 we get The associated logarithmic growth rate provides a finite-time approximation of the SCGF λ k , which depends on the initial condition x ∈ E. We require convergence of this approximation as t → ∞ as an asymptotic stability property of the process, discussed in detail in [18] and references therein. Under exponential mixing assumptions, which are mild in our contexts of interest, it can be shown that for some constant C > 0 This is for example the case if E is finite and the process necessarily has a spectral gap, as is the case for all finite state lattice gases mentioned earlier. Furthermore, exponential mixing implies that the modified finite-time approximation with a ∈ (0, 1) , with a 'burn-in' time period of length at, significantly improves the convergence in (2.18) to for some ρ ∈ (0, 1). This is of course routinely used in Monte-Carlo sampling where systems are allowed to relax towards stationarity before measuring. These intrinsic properties of the process and the related finite-time errors in the estimation of λ k are not the main subject of this paper. In the following we simply assume asymptotic stability and (2.18), and focus on the efficient numerical estimation of λ k (t) for any given t ≥ 0. λ k (at, t) can be treated completely analogously, which is discussed in more detail in [18].

McKean interpretation of the Feynman-Kac semigroup
As usual, for a given initial distribution ν k 0 on E the semigroup (2.14) determines a measure ν k t at times t ≥ 0 on E, which can be characterized weakly through integrals of bounded test functions f ∈ C b (E) as Here and in the following we use the common short notation ν k t (f ) for the integral of the function f under the measure ν k t to simplify notation. Note that we can write (2.17) as with a more general initial condition ν k 0 . But since P k t does not conserve probability, ν k t is not a normalized probability measure and it is consequently impossible to sample from it.
With (2.16) ν k t (1) > 0 for all t ≥ 0, and we can define normalized versions of the measures via Using (2.9), (2.13) and (2.15) on can derive the evolution equation with initial condition µ k 0 = ν k 0 . It can be shown by similar direct computation of d dt log ν k t (1), using (2.13) and (2.22) that So the finite-time approximation of λ k is given by an ergodic average with respect to the distribution µ k t , depending on the initial distribution µ k 0 , with an obvious modification for (2.19). The asymptotic stability of the original process implies that µ k t → µ k ∞ converges to a unique stationary distribution µ k ∞ on E, so that the SCGF (2.7) can be written as the stationary expectation of the potential Due to the non-linear nature of (2.24), µ k ∞ is characterized by stationarity as the solution of the non-linear equation Usually µ k ∞ cannot be evaluated explicitly, but from (2.24) it is possible to define a generic processes X k (t) : t ≥ 0 with time-marginals µ k t , and then use Monte Carlo sampling techniques. The first term of (2.24) already corresponds to a jump process with generator L k , and we have to rewrite the second non-linear part to be of the form of a generator. There is some freedom at this stage, and we report three common choices from the applied probability literature on particle approximations [30,17], one of which corresponds to the approach in [3,4], and to the best of the authors' knowledge the other two have not been considered in the computational physics literature so far.
For every probability distribution µ on E we can write using the standard notation a + = max{0, a} and a − = max{0, −a} for positive and negative part of a ∈ R. We have the freedom to introduce an arbitrary constant c ∈ R, possibly depending also on the measure µ (but not the state x ∈ E), since the left-hand side of (2.26) is invariant under renormalization of the potential The generators L − k,µ,c and L + k,µ,c describe jump processes on E with rates depending on the probability measure µ. V k (x) can be interpreted as a fitness potential for the process, and play exactly that role in the particle approximation of this process based on population dynamics, which is presented in Section 3. Generic choices are: • c = 0 is the default and simplest choice, but is usually not optimal as discussed in Section 4.
• c = µ k t (V k ) corresponding to the average potential: If the system in state x is less fit than c it jumps to state y chosen from the distribution µ k t (dy) according to (2.27), and independently, the system jumps to states fitter than c irrespective of its current state according to (2.28).
respectively, and only one of the two processes has to be implemented in a simulation.
Another representation of the non-linear part in (2.24) is (see e.g. [31], Section 5.3.1) which is particularly interesting for implementing efficient selection dynamics as discussed in Section 4. Here every jump from this part of the generator strictly increases the fitness of the process, which is a stronger version of the previous idea where the process on average increased its fitness above level c. The rate depends on departure state x and target state y, which is in general computationally more expensive to implement than rates in (2.27) and (2.28), but can still be feasible due to simplifications in many concrete examples as demonstrated in Section 4. A further improvement of that idea is given by which resembles a continuous-time version of selection processes which are known under the names of stochastic remainder sampling [32] or residual sampling [33] in discrete time. Here selection events change the process from states x of less than average fitness µ(V k ) to states y fitter than average, but we will see in Section 4 that this variant is harder to implement than (2.29) in our area of interest, and offers only limited extra gain on selection efficiency. In summary, the evolution equation (2.24) for µ k t can be written as where the first choice with (2.27) and (2.28) is included defining L V k,µ = L − k,µ,c + L + k,µ,c in that case. This defines a Markov process X k (t) : t ≥ 0 on the state space E with generator The process is non-linear since the transition rates in the generator L k,µ k t depend on the distribution µ k t of the process at time t, and in particular the process is also time-inhomogeneous. While the generator is still a linear operator acting on test functions f , the adjoint L † k,µ k t is a non-linear operator acting on measures µ k t , generating their time which is equivalent to (2.31). This microscopic mass transport description consistent with the macroscopic description provided by the Feynman-Kac semigroup P k t is also called a McKean representation [16,31]. It is well know that particle approximations of different McKean representations can have very different properties. The first part is similar to the original dynamics with modified rates W k (2.10), and the second non-linear part depending on the distribution µ k t arises from normalizing the measures ν k t . Note that µ k t and therefore the finitetime approximation λ k (t) in (2.25) are uniquely determined by (2.24), and thus independent of the different McKean representations, as are of course the limiting quantities µ k ∞ and λ k . Also, these interpretations do not make use of concepts from population dynamics such as branching, which will only come into play when using particle approximations of the measures µ k t as explained in the next section.
is fully determined by the state of the ensemble at time t, the particle approximation is a standard (linear) Markov process on E N . This leads to an estimator for the SCGF using (2.25) given by which is a random object depending on the realization of the particle approximation. The full dynamics can be set up in various different ways such that µ N (X k (t)) → µ k t converges as N → ∞ for any t ≥ 0. A generic version, directly related to the above McKean representations has been studied in the applied probability literature in great detail [30,17], providing quantitative control on error bounds for convergence. After describing this approach, we present a different approach known in the theoretical physics literature under the name of cloning algorithms [5,13], which provides some computational advantages but lacks general rigorous error control so far [9,10]. We will then set up a framework to identify common aspects of both approaches, which can be used to generalize existing convergence results to obtain rigorous error bounds for cloning algorithms as described in detail in [18], and to compare computational efficiency of both approaches.

Basic particle approximations
The most basic particle approximation is simply to run the McKean dynamics (2.32) in parallel on each of the particles, replacing the dependence on µ k t by µ N (X k (t)) in the jump rates. Mathematically, denoting by L N k the generator of the full N particle process X k (t) : t ≥ 0 acting on functions F : E N → R, this corresponds to Here L i k,µ N (x) is equivalent to (2.32) acting on particle i only, i.e. on the function x i → F (x) while x j , j = i remain fixed. The linear part L k of (2.32) does not depend on µ k t and follows the original dynamics for each particle, referred to as 'mutation' events in the standard population dynamics interpretation. In this context, the non-linear parts (2.27) and (2.28) can be interpreted as 'selection' events leading to meanfield interactions between the particles. Using the definition (3.1) of the empirical measures, we can write for the part (2.27) So with a rate depending on the fitness of particle i, it is 'killed' and replaced by a copy of particle j uniformly chosen from all particles. Analogously, we have for (2.28) which leads to the same transition x → x i,xj , but with a different interpretation. Each particle j in the system reproduces independently with a rate depending on its fitness (cloning event), and its offspring replaces a uniformly chosen particle, which is equal to i with probability 1/N . The different nature of killing and cloning events becomes clearer when we write out the full generator (3.3) and switch summation indices for the cloning part (3.5) in the second line, Analogously, the McKean representations (2.29) and (2.30) lead to basic N -particle systems with generators and (3.8) Here a particle i is replaced by a particle j with higher fitness, combining killing and cloning into a single event. In the case of (3.8), particle i is furthermore less fit and j is fitter than average. Note that these approximating systems can be seen as particle systems with mean-field or averaged pairwise interaction given by the selection dynamics. Following established results in [30,17,31], the (random) quantity Λ N k (t) is an asymptotically unbiased estimator of λ k (t) with a systematic error bounded by along with several rigorous convergence results. These include an estimate on the random error in L p norm for any p > 1, as well as other formulations including almost sure convergence. Note that these estimates are uniform in t ≥ 0, so are not affected by the choice of simulation time. The use of a finite simulation time, t, leads to an additional systematic error to the estimate of the SCGF λ k , of order 1/t as in (2.18) or ρ at /t as in (2.20). The bound (3.10) for p = 2 implies for the variance for any real-valued random variable Y . Therefore, error bars based on standard deviations are of the usual Monte Carlo order of 1/ √ N , and the random error dominates the systematic bias (3.9) for N large enough. Further remarks on possible unbiased estimators can be found at the end of the next subsection.

Essential properties of particle approximations
Following the standard martingale characterization of Feller-type Markov processes (see e.g. [34], Chapter 3), we know that for every bounded, continuous is a martingale on R with (predictable) quadratic variation where the associated carré du champ operator Γ N k is given by (3.14) In analogy to the decomposition of a random variable into its mean and a centred fluctuating part, the martingale (3.12) describes the fluctuations of the process t → F X k (t) . The strength of the noise depends on time and is given by the increasing process (3.13), whose time evolution is generated by the carré du champ operator (3.14). In contrast to the generator L N k , this is a quadratic (non-linear) operator and it is the main tool for studying the fluctuations of a process.
Elementary computations for approximations (3.6), (3.7) and (3.8) show that for marginal test functions F (x) = f (x i ) depending only on a single particle, we have So generator and carré du champ both coincide with the corresponding operators L k,µ N (x) and Γ k,µ N (x) for the McKean dynamics (2.32). This means that for large enough N and µ N X k (t) close to µ k t , each marginal process t → X i k (t) has essentially the same distribution as the corresponding McKean process t → X k (t). Note that due to selection events these (auxiliary) dynamics do not coincide with the original process conditioned on a large deviation event, and they are also not unique since there are various choices for McKean representations of Feynman-Kac semigroups, as discussed earlier. Trajectories in a particle approximation always correspond to the trajectories of the particular McKean interpretation they are based on, which is usually (2.26) in the context of cloning algorithms. Due to asymptotic stability the particle approximation converges as t → ∞ for fixed N to a unique stationary distribution µ N,k ∞ , and the single-particle marginals of this distribution converge to µ k ∞ as N → ∞. While the marginal processes for a given particle approximation are identically distributed they are not independent, and µ N,k ∞ exhibits non-trivial correlations between particles resulting from selection events, which we discuss again in Section 4. Now consider averaged observables of the form as they appear in the eigenvalue estimator (3.2). Since the generator L N k is a linear operator in F , we have the same identity as above for the generator, The carré du champ, on the other hand, is non-linear in F and the dependence between particles is captured by this operator. Since for all approximations (3.6), (3.7) and (3.8) in every selection event only a single particle is affected, another elementary, slightly more cumbersome computation shows (see [18] for details) The factor 1/N results from a self-averaging property of the mean-field interaction through selection dynamics, which is expected from results on other mean-field particle systems (see e.g. [35,36] and references therein), and is fully analogous to the central-limit type scaling of the empirical variance for the sum of N independent random variables. While this scaling remains the same for more general particle approximations with more than one particle being affected by selection events, the simple identity (3.16) does not hold exactly for any N ≥ 1 as we see in the next subsection.
Recall that the estimator (3.2) for the principal eigenvalue (2.7) is given by an ergodic integral of the average observable is also bounded and the carré du champ (3.16) vanishes as N → ∞. Therefore the martingale M N F (t) also vanishes 1 for all t ≥ 0, leading to a convergence of the measures µ N (X k (t)) → µ k t (t) and also of finite time approximations Λ N k (t) → λ k (t) as reported in the previous subsection. Due to the time-normalization in (3.2) and the assumed ergodicity, corresponding error bounds hold uniformly in t ≥ 0. In summary, bounds on the carré du champ are the main ingredient for the proof of convergence results as explained in detail in [18] and references therein. All above properties up to and including (3.15) are generic requirements for any particle approximation. These particle approximations can differ in their correlation structures and this freedom can be used to construct numerically more efficient particle approximations as discussed in the next subsection. To optimize sampling, particles should ideally evolve in as uncorrelated a fashion as possible; it is not possible to achieve completely independent evolution due to the non-linearity of the underlying McKean process and resulting selection events and mean-field interactions.
Remarks on unbiased estimators. Estimators based on expectations w.r.t. the empirical measures , which vanishes only asymptotically (3.9). This originates from the non-linear time evolution of µ k t (2.24) and associated McKean processes. It is straightforward to derive estimators based on the unnormalized measures ν k t (2.21) that are unbiased. Based on (2.25) and (3.2), we obtain an estimate of the normalization ν k t (1): and then introduce unnormalized empirical measures on E at time t based on the particle approximation The expected time evolution of observables f is then given by Now with (3.15) and the decomposition (2.32) of L k,µ N t into mutation and selection part, we have and with the general construction of McKean representations (2.26) Note that choosing f ≡ 1 implies that the normalization (3.17) is an unbiased estimator of e tλ k (t) , which we will see again in Section 3.4 in the context of cloning algorithms. However, in practice the random error dominates the accuracy of estimates of λ k (t), so N has to be chosen large and the bias of the estimator Λ N k (t) (3.2) is negligible.

Cloning algorithms
Cloning algorithms proposed in the theoretical physics literature [3,4] are similar to the particle approximation (3.6), using the same tilted rates W k for mutations, but combining the cloning and mutation part of the generator. We focus the following exposition around the algorithm proposed in [4], but other continuous-time versions can be analysed analogously. The idea is simply to sample the cloning process for each particle i together with the mutation process at the same rate In each combined event, a random number of clones is generated with a distribution p N k,xi (n) such that its expectation is These clones then replace n particles chosen uniformly at random (in the sense that all subsets of size n are equally probable) from the ensemble. In this way, the rate at which a clone of particle i replaces any given particle j is as required for L N k in (3.6). The only additional assumption on p N k,xi (n) is that the range of possible values for n has to be bounded by N for the cloning event to be well defined. Since its mean is bounded by with the correct mean and finite range will lead to a valid algorithm for sufficiently large N . The above cloning process is described by the generator Here we have used the notation x A,w; i,y for the vector z ∈ E N with and by linearity of generators also for all averaged functions of the form F (x) = µ N (x)(f ). One can also show that for marginal test functions the carré du champ operators coincide, so the cloning algorithm produces marginal processes or particle trajectories with the same distribution as the simple particle approximation (3.6). For averaged test functions the change in the correlation structure between particles is picked up by the carré du champ operator. Instead of (3.16) one can derive the following estimate for the mutation and cloning part see [18], Theorem 3.2, for a proof.
Here q N k (x i ) := N n=0 n 2 p N k,xi (n) denotes the second moment of the number of clones for the particle i, and we use the decomposition (2.32) where Γ k f is the carré du champ corresponding to the mutation dynamics L k , and Γ + k,µ N (x),c the one corresponding to the cloning part (2.28). This estimate holds, of course, only for N large enough that the cloning event is well defined (see discussion above). Note also that This is sufficient to carry out the full proof of the convergence results mentioned in Section 3 based on results in [17]. This is carried out in [18] in full detail, and here we only report the main result of that work. Recall the bounds (2.12) on V k and the total modified exit rate w k .
Theorem. Denote byΛ N k (t) the eigenvalue estimator (3.2) corresponding to the cloning algorithm (3.22). Then there exist constants α, γ > 0 and α p , γ p > 0 such that for all N large enough

24)
and for all p > 1 Remarks.
• Choosing the normalization of the potential c < inf x∈E V k (x) the killing rate in (3.6) vanishes and (3.21) describes the full generator L N,clone k for the cloning algorithm. This is computationally cheaper and simpler to implement, since only the mutation process has to be sampled independently for all particles, and cloning events happen simultaneously. However, as is discussed in Section 4, this choice in general reduces the accuracy of the estimator.
• A common choice in the physics literature for the distribution p N k,xi of the clone size event (see e.g. [3,13]) is otherwise (3.26) So the two adjacent integers to the mean are chosen with appropriate probabilities, which minimizes the variance of the distribution for a given mean. This choice therefore minimizes the contribution of the second moment q N k to the bound for the errors in (3.24) and (3.25), and is also simple to implement in practice.
• Due to (3.23), trajectories of individual particles follow the same law as the simple particle approximation (3.6) and therefore the same McKean process as explained in Section 2.2 The cloning approach can introduce additional correlations between particles due to large cloning events, which is quantified by the second moment q N k entering the error bounds in (3.24) and (3.25).

The cloning factor
In the physics literature an alternative estimator to Λ N t (3.2) is often used, based on a concept called the 'cloning factor' (see e.g. [3,5,13]). This is essentially a continuous-time jump process (C N k (t) : t ≥ 0) on R + with C N k (0) = 1, where at each selection event of size n ≥ −1 at a given time t the value is updated as Here n = −1 indicates a killing event, and n ≥ 0 a cloning event according to the two parts of the generator (3.22). This idea comes from a branching process interpretation of the cloning ensemble related to the unnormalized measure ν k t , since with (2.17) we have that ν k t (1) ≈ e λ k t as t → ∞ , so λ k corresponds to the volume expansion factor of the clone ensemble due to selection dynamics. In our setting, the dynamics of C N k (t) can be defined jointly with the cloning process via an extension of the generator (3.22) acting on functions that depend on the state x ∈ E N and the cloning factor ζ ∈ R + . With (3.21) we have for cloning events and with the third line of (3.6) for killing events So the joint process (X N k (t), C N k (t)) : t ≥ 0 is Markov, and observing only the cloning factor with the simple test function G(x, ζ) = ζ we getL In the last line we have used (3.20), and in a similar fashion we get for killing eventsL and analogously to (3.18), the expected time evolution of C N k (t) is then given by This is also the evolution of With initial conditions C N k (0) = 1 = ν N t (1), a Gronwall argument analogous to (3.19) gives So e tc C N k (t) is an unbiased estimator for ν t (1), which leads also to an alternative estimator for λ k (t) (2.17) given by Note that this is not itself unbiased as a consequence of the nonlinear transformation involving the logarithm. Since C N k (t) is defined as a product (3.27), we can use another simple test function G(x, ζ) = log ζ to analyze the convergence behaviour of Λ N k (t). Analogously to (3.28) and (3.29) we get where we have also used (3.20) and assumed that the support of p N k,xi is bounded independently of N (which is the case for common choices in the literature such as (3.26)). This allows us to approximate log(1 + n/N ) = n/N + O(1/N 2 ) as N → ∞, leading to error terms of order 1/N . Then, analogously to (3.12) we get with log C N k (0) = 0 that is a martingale. For the carré du champ we obtain from a straightforward computation that and since the potential V k is bounded (2.12), the quadratic variation of the martingale is bounded by Therefore the estimator (3.30) based on the cloning factor is asymptotically equal to the basic estimator Λ N k (t) (3.2), with corrections that vanish as 1/N in the L p -norm as N → ∞ uniformly in t ≥ 0, analogously to the discussion in Section 3.2. Therefore, the same convergence results as stated in the Theorem apply forΛ N k (t).
Similar convergence results can be shown to hold for e tc C N k (t) as an estimator of ν t (1) for fixed t > 0, but naturally cannot hold uniformly in time. Since the object of interest is usually the long-time limit λ k (2.17), the practical relevance of this is limited, in addition to the general point that random errors dominate the convergence as mentioned in Section 3.2. In practice, the basic ergodic average Λ N k (t) (3.2) is more useful than the cloning factor in the application areas we have in mind. In particular, for alternative particle approximations such as (3.7) or (3.8) where cloning and killing events are effectively combined, it is not clear how to define a cloning factor, whereas Λ N k (t) is always easily accessible.

Efficiency of algorithms
Selection events (cloning or killing) in a particle approximation increase the correlations among the particles in the ensemble, and thereby decrease the resolution in the empirical distribution µ N t = µ N (X k (t)), and ultimately the quality of the sample average in the estimator (3.2). Therefore it is desirable to minimize the total rate S k (x) of selection events for a particle approximation. For algorithm (3.6) this is given by and the same holds for the cloning algorithm (3.22), since the change in cloning rate is compensated exactly by the average number of clones created to obtain the same overall rate. It is easy to see that for a given state x of the clone ensemble, there is an optimal choice of c to minimize this expression, given by the median of the fitness distributions V k (x) := V k (x i ) : i = 1, . . . , N . If the distribution of X k (t) is unimodal with light enough tails, the median can be well approximated by the mean µ N t (V k ). Since both quantities can be computed with similar computational effort (or well approximated at reduced cost using only a subset of the ensemble), choosing should be computationally optimal. In particular, the simplest choice c = 0 in the cloning algorithm is in general far from optimal, so is choosing c = inf x∈E V k (x) to get rid of the killing part of the dynamics (see first remark in Section 3.3).
Intuitively, algorithms (3.7) and (3.8) should lead to even lower total selection rates since every selection event increases the fitness potential, while in algorithms based on (3.6) it increases only on average and may also decrease as the result of selection events. Indeed for (3.7) we have by symmetry of summations and the triangle inequality. The inequality is strict except for degenerate cases, e.g. if V k (x i ) takes only two values, and c lies in between the two. In practice, in the scenarios which we have investigated, it turns out that unless the distribution of X k (t) is seriously skewed, S 2 k is strictly smaller than S 1 k by a sizeable amount, as is illustrated later in Figure 2 for the inclusion process. Algorithm (3.8) provides further improvement with Here we have used and Jensen's inequality to compare with S 2 k (x), since v → |a − v| is convex for all a ∈ R. Note that the rate of change of the mean fitness µ N t (V k ) is given by the same expression in all the above particle approximations, The first term due to mutation dynamics L k can have either sign and is identical in all algorithms, while the second due to selection is positive and given by the empirical variance of V k . This follows from direct computations using the averaged test function F (x) = µ N (x)(V k ) in (3.6), (3.7), (3.8) and (3.22), and is consistent with the evolution equation (2.24). So the mean fitness evolves until a mutation selection balance is reached and the rate of change (4.4) vanishes, characterizing the stationary state of the particle approximation process. Note that this basic mechanism is identical in all particle approximations discussed here, so we expect the mean fitness to show a very similar behaviour. While finite size effects can lead to deviations also in the mean, the main difference between the algorithms is found on the level of variances and time correlations, which can be significantly reduced using (3.7) or (3.8) as illustrated in the next subsections. Since our main observable of interest Λ N k (t) is an ergodic time average of µ N t (V k ), this can lead to significant improvements in the accuracy of the estimator (3.2).
The correlations introduced by selection are counteracted by mutation dynamics, which occur independently for each particle and decorrelate the ensemble. The dynamics of correlation structures in cloning algorithms has been discussed in some detail recently in [38,7,8,13], and can be understood in terms of ancestry in the generic population dynamics interpretation. Those results also discuss important non-ergodicity effects in the measurement of path properties and the interpretation of particle trajectories, which were already pointed out in [3] and are also a subject of recent research [39]. This poses interesting questions for rigorous mathematical investigations which are left to future work. Here we simply conclude with a numerical test in the next subsections, which supports the intuition that approximation (3.7) with minimal selection rates leads to variance reduction in the relevant estimators compared to the cloning algorithm. Since the selection rate in (3.7) depends on potential differences between pairs, implementation is in general more involved than for algorithms based on (3.6). While the scaling tN log N of computational complexity with the size N of the clone ensemble is the same, the prefactor and computational cost in practice may be higher and this has to be traded off against gains in accuracy on a case by case basis. For the examples studied below we find a computationally efficient implementation of (3.7) providing a clear improvement over the standard cloning algorithm, which is the main contribution of this paper in this context. Algorithm (3.8), on the other hand, provides only marginal improvement over (3.7), but cannot be implemented as efficiently in our area of interest.

Current large deviations for lattice gases
In the following we consider one-dimensional stochastic lattice gases with periodic boundary conditions on the discrete torus T L with L sites and a fixed number of particles M . Within our general framework, they are simply Markov chains on the finite state space E of all particle configurations, which have been of recent research interest in the context of current fluctuations. We denote configurations by η = (η x : x ∈ T L ) where η x ∈ N 0 is interpreted as the mass (or number of monomers) at site x, and the process is denoted as (η(t) : t ≥ 0). In order to use standard notation for lattice gases, in this and the following subsection we change notation, and in particular the use of x, y ∈ T L is different to the use of those symbols in previous sections where they denoted states in E. Monomers jump to nearest neighbour sites with rates u(η x , η y ) ≥ 0 for y = x ± 1 depending on the occupation numbers of departure and target site, multiplied with a spatial bias p = 1 − q ∈ [0, 1]. The generator is of the form where σ x,y η results from the configuration η after moving one particle from x to y. The number of particles M = x∈T L η x is a conserved quantity, but otherwise we assume the process to be irreducible for any fixed M , which is ensured for example by positivity of the rates, i.e. for all k, l ≥ 0 u(k, l) = 0 ⇔ k = 0 .
This class includes various models that have been studied in the literature, for example the inclusion process introduced in [19], where with a positive parameter d > 0. Particles perform independent jumps with rate d and in addition are attracted by each particle on the target site with rate 1, giving rise to the 'inclusion' interaction. This model has attracted recent attention due the presence of condensation phenomena [40,41] and in the context of large deviations of the particle current [20], and we will use this as an example in Section 4.3. Other well-studied models covered by our set-up are the exclusion process with state space E ⊂ {0, 1} T L and u(η x , η y ) = η x (1 − η y ), or zerorange processes with E ⊂ N T L 0 and rates u(η x , η y ) = u(η x ) depending only on the occupation number on the departure site.
In terms of previous notation, the jump rates for a lattice gas of type (4.5) between any two configurations η and ζ are given as In the following we focus on lattice gases where x u(η x , η x+1 ) = x u(η x , η x−1 ) for all configurations η. While this is not true in general for models of type (4.5), it holds for many examples including inclusion, exclusion and zero-range processes mentioned above. With p + q = 1, the total exit rate out of configuration η is then simply given by We are interested in an observable A T measuring the total particle current up to time T , which is achieved by choosing h(η) ≡ 0 in (2.5) and g(η, ζ) = ±1 if ζ = σ x,x±1 η and g(η, ζ) = 0 otherwise .
Using (4.8) we see by direct computation that the potential (2.11) takes the simple form (4.9) Modified mutation rates W k (η, ζ) are given by (4.7) replacing (p, q) by (pe k , qe −k ), leading to modified total exit rates The similarity of V k and w k for lattice gases (4.5) that obey (4.8) provides a direct relation between mutation and selection rates, and allows us to set up an efficient rejection based implementation of a particle approximation (η k (t) : t ≥ 0) based on the efficient algorithm (3.7). In the following we omit the subscript k for configurations and write η(t) = (η i (t), i = 1, . . . , N ) to simplify notation. For given parameters p, q = 1 − p and fixed k ∈ R we distinguish two cases.
Q k < 1. We sample the ensemble of N clones at a total rate of W(η) := N i=1 w(η i ), and pick a clone i with probability w(η i )/W(η) for the next event. With probability Q k ∈ (0, 1) this is a simple mutation within clone i, and then we replace η i by ζ i with probability W k (η i , ζ i )/w k (η i ). Otherwise, with probability 1 − Q k we perform a selection event following the second line in (3.7): Pick a clone j uniformly at random (including i). If (with (4.9) and since Q k < 1), replace η i by η j with probability w(η i ) − w(η j ) /w(η i ). This procedure ensures that mutation and selection events are sampled with the correct rates as required in (3.7).
Q k > 1. We sample the ensemble of N clones at a total rate of Q k W(η), and pick a clone i with probability w(η i )/W(η) and a clone j uniformly at random. If w(η j ) < w(η i ) we replace η j by η i with prob- . Then we mutate clone i as above, combining the mutation and selection event as in the cloning algorithm.
Remarks. Note that Q k = 1 is equivalent to k = 0, which corresponds to the original process with λ 0 = 0 and does not require any estimation. For Q k > 1 we perform mutation and selection events simultaneously, in analogy to the cloning procedure explained in Section 3.3, but can use the efficient algorithm (3.7). For Q k < 1 no mutation or selection event occurs with probability (1 − Q k ) w(η j ) w(η i ) 1(w(η j ) < w(η i )), and a high rate of such rejections is not desirable for computational efficiency. But even for very small values of Q k the second factor is usually significantly smaller than 1 (or simply 0), since clone i was picked with probability proportional to w(η i ) and j uniformly at random.
Note also that if the cloning algorithm (3.22) is implemented with the common choice c = 0 for a lattice gas of the type discussed here, due to (4.9) and (4.10) the average number of clones per event (3.20) is and 0 for Q k < 1, where only killing occurs. In particular, this is independent of the state η of the clone ensemble, and the standard distribution of the form (3.26) is a simple Bernoulli random variable.
, which is also where the modified drift vanishes.
While with (4.10) the total mutation rate is Q k W(η), selection rates (4.1), (4.2) and (4.3) can be written as So for very small values of Q k close to 0 the mutation rate can become very small in comparison to selection, which means that significant computation time is devoted to re-weighting by selection, rather than advancing the dynamics via mutation events. This effect is typically much stronger for the standard cloning algorithm with c = 0, and occurs for example for totally asymmetric lattice gases with p = 1 and negative k conditioning on low currents. In Figure 1 we include a sketch of Q k for different values of asymmetry, including also the drift of the modified dynamics, which can be reversed in partially asymmetric systems.
In Figure 2 we compare the cloning algorithm to algorithms (3.7) and (3.8) for an inclusion process with d = 1, L = 64, M = 128 and asymmetry p = 0.7. It is known [20] that the SCGF λ k scales linearly with the system size L, and outside the convergent regime k ∈ [− ln( 1−p p ), 0] ≈ [−0.85, 0] the rescaled SCGF λ k /L diverges as L → ∞ (divergent regime). We compare estimates Λ N k (t) for the cloning algorithm (3.22) with c = 0 and algorithm (3.7) in the convergent regime. We use initial conditions where M particles are distributed on L lattice sites uniformly at random, and a burn-in time of 10 · L = 640 as discussed in (2.19) and (2.20). This leads to an obvious adaption  of the integration interval in the estimator Λ N k (t) (3.2), but we do not alter the notation here to keep it simple. Both algorithms perform very well and agree with a simple theoretical estimate based on bias reversal, which is not the main concern in this paper and we refer the reader to [20]. Enlarged error bars indicating 5 standard deviations reveal that (3.7) is significantly more accurate than (3.22). This is due to lower total selection rates S k illustrated at the bottom in the converging and diverging regime. While S 2 k for (3.7) is much lower than S 1 k with c = 0, S 3 k does not offer significant further improvement. Since the efficient rejection based implementation of (3.7) explained above does not work for (3.8), we focus on (3.7) in our context. The much higher selection rate for the cloning algorithm with c = 0 leads to a significantly higher time variation of the average potential in the convergent regime compared to algorithm (3.7), as is illustrated in Figure 3. So in comparison to standard cloning, algorithm (3.7) leads to reduced finite size effects and/or a significant variance reduction in this example, and a significant improvement of convergence of the estimator (3.2). We have checked that this also holds for zero-range processes with bounded rates. These promising first numerical results pose interesting questions for a systematic study of practical properties of the algorithms and associated time correlations for future work, also in comparison with various recent results on improvements of cloning algorithms [7,12,11]

Details for the inclusion process
We summarize the procedure outlined in the previous subsection for the inclusion process with rates (4.6) on the torus T L with M particles in pseudo-code given below. Besides fixing the model parameter d > 0 and the tilt k ∈ R, the specific parameters for the estimator are the ensemble size N and the total simulation time t, which lead to the estimator Λ N k (t) as given in (3.2). For simplicity we do not include any burn-in times in this description, which would obviously be used in practice. In this implementation we make a further simplification which is very common for continuous-time jump dynamics of large systems: We replace exponentially distributed random time increments by their expectation, given by ∆t = 1/W(η) for Q k < 1 and ∆t = 1/(Q k W(η)) for Q k > 1. Since with (4.9) we get for increments in the evaluation of the ergodic time integral in (3.2) These are independent of the actual state η of the clones, so evaluation of Λ N k (t) in (3.2) can be achieved by a simple integer counterΛ N k as explained in the pseudocode Algorithm 1 and 2. While this counter may appear similar to the cloning factor explained in Section 3.4 at first glance, we want to stress that here finer increments of +1 are added after every event (not only selections).

Discussion
We have presented an analytical approach to cloning algorithms based on McKean interpretations of Feynman-Kac semigroups that have been introduced in the applied probability literature. This allows us to establish rigorous error bounds for the cloning algorithm in continuous time, and to suggest a more efficient variant of the algorithm which can be implemented effectively for current large deviations in stochastic lattice gases. The latter is based on minimizing the selection rate in a standard population dynamics interpretation of particle approximations of non-linear processes. We include a first application of this idea in the context of inclusion processes, but its full potential will be explored in future more systematic studies of optimization of cloning-type algorithms. The rigorous results fully reported in [18] apply under very general conditions, demanding bounded jump rates and existence of a spectral gap for the underlying jump process. These impose no restriction for lattice gases with a fixed number of particles, which are essentially finite state Markov chains. We anticipate that these techniques can also be applied for more general processes including diffusive, piecewise deterministic, or possibly non-Markovian dynamics (see [42] for first heuristic results in this direction). Another interesting direction would be a rigorous analysis of the detailed ergodic properties of trajectories in the clone ensemble based on recent results in [7,8,39].