Fluctuations in the heterogeneous multiscale methods for fast-slow systems

How heterogeneous multiscale methods (HMM) handle fluctuations acting on the slow variables in fast-slow systems is investigated. In particular, it is shown via analysis of central limit theorems (CLT) and large deviation principles (LDP) that the standard version of HMM artificially amplifies these fluctuations. A simple modification of HMM, termed parallel HMM, is introduced and is shown to remedy this problem, capturing fluctuations correctly both at the level of the CLT and the LDP. Similar type of arguments can also be used to justify that the tau-leaping method used in the context of Gillespie's stochastic simulation algorithm for Markov jump processes also captures the right CLT and LDP for these processes.


Introduction
The heterogeneous multiscale methods (HMM) [EE03, VE03, EEL + 07, AEEVE12] provide an efficient strategy for integrating fast-slow systems of the type The method relies on an averaging principle that holds under some assumption of ergodicity and states that as ε → 0 the slow variables X ε can be uniformly approximated by the solution to the following averaged equationẊ = F (X) . (1.2) Here F (x) = f (x, y)µ x (dy) is the averaged vector field, with µ x (dy) being the ergodic invariant measure of the fast variables Y x with a frozen x variable. This averaging principle is akin to the law of large number (LLN) in the present context and it suggests to simulate the evolution of the slow variables using (1.2) rather than (1.1) when ε is small. This requires to estimate F (x), which typically has to be done on-the-fly given the current value of the slow variables. To this end, note that if Euler's method with time step ∆t is used as integrator for the slow variables in (1.1), we can approximate X ε (n∆t) by x n satisfying the recurrence where Y ε x denotes the solution to the second equation in (1.1) with X ε kept fixed at the value x. If ε is small enough that ∆t/ε is larger than the mixing time of Y ε x , the Birkhoff integral in (1.4) is in fact close to the averaged coefficient in (1.2), in the sense that (1.4) Therefore (1.3) can also be thought of as an integrator for the averaged equation (1.2). In fact, when ε is small, one can obtain a good approximation of F (x) using only a fraction of the macro time step. In particular, we expect that with λ ≥ 1 provided that ∆t/(ελ) remains larger than the mixing time of Y ε x . This observation is at the core of HMM-type methods -in essence, they amount to replacing (1.3) by x n+1 = x n + ∆t F n (x n ) . (1.6) Since the number of computations required to compute the effective vector field F n (x) is reduced by a factor λ, this is also the speed-up factor for an HMM-type method. From the argument above, it is apparent that there is another, equivalent way to think about HMM-type methods, as was first pointed out in [FVE04] (see also [VE07,ERVE09,ASST12, AEK + 13]. Indeed, the integral defining F n (x) in (1.5) can be recast into an integral on the full interval [n∆t, (n + 1)∆t] by a change of integration variables, which amount to rescaling the internal clock of the variables Y ε x . In other words, HMM-type methods can also be thought of as approximating the fast-slow system in (1.1) by If ε ≪ 1, we can reasonably replace ε with ελ, provided that this product still remains small -in particular, the evolution of the slow variables in (1.7) is still captured by the limiting equation (1.2). Hence HMM-type methods are akin to artificial compressibility [Cho67] in fluid simulations and Car-Parrinello methods [CP85] in molecular dynamics. The approximations in (1.5) or (1.7) are perfectly reasonable if we are only interested in staying faithful to the averaged equation (1.2) -that is to say, HMM-type approximations will have the correct law of large numbers (LLN) behavior. However, the fluctuations about that average will be enhanced by a factor of λ. This is quite clear from the interpretation (1.7), since in the original model (1.1), the local fluctuations about the average are of order √ ε and in (1.7) they are of order √ ελ. The large fluctuations about the average caused by rare events are similarly inflated by a factor of λ. This can be an issue, for example in metastable fastslow systems, where the large fluctuations about the average determine the waiting times for transitions between metastable states. In particular we shall see that an HMM-type scheme drastically decreases these waiting times due to the enhanced fluctuations.
In this article we propose a simple modification of HMM which corrects the problem of enhanced fluctuations. The key idea is to replace the approximation (1.5) with 1 ∆t where each Y ε,j x is an independent copy of Y ε x . By comparing (1.5) with (1.8), we see that the first approximation is essentially replacing a sum of λ weakly correlated random variables with one random variable, multiplied by λ. This introduces correlations that should not be there and in particular results in enhanced fluctuations. In (1.8), we instead replace the sum of λ weakly correlated random variables with a sum of λ independent random variables. This is a much more reasonable approximation to make, since these random variables are becoming less and less correlated as ε gets smaller. Since the terms appearing on the right hand side are independent of each other, they can be computed in parallel. Thus if one has λ CPUs available, then the real time of the computations is identical to HMM. For this reason, we call the modification the parallelized HMM (PHMM). Note that, in analogy to (1.7), one can interpret PHMM as approximating (1.1) by the system It is clear that this approximation will be as good as (1.7) in term of the LLN, but in contrast with (1.7), we will show below that it captures the fluctuations about the average correctly, both in terms of small Gaussian fluctuations and large fluctuations describing rare events. A similar observation in the context of numerical homogenization was made in [BJ11,BJ14]. The outline of the remainder of this article is as follows. In Section 2 we recall the averaging principle for stochastic fast-slow systems and describe how to characterize the fluctuations about this average, including local Gaussian fluctuations and large deviation principles. In Section 3 we recall the HMM-type methods. In Section 4 we show that they lead to enhanced fluctuations. In Section 5 we introduce the PHMM modification and in Section 6 show that this approximation yields the correct fluctuations, both in terms of local Gaussian fluctuations and large deviations. In Section 7 we test PHMM for a variety of simple models and conclude in Section 8 with a discussion.

Average and fluctuations in fast-slow systems
For simplicity we will from here on assume that the fast variables are stochastic. This assumption is convenient, but not necessary, since all the averaging and fluctuation properties stated below are known to hold for large classes of fast-slow systems with deterministically chaotic fast variables [Kif92,Dol04,KMb,KMa]. The fast-slow systems we investigate are given by , and W is a standard Wiener process in R e . We assume that for every x ∈ R d , the Markov process described by the SDE is ergodic, with invariant measure µ x , and has sufficient mixing properties. For full details on the necessary mixing properties, see for instance [FW12].
In this section we briefly recall the averaging principle for stochastic fast-slow systems and discuss two results that characterize the fluctuations about the average, the central limit theorem (CLT) and the large deviations principle (LDP).

Averaging principle
As ε → 0, each realization of X ε , with initial condition X ε (0) = x, tends towards a trajectory of a deterministic system dX dt where F (x) = f (x, y)µ x (dy) and µ x is the invariant measure corresponding to the Markov The convergence is in an almost sure and uniform sense: lim for every fixed T > 0, every choice of initial condition x and almost surely every initial condition Y ε (0) (a.s. with respect to µ x ) as well as every realization of the Brownian paths driving the fast variables. Details of this convergence result in the setting above are given in (for instance) [FW12, Chapter 7.2].

Small fluctuations -CLT
The small fluctuations of X ε about the averaged systemX can be understood by characterizing the limiting behavior of as ε → 0. It can be shown that the process Z ε converges in distribution (on the space of continuous functions C([0, T ]; R d ) endowed with the sup-norm topology) to a process Z defined by the SDE dZ = B 0 (X)Zdt + η(X)dV , Z(0) = 0 , (2.4) HereX solves the averaged system in (2.3), V is a standard Wiener process, B 0 := B 1 + B 2 with and We include next a formal argument deriving this limit, as it will prove useful when analyzing the multiscale approximation methods. We will replicate the argument given in [BGTVE15]; a more complete and rigorous argument can be found in [FW12,Chapter 7.3]. First, we write a system of equation for the triple (X, Z ε , Y ε ) in the following approximated form, which uses nothing more than Taylor expansions of the original system in (1.1): We now proceed with a classical perturbation expansion on the generator of the triple (X, Z ε , Y ε ).
In particular we have L ε = 1 ) and introduce the ansatz u ε = u 0 + √ εu 1 + εu 2 + . . . . By substituting u ε into ∂ t u ε = L ε u ε and equating powers of ε we obtain From the O(ε −1 ) identity, we obtain u 0 = u 0 (x, z, t), confirming that the leading order term is independent of y. By the Fredholm alternative, the O(ε −1/2 ) identity has a solution u 1 which has the Feynman-Kac representation where Y x denotes the Markov process generated by L 0 , i.e. the solution of (2.2). Finally, if we average the O(1) identity against the invariant measure corresponding to L 0 , we obtain Clearly, this is the forward Kolmogorov equation for the Markov process (X, Z) defined by with B 0 and η defined as above.

Large fluctuations -LDP
A large deviation principle (LDP) for the fast-slow system (2.1) quantifies probabilities of O(1) fluctuations of X ε away from the averaged trajectoryX. The probability of such events vanishes exponentially quickly and as a consequence are not accounted for by the CLT fluctuations, hence an LDP accounts for the rare events.
We say that the slow variables X ε satisfy a large deviation principle (LDP) with action functional whereΓ andΓ denote the interior and closure of Γ respectively. An LDP also determines many important features of O(1) fluctuations that occur on large time scales, such as the probability of transition from one metastable set to another. For example, suppose that X ε is known to satisfy an LDP with action functional S [0,T ] . Let D be an open domain in R d with smooth boundary ∂D and let x * ∈ D be an asymptotically stable equilibrium for the averaged systemẊ = F (X). When ε ≪ 1, we expect that a trajectory of X ε that starts in D will tend towards the equilibrium x * and exhibit O( √ ε) fluctuations about the equilibrium -these fluctuations are described by the CLT. On very large time scales, these small fluctuations have a chance to 'pile up' into an O(1) fluctuation, producing behavior of the trajectory that would be considered impossible for the averaged system. Such fluctuations are not accurately described by the CLT and requires the LDP instead. For example, the asymptotic behaviour of escape time from the domain D, can be quantified in terms of the quasi-potential defined by Under natural conditions, it can be shown that for any Hence the time it takes to pass from the neighborhood of one equilibrium to another may be quantified using the LDP. Details on the escape time of fast-slow systems can be found in [ where Y x denotes the Markov process governed by dY Then the action functional is given by It can also be shown that the function (2.10) Donsker-Varadhan theory tells us that the connection between Hamilton-Jacobi equations and LDPs is in fact much deeper. Firstly, Varadhan's Lemma states that if a process X ε is known to satisfy an LDP with some associated Hamiltonian H, then for any ϕ : where S t is the semigroup associated with the Hamilton-Jacobi equation Conversely, if it is known that (2.11) holds for all (x, t) and a suitable class of ϕ, then the inverse Varadhan's lemma states that X ε satisfies an LDP with action functional given by (2.8), (2.9). Hence we can use (2.11) to determine the action functional for a given process.
In the next few sections, we will exploit both sides of Varadhan's lemma when investigating the large fluctuations of the HMM and related schemes. More complete discussions on Varadhan's Lemma can be found in [DZ09, Chapters 4.3, 4.4].

HMM for fast-slow systems
When applied to the stochastic fast-slow system (2.1), HMM-type schemes rely on the fact that the slow X ε variables, and the coefficients that govern them, converge to a set of reduced variables as ε tends to zero. We will describe a simplest version of the method below, which is more convenient to deal with mathematically.
Before proceeding, we digress briefly on notation. When referring to continuous time variables we will always use upper case symbols (X ε , Y ε etc) and when referring to discrete time approximations we will always use lower case symbols (x ε n , y ε n etc). We will also encounter continuous time variables whose definition depends on the integer n for which we have t ∈ [n∆t, (n + 1)∆t). We will see below that such continuous time variables are used to define discrete time approximations. In this situation we will use upper case symbols with a subscript n (eg. X ε n ). Let us now describe a 'high-level' version of HMM. Fix a step size ∆t and define the intervals I n,∆t := [n∆t, (n + 1)∆t). On each interval I n,∆t we update x ε n ≈ X ε (n∆t) to x ε n+1 ≈ X ε ((n + 1)∆t) via an iteration of the following two steps: 1. (Micro step) Integrate the fast variables over the interval I n,∆t , with the slow variable frozen at X ε = x ε n . That is, the fast variables are approximated by for n∆t ≤ t ≤ (n+1/λ)∆t with some λ ≥ 1 (that is, we do not necessarily integrate the Y ε n variables over the whole time window). Due to the ergodicity of Y x , the initialization of Y ε n is not crucial to the performance of the algorithm. It is however convenient to use Y ε n+1 (0) = Y ε n ((n + 1/λ)∆t), since this reinitialization leads to the interpretation of the HMM scheme given in (3.5) below.
2. (Macro step) Use the time series from the micro step to update x ε n to x ε n+1 via Note that we do not require Y ε n over the whole ∆t time step, but only a fraction of the step large enough for Y ε n to mix. Indeed, if ε is small enough, we have the approximate equality λ ∆t since both sides are close the the ergodic mean f (x ε n , y)dµ x ε n (y). Clearly, the efficiency of the methods comes from the fact that we do not need to compute the fast variables on the whole time interval I n,∆t but only a 1/λ fraction of it. Hence λ should be considered the speed-up factor of HMM.
As already stated, the algorithm above is a high-level version, in that one must do further approximations to make the method implementable. For example, one typically must specify some approximation scheme to integrate (3.1), for instance with Euler-Maruyama we compute the time series by y ε n,m+1 = y ε n,m + δt ε g(x ε n , y ε n,m ) + δt ε σ(x ε n , y ε n,m )ξ n,m , where 0 ≤ m ≤ M is the index within the micro step, ξ n,m are i.i.d. standard Gaussians and the micro-scale step size δt is much smaller than the macro-scale step size ∆t. In the macro step, we would similarly have f (x, y ε n,m ) and M = ∆t/(δtλ). The following observation, which is taken from [FVE04], will allow us to easily describe the average and fluctuations of the above method. On each interval I n,∆t , the high-level HMM scheme described above is equivalently given by x ε n+1 = X ε n ((n + 1)∆t), where X ε n solves the system defined on the interval n∆t ≤ t ≤ (n + 1)∆t, with the initial condition X ε n (n∆t) = x ε n . This can be checked by a simple rescaling of time. It is clear that the efficiency of HMM essentially comes from saying that the fast-slow system is not drastically changed if one replaces ε with the slightly larger, but still very small ελ.

Average and fluctuations in HMM methods
In this section we investigate whether the limit theorems discussed in Section 2, i.e. the averaging principle, the CLT fluctuations and the LDP fluctuations, are also valid in the HMM approximation a fast-slow system. We will see that the averaging principle is the only property that holds, and that both types of fluctuations are inflated by the HMM method.

Averaging
By construction, HMM-type schemes capture the correct averaging principle. More precisely, if we take ε → 0 then the sequence x ε n converges to somex n , wherex n is a numerical approximation of the true averaged systemX. If this numerical approximation is well-posed, the limits ε → 0 and ∆t → 0 commute with one another. Hence the HMM approximation x ε n is consistent, in that it features approximately the same averaging behavior as the original fast-slow system.
We will argue the claim by induction. Suppose that for some n ≥ 0 we know that lim ε→0 x ε n =x n (the n = 0 claim is trivial, since they are both simply the initial condition). Then, using the representation (3.5) we know that x ε n+1 = X ε n ((n + 1)∆t) where X ε n (n∆t) = x ε n . Since (3.5) is a fast-slow system of the form (2.1) we can apply the averaging principle from Section 2. In particular it follows that X ε n →X n uniformly (and almost surely) on I n,∆t , whereX n satisfies the averaged ODE Since the right hand side is a constant, it follows that x ε n+1 →x n+1 as ε → 0, wherē This is nothing more than the Euler approximation of the true averaged variablesX, which completes the induction and hence the claim.
Introducing an integrator in to the micro-step will make things more complicated, as the invariant measures appearing will be those of the discretized fast variables. In [MSH02] it is shown that discretizations of SDEs often do not possess the ergodic properties of the original system. For those situations where no such issues arise, rigorous arguments concerning this scenario, including rates of convergence for the schemes, are given in [ELVE05].

Small fluctuations
For HMM-type methods, the CLT fluctuations about the average become inflated by a factor of √ λ. That is, if we define then as ε → 0, the fluctuations described by z ε n+1 are not consistent with (2.4), but rather with the SDE dZ = B(X)Zdt + √ λη(X)dV , Z(0) = 0 (4.1) whereX satisfies the correct averaged system. As above, by consistency we mean that when we take ε → 0, the sequence {z ε n } n≥0 converges to some well-posed discretization of the SDE (4.1). Since Z(0) = 0, it is easy to see that the solution to this equation is simply √ λ times the solution of (2.4). Hence the fluctuations of the HMM-type scheme are inflated by a factor of √ λ. It is convenient to look instead at the rescaled fluctuationŝ since this allows us to reproduce the argument from Section 2.2, with ε ′ = ελ playing the role of ε. We will again argue by induction, assuming for some n ≥ 0 thatẑ ε n →ẑ n as ε → 0 (the n = 0 case is trivial).
The rescaled fluctuations are given byẑ ε n+1 = Z ε n ((n + 1)∆t) where Z ε n (t) = (X ε n (t) − X n (t))/ √ ελ and X ε n (t) is governed by the system (3.5) with initial condition X ε n (n∆t) = x ε n andX n satisfies dX n dt = F (x n ) with initial conditionX n (n∆t) =x n . We can then obtain the reduced equations for the pair (X ε n , Z ε n ) by arguing exactly as in Section 2. Indeed, the triple (X n , Z ε n , Y ε n ) is governed by the system From here on we can carry out the calculation precisely as in Section 2.2, with the added convenience of the vector fields no longer depending on x as a variable. In doing so we obtain Z ε n → Z n (in distribution) as ε → 0, where with the initial condition defined recursively by Z n (n∆t) =ẑ n . Using the fact thatẑ n+1 = Z n ((n + 1)∆t), we obtain where ξ n are iid standard Gaussians. Hence we obtain the Euler-Maruyama scheme for the correct CLT (2.4). However, sinceẑ ε n describes the rescaled fluctuations, we see that the true fluctuations z ε n of HMM are consistent with the inflated (4.1).

Large fluctuations
As with the CLT, the LDP of the HMM scheme is not consistent with the true LDP of the fast-slow system, but rather a rescaled version of the true LDP. In particular, define u λ,∆t by for t ∈ I n,∆t . If the O(1) fluctuations of HMM were consistent with those of the fast-slow system, we would expect u λ,∆ to converge to the solution of (2.10) as ∆t → 0. Instead, we find that as ∆t → 0, u λ,∆t (t, x) converges to the solution to the Hamilton-Jacobi equation In light of the discussion in Section 2.3, the reverse Varadhan lemma suggests that the HMM scheme is consistent with the wrong LDP. Before proving this claim, we first discuss some implications.
The rescaled Hamilton-Jacobi equation implies that the action functional for HMM will be a rescaled version of that for the true fast-slow system. Indeed, it is easy to see that the Langrangian corresponding to HMM simplifies to where L is the Lagrangian for the true fast-slow system. Thus, the action of the HMM approximation is given by where S is the action of the true fast-slow system.
In particular, it follows immediately from the definition that the HMM approximation has quasi-potential V (x, y) = λ −1 V (x, y), where V is the true quasi-potential. As a consequence, the escape times for the HMM scheme will be drastically faster than those of the fast-slow system. In the terminology of Section 2.3, if we let τ ε,∆t be the escape time for the HMM scheme then for ε, ∆t ≪ 1 we expect where ≍ log-asymptotic equality. Thus, the log-expected escape times are decreasing proportionally with λ. On the other hand, since the HMM action is a multiple of the true action, the minimizers will be unchanged by the HMM approximation. Hence the large deviation transition pathways will be unchanged by the HMM approximation.
To justify the claim for u λ,∆t (4.2), we first introduce some notation. Let S (α) t be the semigroup associated with the Hamilton-Jacobi equation notice that this is the same as the true Hamilton-Jacobi equation (2.10) but with the first argument of the Hamiltonian now frozen as a parameter α. The necessity of the parameter α is due to the fact that in the system for (X ε n , Y ε n ), the x variable in the fast process is frozen to its value at the left end point of the interval, and hence is treated as a parameter on each interval. We also introduce the operator S t ψ(x) = S (α) t ψ(x)| α=x and also S λ,t = λ −1 S t (λ·).

In this notation, it is simple to show that
(4.5) We will verify (4.5) by induction, starting with the n = 1 case. Since, on the interval I 0,∆t , the pair (X ε 0 , Y ε 0 ) is a fast-slow system of the form (2.1) with ε replaced by ελ, it follows from Section 2.3 that X ε 0 satisfies an LDP with action functional derived from the Hamiltonian-Jacobi equation (4.4), with the parameter α set to the value of X ε 0 at the left endpoint, which is X ε 0 (0) = x. Hence, it follows from Varadhan's lemma that for any suitable ψ : as claimed. Now, suppose (4.5) holds for all k with n ≥ k ≥ 1, then By the inductive hypothesis, we have that Applying (4.7) under the expectation in (4.6) (see Remark 4.1) we see that Now applying the inductive hypothesis with n = 1 and ψ(·) = (S λ,∆t ) n ϕ(·) E x exp ε −1 (S λ,∆t ) n ϕ(x ε 1 ) ≍ exp ε −1 S λ,∆t (S λ,∆t ) n ϕ(x) , which completes the induction.
By definition, we therefore have u λ,∆t (t, x) = (S λ,∆t ) n ϕ(x) when t ∈ I n,∆t . All that remains is to argue that u λ,∆t converges to the solution of (4.2) as ∆t → 0. But this can be seen from the expansion of the semigroup which yields the desired limiting equation. Remark 4.2. From the discussion above, it appears that the mean transition time can be estimated from HMM upon exponential rescaling, see (4.3). This is true, but only at the level of the (rough) log-asymptotic estimate of this time. How to rescale the prefactor is by no means obvious. As we will see below PHMM avoids this issue altogether since it does not necessitate any rescaling.

Parallelized HMM
There is a simple variant of the above HMM-type scheme which captures the correct average behavior and fluctuations, both at the level of the CLT and LDP. In a usual HMM type method, the key approximation is given by which only requires computation of the fast variables on the interval [n∆t, (n + 1/λ)∆t]. This approximation is effective at replicating averages, but poor at replicating fluctuations. Indeed, for each j, the time series Y ε n on the interval [(n + j/λ)∆t, (n + (j + 1)/λ)∆t] is replaced with an identical copy of the time series from the interval [n∆t, (n + 1/λ)∆t]. This introduces strong correlations between random variables that should be essentially independent. Parallelized HMM avoids this issue by employing the approximation where Y ε,j n are for each j independent copies of the time series computed in (5.1). Due to their independence, each copy of the fast variables can be computed in parallel, hence we refer to the method as parallel HMM (PHMM). The method is summarized below.
1. (Micro step) On the interval I n,∆t , simulate λ independent copies of the of the fastvariables, each copy simulated precisely as in the usual HMM. That is, let for j = 1, . . . , λ with W j independent Brownian motions. As with ordinary HMM, we will not require the time series of the whole interval I n,∆t but only over the subset [n∆t, (n + 1/λ)∆t).
2. (Macro step) Use the time series from the micro step to update x ε n to x ε n+1 by As with the HMM-type method, it will be convenient to write PHMM as a fast-slow system (when restricted to an interval I n,∆t ). Akin to (3.5), it is easy to verify that the parallel HMM scheme is described by the system for j = 1, . . . , λ with the initial condition X ε n (n∆t) = x ε n .

Average and fluctuations in parallelized HMM
In this section we check that the averaged behavior and the fluctuations in the PHMM method are consistent with those in the original fast slow system.

Averaging
Proceeding exactly as in Section 4.1, it follows that as ε → 0 the PHMM scheme x ε n+1 converges tox n+1 =X n ((n + 1)∆t) where with initial conditionX n (n∆t) =x n . Hence, we are in the exact same situation as with ordinary HMM, so the averaged behavior is consistent with that of the original fast slow system.

Small fluctuations
We now show that the fluctuations are consistent with the correct CLT fluctuations, described by (2.4). As in Section 4.2, we instead look at the rescaled fluctuationŝ In particular we will show that these rescaled fluctuations are consistent with The claim for z ε will follows immediately from the claim forẑ ε . We have thatẑ ε n+1 = Z ε n ((n + 1)∆t) where with X ε n given by the system (5.4) andX n given by the averaged equation (6.1). As in Section 4.2, we derive a system for the triple (X n , Z ε n , Y ε n ), where now the fast process has λ independent components Y ε n = ( Y ε,1 n , . . . , Y ε,λ n ): With a modicum added difficulty, we can now argue as in Section 2.2 with ε ′ = ελ playing the role of ε. The invariant measure µ λ x (dy) associated with the generator of Y ε n is now the product measure µ λ x (dy 1 , . . . , dy λ ) = µ x (dy 1 ) . . . µ x (dy λ ) where µ x is the invariant measure associated with L 0 from Section 2.2. This product structure simplifies the seemingly complicated expressions arising in the perturbation expansion of (6.3). In the setting of Section 2.2 we have that u 0 = u 0 (x, z, t) and the Feynman-Kac representation of (6.4) yields The equation for u 0 is now given by By expanding the product measure, the second term on the right hand side of (6.5) becomes Likewise, using the independence of Y j x for distinct j, the third term becomes where the expectation is taken over realizations of Y j x with Y j x (0) ∼ µ x . Finally, since the ∇ y j E y k term vanishes on the off-diagonal, the last term in (6.5) reduces to It follows immediately that the reduced equation for the pair (X n ,Ẑ ε n ) is with initial conditions Z n (n∆t) =ẑ n andX n (n∆t) =x n . Hence we see thatẑ n+1 is described byẑ which is the Euler-Maruyama scheme for (6.2).

Large fluctuations
In this section we show that the LDP for PHMM is consistent with the true LDP from Section 2.3. In particular, let for t ∈ I n,∆t , where x ε n is the PHMM approximation. We will argue that u λ,∆t (t, x) → u(t, x) as ∆t → 0, where u solves the correct Hamilton-Jacobi equation (2.10).
The argument is a slight modification of that given in Section 4.3. Before proceeding, we recall the notation S where H is the Hamiltonian defined by (2.7). We also define the operator S ∆t ϕ(x) = S (α) ∆t ϕ(x)| α=x . As in Section 4.3, the claim follows from the asymptotic statement Given (6.7), by an identical argument to that started in Equation (4.8), it follows from (6.7) that u λ,∆t is indeed a numerical approximation of the solution to (6.6) and hence u λ,∆t → u as ∆t → 0. We will verify (6.7) by induction, starting with the n = 1 case. Since (X ε , Y ε 0,1 , . . . , Y ε 0,λ ) is a fast-slow system of the form (2.1) with ε replaced by ελ, it follows from Section 2.3 (Varadhan's lemma) that ∆t is the semigroup associated with ∂ t v(t, x) = H(α, ∇v(t, x)) and

Hence we have
But since Y j α are iid for distinct j, the Hamiltonian H reduces to It follows that ∆t ϕ. Combining this with (6.8) completes the claim for n = 1. The proof of the inductive step for arbitrary n ≥ 1 follows identically to Section 4.3.

Numerical evidence
In this section, we investigate the performance of the standard HMM and PHMM methods for systems with well understood fluctuations and metastability properties. These simple experiments confirm that HMM amplifies fluctuations, which can drastically change the system's metastable behavior, and that the PHMM succeeds in avoiding these problems. In Section 7.1 we investigate simple CLT fluctuations for a simple quadratic potential systems, in Section 7.2 we look at large deviation fluctuations for a quartic double-well potential. Finally in Section 7.3 we look at fluctuations for a non-diffusive double well potential, which has large deviation properties that cannot be captured by a so-called 'small noise' diffusion.

Small fluctuations
We examine the small CLT-type fluctuations by looking the following fast-slow system It is simple to check that the averaged system is given by Hence for µ < 1 the averaged system is a gradient flow in a quadratic potential centered at the origin.
We will first illustrate that the HMM-type method described in Section 3 inflates the O( √ ε) fluctuations about the average by a factor of √ λ. In Figure 1 we plot histograms of the slow variable X for different speed-up factors λ. It is clear that the spread of the invariant distribution is increasing with λ. The profile remains Gaussian but the variance is greatly inflated. In Figure 2 we plot the variance of the stationary time series for X as a function of λ. The blue line is computed using HMM and the red line is computed using PHMM. As predicted by the theory in Section 4.2, in the case of HMM the variance is increasing linearly with λ and in the case of PHMM the variance is approximately constant. Note that in this example, the CLT captures the large deviations as well.

Large fluctuations
To investigate the affect of parallelization on O(1) deviations not captured by the CLT, we will look at a fast-slow system which exhibits metastability. Hence it is natural to take It is simple to check that the averaged system is Hence for any µ > 0 the averaged system is a gradient flow in a symmetric double well potential, with stable equilibria at ± √ µ and a saddle point at the origin. The large fluctuations of the fast-slow system can be investigated by looking at the first passage time for transitions from a neighborhood of one stable equilibrium to the other.
In Figure 3 we compare the mean first passage time for HMM and PHMM as a function of λ. Even for λ = 2, the distinction between the two methods is vast, with the mean first passage time for HMM rapidly dropping off and for PHMM staying approximately constant.
In Figure 4 we compare respectively the stationary distributions of the true fast-slow system, HMM (λ = 5) and PHMM (λ = 5). In the case of HMM, the energy barrier separating the two metastable states is now overpopulated, which explains the rapid fall in mean first passage time. In the case of PHMM, the histogram is indistinguishable from the true stationary distribution (with the exception of a slight asymmetry).
In Figure 5 we plot the cumulative distributions function (CDF) for the first passage time, comparing that of the true fast-slow system, with HMM (λ = 5) and PHMM (λ = 5). We see that the HMM first passage times are supported on a much faster time scale than that of the true fast-slow system. In contrast, the CDF of PHMM is almost indistinguishable from that of the true fast-slow system. Hence PHMM is not just replicating the mean first passage time, but also the entire distribution of first passage times.

Asymmetric, non-diffusive fluctuations
We now compare HMM and PHMM for a multiscale model that also displays metastability, but in which the large fluctuations cannot be characterized by a 'small noise' Ito diffusion. In particular, the Hamiltonian describing the LDP of the system is non-quadratic, as opposed the the previous systems. The system has been used [BGTVE15] to illustrate the ineffectiveness of diffusion-type approximations for fast-slow systems. The fast-slow system is given by where γ(x) = x 4 /10 − x 2 + 3 . The averaged equation for this system reads For ν = 1 and σ = √ 3, this averaged equation possesses two stable fixed points at x ≈ 0.555 and x ≈= 2.459 and one unstable fixed point at x ≈ 2.459. The the rates of transition between these stable fixed points is captured by the LDP. By an elementary calculation [BGTVE15], the Hamiltonian of this LDP is found to be non-quadratic and given by The quasi-potential associated with this Hamiltonian satisfies 0 = H(x, V ′ ), i.e.
and is displayed in Figure 6. Whilst there is a significant barrier corresponding to left-to-right transitions, there is almost no barrier corresponding to right-to-left transitions. In Figure 7 we plot CDFs of the first passage times: due to the asymmetry we plot separately the transitions from the left-to-right and right-to-left. For left-to-right transitions, the HMM procedure drastically speeds up transitions because it enhances fluctuations: as is the case with the previous experiment, the HMM transitions are supported on a timescale several orders of magnitude faster than those of the true fast slow system. The PHMM method does not experience this problem and the distribution of first passage times agrees quite well with the true model. For right-to-left transitions, PHMM shows similarly good agreement with the true fast-slow system, but in contrast HMM is not too far off either. This can be accounted for by the 'flatness' of the right potential well, meaning that increasing the amplitude of fluctuations will only decrease the escape time by a linear multiplicative factor. We note that the noise appearing in the CDF plots is due to the scarcity of transitions occurring in the model (7.2).

Discussion
We have investigated HMM methods for fast-slow systems, in particular their ability (or lack thereof) to capture fluctuations, both small (CLT) and large (LDP). We found, both theoretically (Section 4) and numerically (Section 7), that the amplitude of fluctuations is enhanced by an HMM-type method. In particular with an HMM speed up factor λ, in the CLT the variance of Gaussian fluctuations about the average is increased by a factor λ as well. In the LDP, the quasi-potential is decreased by a factor λ, leading to the first passage times being supported on a time scale λ orders of magnitude smaller than in the true fast slow system. This inability to correctly capture fluctuations about the average suggests that HMM can be a poor approximation of fast-slow systems, particularly when metastable behavior is important. As noted in Section 4.3, although the fluctuations of HMM are enhanced, the large deviation transition pathways remain faithful to the true model. Thus we stress that HMM is a reliable method of finding transition pathways in metastable systems, but not for simulating their dynamics. We have introduced a simple modification of HMM, called parallel HMM (PHMM), which avoids these fluctuation issues. In particular, the PHMM method yields fluctuations that are consistent with the true fast slow system for any speed up factor λ (provided that we still have ελ ≪ 1), as was shown both theoretically (Section 6) and numerically (Section 7). The HMM method relies on computing one short burst of the fast variables, and inferring the statistical behavior of the fast-variables by extrapolating this short burst over a large time window. PHMM on the other hand computes an ensemble of λ short bursts, and infers the statistics of the fast variables using the ensemble. Since the ensemble members are independent, they can be computed in parallel. Hence if one has λ CPUs available, then the real computational time required in PHMM is identical to that in HMM.  Interestingly, one can draw connections between the parallel method introduced here and the tau-leaping method used in stochastic chemical kinetics [Gil00]. The tau-leaping method is an approximation used to speed up simulation of stochastic fast-slow systems of the type X ε (t) = X ε (0) + m k=1 εN k ε −1 t 0 a k (X ε (s))ds ν k , (8.1) where N k are independent unit rate Poisson processes, ν k are vectors in R d and a k : R d → R.
The system (8.1) can be solved exactly by the stochastic simulation algorithm (SSA), but when ε is small this can be extremely expensive, due to the Poisson clocks being reset each time a jump occurs. The tau-leaping procedure avoids this issue by chopping the simulation window into sub-intervals of size τ and on each subinterval fixing the Poisson clocks to their value at the left endpoint. The speed-up is a result of the fact that one can simulate the Poisson jumps in parallel, since their clocks are fixed over the τ interval. As a consequence of this analogy, one can check (using calculations similar to those found above) that the tau-leaping method also captures the fluctuations correctly, both at the level of the CLT and that of the LDP. The former observation was made in [AGK11]; to the best of our knowledge, the second one is new. As a final note, we stress that there are non-dissipative fast-slow systems for which the PHMM will not be effective at capturing their long time scale behavior, including metastability. These are system for which the CLT and LDP hold on O(1) timescale, but they either cannot be extended to longer time-scale (in the case of the CLT) or leads to trivial prediction on these time scales (in the case of the LDP). To clarify this point, take for example the fast-slow Langevin systeṁ q 1 = p 1ṗ1 = q 1 − q 3 1 + (q 2 − q 1 ) , (8.2) q 2 = ε −1 p 2ṗ2 = ε −1 (q 1 − q 2 ) − ε −1 γp 2 + 2ε −1 β −1 γη .
As ε → 0, it is easy to check that the slow variables (q 1 , q 2 ) converge to the averaged systeṁ where the averaged vector field is the gradient of the free energy G(q 1 ) = 1 4 q 4 1 − 1 2 q 2 1 + 1 2 q 2 1 = −β −1 log exp(βU(q 1 , q 2 ))dq 2 , with U(q 1 , q 2 ) = 1 4 q 4 1 − 1 2 q 2 1 + 1 2 (q 1 − q 2 ) 2 . Likewise, if we introduce the CLT indicates that the evolution of these variables are captured bẏ η 1 = ζ 1 , dζ 1 = 2β −1 γ dB (8.4) and we can also derive an LDP for (8.2) with action S [0,T ] (q 1 ) = β 4γ T 0 |q 1 − q 1 + q 3 1 | 2 dt (8.5) However, neither (8.4) nor (8.5) capture the long time behavior of the solution to (8.2). The problem stems from the fact that the averaged equation in (8.3) is Hamiltonian, hence nondissipative. As a result, fluctuations accumulate as time goes on. Eventually, the CLT stops being valid, and the LDP becomes trivial -in particular, it is easy to see that the quasipotential associated with the action in (8.5) is flat. For examples of this type, other techniques will have to be employed to describe their long time behavior including, possibly, their metastability (which, in the case of (8.2) is controlled by how small β −1 is, rather than ε). These questions will be investigated elsewhere.